AI & Developer Tools

AI API Cost Calculator

Estimate what your LLM usage will cost across OpenAI (GPT), Anthropic (Claude), and Google (Gemini) — and compare every major model side by side. Pricing last updated July 2026.

Rule of thumb: 1,000 tokens ≈ 750 English words. A typical chat message is 50–300 tokens; a long document can be 10,000+.

Full pricing comparison (per 1 million tokens)

Sorted from cheapest to most expensive output. "Blended $/1K requests" uses your inputs above once you press Calculate.

ModelProviderInput $/1MOutput $/1MCost / 1,000 requests

Prices are standard-tier, short-context list prices as of July 2026 and change often — always confirm on the provider's official pricing page before committing. Batch processing (up to 50% off) and prompt caching (up to 90% off repeated input) can substantially lower real-world costs.

Want details on one model? See the per-model breakdowns in our AI pricing directory.

How LLM API pricing works

Every major AI provider bills the same way: you pay per token, with separate rates for input (what you send — your prompt, documents, conversation history) and output (what the model writes back). Output tokens typically cost 3–6× more than input tokens.

A token is roughly ¾ of an English word. "The quick brown fox" is 4 tokens; this paragraph is about 60. Code, non-English languages, and unusual formatting usually consume more tokens per word.

Three ways to cut your AI bill

  1. Prompt caching. If every request repeats the same long system prompt or document, providers charge cached input at ~10% of the normal rate. For agents and RAG apps this is often the single biggest saving.
  2. Batch processing. Work that can wait up to 24 hours (classification, embeddings, bulk generation) gets ~50% off at OpenAI, Anthropic, and Google.
  3. Model routing. Don't send every request to a flagship model. Route simple tasks (classification, extraction, short answers) to a budget model like Gemini 2.5 Flash-Lite or GPT-5.4 Nano — often 25–100× cheaper per token than flagships.

API vs. a chat subscription — which is cheaper?

A ChatGPT/Claude/Gemini consumer subscription runs about $20/month flat. Using the calculator above: a heavy personal user sending ~100 messages a day (≈3,000 requests/month at ~1,000 input + 400 output tokens each) costs roughly $2–8/month on a mid-tier model via API — often cheaper than the subscription, but without the polished app, and you pay for every call. For teams building products, the API is the only option; for casual personal use, subscriptions are simpler.

Frequently asked questions

How many tokens is my prompt?

Roughly: words × 1.33 for English prose. For an exact count, each provider ships a tokenizer tool, or paste your text into our word counter and multiply the word count by 1.33.

Do I pay for the conversation history?

Yes — chat APIs are stateless, so the whole conversation is re-sent (and billed as input) on every turn. Long chats get progressively more expensive; caching mitigates this.

Why is output so much more expensive than input?

Generating tokens requires a full forward pass of the model per token, while input is processed in parallel. The compute asymmetry shows up in the price.

Are there free tiers?

Google's Gemini API has a meaningful free tier for low volumes. OpenAI and Anthropic offer trial credits. All are rate-limited and intended for development, not production.