AI & Developer Tools

AI Chatbot Cost Simulator

Building a customer-support bot, an in-app assistant, or an internal AI tool? Estimate the monthly model bill from your real usage shape — and see how caching and model choice change it.

Input = system prompt + retrieved docs + history. A stable system prompt and knowledge base makes 40–70% of input cacheable at ~10% price.

Cost at your usage, across models

ModelMonthly costPer userPer 1,000 messages

Standard-tier prices, July 2026 — see the AI price tracker. Batch processing (~50% off) applies to non-real-time workloads, not live chat.

The three levers that dominate chatbot cost

  1. Model tier. The spread between a budget and a frontier model is 25–100× per token. Most support bots route 80% of traffic to a cheap model and escalate the hard 20%.
  2. Prompt caching. Your system prompt and knowledge base are identical on every request — cached input bills at ~10%. For RAG bots this is routinely a 40–60% total saving.
  3. History management. Re-sending the whole conversation every turn makes long chats quadratically expensive. Summarize or truncate history beyond ~10 turns.

Sanity benchmarks

Product shapeTypical model cost
FAQ/support bot, budget model, cached$0.001–0.01 per conversation
General assistant, flagship model$0.01–0.05 per conversation
Agent doing multi-step work$0.10–1.00+ per task
Rule of thumbIf model cost per user exceeds ~10% of revenue per user, revisit routing and caching

Frequently asked questions

What usage shape should I assume before launch?

A common planning baseline for consumer products: 10–20% of signups become monthly active, active users send 5–20 messages/month, and support-style conversations average 3–6 turns. Model with your best guess, then re-simulate with real data after two weeks.

Does this include infrastructure costs?

No — this is the model (LLM API) bill only. Hosting a thin chat backend typically adds $5–50/month at small scale; vector databases for RAG add more.

How do rate limits affect scale?

Entry API tiers allow thousands of requests/minute — enough for most products. Past that, providers raise limits with usage history or a sales conversation, not extra fees.

Should I use batch pricing?

Not for live chat (batch takes up to 24h). Use it for the offline parts: nightly summarization, embedding generation, analytics — those get ~50% off.