Every major LLM provider publishes a per-token rate. None of them give you the real cost of running a workload. This calculator models the actual bill across 16 production models from Anthropic, OpenAI, Google, Meta, and DeepSeek — accounting for the things vendor pricing pages omit:
Tokenizer differences — Opus 4.7 produces 1.0–1.46× more tokens than Opus 4.6 for the same English
Prompt caching — 90% off cached input, often 60–90% of production prompts
Batch API — 50% off async workloads, ignored by most calculators
Paste your actual prompt. Set your real volume. See the real cost across all models in one comparison table.
Frequently asked questions
What's the cheapest LLM?
DeepSeek V3.2 at $0.14/$0.28 per million tokens — but it's a smaller model. Among flagships, Claude Sonnet 4.6 at $3/$15 offers the best capability-per-dollar.
What's the most expensive LLM?
GPT-5.5 at $5/$30 (output dominates), then Claude Opus 4.7 at $5/$25.
How does prompt caching change the math?
Dramatically. For production workloads with stable system prompts (60–90% of tokens cacheable), caching can cut input costs by 80%+.
What's the Batch API?
OpenAI, Anthropic, and Google all offer 50% off for async batch processing (results within 24 hours). Massively underused.
Which LLM pricing calculator is most accurate?
We use the vendors' official tokenizers (tiktoken for OpenAI, Anthropic's count_tokens API, Google's countTokens API) instead of character approximations. For Llama and DeepSeek, no free count API exists, so we use calibrated character approximations and flag those as estimated.