Cheapest LLM APIs in 2025: a complete comparison
The price of frontier-quality LLM tokens has fallen by more than 90% over the last 18 months. In 2023 you paid $30 per million input tokens to use GPT-4. In 2025, you can get a model that beats that GPT-4 on most benchmarks for under $0.15 per million tokens.
This post is a practical, opinionated comparison of the cheapest production-grade APIs available right now.
What "cheap" actually means
When people say "cheap LLM API" they usually conflate three different things:
- Headline input price per 1M tokens. This is what the marketing pages show.
- Real cost on your workload, which depends on the input/output ratio. Most chat workloads are roughly 4:1 input-heavy. RAG workloads can be 20:1.
- Total cost of ownership, which includes retries on bad outputs, context caching, and how often you have to re-prompt because the model misunderstood you.
A model that costs 2x more per token but only needs one shot at a problem is often cheaper than a "cheap" model you have to call three times.
The cheapest tier: under $0.30 / 1M input tokens
These are the models worth defaulting to for high-volume, latency-sensitive, or cost-sensitive workloads.
- DeepSeek V3 — Currently the price-to-quality leader. General-purpose, strong coding, and absurdly cheap.
- GPT-4o mini — OpenAI's value pick. Slightly more expensive than DeepSeek, but with the maturest tooling, function calling, and structured outputs.
- Claude 3 Haiku — Anthropic's small model. Fast, friendly tone, very low hallucination rate on summarisation tasks.
- Gemini 1.5 Flash — Google's value pick. Insane 1M-token context window for the price.
If you are building a chatbot, a content-classification pipeline, or RAG over a small corpus, one of these four is almost certainly your answer.
The mid tier: $1 to $5 / 1M input tokens
This is where you go when the cheap tier starts producing visibly worse outputs on your specific task. In practice that's usually:
- Code generation beyond a few hundred lines
- Multi-step agentic workflows
- Long-document reasoning where you cannot afford to lose details
The models in this tier — Claude 3.5 Sonnet, Mistral Large, Llama 3.1 70B — are dramatically better at instruction-following and reasoning than the cheap tier, while still being 5–10x cheaper than the flagship tier.
The flagship tier: $10+ / 1M input tokens
GPT-4o, Claude 3 Opus, o1. Use these only when:
- The task is genuinely hard and the cheap tier has failed
- A single bad output costs more than the price of the call
- You are a paying user, not a paying developer (i.e. your end-users will eat the cost)
How to actually pick
A practical decision tree:
- Start at the cheap tier. Pick whichever of the four matches your stack (OpenAI tooling? GPT-4o mini. Anthropic ecosystem? Haiku. Already on Vertex? Flash. Cost-only? DeepSeek).
- Build the feature. Measure quality with a small eval set (50–100 examples is plenty).
- If quality is below your bar, climb the tier. Don't skip from Haiku to Opus — try Sonnet first.
- If quality is at your bar, stay there. Don't pay for capability you aren't using.
A note on output prices
Output tokens are typically 3x to 5x more expensive than input tokens at every provider. If your responses are long (think long-form generation, not classification), output cost will dominate. Always model your real input:output ratio before locking in a provider.
Updated daily
Prices change often. The comparison table on the homepage is updated every day from the official pricing pages. Bookmark it.
Related models
- DeepSeek · DeepSeek V3$0.07 / $1.10
- OpenAI · GPT-4o Mini$0.15 / $0.60
- Anthropic · Claude 3 Haiku$0.25 / $1.25
- Google · Gemini 1.5 Flash$0.35 / $1.05