Cheapest LLM APIs in 2025: a complete comparison

Published February 12, 20253 min read

The price of frontier-quality LLM tokens has fallen by more than 90% over the last 18 months. In 2023 you paid $30 per million input tokens to use GPT-4. In 2025, you can get a model that beats that GPT-4 on most benchmarks for under $0.15 per million tokens.

This post is a practical, opinionated comparison of the cheapest production-grade APIs available right now.

What "cheap" actually means

When people say "cheap LLM API" they usually conflate three different things:

Headline input price per 1M tokens. This is what the marketing pages show.
Real cost on your workload, which depends on the input/output ratio. Most chat workloads are roughly 4:1 input-heavy. RAG workloads can be 20:1.
Total cost of ownership, which includes retries on bad outputs, context caching, and how often you have to re-prompt because the model misunderstood you.

A model that costs 2x more per token but only needs one shot at a problem is often cheaper than a "cheap" model you have to call three times.

The cheapest tier: under $0.30 / 1M input tokens

These are the models worth defaulting to for high-volume, latency-sensitive, or cost-sensitive workloads.

DeepSeek V3 — Currently the price-to-quality leader. General-purpose, strong coding, and absurdly cheap.
GPT-4o mini — OpenAI's value pick. Slightly more expensive than DeepSeek, but with the maturest tooling, function calling, and structured outputs.
Claude 3 Haiku — Anthropic's small model. Fast, friendly tone, very low hallucination rate on summarisation tasks.
Gemini 1.5 Flash — Google's value pick. Insane 1M-token context window for the price.

If you are building a chatbot, a content-classification pipeline, or RAG over a small corpus, one of these four is almost certainly your answer.

The mid tier: $1 to $5 / 1M input tokens

This is where you go when the cheap tier starts producing visibly worse outputs on your specific task. In practice that's usually:

Code generation beyond a few hundred lines
Multi-step agentic workflows
Long-document reasoning where you cannot afford to lose details

The models in this tier — Claude 3.5 Sonnet, Mistral Large, Llama 3.1 70B — are dramatically better at instruction-following and reasoning than the cheap tier, while still being 5–10x cheaper than the flagship tier.

The flagship tier: $10+ / 1M input tokens

GPT-4o, Claude 3 Opus, o1. Use these only when:

The task is genuinely hard and the cheap tier has failed
A single bad output costs more than the price of the call
You are a paying user, not a paying developer (i.e. your end-users will eat the cost)

How to actually pick

A practical decision tree:

Start at the cheap tier. Pick whichever of the four matches your stack (OpenAI tooling? GPT-4o mini. Anthropic ecosystem? Haiku. Already on Vertex? Flash. Cost-only? DeepSeek).
Build the feature. Measure quality with a small eval set (50–100 examples is plenty).
If quality is below your bar, climb the tier. Don't skip from Haiku to Opus — try Sonnet first.
If quality is at your bar, stay there. Don't pay for capability you aren't using.

A note on output prices

Output tokens are typically 3x to 5x more expensive than input tokens at every provider. If your responses are long (think long-form generation, not classification), output cost will dominate. Always model your real input:output ratio before locking in a provider.

Updated daily

Prices change often. The comparison table on the homepage is updated every day from the official pricing pages. Bookmark it.