Are open-source AI models cheaper than GPT and Claude?

Substantially, yes. The strongest open-weight models run roughly $0.14 to $1.75 per million input tokens and $0.28 to $4.40 per million output tokens. The closed frontier runs about $3 to $5 per million input and $15 to $30 per million output. On output tokens, which dominate most bills, closed frontier models cost roughly five to fifty times more than the cheapest capable open models. The gap is real, but it is not the whole story, because the closed frontier still leads on the hardest reasoning tasks.

Are Chinese open models like DeepSeek, Qwen, and Kimi actually cheaper?

Yes, and they are also the reason the open tier is competitive at all. DeepSeek, Alibaba's Qwen, Zhipu's GLM, Moonshot's Kimi, and MiniMax have shipped open-weight models that land within striking distance of the closed frontier on many tasks while pricing at a fraction of it. DeepSeek V4 Flash, for example, runs about $0.14 input and $0.28 output per million tokens, roughly a hundredth of a closed frontier flagship on output.

How much cheaper are open models than closed models?

On a per-token basis, five to fifty times cheaper for output, depending on which models you compare. A workload that costs $30 per million output tokens on a closed flagship can cost $0.28 to $4.40 on a strong open model. The practical saving depends on your mix of input and output tokens and how much prompt caching you get, but for output-heavy work the difference is large enough to change what is economically possible.

Are open models good enough to replace closed frontier models?

For a large and growing class of work, yes. In our own coding benchmark an open-weight model won a task outright against the closed frontier. But open models are more task-sensitive, so results vary more run to run, and the very hardest reasoning and long-horizon agent tasks still favor the closed frontier. The right answer is rarely all-open or all-closed. It is matching the model to the task.

Why are open models so much cheaper?

Three reasons. The weights are public, so many providers compete to host the same model and drive the hosting price toward the cost of compute. The models are often smaller or more efficient for a given capability. And there is no proprietary-model margin to recover. Closed frontier prices carry the cost of training runs and a margin that open-weight hosting does not.

Research

Open vs closed AI models: what they actually cost in 2026

The strongest open-weight models now cost a fraction of the closed frontier, often five to fifty times less per token, and several of them come from Chinese labs. Here is the real per-million-token math on both sides, why the gap exists, and where paying for a closed model still makes sense.

July 5, 20268 min read

Six months ago the question we heard most from finance and engineering teams was "which model is best." Today it is increasingly "which model is best for the price," and that shift is being driven by a wave of open-weight models, many of them from Chinese labs, that are good enough to take seriously and cheap enough to change the math.

If you have searched for what DeepSeek, Qwen, GLM, Kimi, or MiniMax cost against GPT, Claude, or Gemini, you have probably found a mess of conflicting numbers. This piece lays out the real per-token pricing on both sides as of mid-2026, explains why the gap is as wide as it is, and is honest about where paying for a closed model still earns its keep.

How model pricing actually works

Model APIs bill per token, and almost always at two different rates: one for input tokens (the prompt you send, including any documents, system instructions, and prior conversation) and a higher one for output tokens (what the model generates). A token is roughly three-quarters of a word. Prices are quoted per million tokens.

The split matters because the ratio of input to output in your workload determines which price dominates your bill. A summarization job that reads a long document and writes a short answer is input-heavy. An agent that reads a little and writes a lot of code or analysis is output-heavy, and output is the more expensive side everywhere. Many providers also offer a discounted cached input rate for prompt content they can reuse across calls, which can cut the input side sharply for repeated work.

Keep that in mind reading the tables below: the output column is usually where the real money is.

The closed frontier

These are the proprietary flagships from the three largest labs. Prices are the published standard rates per million tokens as of mid-2026.

Model	Lab	Input / 1M	Output / 1M
GPT-5.5	OpenAI	$5.00	$30.00
Claude Opus 4.8	Anthropic	$5.00	$25.00
Claude Sonnet 4.6	Anthropic	$3.00	$15.00
Gemini 2.5 Pro	Google	$1.25	$10.00

This is the top of the market: the models that still set the bar on the hardest reasoning, longest-horizon agent tasks, and most demanding tool use. You pay for that, and for the training runs and margin behind a closed model. Output sits between $10 and $30 per million tokens.

The open tier

Now the open-weight side. These models publish their weights, which means many providers can host the same model and compete on price. Several of the strongest come from Chinese labs: DeepSeek, Alibaba (Qwen), Zhipu / Z.ai (GLM), Moonshot (Kimi), and MiniMax. The rates below are representative published hosting rates per million tokens; the maintained list of what Perch runs, with current numbers, lives on the Models page.

Model	Lab	Input / 1M	Output / 1M
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28
GPT-OSS 120B	OpenAI	$0.15	$0.60
Qwen 3.6	Alibaba	$0.25	$1.25
MiniMax M2	MiniMax	$0.30	$1.20
Kimi K2.6	Moonshot	$0.95	$4.00
GLM 5	Zhipu / Z.ai	$1.00	$3.20
DeepSeek V4 Pro	DeepSeek	$1.74	$3.48

The spread within the open tier is wide, from a flash model at $0.28 output to a flagship at $4.40, but even the top of the open range sits below the bottom of the closed frontier. On output tokens, the cheapest capable open model is roughly a hundredth the price of a closed flagship, and the strongest open flagship is still a fraction of one.

The gap, in one number

Take output tokens, since that is where most bills are decided. A closed frontier flagship charges $25 to $30 per million. A strong open model charges $3 to $4. A fast open model charges under $1. That is a five to fifty times difference for the same unit of generated text, depending on which pair you compare.

For a small, occasional workload the absolute numbers are tiny either way and the price gap barely matters; run the best model and move on. But for anything at volume, an agent that runs continuously, a document pipeline processing thousands of files, a product feature in the hot path of every user request, the multiple is the difference between a line item and a budget. It is why open models are showing up in production, not because they always win on quality, but because at their price they are worth trying first.

Why open models are so much cheaper

Three forces compound:

The weights are public. When anyone can host a model, hosting becomes a commodity and providers compete the price down toward the cost of the compute. A closed model has exactly one seller.

They are often more efficient. Many of these models are smaller, or use mixture-of-experts designs that activate only part of the network per token, delivering a given capability for less compute. Less compute per token is less cost per token.

There is no proprietary-model margin. A closed frontier price has to recover the cost of frontier training runs and return a margin on a differentiated product. Open-weight hosting prices the compute and a thin operating margin, and little else.

None of this is a claim that open models match the frontier everywhere. It is an explanation of why, where they are competitive on quality, they are dramatically cheaper.

Where closed models still earn their price

Price is one axis. Honesty requires the others:

The hardest reasoning still favors the frontier. On the most demanding multi-step reasoning and long-horizon agent tasks, the closed flagships remain more reliable. If a task genuinely needs the top of the market, paying for it is the cheaper choice overall.
Open models are more variable. In our own coding benchmark, an open-weight model won one task outright, and another was the most volatile in the field, scoring 83 on one task and 30 on another. Higher variance means you have to test on your actual work rather than trust a single run.
The output multiple cuts both ways. A cheaper model that needs two attempts, or that generates more tokens to reach the same answer, can erase its per-token advantage. Cost per finished task, not cost per token, is what to measure.

How we think about it

Perch does not pick a side in the open-versus-closed debate, because the debate is the wrong frame. The useful question is which model fits the task in front of you, at what cost. So Perch is model-agnostic: Roost routes each task to a capable model automatically, and you can pin a specific one by hand. The full lineup and each model's published rate are on the Models page, and we meter usage against those published rates rather than marking them up.

That is also why we can run comparisons like this one honestly. We are not defending a house model. We evaluate every candidate the same way, on realistic tasks with known-correct answers, and we let the price sit next to the result. For a growing share of real work, the open tier now wins that comparison. For the hardest tasks, the closed frontier still does. The point is to see both numbers and choose deliberately.

For plans and what is included on each, see Pricing.

Explore Perch

Back to research