Open vs closed AI models: what they actually cost in 2026
The strongest open-weight models now cost a fraction of the closed frontier, often five to fifty times less per token, and several of them come from Chinese labs. Here is the real per-million-token math on both sides, why the gap exists, and where paying for a closed model still makes sense.
Six months ago the question we heard most from finance and engineering teams was "which model is best." Today it is increasingly "which model is best for the price," and that shift is being driven by a wave of open-weight models, many of them from Chinese labs, that are good enough to take seriously and cheap enough to change the math.
If you have searched for what DeepSeek, Qwen, GLM, Kimi, or MiniMax cost against GPT, Claude, or Gemini, you have probably found a mess of conflicting numbers. This piece lays out the real per-token pricing on both sides as of mid-2026, explains why the gap is as wide as it is, and is honest about where paying for a closed model still earns its keep.
How model pricing actually works
Model APIs bill per token, and almost always at two different rates: one for input tokens (the prompt you send, including any documents, system instructions, and prior conversation) and a higher one for output tokens (what the model generates). A token is roughly three-quarters of a word. Prices are quoted per million tokens.
The split matters because the ratio of input to output in your workload determines which price dominates your bill. A summarization job that reads a long document and writes a short answer is input-heavy. An agent that reads a little and writes a lot of code or analysis is output-heavy, and output is the more expensive side everywhere. Many providers also offer a discounted cached input rate for prompt content they can reuse across calls, which can cut the input side sharply for repeated work.
Keep that in mind reading the tables below: the output column is usually where the real money is.
The closed frontier
These are the proprietary flagships from the three largest labs. Prices are the published standard rates per million tokens as of mid-2026.
| Model | Lab | Input / 1M | Output / 1M |
|---|---|---|---|
| GPT-5.5 | OpenAI | $5.00 | $30.00 |
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
This is the top of the market: the models that still set the bar on the hardest reasoning, longest-horizon agent tasks, and most demanding tool use. You pay for that, and for the training runs and margin behind a closed model. Output sits between $10 and $30 per million tokens.
The open tier
Now the open-weight side. These models publish their weights, which means many providers can host the same model and compete on price. Several of the strongest come from Chinese labs: DeepSeek, Alibaba (Qwen), Zhipu / Z.ai (GLM), Moonshot (Kimi), and MiniMax. The rates below are representative published hosting rates per million tokens; the maintained list of what Perch runs, with current numbers, lives on the Models page.
| Model | Lab | Input / 1M | Output / 1M |
|---|---|---|---|
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 |
| GPT-OSS 120B | OpenAI | $0.15 | $0.60 |
| Qwen 3.6 | Alibaba | $0.25 | $1.25 |
| MiniMax M2 | MiniMax | $0.30 | $1.20 |
| Kimi K2.6 | Moonshot | $0.95 | $4.00 |
| GLM 5 | Zhipu / Z.ai | $1.00 | $3.20 |
| DeepSeek V4 Pro | DeepSeek | $1.74 | $3.48 |
The spread within the open tier is wide, from a flash model at $0.28 output to a flagship at $4.40, but even the top of the open range sits below the bottom of the closed frontier. On output tokens, the cheapest capable open model is roughly a hundredth the price of a closed flagship, and the strongest open flagship is still a fraction of one.
The gap, in one number
Take output tokens, since that is where most bills are decided. A closed frontier flagship charges $25 to $30 per million. A strong open model charges $3 to $4. A fast open model charges under $1. That is a five to fifty times difference for the same unit of generated text, depending on which pair you compare.
For a small, occasional workload the absolute numbers are tiny either way and the price gap barely matters; run the best model and move on. But for anything at volume, an agent that runs continuously, a document pipeline processing thousands of files, a product feature in the hot path of every user request, the multiple is the difference between a line item and a budget. It is why open models are showing up in production, not because they always win on quality, but because at their price they are worth trying first.
Why open models are so much cheaper
Three forces compound:
The weights are public. When anyone can host a model, hosting becomes a commodity and providers compete the price down toward the cost of the compute. A closed model has exactly one seller.
They are often more efficient. Many of these models are smaller, or use mixture-of-experts designs that activate only part of the network per token, delivering a given capability for less compute. Less compute per token is less cost per token.
There is no proprietary-model margin. A closed frontier price has to recover the cost of frontier training runs and return a margin on a differentiated product. Open-weight hosting prices the compute and a thin operating margin, and little else.
None of this is a claim that open models match the frontier everywhere. It is an explanation of why, where they are competitive on quality, they are dramatically cheaper.
Where closed models still earn their price
Price is one axis. Honesty requires the others:
- The hardest reasoning still favors the frontier. On the most demanding multi-step reasoning and long-horizon agent tasks, the closed flagships remain more reliable. If a task genuinely needs the top of the market, paying for it is the cheaper choice overall.
- Open models are more variable. In our own coding benchmark, an open-weight model won one task outright, and another was the most volatile in the field, scoring 83 on one task and 30 on another. Higher variance means you have to test on your actual work rather than trust a single run.
- The output multiple cuts both ways. A cheaper model that needs two attempts, or that generates more tokens to reach the same answer, can erase its per-token advantage. Cost per finished task, not cost per token, is what to measure.
How we think about it
Perch does not pick a side in the open-versus-closed debate, because the debate is the wrong frame. The useful question is which model fits the task in front of you, at what cost. So Perch is model-agnostic: Roost routes each task to a capable model automatically, and you can pin a specific one by hand. The full lineup and each model's published rate are on the Models page, and we meter usage against those published rates rather than marking them up.
That is also why we can run comparisons like this one honestly. We are not defending a house model. We evaluate every candidate the same way, on realistic tasks with known-correct answers, and we let the price sit next to the result. For a growing share of real work, the open tier now wins that comparison. For the hardest tasks, the closed frontier still does. The point is to see both numbers and choose deliberately.
For plans and what is included on each, see Pricing.