The Math Behind Your $20 Subscription of LLM models

Part 2 of 3: The Cost Stack, and Why Neither Business Is Profitable Yet

Jun 06, 2026

In Part 1, I looked at the revenue side of OpenAI and Anthropic, users, segments, and growth projections that have investors paying $852 billion for a company that does not yet make money. In this piece, I want to go into the cost layer: what does it actually cost to serve one user, and where does the margin sit today?

AI Economics: Who Actually Gets Paid When You Use ChatGPT?

Pranjal Kalra

May 23

Read full story

This is something I learned in my early years at Beam: the most important analysis was Unit Economics. Once we had the per-trip unit, everything else fell into place. Revenue per trip. Cost per trip. Contribution per trip. The same logic applies here, except the unit is the session, and the cost stack has four layers that most people do not see.

Before we go deeper, we define the fundamental cost unit for these businesses and how they stack up below.

A token is roughly four characters of English text, or about three-quarters of a word. Every time you send a message and receive a response, both sides are measured in tokens. A short question and answer might consume 500 tokens. A long research session with documents could consume 50,000. Tokens are the engineering unit, precise, measurable, and directly tied to compute cost. Every dollar of inference cost ultimately traces back to tokens processed.

A session is one conversation: a start, an end, and a token count in between. It is the consumer unit, the closest thing to a trip. Revenue per session is a fraction of the monthly subscription. Cost per session is tokens consumed multiplied by the model’s token rate. The gap between those two numbers is where the economics live.

Source: OpenAI and Anthropic pricing pages, May 2026

The Four Layer Cake: Costs of delivering value

L1: Inference

Inference is the largest and most visible. Every session consumes tokens, and every token consumes compute. OpenAI’s inference costs reached $8.4 billion in 2025 and are projected to hit $14.1 billion in 2026. Paying users account for roughly 66% of that spend. The remaining 34%, approximately $4.8 billion, is the cost of keeping the free tier alive. [Source: Sacra, 2026] That is $4.8 billion a year spent on 850 million people paying nothing.

The free user costs almost nothing to serve per query. The heavy Plus user is where the inference cost starts to exceed the $20 subscription price before any other cost is counted.

L2: Infrastructure and platform tax

This sits underneath inference. Neither company owns the computer it runs on. OpenAI runs on Microsoft Azure and pays 20% of total revenue back to Microsoft through 2032, over $13 billion in projected payments across 2026 and 2027 alone. [Source: Sacra, 2026] Anthropic runs on Google and Amazon, and its cloud costs came in 23% above projections in 2025. [Source: The Information, cited in Tiger Brokers, January 2026] Leasing compute from the same companies that are also your largest competitors is a structural vulnerability neither has fully resolved.

Estimates based on published daily infrastructure spend, user base figures, and average token consumption by tier.

The same infrastructure bill looks very different depending on whether you divide it across all users or only the paying ones. And when expressed per token, the infrastructure cost alone adds roughly $2.80 per million tokens on top of the inference cost — meaning the true cost of serving a paying user is nearly double what the model’s token price suggests.

L3: Training amortisation

The cost that the accontants dont’t know how to schedule, but is the cost most people forget. Building a frontier model is a one-time spend spread across everyone who uses it. In 2024, OpenAI spent $3 billion on training, $1.8 billion on inference, and $1 billion on research amortised over multiple years. [Source: Epoch AI, 2025] The more users, the lower the per-user training cost. 900 million users make each training dollar go further. But each new model costs more than the last. OpenAI projects $32 billion in training spend in 2026 and $65 billion in 2027. [Source: OpenAI internal projections, cited by CNBC, 2026] The amortisation math only works if the user base keeps growing faster than the training bill.

We have used a 2-year amortisation for each of the training models to run, in reality we don’t know how long these companies will actually run these amortisation costs.

L4: Talent and SG&A costs

OpenAI’s salaries and equity compensation run at $2.5 billion or more per year. Senior ML researchers command over $500,000 in total annual compensation. [Source: Medium, April 2026] These costs sit below CM1 and are not included in the per-user contribution calculation above.

OpenAI spent $2 billion on sales and advertising in the first half of 2025 alone, nearly double its entire 2024 budget for that category. [Source: sanj.dev, October 2025] R&D spend came in at $6.7 billion in H1 2025. These costs are what turn a marginally profitable unit into a deeply loss-making company at the operating level.

The Per-User Math: Where It Breaks

I understand this might be confusing at first, because a lot of the costs are split out in different ways. To make it easier, we will look at costs as:

Gross margin (net of inference costs, purely based on token)
Contribution Margin 1 (net of infra / platform fees)
Contribution Margin 2 (net of talent / SG&A)
Contribution Margin 3 (net of training costs, which I am most unsure of where to park)

Estimates based on inference cost allocations, published gross margins, and revenue share disclosures.

This profitability will continue to be a problem for OpenAI as it scales. The negative margins continue to put pressure on the business even as the number of users continues to grow. The only hope is if the business can drastically reduce token costs in the future to make itself sustainably positive. We can see the negative margins trickle through as the business scales, just like we spoke about a few articles back in “Orthogonality in businesses”

Understanding Orthogonality in businesses - for founders

Pranjal Kalra

August 23, 2025

Read full story

Why Token Costs Are Falling and Rising at the Same Time

This is the part that confuses most people, and it is worth understanding properly because it sits at the heart of whether either business can ever make money.

Per-token costs have fallen approximately 280x in two years for equivalent capability levels. GPT-4 input tokens cost $30 per million in early 2023. The same capability now costs approximately $0.07. [Source: Stanford AI Index 2025] That looks like the cost of problem-solving itself. It is not.

But here is what those falling costs are actually buying. The newer, more capable models — GPT-5.4, GPT-5.5, Claude Opus 4.7 — are significantly more expensive per token than their predecessors, not cheaper. GPT-5.5 launched at double the token price of GPT-5.4. [Source: OpenRouter, April 2026] The reason is straightforward: bigger models, longer reasoning chains, and extended thinking all consume more compute per token generated. The capability jump costs real compute.

So the pattern is this. The cost of last year’s intelligence keeps falling. The cost of this year’s intelligence keeps rising. And users always want this year’s. All AI companies face the same underlying reality. The product that retains users is the most capable, most expensive version. The product that makes economic sense is the cheaper, lighter version. Pricing them the same on a flat subscription is the structural tension neither has resolved - something needs to change here; either prices will go up, or the models will stagnate, and there will be an optimisation based on these input costs.

Is this a structurally broken Unit Economics today?

There is a paradox sitting at the centre of both businesses. The features that make users love these products, longer context, deeper reasoning, and agentic workflows that loop through dozens of model calls, are precisely the features that make the economics harder. Agentic workflows can trigger ten to twenty model calls for a single user-initiated request, compared to one for a standard chat session. [Source: Oplexa, March 2026] Every capability improvement shipped to retain users costs more to serve than the last one.

This is what makes the path to profitability genuinely uncertain, not just delayed. The cost base is not shrinking. It is growing faster than revenue in absolute terms. Whether it can turn, and what has to happen for it to do so, is what Part 3 is about.

Part 1: Who Actually Gets Paid When You Use ChatGPT?
Part 3: Can These Businesses Actually Make Money? [Pending]

Sources: Sacra (April 2026), Epoch AI, Tiger Brokers (January 2026), The Information cited via CNBC, AI Automation Global (March 2026), Stanford AI Index 2025, Oplexa (March 2026), Medium (April 2026).

Beyond10x by Pranjal Kalra

AI Economics: Who Actually Gets Paid When You Use ChatGPT?

Understanding Orthogonality in businesses - for founders

Discussion about this post

Ready for more?