GPT-5.1: $1.25 / 1M tokens ($0.125 cached) vs an hour of work ($35 / hr)
February 3, 2025
I think about token pricing with transaction-cost economics: if cognition is a service, tokens are the marginal unit and the API is the market. Bounded rationality means humans spend time on search, parsing, and synthesis; tokens turn that into a priced, automatable stream. The insight is not that tokens replace humans, but that the unit economics of thinking are now legible.
At $1.25 per million input tokens and $0.125 for cached, the dominant cost in many workflows is no longer compute but orchestration. The technical lever is whether you can structure prompts and retrieval so the model spends cached context on stable facts, and fresh tokens only on the delta. Good system design looks a lot like caching strategy plus careful scoping of questions.
Example: an agent that researches apartment listings for 2 hours is really a pipeline that hits listings APIs, map APIs, and transit APIs, then summarizes tradeoffs. With tight retrieval and a cached city profile, the workflow can stay under 150k input tokens and 10k output tokens, which is a fraction of a dollar. The expensive part is the human time to orchestrate the APIs and set the constraints, not the model cost. The model is cheap; the plumbing is what you pay for.
That flips the decision: use tokens to buy down search and synthesis, and use engineering time to design better interfaces for that cognition. The model is cheap; the real cost center is how you route information through it. If you do that well, a small token budget can replace hours of manual browsing without sacrificing quality.
A practical rule: if a task is mostly information filtering and synthesis, the token budget is likely below 1 percent of a human hour. Design your system so you pay for decisions, not for browsing. In other words, spend time on architecture once so you can spend pennies on cognition forever.