Writeups

Honest analysis of LLM pricing, tokenization quirks, and the actual cost of running models in production.

How to Reduce LLM API Costs: 7 Strategies That Actually Work

Most teams overpay for LLM APIs by 2-5×. Prompt caching, batch API, model routing, and prompt trimming can drop a $360/month bill to under $100. Here's the math for each strategy.

May 19, 2026 · 7 min read

Pricing

The Hidden Cost of Long Context Windows (And the Pricing Cliffs Nobody Mentions)

A 1M-token context window doesn't mean 1M tokens at a flat rate. Here's what large prompts actually cost across Claude, GPT, and Gemini — including the tier cliffs that double your bill.

May 14, 2026 · 5 min read

Pricing

GPT-5.5 Costs 2× More Than GPT-5.4 for the Same Job. Here's the Math.

OpenAI shipped GPT-5.5 with a quiet 2× price hike on input and output. Both models use the same tokenizer. Here's what that does to a 30,000-request-per-month workload, and when it's worth paying.

Apr 30, 2026 · 3 min read