AI · 15 min

The quiet costs of putting LLMs in product

Enes Yıldız·Mar 21, 2026

The quiet costs of putting LLMs in product

Token price is a signal, not the real cost. Six things to measure once your LLM hits production.

P95 and queues

A call may be fast at P50 and dramatically slow at P95. As the queue grows, your front-end's 'thinking…' message burns user trust.

Retrieval quality

Bad chunking, duplicate context, stale embeddings — these are not model problems; they are environment problems.

Observability

Beyond tokens, log three things: prompt version, retrieval context size, and final-step user feedback. Without these you cannot measure improvement.

More

Other posts

Taking AI agents to production: from pilot to scale

AI · May 28, 2026 · 13 min

Taking AI agents to production: from pilot to scale

Core Web Vitals 2026: actually fixing INP

SEO · May 20, 2026 · 9 min

Core Web Vitals 2026: actually fixing INP

Next.js 16 and Server Components: what really changed

Engineering · May 12, 2026 · 11 min

Next.js 16 and Server Components: what really changed