AI · 15 min

The quiet costs of putting LLMs in product

The quiet costs of putting LLMs in product

Token price is a signal, not the real cost. Six things to measure once your LLM hits production.

P95 and queues

A call may be fast at P50 and dramatically slow at P95. As the queue grows, your front-end's 'thinking…' message burns user trust.

Retrieval quality

Bad chunking, duplicate context, stale embeddings — these are not model problems; they are environment problems.

Observability

Beyond tokens, log three things: prompt version, retrieval context size, and final-step user feedback. Without these you cannot measure improvement.

Share
LinkedIn·X·
More

Other posts