Token price is a signal, not the real cost. Six things to measure once your LLM hits production.
P95 and queues
A call may be fast at P50 and dramatically slow at P95. As the queue grows, your front-end's 'thinking…' message burns user trust.
Retrieval quality
Bad chunking, duplicate context, stale embeddings — these are not model problems; they are environment problems.
Observability
Beyond tokens, log three things: prompt version, retrieval context size, and final-step user feedback. Without these you cannot measure improvement.