AI · 9 min read

The LLM production checklist: 12 things we check before every AI launch

J
Jay Sarvaiya
Founder & CEO · May 2025

Shipping a demo with a large language model takes an afternoon. Shipping one that survives real users, real cost pressure, and real edge cases takes a lot more discipline. After putting LLM features into production across fintech, legal, and education products, we boiled our pre-launch ritual down to a checklist. Here it is.

1. Stream everything the user waits on

Perceived latency kills AI products faster than actual latency. If a response takes four seconds, streaming the first token in under 500ms changes the entire experience. We stream by default and only fall back to blocking responses when the output feeds a downstream system rather than a human.

2. Put a hard cost cap on every request path

A runaway agent loop or an unbounded context window can turn a ₹2 request into a ₹200 one. We set per-request token ceilings, per-user daily budgets, and a global circuit breaker. None of these are optional once you have real traffic.

3. Always have a fallback model

Providers have outages. Rate limits trip. A good architecture degrades gracefully: primary model, secondary model, and a deterministic non-AI fallback for the critical path so the product never simply breaks.

4. Build the eval harness before the feature

You cannot improve what you cannot measure. We write a small evaluation set — real inputs with graded expected behaviour — before writing the prompt. Every prompt change runs against it. This is the single highest-leverage habit in AI engineering.

A prompt without an eval set is a vibe, not a feature.

The rest of the list

  • Log every prompt and completion (with PII redaction) so you can debug production.
  • Make outputs traceable to sources when the user is trusting the answer.
  • Surface confidence, and design the UI for when the model is wrong.
  • Rate-limit per user, not just per IP.
  • Cache aggressively — identical inputs should not pay twice.
  • Red-team for prompt injection before launch, not after.
  • Version your prompts like code, with a rollback path.
  • Decide your data-retention policy before the first request, not after the first audit.

None of this is glamorous. All of it is the difference between an AI feature that earns trust and one that quietly gets switched off three weeks after launch.

J
Written by Jay Sarvaiya
Founder & CEO at Satvix Tech Solutions