AI Checkout — Operational Runbook
For Swft engineers responding to AI Checkout incidents.
Quick health check
Section titled “Quick health check”Per-merchant: GET https://api.swft.co.uk/health/ai-chat with x-swft-api-key.
Returns:
status:ok | degraded | paused- LLM + embedding configuration (just the booleans, not the keys)
- Knowledge stats
- Monthly cost vs cap
Common issues
Section titled “Common issues””AI Checkout silently doesn’t work for merchant X”
Section titled “”AI Checkout silently doesn’t work for merchant X””- Hit health endpoint with their API key
- Check
cost.monthly_cap_cents. If 0 → merchant explicitly disabled. Ifpct_used >= 1→ cap reached for the month. - Check
llm.configuredandembeddings.configured. If false → env var missing on Railway. - Check
knowledge.product_chunksandtotal_chunks. If 0 and merchant enabled Mode B/C → catalog/policy sync hasn’t run.
”Customer reports a hallucinated answer”
Section titled “”Customer reports a hallucinated answer””- Find the chat in
ai_chat_sessions(via dashboardTranscriptstab, filter by date). - Check
messagesfor the offending answer. - Check
mode_b_detour/mode_c_detourevents for thatchat_id—props.results_countshows what the search returned. - If
results_count: 0, the AI fell through to the “I don’t have a match” fallback. Customer concerns are likely about a product/policy that isn’t synced. - If
results_count > 0, the AI may have hallucinated despite citations. Forward to ML team with the transcript ID.
”Stripe webhook not flipping chat to complete”
Section titled “”Stripe webhook not flipping chat to complete””- Find the chat by
payment_intent_id:select * from ai_chat_sessions where payment_intent_id = 'pi_xxx'; - Check the webhook logs (Railway → API service). Look for
"AI chat ${id} completed via PI ${pi.id}". - If not present, check the PI metadata —
swft_chat_idmust be set, otherwise the AI chat handler doesn’t fire.
”Cost spiking unexpectedly”
Section titled “”Cost spiking unexpectedly””- Hit health endpoint, note
cost.monthly_spend_cents. - Query
ai_chat_eventsfor the merchant grouped byevent_type:-- running_cost_cents is cumulative per chat — use count() for volume,-- and sum on ai_chat_sessions.llm_cost_cents for actual dollars.select event_type, count(*) as eventsfrom ai_chat_eventswhere merchant_id = 'm-xxx' and ts > now() - interval '24 hours'group by event_typeorder by events desc; - If
mode_c_detouris high, Sonnet is the culprit (5× cost). Disable Mode C if needed. - If
state_transitionis high, narration cost. Mode A defaults are cheap; if elevated, check for runaway loops.
”Plugin not auto-syncing products”
Section titled “”Plugin not auto-syncing products””- WordPress error log on the merchant’s host
- Check
swft_enabledandswft_api_keyoptions - Check
swft_ai_chat_enabledoption - Try a manual save on a product — should fire
woocommerce_update_producthook - Check Cloudflare API logs for the merchant’s
POST /merchants/ai-chat/knowledge/productsrequests
Rate limits to be aware of
Section titled “Rate limits to be aware of”- Anthropic: tier-1 default 50 RPM for Haiku. We’re not close in practice but a viral product could spike.
- OpenAI embeddings: tier-1 default 3,000 RPM. Bulk catalog scans throttle to 20 RPS.
- Stripe webhooks: at-least-once delivery. Idempotency is enforced via the
current_state === 'complete'early-return inwebhooks.ts.
Eval gates before releases
Section titled “Eval gates before releases”Before any production release that touches LLM-adjacent code:
cd apiANTHROPIC_API_KEY=sk-ant-... npm run eval:classifierShould report ≥85% accuracy. Below that → block release, investigate prompt drift.
OPENAI_API_KEY=sk-... EVAL_MERCHANT_ID=<staging-merchant> npm run eval:retrieval-productsOPENAI_API_KEY=sk-... EVAL_MERCHANT_ID=<staging-merchant> npm run eval:retrieval-policiesShould both report ≥80% Recall@5. Below that → check embedding model version + fixture data.
Killswitch
Section titled “Killswitch”Per-merchant: set merchants.ai_chat_monthly_cap_cents = 0. Takes effect on the next request (no cache invalidation needed).
Globally: pause API service on Railway. Customers fall back to standard Swft checkout via the plugin’s 402 handler.
Useful queries
Section titled “Useful queries”-- Top 10 merchants by AI Checkout spend this monthselect merchant_id, sum(llm_cost_cents)/100.0 as spend_dollarsfrom ai_chat_sessionswhere created_at >= date_trunc('month', now())group by merchant_idorder by spend_dollars desclimit 10;
-- Merchants approaching their cap (>80%)select m.id, m.name, m.ai_chat_monthly_cap_cents as cap_cents, coalesce(sum(s.llm_cost_cents), 0) as spend_cents, round(100.0 * coalesce(sum(s.llm_cost_cents), 0) / nullif(m.ai_chat_monthly_cap_cents, 0)) as pctfrom merchants mleft join ai_chat_sessions s on s.merchant_id = m.id and s.created_at >= date_trunc('month', now())group by m.id, m.name, m.ai_chat_monthly_cap_centshaving coalesce(sum(s.llm_cost_cents), 0) > 0.8 * m.ai_chat_monthly_cap_centsorder by pct desc;
-- Today's funnel for merchant Xselect current_state, count(*)from ai_chat_sessionswhere merchant_id = 'm-xxx' and created_at >= current_dategroup by current_stateorder by count(*) desc;