How do you decide between latency vs accuracy for an AI feature?
### Signal to interviewer
I can resolve latency-accuracy tradeoffs through segmented operating points tied to user outcomes.
### Clarify
I would clarify request urgency classes, error tolerance by use case, and abandonment sensitivity to delay.
### Approach
Build a latency-accuracy Pareto frontier by cohort and choose route-specific operating points instead of a single universal setting.
### Metrics & instrumentation
Primary metric: successful task completion within cohort latency budget. Secondary metrics: abandonment by response time, correctness score by route, and fallback frequency. Guardrails: severe error growth and timeout spikes.
### Tradeoffs
Lower latency improves responsiveness but can reduce reasoning depth. Higher accuracy improves trust but can increase wait time and compute cost.
### Risks & mitigations
Risk: overfitting to average users; mitigate with cohort-level targets. Risk: hidden quality loss in fast path; mitigate with canary audits. Risk: unstable routing behavior; mitigate with policy hysteresis.
### Example
Customer support triage uses low-latency quick intents first, then escalates complex policy cases to higher-accuracy routes.
### 90-second version
Choose latency versus accuracy by segment and task risk. Optimize for successful outcomes under explicit response-time budgets, and continuously rebalance as behavior changes.
- Which journeys are most latency-sensitive versus accuracy-sensitive?
- What latency threshold begins to hurt completion materially?
- How would you implement dynamic routing across urgency classes?
- What monitoring catches silent quality regressions in fast paths?