Continuously monitor structured output accuracy, reasoning chain integrity, and latency across every major AI model — so you catch regressions before they hit production.
How are the models doing over time?
Select models above to see trends
Track model drift in real time — AI providers push silent updates that break your agents. Lumina runs benchmarks against every major model, tracking structured output accuracy, reasoning step completion, and throughput.
When do providers slow down?
Select models above to see latency trends
See provider reliability at a glance — latency spikes don't announce themselves. Track TTFT and throughput across every model to plan your routing strategy and avoid peak-hour failures.
Lumina watches every model you depend on and alerts you the moment performance drifts. Instead of running expensive eval suites on a schedule, trigger your pipelines only when something actually changes — cutting continuous eval compute by 95%.
From free public dashboards to dedicated enterprise monitoring.
Free
For individuals
$149/mo
For teams
Custom
Contact us
| Feature | Public | Starter | Professional | Enterprise |
|---|---|---|---|---|
| Models tracked | 6 | 15 | 25+ | Unlimited |
| Tasks per run | 15 | 30 | 70 | Custom |
| Test frequency | Hourly | Hourly | 15 min | 5 min |
| Data history | 24 hrs | 30 days | 12 months | Unlimited |
| Web dashboard | ✓ | ✓ | ✓ | ✓ |
| Live leaderboard | ✓ | ✓ | ✓ | ✓ |
| API access | — | — | ✓ | ✓ |
| Failure drill-down | — | — | ✓ | ✓ |
| Drift-triggered alerts | — | — | Webhook/API | PagerDuty etc. |
| Custom evaluations | — | — | — | ✓ |
| SLA reporting | — | — | — | ✓ |
| Support | Community | Priority | Dedicated |