Supported Models
Three orchestrated modes for picking how Ava routes work, plus every individual model behind them. Pick a mode, or pick a single model and skip the routing.
Routing Modes
Three ways Ava can pick models for you — single conductor, polyglot ensemble, or a sovereign EU stack. Switch any time from the model picker.
Maestro
One conductor handles every persona, every step. Production-tuned, default for everyone.
- Coordinator
- Qwen 3.6 Plus
- Specialists
- Same model serves the full pipeline — Scout → Architect → Builder → Verifier.
- Data residency
- Mixed — wherever Qwen is hosted
Best for
Daily work, predictable cost, anyone who wants Ava to "just work" without thinking about routing.
Supernova
Polyglot ensemble — the coordinator picks the best specialist for each task, model by model.
- Coordinator
- DeepSeek V4 Pro (1.6T / 49B active, 1M ctx)
- Specialists
- V4 Flash for builds and review · Qwen 3.6 Plus fallback · Qwen Omni when vision is in play
- Data residency
- Mixed — DeepSeek + Qwen infrastructure
Best for
Heavy multi-step work where each subtask wants its own specialist. Frontier coordinator on every plan.
Aurora
European AI stack — sovereign by design. Mistral-only routing in three tiers, never leaves EU infrastructure.
- Coordinator
- Mistral Large 3 (675B / 41B active, 262K ctx) — coordinator + heavy specialists
- Specialists
- Mistral Medium 3.5 (128B dense, 256K, vision encoder from scratch, 77.6% SWE-Bench Verified) — Builder, mid-tier specialists, vision, long-form · Mistral Small 4 — intent gate
- Data residency
- EU only — open weights end-to-end
Best for
GDPR-strict deployments, public-sector and healthcare buyers, anyone with a sovereignty mandate.
Or skip routing entirely — pick a single model below and Ava drives just that one.
Understanding These Numbers
SWE-Bench
Tests whether the model can solve real bugs from GitHub repositories — reading code, understanding the issue, and writing a working fix.
Higher score = Better at fixing real-world code problems autonomously
HumanEval
Given a function description, can the model write correct code that passes all test cases? Measures raw coding ability.
Higher score = More reliable at writing correct code from descriptions
MMLU
A massive exam covering 57 subjects — science, history, law, medicine, maths. Tests general knowledge and reasoning breadth.
Higher score = Broader general knowledge across many domains
MATH
Competition-level maths problems — algebra, calculus, geometry, number theory. Tests deep mathematical reasoning.
Higher score = Stronger at solving complex mathematical problems
GPQA
Graduate-level science questions written by PhD researchers. Even experts struggle with these — tests frontier reasoning.
Higher score = Better at expert-level scientific reasoning
Tool Use
Can the model correctly call functions, pass the right arguments, and chain multiple tools together? Critical for an AI agent.
Higher score = More reliable at using tools like file editing, search, and git
Vision
Can the model understand images — screenshots, diagrams, charts, photos? Tests visual comprehension and reasoning.
Higher score = Better at understanding what it sees on screen
How to Access These Models
Available through your Ava account. Free tier gets 300 credits/month to evaluate the platform — every model, every tool. Upgrade to Pro for 5,000/month when ready, or stay on Free with your own API key (BYOK is unlimited).
Bring Your Own Key. Get an API key directly from the provider, paste it into Ava's settings, and pay the provider directly. No account needed. Runs 100% locally.
Available both ways. Use your Ava account for convenience, or bring your own key for direct access and maximum savings.