RFC: Auto-Assessment & Self-Healing Combo Engine
Was this page helpful?
Loading OmniRoute...
Auto-Assessment Engine that continuously tests, categorizes, and self-heals combo configurations β making omniroute truly "plug and play" for non-technical users.
" errors β , , all fail with 400/404
- No automated way to know which models actually work β the
endpoint lists 1,236 models, but fails for most of them
- Weight-based routing sends traffic to dead models β a model weighted at 30% that returns errors wastes 30% of requests
- Manual diagnosis took hours β we had to curl each model individually, categorize results, then update the SQLite DB
- Provider
field exists but isn't used for routing β has (active/banned/expired/credits_exhausted) but the combo resolver ignores it
Purpose: Probe every provider/model pair with a lightweight chat completion to determine if it works and measure performance.
interface ModelAssessment {
modelId: string;
providerId: string;
status: "working" | "broken" | "rate_limited" | "timeout" | "auth_error" | "unknown";
// Performance metrics
latencyP50: number; // milliseconds
latencyP95: number; // milliseconds
successRate: number; // 0..1 over last N probes
// Capability detection
supportsVision: boolean;
supportsToolCall: boolean;
supportsStreaming: boolean;
maxContextWindow: number;
maxOutputTokens: number;
categories: ModelCategory[]; // 'coding' | 'reasoning' | 'chat' | 'fast' | 'vision' | 'reasoning_deep'
tier: "premium" | "balanced" | "fast" | "free";
// Metadata
lastTested: string; // ISO timestamp
lastError: string | null;
consecutiveFails: number;
probeCount: number;
}
type ModelCategory =
| "coding" // Good at code generation, debugging, refactoring
| "reasoning" // Strong logical reasoning, math, analysis
| "reasoning_deep" // Extended thinking, complex multi-step reasoning
| "chat" // Good conversational ability
| "fast" // Sub-2s response time
| "vision" // Image input support
| "tool_call" // Function/tool calling support
| "structured_output"; // JSON mode / structured output
Assessment Probes β three tiers of testing:
| Quick | |||
| Standard | |||
| Deep |
Scheduling:
Purpose: Classify each working model into capability categories and assign fitness scores per category.
Category Detection Logic:
function categorizeModel(assessment: ModelAssessment): ModelCategory[] {
const categories: ModelCategory[] = [];
// Speed classification
if (assessment.latencyP50 < 2000) categories.push("fast");
// Capability from probe responses
if (assessment.supportsToolCall) categories.push("tool_call");
if (assessment.supportsVision) categories.push("vision");
if (assessment.supportsStreaming) categories.push("structured_output"); // if supports JSON mode
// Tier-based reasoning classification
if (assessment.tier === "premium") {
categories.push("reasoning_deep", "coding", "reasoning");
} else if (assessment.tier === "balanced") {
categories.push("coding", "reasoning");
} else if (assessment.tier === "fast") {
categories.push("chat");
}
return categories;
}
Fitness Scores (0..1 per category):
| (only premium tier eligible) | |
Purpose: Automatically update combo model lists based on assessment results.
Auto-Heal Rules:
Auto-Generation of Combos:
const AUTO_COMBOS = [
{ name: "auto/best-coding", categories: ["coding"], tier: ["premium", "balanced"] },
{ name: "auto/best-reasoning", categories: ["reasoning_deep"], tier: ["premium"] },
{ name: "auto/best-fast", categories: ["fast"], tier: ["fast", "balanced"] },
{ name: "auto/best-vision", categories: ["vision"], tier: ["premium", "balanced"] },
{ name: "auto/best-chat", categories: ["chat"], tier: ["balanced", "premium"] },
{ name: "auto/coding", categories: ["coding"], tier: ["balanced", "fast", "premium"] },
{ name: "auto/fast", categories: ["fast"], tier: ["fast"] },
{ name: "auto/pro-coding", categories: ["coding"], tier: ["premium"] },
{ name: "auto/pro-reasoning", categories: ["reasoning_deep"], tier: ["premium"] },
{ name: "auto/pro-vision", categories: ["vision"], tier: ["premium"] },
{ name: "auto/pro-chat", categories: ["chat"], tier: ["premium"] },
{ name: "auto/pro-fast", categories: ["fast"], tier: ["fast"] },
];
-- New tables for assessment engine
CREATE TABLE IF NOT EXISTS model_assessments (
id TEXT PRIMARY KEY,
model_id TEXT NOT NULL,
provider_id TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'unknown', -- working|broken|rate_limited|timeout|auth_error|unknown
latency_p50 INTEGER, -- milliseconds
latency_p95 INTEGER, -- milliseconds
success_rate REAL DEFAULT 0, -- 0..1
supports_vision INTEGER DEFAULT 0,
supports_tool_call INTEGER DEFAULT 0,
supports_streaming INTEGER DEFAULT 0,
supports_structured_output INTEGER DEFAULT 0,
max_context_window INTEGER,
max_output_tokens INTEGER,
categories TEXT DEFAULT '[]', -- JSON array of ModelCategory
fitness_scores TEXT DEFAULT '{}', -- JSON object: {category: score}
tier TEXT DEFAULT 'balanced', -- premium|balanced|fast|free
last_tested TEXT,
last_error TEXT,
consecutive_fails INTEGER DEFAULT 0,
probe_count INTEGER DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(model_id, provider_id)
);
CREATE TABLE IF NOT EXISTS assessment_runs (
id TEXT PRIMARY KEY,
started_at TEXT NOT NULL,
completed_at TEXT,
models_tested INTEGER DEFAULT 0,
models_passed INTEGER DEFAULT 0,
models_failed INTEGER DEFAULT 0,
models_rate_limited INTEGER DEFAULT 0,
duration_ms INTEGER,
trigger TEXT DEFAULT 'scheduled', -- scheduled|on_demand|on_provider_change|on_error
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS combo_health (
combo_id TEXT PRIMARY KEY,
healthy_model_count INTEGER DEFAULT 0,
dead_model_count INTEGER DEFAULT 0,
total_model_count INTEGER DEFAULT 0,
last_auto_fix TEXT,
auto_fix_count INTEGER DEFAULT 0,
health_score REAL DEFAULT 0, -- 0..1, weighted by model health
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (combo_id) REFERENCES combos(id)
);
CREATE INDEX IF NOT EXISTS idx_model_assessments_status ON model_assessments(status);
CREATE INDEX IF NOT EXISTS idx_model_assessments_provider ON model_assessments(provider_id);
CREATE INDEX IF NOT EXISTS idx_model_assessments_tier ON model_assessments(tier);
CREATE INDEX IF NOT EXISTS idx_combo_health_health_score ON combo_health(health_score);
# Trigger assessment (blocking or background)
POST /api/assess/models
Body: { "scope": "all" | "provider:<id>" | "model:<id>", "tier": "quick" | "standard" | "deep" }
Response: { "run_id": "...", "status": "started" }
# Get assessment results
GET /api/assess/results
Query: ?status=working|broken|rate_limited&provider=kiro&category=coding
Response: { "models": [...] }
# Get combo health dashboard
GET /api/assess/combo-health
Response: { "combos": [{ "id": "...", "name": "...", "healthy_models": 5, "dead_models": 2, "health_score": 0.71 }] }
# Auto-fix all combos
POST /api/assess/auto-fix
Response: { "fixed_combos": 3, "removed_models": ["ollamacloud/glm-5.1", "..."], "added_models": [...] }
# Auto-generate combos from assessments
POST /api/assess/auto-generate
Response: { "generated_combos": ["auto/best-coding", "..."], "models_per_combo": { "auto/best-coding": 5 } }
# Get assessment run history
GET /api/assess/runs
Response: { "runs": [...] }
already handles , , , , and strategies. The enhancement:
// In comboResolver.ts β add health-aware filtering
export function resolveComboModel(combo, context = {}) {
const models = combo.models || [];
if (models.length === 0) {
throw new Error(`Combo "${combo.name}" has no models configured`);
}
const normalized = models
.map((entry) => ({
model: getComboStepTarget(entry) || "",
weight: getComboStepWeight(entry) || 1,
}))
.filter((entry) => entry.model);
// NEW: Filter out models known to be broken/rate_limited
const healthy = normalized.filter((entry) => {
const assessment = getAssessment(entry.model);
if (!assessment) return true; // Unknown β allow (haven't tested yet)
return assessment.status === "working" || assessment.status === "unknown";
});
// If all models are unhealthy, fall back to full list (better to try than to fail)
const pool = healthy.length > 0 ? healthy : normalized;
const strategy = combo.strategy || "priority";
// ... existing resolution logic using `pool` instead of `normalized`
}
| NEW: from assessment categories | ||
currently uses a static fitness lookup (). With assessments, we derive fitness from live probe results β a model that actually passes coding probes gets a high fitness, not just because its name contains "coder".
, tables
-
-
-
-
-
| NEW | ||
| NEW | ||
| NEW | ||
| MODIFIED | ||
| MODIFIED | ||
| NEW | ||
| MODIFIED | ||
| NEW | ||
| NEW | ||
| NEW |
# Our actual test sequence that should be automated:
# 1. Start omniroute with 406 provider connections (51 providers)
# 2. Discover only 8 models actually work from 2 providers (kiro, ollamacloud)
# 3. Manually update 44 combos with working models only
# 4. Verify all 15 key combos pass end-to-end
# 5. Set up auto-sync cron for model list updates
# With auto-assessment, this entire process should be:
# 1. Start omniroute
# 2. Run: curl -X POST http://localhost:20128/api/assess/models -d '{"scope":"all"}'
# 3. Wait for assessment to complete
# 4. Run: curl -X POST http://localhost:20128/api/assess/auto-fix
# 5. All combos are now healthy
and be treated as the same model for combo purposes?