AAOS — ALAI Agent Operating System Executive Summary AAOS is the enforcement runtime for the ALAI agent system. It turns optional protocols (RAG-first, GOTCHA, evidence tracking, quality gates) into mandatory runtime gates that every agent passes through on every lifecycle transition. Core insight: Enforcement belongs at state transitions , not at every tool call. Per-tool-call enforcement caused 348 blocks/session (system unusable). AAOS uses 4 gates at 4 transitions — proven workable. Spec file: ~/system/specs/aaos-architecture.md Deployed: 2026-04-02 MC Task: #6921 Architecture Layers Layer 5: INTERFACE — John (Orchestrator) | MC Dashboard | Slack | CLI Layer 4: ORCHESTRATION — pi-orchestrator.js | team-coordinator.js | pipeline-engine.js Layer 3: ENFORCEMENT — Spawn Gate | Exec Gate | Claim Gate | Close Gate Layer 2: LIBRARY — Tool Registry | Skill Registry | RAG Index | Agent Registry | Context Assembler Layer 1: COMPUTE — Ollama ANVIL (12 models) | Ollama FORGE (7 models) | Claude API | Local Tools Layer 0: PERSISTENCE — SQLite (54 DBs) | Filesystem | HiveMind | Qdrant (vector search) The 4 Enforcement Gates Gate When Checks Implementation SPAWN GATE Agent creation MC task exists & in_progress, GOTCHA written (H/M), team composition meets minimum, budget check kernel/spawn-gate.js + pi-orchestrator Step 4.5 EXEC GATE During execution WIP limit (max 3), tool whitelist, budget cap, timeout Existing hooks ( alai-hooks binary) CLAIM GATE Before "done" All claims labeled L0-L4, no L0/L1 in final report, evidence artifacts exist kernel/claim-gate.js CLOSE GATE Task completion QA-19 score meets threshold, metrics recorded to agent_metrics, learning posted to HiveMind mc.js done handler Trust Levels (ZAKON #21) Level Meaning Allowed L0 Unverified — agent says "done" with no evidence ❌ Never to CEO L1 Self-Tested — agent ran its own tests ❌ Never to CEO L2 Peer-Tested — validator or tester confirmed ✅ Minimum for reports L3 Machine-Verified — exit codes, HTTP responses, DOM checks ✅ Required for aggregate claims L4 Human-Verified — Alem confirmed ✅ Gold standard Library-in-the-Middle The Library is a Node.js module ( kernel/library.js ) that unifies access to all existing stores. Agents don't browse ~/system/ looking for files — they call the Context Assembler which returns exactly what they need, within a token budget. API const library = require('~/system/kernel/library.js'); // Assemble full context for an agent on a task library.assemble(taskId, agentId) → { coreProtocol, agentPersona, projectContext, ragContext, skillSet, toolWhitelist, rules, tokenBudget } // Individual registries library.tools.search(query) // Search 1310 tools library.tools.audit(toolName, agentId, taskId) // Record usage library.skills.forAgent(agentId) // Cookbook-matched skills library.context.rag(query, limit) // HiveMind semantic search library.agents.roster(taskType, priority) // Recommended team composition library.rules.forTask(taskType) // Relevant ZAKONs Token Budgets Model Max Context Tokens Claude Opus 32,000 Claude Sonnet 16,000 Claude Haiku 4,000 Ollama 32B 8,000 Ollama 8B 4,000 Team Composition Rules Config: ~/system/config/team-templates.json Task Type Min Team Required Roles Trivial fix 1 Builder only Feature (M priority) 3 Builder + Validator + Tester Feature (H priority) 5 Builder + Validator + 2 Testers + Security Architecture 3 Architect + Devil's Advocate + Validator Deploy 3 Builder + DevOps + Validator Financial 3 Builder + Finance + Validator Specialist Agents 22 agents total in specialist-mapping.json . Key additions (2026-04-02): Builders (Write/Edit access) Agent Company Domain Expertise Hadi Hariri CodeCraft Kotlin/Ktor Kotlin, Ktor, coroutines, Gradle, JVM optimization Lee Robinson CodeCraft Next.js 15 App Router, React Server Components, Tailwind, Vercel Testers (READ-ONLY — no Write/Edit) Agent Company Focus Style Angie Jones Proveo Test automation Frameworks, E2E, API contracts, regression James Bach Proveo Exploratory testing Skeptical, edge cases, "what would a real user do?" Lisa Crispin Proveo Agile testing Business rules, acceptance criteria, Given/When/Then Dorota Huizinga Proveo Performance testing Load testing, chaos engineering, p50/p95/p99 latencies Tester Assignment Rule H-priority: All 4 testers (minimum 3) M-priority: Angie Jones + 1 other (minimum 2) L-priority: Angie Jones (minimum 1) Database Schema (New Tables) All in ~/system/databases/mission-control.db agent_metrics CREATE TABLE agent_metrics ( id INTEGER PRIMARY KEY AUTOINCREMENT, agent_id TEXT NOT NULL, -- e.g., 'bruce-momjian' task_id INTEGER, -- MC task ID qa_score REAL, -- QA-19 score (0-19) token_count INTEGER, -- tokens consumed duration_seconds INTEGER, -- wall clock time escalated BOOLEAN DEFAULT 0, -- task escalated to higher model? model_used TEXT, -- e.g., 'sonnet', 'qwen3:32b' claim_count INTEGER DEFAULT 0, evidence_count INTEGER DEFAULT 0, defects_found INTEGER DEFAULT 0, trust_level TEXT DEFAULT 'L0', -- L0-L4 created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); team_composition CREATE TABLE team_composition ( id INTEGER PRIMARY KEY AUTOINCREMENT, task_id INTEGER NOT NULL, role TEXT NOT NULL, -- builder, validator, tester, security agent_id TEXT NOT NULL, assigned_at DATETIME DEFAULT CURRENT_TIMESTAMP ); library_usage CREATE TABLE library_usage ( id INTEGER PRIMARY KEY AUTOINCREMENT, task_id INTEGER, agent_id TEXT, tool_name TEXT, skill_name TEXT, used_at DATETIME DEFAULT CURRENT_TIMESTAMP ); Pi-Orchestrator Integration Wired 2026-04-02. Backup: pi-orchestrator.js.bak-aaos-20260402 Imports (line 66-72): library.js + spawn-gate.js with graceful degradation Spawn Gate (Step 4.5, line 3288): Advisory check before task claim — logs warning if gate fails, doesn't block pi-orch Library Context (line 770-782): RAG preloading via library.assemble() injected into buildPrompt() Prompt Template (line 928): aaosContextBlock added between contextBlock and projectContextBlock Graceful degradation: If AAOS modules fail to load, pi-orchestrator works exactly as before. Infrastructure Status Component Status Details Docker ✅ UP v29.2 Qdrant ✅ UP 3 collections (sessions, knowledge, hivemind) on port 6333 Ollama ANVIL ✅ UP 12 models on localhost:11434 Ollama FORGE ✅ UP 7 models on 10.0.0.2:11434 Tool Shed ✅ UP 240 tools on port 3050 HiveMind ✅ UP 25,309 entries, keyword search working Hooks Binary ✅ UP 15.7MB arm64, 4 blocking + 1 advisory gate Enforcement Configuration File: ~/.claude/hooks/config/enforcement.json Hook ZAKON Mode HopBuild #5 BLOCKING RAG-First #12 BLOCKING QA-19 #14 BLOCKING Evidence #21 BLOCKING Agent Testing #20 ADVISORY (promote to blocking after 2 weeks) File Map New Files (created 2026-04-02) ~/system/kernel/library.js — Library-in-the-Middle (283 lines) ~/system/kernel/spawn-gate.js — SPAWN GATE enforcement ~/system/kernel/claim-gate.js — CLAIM GATE enforcement ~/system/config/team-templates.json — Team composition rules (6 types) ~/system/specs/aaos-architecture.md — Full architecture spec (1060 lines) ~/system/agents/definitions/hadi-hariri.md + .yaml — Kotlin/Ktor specialist ~/system/agents/definitions/lee-robinson.md + .yaml — Next.js 15 specialist ~/system/agents/definitions/james-bach.md + .yaml — Exploratory tester ~/system/agents/definitions/lisa-crispin.md + .yaml — Agile tester ~/system/agents/definitions/dorota-huizinga.md + .yaml — Performance tester ~/system/agents/identities/{hadi,lee,james,lisa,dorota}-*.md — Full identities Modified Files ~/system/tools/mc.js — CLOSE GATE metrics recording in done handler ~/system/kernel/pi-orchestrator.js — AAOS wiring (spawn-gate + library context) ~/system/agents/specialist-mapping.json — 5 new agents (total: 22) ~/system/databases/mission-control.db — 3 new tables Metrics & Learning Loop Every task completion records to agent_metrics : Agent ID, task ID, model used Duration (seconds from mc.js start to done) QA-19 score (if available) Evidence count (files in /tmp/evidence-{id}/ ) Trust level (L0-L4, based on evidence presence and force flag) Every non-forced completion also posts a learning entry to HiveMind (knowledge type). Success Criteria Zero agents complete a task without RAG preloading (measured by SPAWN GATE rejection count) Zero L0/L1 claims reach Alem (measured by CLAIM GATE + CEO-reported false claims) Every H-priority task has 3+ testers (measured by team_composition table) Agent quality improves over time (measured by avg QA-19 score per agent, monthly) Token efficiency improves (measured by qa_score / token_count ratio, monthly)