ALAI AI System — v2.0 Operating Picture & Master Roadmap

ALAI AI System — v2.0 Operating Picture & Master Roadmap

Date: 2026-05-19 Architect: Petter Graff Status: SYNTHESIS COMPLETE — pending dual validation (Proveo + Verifier) Supersedes: ceo-ai-system-audit-2026-05-18-REPORT.md (v1.1 — Wave 1 still canonical for inventory; v2.0 adds design + build roadmap)


1. Executive Brief

The ALAI AI system is a system that builds systems — and it has stopped building. Over the last 8 days it burned $742K on Anthropic Opus (99.98% of all spend), peaked at $377,487 in a single day (2026-05-11), and shipped zero production code in 7 days. Wave 1 (2026-05-18) identified the symptoms; Wave 2 (three parallel teams: Control, Knowledge, Workflow) identified the single causal narrative:

The orchestrator steers by frozen instruments, dispatches through gates that don't fire, into a free-tier fleet that doesn't exist, validates with probes that never run, and ships into a backlog with no exit. Every "save" is a watchdog that itself is dormant. The meta-failure — hook-drift-detector daemon exit 2, stopped — is what allows all other silent failures to hide.

The three planes fail compoundingly:

If you read nothing else

One sentence per plane


2. The Three Planes (Target Architecture)

2.1 Mermaid Super-Diagram

flowchart TB
  subgraph CEO_SURFACE [CEO Surface]
    Prompt[CEO prompt / Slack]
    Email[CEO email IMAP]
  end

  subgraph CONTROL [Plane 1 — Control & Determinism]
    KS[Kill switch<br/>tmp alai-killswitch]:::new
    OCG[opus-cost-guard v2<br/>daily $ ceiling]:::fix
    KSW[fleet-reconcile-probe<br/>tier-truth.json]:::new
    RAW[probe-liveness-watchdog]:::new
    HDD[hook-drift-detector v2]:::new
    EL[(evidence-ledger.db<br/>SQLite schema'd)]:::fix
    SSM[session-spend-monitor<br/>per-session $ ladder]:::new
  end

  subgraph KNOWLEDGE [Plane 2 — Knowledge & Memory]
    DJ[discover.js<br/>3-tier front door]:::fix
    L1[L1 MEMORY.md + session]:::ok
    L2[L2 HiveMind 21,741 rows]:::ok
    L3a[L3a LightRAG Azure]:::fix
    L3b[L3b Mem0 facts<br/>KILL → fold to HiveMind]:::kill
    BS[(BookStack 478 pages<br/>canonical wiki)]:::fix
    Z12[ZAKON #12<br/>rag-context-for-builder]:::new
    INV[manifest-index + skill-registry<br/>daily regen]:::fix
  end

  subgraph WORKFLOW [Plane 3 — Orchestration & Workflow]
    EID[email-intake-daemon]:::new
    MC[(MC tasks db)]:::ok
    RTR[router.js classify<br/>discover.js routing alias]:::new
    MEH[mehanik gate]:::fix
    SUB[Specialist subagents]:::ok
    PIO[pi-orchestrator<br/>route_eligibility expanded]:::fix
    PRO[Proveo E2E validation]:::ok
    TLDR[TLDR daemon<br/>~/system/data/insights]:::new
    TTL[backlog-ttl-daemon]:::new
    ESC[escalation-matrix hook]:::new
  end

  Prompt --> Z12
  Email --> EID --> MC
  MC --> RTR --> MEH --> SUB
  SUB -.queries.-> DJ
  DJ --> L1 & L2 & L3a & L3b
  DJ -. cite .-> BS
  Z12 --> DJ
  SUB --> OCG
  OCG -. breach .-> KS
  SSM -. breach .-> KS
  KS -. blocks.-> SUB & MEH
  KSW -. health .-> SUB
  RAW -. probes .-> PRO
  PRO --> EL
  EL --> MC
  HDD -. watches .-> OCG & KSW & RAW & EID & TLDR
  PIO --> PRO
  SUB --> PIO
  MC --> TTL
  TTL --> TLDR --> Prompt
  ESC -. gates .-> Prompt
  INV -. truth .-> DJ

  classDef new fill:#1d8c43,color:#fff
  classDef fix fill:#d4a017,color:#000
  classDef kill fill:#b3261e,color:#fff
  classDef ok fill:#5b9bd5,color:#fff

Legend: green = new build, yellow = fix-in-place, red = formal kill, blue = working today.

2.2 Plane Summaries

Control plane (Team A). Current: Probes designed but not running (0 PROBE_PASS events 7d). Hooks present (58) but only 5 with today's audit logs. opus-cost-guard blocks per-agent name match, not $-ceiling. May 11 ($377K) would not have triggered any gate. Evidence ledger SQLite empty (0 tables); JSONL = 100% force_completion. Tier router blind: 4/14 routes point at ghost models. Target: Hard $-ceiling + global kill-switch + live fleet reconcile (5-min cycle) + Reality Anchor watchdog auto-restarting dormant probes + evidence-ledger schema with HMAC chain + per-hook audit-log convention enforced by hook-drift-detector v2. MCs: 9 (T-A-01 through T-A-09).

Knowledge plane (Team B). Current: 5 critical governance subsystems (Reality Anchor, ZAKON NULA, Tier Router, Evidence Ledger, Hooks) have ZERO BookStack pages. discover.js cites stale manifest. ZAKON #12 dormant — every builder dispatch eats ~15K tokens of full MEMORY.md re-injection. LightRAG: degraded (15% timeout), public endpoint CF Access blocked, pump capped 600/run with 23,558 backlog. Mem0 dead. ADR numbering collisions (025×2, 026×4). Target: One front door (discover.js memory --budget=2000) that spans L1+L2+L3 with token-budget contract. CF Access rotated → BookStack + LightRAG public both unblocked. ZAKON #12 wired into PreToolUse → ~105K tokens/day saved. 8 governance pages published; ADR allocator + collision repair. Mem0 killed (Path B), folded into HiveMind facts table. Library built (Path A) as central skill registry. MCs: 17 (MC-B01 through MC-B17).

Workflow plane (Team C). Current: CEO email pipeline broken at every transition. Email→MC linkage dead (873/887 unlinked, 80 replay_required with no replay daemon). discover.js routing CLI is fictional. claude-builder queue: 2,945 failed since April. PI-orch alive but route_eligibility=['post-build'] excludes every real MC. TLDR daemon writes to nonexistent dir. 2,400 zombie MCs. 65 agent files vs 30 mapping keys. Target: email-intake-daemon classifies via local qwen3 ($0) → MC link 100%. router.js classify made real (alias makes CLAUDE.md claim honest). Mapping JSON closed (0 orphans). backlog-ttl-daemon enforces 30d/60d retirement. PI-orch route filter expanded to 5 categories → free-tier execution path revived. Session-spend-monitor closes the gap opus-cost-guard cannot (main session burn). Escalation matrix hook silences micro-decision pings to CEO. MCs: 13 (MC-C1-1 through MC-C5-1).


3. Cross-Plane Couplings (the new picture Wave 1 didn't see)

These five couplings are why no single team can finish in isolation, and why sequencing matters.

3.1 ZAKON #12 wire-in = A + B + C all three

3.2 Cost guard is 3 layers, one per plane

3.3 discover.js is the single front door — three teams patch it

3.4 Email pipeline is ONE workflow with THREE breaks

The CEO daily flow has a single physical pipeline (Email → email-inbox.db → MC → router → mehanik → specialist → proveo → done → TLDR) with three independent breaks:

3.5 Gate-gaming (verdict-ledger 100% force_completion) is a consequence of A + B + C all failing

Cross-Team Contradictions (resolved)

Reviewed all three audit docs for conflicting claims; no hard contradictions found, only resolved revisions:


4. Master Roadmap (4 Weeks)

Week Theme Teams MCs to ship End-state gate (deterministic probe) Rollback
1 Stop the bleed A T-A-01 kill switch, T-A-02 $ ceiling, T-A-03 fleet reconcile, T-A-04 devstral, T-A-05 MLX, T-A-06 probe watchdog, T-A-07 evidence schema, T-A-08 hook-drift v2, T-A-09 daemon sweep control-plane-health.sh returns 7/7 PASS: killswitch round-trip; cost-ceiling fires at synthetic $1000; tier-truth.json all 14 tiers healthy or explicitly disabled; probe-watchdog detects 48h synthetic stall; evidence-ledger.db has table + row-count == JSONL; hook-drift detects 24h synthetic silence; 0 flapping daemons Disable killswitch + revert hook-drift v2 plist; T-A-02 ceiling can be raised to $10K/day as soft-rollback. Evidence schema is additive — no rollback needed.
2 Lights on B (+ A finishing T-A-08 integration) MC-B01 CF token, MC-B02 LightRAG pump, MC-B03 outbox-ingest decision, MC-B04 rag-context rewrite, MC-B05 ZAKON #12 wire, MC-B06 inventory regen, MC-B07 self-check, MC-B08 memory upgrade, MC-B09 HiveMind purge, MC-B10 dead-agent TTL discover.js --self-check reports 0 drift on day 7; curl https://lightrag.alai.no/health returns 200; bookstack-staleness.js sample returns JSON; ZAKON #12 fires logged for ≥80% of builder dispatches; pre/post token count shows ≥40% reduction in builder prompts MC-B05 hook is opt-in via env flag ZAKON12_ENABLED=1 for first 24h; if drift >5% on day 1, revert to off. MC-B09 stub removal: archive-first, restore is cp from _archive/.
3 Workflow restored C MC-C1-1 email→MC, MC-C1-2 router.js, MC-C1-3 mapping cleanup, MC-C1-4 TLDR, MC-C2-1 backlog TTL, MC-C2-2 session-spend, MC-C2-3 per-MC budget, MC-C3-1 HiveMind cleanup, MC-C3-2 skill registry, MC-C3-3 MCP cleanup, MC-C4-1 pi-orch routes, MC-C4-2 claude-builder archive, MC-C5-1 escalation hook E2E test: CEO sends 1 test email → MC linked <5min → routed → mehanik authorized → specialist returned <60min → Proveo PASS to Slack #ceo-digest with screenshot → TLDR digest 6h later. 8/9 sub-criteria pass. MC-C1-1 daemon can be disabled; backfill MC link via one-off script. MC-C2-2 session monitor is alert-only first 48h before model-flip is enabled. MC-C5-1 hook is WARN-only first 7 days.
4 Production resumes All teams hardening + Bilko/Drop work Production MCs from BUILD-BLUEPRINT.md per project; no new system-level MCs except hardening git log --since=7.days --author=alai-builders ~/projects/bilko-cloud > 5 commits AND costs.db today < $5K AND verdict-ledger PROBE_PASS:force_completion ≥ 1:1 If Week 4 cost burn returns to >$10K/day → freeze prod work, return to Week 3 hardening. Killswitch always available.

Gate between weeks: each week's end-state probe must PASS before the next week's specialist dispatches are authorized. CEO sign-off on probe report = go.


5. MC Inventory (Consolidated 39 MCs)

ID Title Team Prio Week $ Save Dep
T-A-01 Kill switch + CLI A BLOCKER 1 insurance
T-A-02 opus-cost-guard v2 daily $ ceiling A BLOCKER 1 $20-70K/d T-A-01
T-A-03 fleet-reconcile-probe + tier-truth A H 1 $2-8K/d T-A-01
T-A-04 devstral pull or remap A H 1 $5-15K/d T-A-03
T-A-05 MLX M2c+M3 repair A H 1 $1-5K/d T-A-03
T-A-06 Reality Anchor watchdog A H 1 risk-redux T-A-01
T-A-07 Evidence ledger SQLite schema A H 1 risk-redux
T-A-08 hook-drift-detector v2 A M 1 risk-redux T-A-01, T-A-07
T-A-09 Daemon hygiene sweep A M 1 $0 direct
MC-B01 CF Access token rotate B H 2 unblock $15-42/mo
MC-B02 LightRAG pump 600→5000 B H 2 40-80K tok/d B01
MC-B03 outbox-ingest restore/decom (ADR-036) B M 2 qual B01
MC-B04 rag-context-for-builder rewrite B H 2 105K tok/d B02, T-A-08
MC-B05 ZAKON #12 PreToolUse hook B H 2 activates B04 B04, T-A hook fw
MC-B06 Daily inventory regen cron B H 2 5-30K tok/d
MC-B07 discover.js --self-check at boot B H 2 indirect B06
MC-B08 discover.js memory 3-tier upgrade B M 2 qual B02, B06
MC-B09 Purge 3 orphan HiveMind stubs B M 2 10K tok/d
MC-B10 Dead-agent TTL ADR-035 B M 2 6K tok/d
MC-B11 bookstack-staleness daemon revive B H 3 $0 direct B01
MC-B12 Publish 8 governance pages B H 3 $0 direct B01
MC-B13 ADR allocator + 6 collision repair B M 3 $0
MC-B14 Mem0 ADR-033 (recommend KILL) B M 3 consolidation
MC-B15 Library ADR-034 (recommend BUILD) B M 3 qual B06
MC-B16 specialist-mapping audit B M 3 $1-3/mo B06
MC-B17 Hook .bak cruft cleanup B L 3 $0
MC-C1-1 email-intake-daemon C BLOCKER 3 unblock A T-A fleet
MC-C1-2 router.js classify CLI C H 3 unblock C1-3
MC-C1-3 specialist-mapping completion + ADR-027 C H 3 $1-3/mo
MC-C1-4 TLDR daemon reconnect C H 3 qual (closes loop) C1-1
MC-C2-1 backlog-ttl-daemon C H 3 signal/noise C1-4
MC-C2-2 Session spend monitor (Layer 2) C BLOCKER 3 $5-30K/d session cap T-A-02
MC-C2-3 Per-MC budget (Layer 3) C H 3 $1-5K/d C2-2
MC-C3-1 HiveMind ~85 zombie + 46 pollution cleanup C M 3 qual
MC-C3-2 Skill registry + retire wave C M 3 qual
MC-C3-3 MCP audit + decom stitch+local-rag (ADR-029) C M 3 startup time
MC-C4-1 pi-orch route_eligibility expansion C M 3 free-tier revival T-A-04, T-A-05
MC-C4-2 claude-builder fossil archive (ADR-030) C M 3 $0
MC-C4-3 edita owner audit + reassign C M 3 signal/noise
MC-C5-1 Escalation matrix hook C H 3 CEO-attention save C1-4

Plus 5 Wave 1 P0 carryovers (now subsumed): P0-1 #101375 → T-A-02; P0-2 #101376 → T-A-04; P0-3 #101377 → T-A-06; P0-4 #101378 → MC-B07; P0-5 #101379 → T-A-05.

Total Wave 2 MCs: 40 distinct (including MC-C4-3) + 5 Wave 1 P0 consolidated.


6. Risks & Open CEO Decisions

  1. Mem0 — resurrect (Path A) or kill+fold-into-HiveMind (Path B)? Recommendation: B. Reduces moving parts; Qdrant runtime removed; HiveMind facts table covers same use case. Mem0 has been dead 14+ days with no detected loss. Formalize via ADR-033 (MC-B14).

  2. Library system — build (Path A) or kill (Path B)? Recommendation: A — minimal build. ~/system/library.yaml is real intent, no consumer ever shipped. A 1-day install script gives one-place control over which skills are active where; the alternative is 96 skills with no source-of-truth. Formalize via ADR-034 (MC-B15).

  3. PI-orchestrator — expand route filter (Path A) or formal decommission (Path B)? Recommendation: A first, B as fallback. MC-C4-1 expands route_eligibility to 5 categories. Kill criterion (auto): if after T-A-04 + T-A-05 + MC-C4-1 ship, pi-orch still has 0 matching tasks in 7 days, formal kill via ADR-026 (one of the existing collision files — repaired in MC-B13).

  4. claude-builder durable-runner queue — drain + restart, or replace? Recommendation: drop the queue, do not restart. 2,945 failed / 1 completed since April = the architecture is fossilized. MC-C4-2 archives. Future "durable-runner v2" decision punts to Week 5+; not in current scope.

  5. 2,400 zombie MC tasks — auto-close at >14d idle? Recommendation: tiered TTL via MC-C2-1. Open + M/L + >30d → auto-pause. Paused + >60d → auto-close. H + open + >14d → CEO digest entry. Not blanket auto-close — preserves CEO-owned tasks (alem has 72 open).

  6. Production code resumption — Week 4 firm or conditional? Recommendation: conditional on Week 3 end-state E2E probe (8/9 sub-criteria PASS + 48h cost <$5K/day). If both gates green, resume Week 4. If either red, Week 4 = hardening cycle; production code Week 5.

  7. Daily $ ceiling level (T-A-02) — $500/day Opus default? Recommendation: yes, with ~/system/config/cost-ceilings.json knob. Pre-AI-Services-revenue, $500/day Opus = $15K/month. Override token TTL 60s for CEO-explicit cases. If CEO wants $300/day, change one JSON line.

  8. Session-spend ladder (MC-C2-2) — $200 alert / $500 model-flip / $1000 kill? Recommendation: alert-only first 48h, then enable model-flip + kill. Avoids same-day surprise on already-running session.

  9. Wave 2 build budget — what's the Opus ceiling for the build phase itself? Recommendation: $250 total for all 40 MCs. Each MC ≈ $1 prompt-forge + $2-5 specialist + $1 Sonnet sub + $1 Proveo + $0.50 Skillforge ≈ $5-8 avg. Build cost ≪ 1 hour of current burn. Use /prompt-forge only for H/BLOCKER (Week 1 + Week 3 BLOCKERs); skip for M/L.


7. Total Economics

Source Daily save (conservative) Daily save (optimistic) Monthly (conservative)
T-A-02 cost ceiling $20,000 $70,000 $600,000
T-A-03/T-A-04 ghost tier kill $5,000 $15,000 $150,000
T-A-05 MLX repair $1,000 $5,000 $30,000
MC-B04/B05 ZAKON #12 wire $0.50 (token) $1.40 (token) $15-42 (token equiv)
MC-B06 inventory regen (re-dispatch prevent) $0.30 $1.80 $9-54
MC-C2-2 session spend ladder (caps catastrophic) $5,000 $30,000 $150,000
MC-C1-1 email→MC (operational efficiency) $0 direct $0 direct unblocks revenue
MC-C2-1 backlog TTL (signal/noise) $0 direct $0 direct CEO time
Total ~$26,000/day ~$90,000/day $780K–$2.7M/month

Wave 2 build phase cost (Opus + Sonnet): ~$250 one-time (see Decision 9).

Payback: <1 hour of current burn at conservative $26K/day = $1,083/hour. Build pays for itself in roughly 13 minutes of current operations.


8. Validation Plan

8.1 Proveo (Angie Jones) — re-probe ≥20% of synthesis claims

Focus areas (load-bearing claims):

Output: ~/tmp/proveo-v2-operating-picture-validation.jsonl.

8.2 Verifier — atomic-claim decomposition

Decompose into atomic claims:

Verdicts per claim: CONFIRMED / PARTIAL / HALLUCINATION. Cost <$0.50.

8.3 Publish

After dual validation PASS → BookStack page "System Architecture" book, page "ALAI AI System v2.0 — Operating Picture & Master Roadmap (CEO Rebuild Brief)". This becomes canonical; v1.1 (Wave 1) demoted to historical reference.


9. Build Phase Dispatch Order (Week 1 only)

Weeks 2–4 dispatch after Week 1 closes (gate from §4).

Day 1 (0–4h):  /prompt-forge T-A-01 → /mehanik → FlowForge dispatch (Kelsey)
                AC probe: killswitch round-trip + 17 PreToolUse hooks updated.

Day 1 (4–10h): /prompt-forge T-A-02 → /mehanik → FlowForge + Securion review dispatch
                AC probe: synthetic $1,000 cost row → next Opus dispatch BLOCKED + killswitch touched.

Day 2:         /prompt-forge T-A-03 → /mehanik → AgentForge + FlowForge dispatch (Georgi + Kelsey)
                AC probe: stop ANVIL Ollama → tier-truth marks 3 tiers unhealthy in 5min → restart recovers.

Day 3 (parallel A):  /mehanik T-A-04 → AgentForge (Georgi) — devstral pull/remap.
Day 3 (parallel B):  /mehanik T-A-05 → AgentForge (Georgi) — MLX M2c+M3 repair.
                Skip /prompt-forge for both (M-priority).

Day 4-5:       /prompt-forge T-A-06 → /mehanik → FlowForge + AgentForge dispatch
                AC probe: touch probe last.jsonl mtime=48h → watchdog STALL + restart in 5min.

Day 5-6:       /mehanik T-A-07 → CodeCraft (Bruce Momjian) dispatch (M-priority, no prompt-forge).
                AC probe: insert null-path row → mc.js done exits 2 "evidence_path required".

Day 6-7:       /mehanik T-A-08 → FlowForge + Securion dispatch.
                AC probe: kill pilot-discover-inject.py 24h → drift detector flags in 15min.

Day 7:         /mehanik T-A-09 → FlowForge dispatch (daemon sweep).
                Then run `control-plane-health.sh` master probe.
                7/7 PASS → CEO go-ahead for Week 2 Team B dispatch.
                <7 PASS → Week 1 extends by 1-2 days; do NOT proceed to Week 2.

After every dispatch: /task-postflight + verifier subagent in bg (per feedback_active_verifier_pattern_2026-05-14).

Each MC closes with mc.js done <id> only after Proveo PASS + Skillforge BookStack page (ZAKON PLAN).


END v2.0 OPERATING PICTURE.

Sources:


10. Validation Patches v2 (applied 2026-05-19 after Proveo + Verifier)

Sources: /tmp/srz-rebuild-2026-05-19/proveo-v2-verdict.json, /tmp/srz-rebuild-2026-05-19/verifier-v2-report.json

Patch Original Corrected Source
V2-P1 "skill-registry.db has 1 row for 96 skills" 96 rows, but only 12 with use_count>0; needs last_used column verifier KP4
V2-P2 "Build cost: <$100" ~$250 (40 MCs × $5–8 avg, consistent with §6 Decision 9 math) verifier D4
V2-P3 "8 governance pages on BookStack" 5 governance pages (Reality Anchor, Determinism, Tier Router, Evidence Ledger, Hooks) verifier KP11
V2-P4 "Total Wave 2 MCs: 39 distinct" 40 distinct (MC-C4-3 edita owner audit was missed in count) verifier MC1
V2-P5 "65 agent files vs 30 mapping keys = 37 orphans" 65 disk vs 52 mapping entries = 13 orphans verifier WP8
V2-P6 "verdict-ledger 100% force_completion" 79/107 rows (74%) force_completion; 28 standalone/done; PROBE_PASS=0 (gate-gaming concern stands) verifier CP8
V2-P7 "claude-builder queue 2,945 failed / 1 completed" TWO subsystems: queue-table has 2,944 rows (verifier WP3); durable-runner.db has 295/1/1 completed/failed/pending (Proveo C-04). MC-C4-2 NEEDS RE-PROBE before dispatch. Proveo C-04 + verifier WP3
V2-P8 "TLDR daemon writes to ~/system/data/insights/ which does not exist" Daemon writes to ~/system/logs/tldr-insights/ which EXISTS with files from 2026-04-24. MC-C1-4 scope needs re-audit. Proveo C-11
V2-P9 "manifest-index.md last 2026-02-26" mtime 2026-04-06 (Feb 26 is content audit date inside file); 43 days stale verifier KP3
V2-P10 "HiveMind 21,741 rows" 21,930 live (audit-snapshot drift) verifier KP5
V2-P11 "True 7d = $365,104" $366,236 (Proveo C-10, ±0.3% rounding) Proveo C-10
V2-P12 "MC backlog blocked = 2,239" 2,241 (Proveo C-02, +2 drift) Proveo C-02

Re-probe required (BLOCKERS for build dispatch):

Verdict on v2.0 after patches: Strategic narrative + 4-week roadmap + 9 CEO decisions HOLD. Six precision errors corrected in this section. v2.0 is publication-ready with footnoted re-probes on MC-C4-2 + MC-C1-4.


Revision #2
Created 2026-05-19 15:55:25 UTC by John
Updated 2026-06-21 20:03:50 UTC by John