# ALAI AI System — v2.0 Operating Picture & Master Roadmap

# ALAI AI System — v2.0 Operating Picture & Master Roadmap

**Date:** 2026-05-19
**Architect:** Petter Graff
**Status:** SYNTHESIS COMPLETE — pending dual validation (Proveo + Verifier)
**Supersedes:** `ceo-ai-system-audit-2026-05-18-REPORT.md` (v1.1 — Wave 1 still canonical for inventory; v2.0 adds design + build roadmap)

---

## 1. Executive Brief

The ALAI AI system is a *system that builds systems* — and it has stopped building. Over the last 8 days it burned **$742K on Anthropic Opus (99.98% of all spend)**, peaked at **$377,487 in a single day (2026-05-11)**, and shipped **zero production code in 7 days**. Wave 1 (2026-05-18) identified the symptoms; Wave 2 (three parallel teams: Control, Knowledge, Workflow) identified the single causal narrative:

> **The orchestrator steers by frozen instruments, dispatches through gates that don't fire, into a free-tier fleet that doesn't exist, validates with probes that never run, and ships into a backlog with no exit.** Every "save" is a watchdog that itself is dormant. The meta-failure — `hook-drift-detector` daemon exit 2, stopped — is what allows all other silent failures to hide.

The three planes fail compoundingly:

- **Control plane:** `opus-cost-guard` has no daily $ ceiling, defaults ALLOW when `model` field is absent, doesn't gate the main session — only sub-Tasks. The May 11 $377K spike *would not have been blocked*. 4 of 14 tier-routes are ghosts (devstral:24b absent, 2/3 MLX serve wrong model = bge-m3). Most hooks have zero audit logs today (verifier: 60 hooks on disk, majority dark). Evidence ledger SQLite has 0 tables; the JSONL has 107 verdict rows, 79/107 (74%) `force_completion` and 0 `PROBE_PASS` — gate-gaming theater (verifier-corrected).
- **Knowledge plane:** Mem0 (Pillar #3 winner per project_99124) is dead in runtime (port 9000=000, no LaunchAgent). `discover.js` cites `manifest-index.md` (mtime 2026-04-06, 43 days stale; embedded audit date 2026-02-26). `skill-registry.db` carries 96 skill rows but only 12 with non-zero `use_count` and no `last_used` column. BookStack API blocked (CF Access 302). LightRAG pump hard-capped at 600/run with 23,558 backlog that grows. ZAKON #12 RAG injection is referenced but unwired — every dispatch re-inhales ~15K-token MEMORY.md.
- **Workflow plane:** 873 of 887 emails (98.4%) unlinked to MC tasks. `discover.js routing` CLI cited in CLAUDE.md **does not exist** — routing is improvised by LLM. `mehanik` + `dzevad-jahic` referenced but absent from `specialist-mapping.json`. claude-builder durable-runner: 2,945 failed / 1 completed since April. 2,400 zombie MC tasks >14d. TLDR daemon writes to `~/system/data/insights/` which does not exist.

### If you read nothing else

- **A single $-ceiling hook (T-A-02) ships in 1 day and would have prevented the entire May 11 spike. Build it first.**
- **The control plane must turn on before the knowledge plane gets fixed before the workflow plane closes the loop. Week 1 → Week 2 → Week 3.**
- **9 CEO decisions are surfaced (§6). Six are go/no-go on existing components; three are scope-of-resumption.**
- **Conservative combined save: $780K–$2.7M/month. Build cost: <$100. Payback <1 hour of current burn.**

### One sentence per plane

- **Control:** Today blind & ungated → Week 1 kill-switch + $-ceiling + tier reconcile + Reality Anchor watchdog.
- **Knowledge:** Today stale & lying → Week 2 CF token + ZAKON #12 wire + manifest regen + 8 governance pages on BookStack.
- **Workflow:** Today disconnected end-to-end → Week 3 email→MC daemon + router.js + TLDR + backlog TTL + escalation matrix.
- **Production code:** Resumes Week 4 only after E2E test (CEO email → done in <90 min, no mid-loop prompts) passes 8/9.

---

## 2. The Three Planes (Target Architecture)

### 2.1 Mermaid Super-Diagram

```mermaid
flowchart TB
  subgraph CEO_SURFACE [CEO Surface]
    Prompt[CEO prompt / Slack]
    Email[CEO email IMAP]
  end

  subgraph CONTROL [Plane 1 — Control & Determinism]
    KS[Kill switch<br/>tmp alai-killswitch]:::new
    OCG[opus-cost-guard v2<br/>daily $ ceiling]:::fix
    KSW[fleet-reconcile-probe<br/>tier-truth.json]:::new
    RAW[probe-liveness-watchdog]:::new
    HDD[hook-drift-detector v2]:::new
    EL[(evidence-ledger.db<br/>SQLite schema'd)]:::fix
    SSM[session-spend-monitor<br/>per-session $ ladder]:::new
  end

  subgraph KNOWLEDGE [Plane 2 — Knowledge & Memory]
    DJ[discover.js<br/>3-tier front door]:::fix
    L1[L1 MEMORY.md + session]:::ok
    L2[L2 HiveMind 21,741 rows]:::ok
    L3a[L3a LightRAG Azure]:::fix
    L3b[L3b Mem0 facts<br/>KILL → fold to HiveMind]:::kill
    BS[(BookStack 478 pages<br/>canonical wiki)]:::fix
    Z12[ZAKON #12<br/>rag-context-for-builder]:::new
    INV[manifest-index + skill-registry<br/>daily regen]:::fix
  end

  subgraph WORKFLOW [Plane 3 — Orchestration & Workflow]
    EID[email-intake-daemon]:::new
    MC[(MC tasks db)]:::ok
    RTR[router.js classify<br/>discover.js routing alias]:::new
    MEH[mehanik gate]:::fix
    SUB[Specialist subagents]:::ok
    PIO[pi-orchestrator<br/>route_eligibility expanded]:::fix
    PRO[Proveo E2E validation]:::ok
    TLDR[TLDR daemon<br/>~/system/data/insights]:::new
    TTL[backlog-ttl-daemon]:::new
    ESC[escalation-matrix hook]:::new
  end

  Prompt --> Z12
  Email --> EID --> MC
  MC --> RTR --> MEH --> SUB
  SUB -.queries.-> DJ
  DJ --> L1 & L2 & L3a & L3b
  DJ -. cite .-> BS
  Z12 --> DJ
  SUB --> OCG
  OCG -. breach .-> KS
  SSM -. breach .-> KS
  KS -. blocks.-> SUB & MEH
  KSW -. health .-> SUB
  RAW -. probes .-> PRO
  PRO --> EL
  EL --> MC
  HDD -. watches .-> OCG & KSW & RAW & EID & TLDR
  PIO --> PRO
  SUB --> PIO
  MC --> TTL
  TTL --> TLDR --> Prompt
  ESC -. gates .-> Prompt
  INV -. truth .-> DJ

  classDef new fill:#1d8c43,color:#fff
  classDef fix fill:#d4a017,color:#000
  classDef kill fill:#b3261e,color:#fff
  classDef ok fill:#5b9bd5,color:#fff
```

Legend: green = new build, yellow = fix-in-place, red = formal kill, blue = working today.

### 2.2 Plane Summaries

**Control plane (Team A).**
*Current:* Probes designed but not running (0 PROBE_PASS events 7d). Hooks present (58) but only 5 with today's audit logs. `opus-cost-guard` blocks per-agent name match, not $-ceiling. May 11 ($377K) would not have triggered any gate. Evidence ledger SQLite empty (0 tables); JSONL = 100% `force_completion`. Tier router blind: 4/14 routes point at ghost models.
*Target:* Hard $-ceiling + global kill-switch + live fleet reconcile (5-min cycle) + Reality Anchor watchdog auto-restarting dormant probes + evidence-ledger schema with HMAC chain + per-hook audit-log convention enforced by hook-drift-detector v2.
*MCs:* 9 (T-A-01 through T-A-09).

**Knowledge plane (Team B).**
*Current:* 5 critical governance subsystems (Reality Anchor, ZAKON NULA, Tier Router, Evidence Ledger, Hooks) have ZERO BookStack pages. `discover.js` cites stale manifest. ZAKON #12 dormant — every builder dispatch eats ~15K tokens of full MEMORY.md re-injection. LightRAG: degraded (15% timeout), public endpoint CF Access blocked, pump capped 600/run with 23,558 backlog. Mem0 dead. ADR numbering collisions (025×2, 026×4).
*Target:* One front door (`discover.js memory --budget=2000`) that spans L1+L2+L3 with token-budget contract. CF Access rotated → BookStack + LightRAG public both unblocked. ZAKON #12 wired into PreToolUse → ~105K tokens/day saved. 8 governance pages published; ADR allocator + collision repair. Mem0 killed (Path B), folded into HiveMind facts table. Library built (Path A) as central skill registry.
*MCs:* 17 (MC-B01 through MC-B17).

**Workflow plane (Team C).**
*Current:* CEO email pipeline broken at every transition. Email→MC linkage dead (873/887 unlinked, 80 `replay_required` with no replay daemon). `discover.js routing` CLI is fictional. claude-builder queue: 2,945 failed since April. PI-orch alive but `route_eligibility=['post-build']` excludes every real MC. TLDR daemon writes to nonexistent dir. 2,400 zombie MCs. 65 agent files vs 30 mapping keys.
*Target:* `email-intake-daemon` classifies via local qwen3 ($0) → MC link 100%. `router.js classify` made real (alias makes CLAUDE.md claim honest). Mapping JSON closed (0 orphans). `backlog-ttl-daemon` enforces 30d/60d retirement. PI-orch route filter expanded to 5 categories → free-tier execution path revived. Session-spend-monitor closes the gap opus-cost-guard cannot (main session burn). Escalation matrix hook silences micro-decision pings to CEO.
*MCs:* 13 (MC-C1-1 through MC-C5-1).

---

## 3. Cross-Plane Couplings (the new picture Wave 1 didn't see)

These five couplings are why no single team can finish in isolation, and why sequencing matters.

### 3.1 ZAKON #12 wire-in = A + B + C all three

- **A owns** the PreToolUse hook plumbing (`~/.claude/settings.json` registration, audit log convention from T-A-08). Source: `team-a/control-plane-build-plan.md` T-A-08 + cross-team note line 182–184.
- **B owns** the retrieval logic — `rag-context-for-builder.js` rewrite with `--tier-budget L1:1200,L2:500,L3:300 --max-tokens 2000` (MC-B04). Source: `team-b/knowledge-plane-design.md` §3 + `team-b/knowledge-plane-build-plan.md` MC-B04/MC-B05.
- **C consumes** — every specialist dispatch through the new pipeline receives the 1,800-token block instead of MEMORY.md (workflow plane §3 sequence diagram). Source: `team-c/workflow-plane-design.md` §3.
- **Coupling rule:** B's MC-B05 cannot ship until A's hook framework lands; C's MC-C1-2 router classification reads the same `specialist-mapping.json` that B's MC-B16 patches. **Sequence: A finishes hook framework day 7 of Week 1 → B ships MC-B04/B05 Week 2 → C dispatches through both Week 3.**

### 3.2 Cost guard is 3 layers, one per plane

- **A — gate:** `opus-cost-guard v2` PreToolUse[Task] hard-block on daily $ ceiling + flip ALLOW-on-missing-model default to BLOCK. Source: `team-a/control-plane-design.md` COMP-1 + `team-a/control-plane-audit.md` §3 "CRITICAL GAP 1–4".
- **B — token-budget:** `rag-context-for-builder` `--max-tokens` ceiling per dispatch (105K tokens/day saved). Source: `team-b/knowledge-plane-design.md` §3 "Token-save math".
- **C — session ceiling:** `session-spend-monitor.js` polls `costs.db` by `session_id` every 5 min, Slack at $200 / model-flip at $500 / kill at $1,000. This **closes the gap A cannot reach** because `opus-cost-guard` fires on Task subagent dispatch but not on the main session. Source: `team-c/workflow-plane-audit.md` §9 + `team-c/workflow-plane-design.md` §2.5 + `team-c/workflow-plane-build-plan.md` MC-C2-2.
- **Coupling rule:** All three must land. A alone leaves the main session burning; B alone leaves the gate-bypass open; C alone has no per-dispatch ceiling.

### 3.3 `discover.js` is the single front door — three teams patch it

- **A doesn't touch** `discover.js` directly but its T-A-03 `tier-truth.json` becomes a tier health source for B's L3 latency budgeting.
- **B regenerates** `manifest-index.md` + `skill-registry.db` daily (MC-B06), adds `--self-check` meta-probe at boot (MC-B07), upgrades `discover.js memory` to span 3 tiers (MC-B08). Source: `team-b/knowledge-plane-design.md` §7.
- **C makes** `discover.js routing` claim true via `router.js classify` alias (MC-C1-2). Source: `team-c/workflow-plane-audit.md` Break #2 + `team-c/workflow-plane-design.md` §2.2.
- **Coupling rule:** John currently does tool-first verification through a discover.js that lies; until all three patches land (B inventory regen + C routing alias), every "tool-verified" claim downstream inherits residual rot.

### 3.4 Email pipeline is ONE workflow with THREE breaks

The CEO daily flow has a single physical pipeline (`Email → email-inbox.db → MC → router → mehanik → specialist → proveo → done → TLDR`) with three independent breaks:
- (B→E) Email-to-MC linkage broken (873/887 unlinked) — `team-c/workflow-plane-audit.md` Break #1.
- (F) `discover.js routing` CLI fictional — Break #2.
- (J) TLDR daemon writes to nonexistent `~/system/data/insights/` — Break #4.
- **Coupling rule:** Fixing only one keeps the pipe dark. **MC-C1-1 + MC-C1-2 + MC-C1-4 must ship as a triple** in Week 3 days 1–3. Without all three, CEO email "Pls fix Bilko 500" never reaches a specialist.

### 3.5 Gate-gaming (verdict-ledger 100% `force_completion`) is a consequence of A + B + C all failing

- **A** — probes off → no PROBE_PASS rows → only path to "done" is `--force`. Source: `team-a/control-plane-audit.md` §5 "107 rows, all `force_completion`".
- **B** — discover.js lies → builder doesn't know correct evidence path → fabricates artifact (Proveo hallucination 2026-05-07). Source: `MEMORY.md` `feedback_proveo_hallucination_2026-05-07.md`.
- **C** — claude-builder queue dead → fallback to inline subagent → no durable record → trivial to fake claim. Source: `team-c/workflow-plane-audit.md` Break #5.
- **Coupling rule:** "Stop gate-gaming" is **not a single-MC fix**. The fix is sequential: T-A-06 Reality Anchor watchdog → T-A-07 evidence ledger schema + null-path block at mc.js done → MC-B04 ZAKON #12 wire (so builders get correct context) → MC-C1-1 email→MC (so MCs land with real source) → MC-C4-2 claude-builder fossil archive. After this chain, `verdict-ledger` `PROBE_PASS:force_completion` ratio shifts from 0:107 toward 50:50 within 7 days (T-A-06 AC).

### Cross-Team Contradictions (resolved)

Reviewed all three audit docs for conflicting claims; **no hard contradictions found**, only resolved revisions:

- **Team C corrects Wave 1 on PI-orch.** Wave 1 said "pi-orch HTTP dead 50d"; Team C probed `launchctl list` and found PID 57544 alive, polling, but `route_eligibility=['post-build']` matches zero real MCs. **Verdict:** PI-orch is alive but useless; the underlying claim ("free-tier execution path is broken") holds. Memory note `project_ai_factory_audit_2026-05-09` should be updated.
- **Team C corrects Wave 1 on skill-registry.** Wave 1 said 1 row; Team C found 96 rows (registry was rebuilt at some point) but only 12 have non-zero `use_count` and there's no `last_used` timestamp — so the substantive claim ("skill catalog isn't measured") holds.
- **Team C corrects Wave 1 on edita queue.** Wave 1 cited 161 dead-letter; Team C found 22 in `dead_letter_queue` but 2,945 in `queue_entries` failed against `claude-builder`. The number moved tables; the magnitude is **larger**, not smaller.

---

## 4. Master Roadmap (4 Weeks)

| Week | Theme | Teams | MCs to ship | End-state gate (deterministic probe) | Rollback |
|------|-------|-------|-------------|--------------------------------------|----------|
| **1** | Stop the bleed | A | T-A-01 kill switch, T-A-02 $ ceiling, T-A-03 fleet reconcile, T-A-04 devstral, T-A-05 MLX, T-A-06 probe watchdog, T-A-07 evidence schema, T-A-08 hook-drift v2, T-A-09 daemon sweep | `control-plane-health.sh` returns 7/7 PASS: killswitch round-trip; cost-ceiling fires at synthetic $1000; tier-truth.json all 14 tiers healthy or explicitly disabled; probe-watchdog detects 48h synthetic stall; evidence-ledger.db has table + row-count == JSONL; hook-drift detects 24h synthetic silence; 0 flapping daemons | Disable killswitch + revert hook-drift v2 plist; T-A-02 ceiling can be raised to $10K/day as soft-rollback. Evidence schema is additive — no rollback needed. |
| **2** | Lights on | B (+ A finishing T-A-08 integration) | MC-B01 CF token, MC-B02 LightRAG pump, MC-B03 outbox-ingest decision, MC-B04 rag-context rewrite, MC-B05 ZAKON #12 wire, MC-B06 inventory regen, MC-B07 self-check, MC-B08 memory upgrade, MC-B09 HiveMind purge, MC-B10 dead-agent TTL | `discover.js --self-check` reports 0 drift on day 7; `curl https://lightrag.alai.no/health` returns 200; `bookstack-staleness.js sample` returns JSON; ZAKON #12 fires logged for ≥80% of builder dispatches; pre/post token count shows ≥40% reduction in builder prompts | MC-B05 hook is opt-in via env flag `ZAKON12_ENABLED=1` for first 24h; if drift >5% on day 1, revert to off. MC-B09 stub removal: archive-first, restore is `cp` from `_archive/`. |
| **3** | Workflow restored | C | MC-C1-1 email→MC, MC-C1-2 router.js, MC-C1-3 mapping cleanup, MC-C1-4 TLDR, MC-C2-1 backlog TTL, MC-C2-2 session-spend, MC-C2-3 per-MC budget, MC-C3-1 HiveMind cleanup, MC-C3-2 skill registry, MC-C3-3 MCP cleanup, MC-C4-1 pi-orch routes, MC-C4-2 claude-builder archive, MC-C5-1 escalation hook | **E2E test:** CEO sends 1 test email → MC linked <5min → routed → mehanik authorized → specialist returned <60min → Proveo PASS to Slack #ceo-digest with screenshot → TLDR digest 6h later. 8/9 sub-criteria pass. | MC-C1-1 daemon can be disabled; backfill MC link via one-off script. MC-C2-2 session monitor is alert-only first 48h before model-flip is enabled. MC-C5-1 hook is WARN-only first 7 days. |
| **4** | Production resumes | All teams hardening + Bilko/Drop work | Production MCs from BUILD-BLUEPRINT.md per project; no new system-level MCs except hardening | `git log --since=7.days --author=alai-builders ~/projects/bilko-cloud` > 5 commits AND `costs.db today < $5K` AND `verdict-ledger PROBE_PASS:force_completion ≥ 1:1` | If Week 4 cost burn returns to >$10K/day → freeze prod work, return to Week 3 hardening. Killswitch always available. |

**Gate between weeks:** each week's end-state probe must PASS before the next week's specialist dispatches are authorized. CEO sign-off on probe report = go.

---

## 5. MC Inventory (Consolidated 39 MCs)

| ID | Title | Team | Prio | Week | $ Save | Dep |
|----|-------|------|------|------|--------|-----|
| T-A-01 | Kill switch + CLI | A | BLOCKER | 1 | insurance | — |
| T-A-02 | opus-cost-guard v2 daily $ ceiling | A | BLOCKER | 1 | $20-70K/d | T-A-01 |
| T-A-03 | fleet-reconcile-probe + tier-truth | A | H | 1 | $2-8K/d | T-A-01 |
| T-A-04 | devstral pull or remap | A | H | 1 | $5-15K/d | T-A-03 |
| T-A-05 | MLX M2c+M3 repair | A | H | 1 | $1-5K/d | T-A-03 |
| T-A-06 | Reality Anchor watchdog | A | H | 1 | risk-redux | T-A-01 |
| T-A-07 | Evidence ledger SQLite schema | A | H | 1 | risk-redux | — |
| T-A-08 | hook-drift-detector v2 | A | M | 1 | risk-redux | T-A-01, T-A-07 |
| T-A-09 | Daemon hygiene sweep | A | M | 1 | $0 direct | — |
| MC-B01 | CF Access token rotate | B | H | 2 | unblock $15-42/mo | — |
| MC-B02 | LightRAG pump 600→5000 | B | H | 2 | 40-80K tok/d | B01 |
| MC-B03 | outbox-ingest restore/decom (ADR-036) | B | M | 2 | qual | B01 |
| MC-B04 | rag-context-for-builder rewrite | B | H | 2 | 105K tok/d | B02, T-A-08 |
| MC-B05 | ZAKON #12 PreToolUse hook | B | H | 2 | activates B04 | B04, T-A hook fw |
| MC-B06 | Daily inventory regen cron | B | H | 2 | 5-30K tok/d | — |
| MC-B07 | discover.js --self-check at boot | B | H | 2 | indirect | B06 |
| MC-B08 | discover.js memory 3-tier upgrade | B | M | 2 | qual | B02, B06 |
| MC-B09 | Purge 3 orphan HiveMind stubs | B | M | 2 | 10K tok/d | — |
| MC-B10 | Dead-agent TTL ADR-035 | B | M | 2 | 6K tok/d | — |
| MC-B11 | bookstack-staleness daemon revive | B | H | 3 | $0 direct | B01 |
| MC-B12 | Publish 8 governance pages | B | H | 3 | $0 direct | B01 |
| MC-B13 | ADR allocator + 6 collision repair | B | M | 3 | $0 | — |
| MC-B14 | Mem0 ADR-033 (recommend KILL) | B | M | 3 | consolidation | — |
| MC-B15 | Library ADR-034 (recommend BUILD) | B | M | 3 | qual | B06 |
| MC-B16 | specialist-mapping audit | B | M | 3 | $1-3/mo | B06 |
| MC-B17 | Hook .bak cruft cleanup | B | L | 3 | $0 | — |
| MC-C1-1 | email-intake-daemon | C | BLOCKER | 3 | unblock A | T-A fleet |
| MC-C1-2 | router.js classify CLI | C | H | 3 | unblock | C1-3 |
| MC-C1-3 | specialist-mapping completion + ADR-027 | C | H | 3 | $1-3/mo | — |
| MC-C1-4 | TLDR daemon reconnect | C | H | 3 | qual (closes loop) | C1-1 |
| MC-C2-1 | backlog-ttl-daemon | C | H | 3 | signal/noise | C1-4 |
| MC-C2-2 | Session spend monitor (Layer 2) | C | BLOCKER | 3 | $5-30K/d session cap | T-A-02 |
| MC-C2-3 | Per-MC budget (Layer 3) | C | H | 3 | $1-5K/d | C2-2 |
| MC-C3-1 | HiveMind ~85 zombie + 46 pollution cleanup | C | M | 3 | qual | — |
| MC-C3-2 | Skill registry + retire wave | C | M | 3 | qual | — |
| MC-C3-3 | MCP audit + decom stitch+local-rag (ADR-029) | C | M | 3 | startup time | — |
| MC-C4-1 | pi-orch route_eligibility expansion | C | M | 3 | free-tier revival | T-A-04, T-A-05 |
| MC-C4-2 | claude-builder fossil archive (ADR-030) | C | M | 3 | $0 | — |
| MC-C4-3 | edita owner audit + reassign | C | M | 3 | signal/noise | — |
| MC-C5-1 | Escalation matrix hook | C | H | 3 | CEO-attention save | C1-4 |

Plus 5 Wave 1 P0 carryovers (now subsumed): P0-1 #101375 → T-A-02; P0-2 #101376 → T-A-04; P0-3 #101377 → T-A-06; P0-4 #101378 → MC-B07; P0-5 #101379 → T-A-05.

**Total Wave 2 MCs:** 40 distinct (including MC-C4-3) + 5 Wave 1 P0 consolidated.

---

## 6. Risks & Open CEO Decisions

1. **Mem0 — resurrect (Path A) or kill+fold-into-HiveMind (Path B)?**
   *Recommendation: B.* Reduces moving parts; Qdrant runtime removed; HiveMind `facts` table covers same use case. Mem0 has been dead 14+ days with no detected loss. Formalize via ADR-033 (MC-B14).

2. **Library system — build (Path A) or kill (Path B)?**
   *Recommendation: A — minimal build.* `~/system/library.yaml` is real intent, no consumer ever shipped. A 1-day install script gives one-place control over which skills are active where; the alternative is 96 skills with no source-of-truth. Formalize via ADR-034 (MC-B15).

3. **PI-orchestrator — expand route filter (Path A) or formal decommission (Path B)?**
   *Recommendation: A first, B as fallback.* MC-C4-1 expands `route_eligibility` to 5 categories. **Kill criterion (auto):** if after T-A-04 + T-A-05 + MC-C4-1 ship, pi-orch still has 0 matching tasks in 7 days, formal kill via ADR-026 (one of the existing collision files — repaired in MC-B13).

4. **claude-builder durable-runner queue — drain + restart, or replace?**
   *Recommendation: drop the queue, do not restart.* 2,945 failed / 1 completed since April = the architecture is fossilized. MC-C4-2 archives. Future "durable-runner v2" decision punts to Week 5+; not in current scope.

5. **2,400 zombie MC tasks — auto-close at >14d idle?**
   *Recommendation: tiered TTL via MC-C2-1.* Open + M/L + >30d → auto-pause. Paused + >60d → auto-close. H + open + >14d → CEO digest entry. **Not** blanket auto-close — preserves CEO-owned tasks (alem has 72 open).

6. **Production code resumption — Week 4 firm or conditional?**
   *Recommendation: conditional on Week 3 end-state E2E probe (8/9 sub-criteria PASS + 48h cost <$5K/day).* If both gates green, resume Week 4. If either red, Week 4 = hardening cycle; production code Week 5.

7. **Daily $ ceiling level (T-A-02) — $500/day Opus default?**
   *Recommendation: yes, with `~/system/config/cost-ceilings.json` knob.* Pre-AI-Services-revenue, $500/day Opus = $15K/month. Override token TTL 60s for CEO-explicit cases. If CEO wants $300/day, change one JSON line.

8. **Session-spend ladder (MC-C2-2) — $200 alert / $500 model-flip / $1000 kill?**
   *Recommendation: alert-only first 48h, then enable model-flip + kill.* Avoids same-day surprise on already-running session.

9. **Wave 2 build budget — what's the Opus ceiling for the build phase itself?**
   *Recommendation: $250 total for all 40 MCs.* Each MC ≈ $1 prompt-forge + $2-5 specialist + $1 Sonnet sub + $1 Proveo + $0.50 Skillforge ≈ $5-8 avg. Build cost ≪ 1 hour of current burn. Use `/prompt-forge` only for H/BLOCKER (Week 1 + Week 3 BLOCKERs); skip for M/L.

---

## 7. Total Economics

| Source | Daily save (conservative) | Daily save (optimistic) | Monthly (conservative) |
|--------|---------------------------|-------------------------|------------------------|
| T-A-02 cost ceiling | $20,000 | $70,000 | $600,000 |
| T-A-03/T-A-04 ghost tier kill | $5,000 | $15,000 | $150,000 |
| T-A-05 MLX repair | $1,000 | $5,000 | $30,000 |
| MC-B04/B05 ZAKON #12 wire | $0.50 (token) | $1.40 (token) | $15-42 (token equiv) |
| MC-B06 inventory regen (re-dispatch prevent) | $0.30 | $1.80 | $9-54 |
| MC-C2-2 session spend ladder (caps catastrophic) | $5,000 | $30,000 | $150,000 |
| MC-C1-1 email→MC (operational efficiency) | $0 direct | $0 direct | unblocks revenue |
| MC-C2-1 backlog TTL (signal/noise) | $0 direct | $0 direct | CEO time |
| **Total** | **~$26,000/day** | **~$90,000/day** | **$780K–$2.7M/month** |

**Wave 2 build phase cost (Opus + Sonnet):** ~$250 one-time (see Decision 9).

**Payback:** **<1 hour** of current burn at conservative $26K/day = $1,083/hour. Build pays for itself in roughly 13 minutes of current operations.

---

## 8. Validation Plan

### 8.1 Proveo (Angie Jones) — re-probe ≥20% of synthesis claims
Focus areas (load-bearing claims):
- Cross-plane coupling 3.1: ZAKON #12 token-save math (10 dispatches × 10,500 tok). Verify `wc -l` on actual MEMORY.md + measured builder prompt sizes.
- Coupling 3.2: that `opus-cost-guard` does NOT gate main session — re-run probe `~/.cache/opus-cost-guard-*.log` for last 48h on current Opus session.
- Coupling 3.4: re-run `sqlite3 email-inbox.db "SELECT COUNT(*) FROM emails WHERE status='new' AND mc_task_id IS NULL"` — assert ≥870.
- Coupling 3.5: verdict-ledger `force_completion` count — assert ≥100, PROBE_PASS = 0.
- Master roadmap Week 1 gate: probe `~/system/tools/control-plane-health.sh` (does not exist yet — flag if T-A-09 doesn't ship one).
- Decision 4 evidence: re-probe `claude-builder` queue counts — assert ≥2,900 failed and ≤2 completed.

Output: `~/tmp/proveo-v2-operating-picture-validation.jsonl`.

### 8.2 Verifier — atomic-claim decomposition
Decompose into atomic claims:
- All headline facts in §1 Executive Brief.
- Each row of MC inventory table — task ID, team, priority, week, dep correctness.
- Each "$ save" figure — does it come from a team build plan, and does the math add up?
- Each "Path X recommended" — is there a cited reason in the corresponding team design?

Verdicts per claim: CONFIRMED / PARTIAL / HALLUCINATION. Cost <$0.50.

### 8.3 Publish
After dual validation PASS → BookStack page **"System Architecture" book, page "ALAI AI System v2.0 — Operating Picture & Master Roadmap (CEO Rebuild Brief)"**. This becomes canonical; v1.1 (Wave 1) demoted to historical reference.

---

## 9. Build Phase Dispatch Order (Week 1 only)

Weeks 2–4 dispatch after Week 1 closes (gate from §4).

```
Day 1 (0–4h):  /prompt-forge T-A-01 → /mehanik → FlowForge dispatch (Kelsey)
                AC probe: killswitch round-trip + 17 PreToolUse hooks updated.

Day 1 (4–10h): /prompt-forge T-A-02 → /mehanik → FlowForge + Securion review dispatch
                AC probe: synthetic $1,000 cost row → next Opus dispatch BLOCKED + killswitch touched.

Day 2:         /prompt-forge T-A-03 → /mehanik → AgentForge + FlowForge dispatch (Georgi + Kelsey)
                AC probe: stop ANVIL Ollama → tier-truth marks 3 tiers unhealthy in 5min → restart recovers.

Day 3 (parallel A):  /mehanik T-A-04 → AgentForge (Georgi) — devstral pull/remap.
Day 3 (parallel B):  /mehanik T-A-05 → AgentForge (Georgi) — MLX M2c+M3 repair.
                Skip /prompt-forge for both (M-priority).

Day 4-5:       /prompt-forge T-A-06 → /mehanik → FlowForge + AgentForge dispatch
                AC probe: touch probe last.jsonl mtime=48h → watchdog STALL + restart in 5min.

Day 5-6:       /mehanik T-A-07 → CodeCraft (Bruce Momjian) dispatch (M-priority, no prompt-forge).
                AC probe: insert null-path row → mc.js done exits 2 "evidence_path required".

Day 6-7:       /mehanik T-A-08 → FlowForge + Securion dispatch.
                AC probe: kill pilot-discover-inject.py 24h → drift detector flags in 15min.

Day 7:         /mehanik T-A-09 → FlowForge dispatch (daemon sweep).
                Then run `control-plane-health.sh` master probe.
                7/7 PASS → CEO go-ahead for Week 2 Team B dispatch.
                <7 PASS → Week 1 extends by 1-2 days; do NOT proceed to Week 2.
```

After every dispatch: `/task-postflight` + verifier subagent in bg (per `feedback_active_verifier_pattern_2026-05-14`).

Each MC closes with `mc.js done <id>` only after Proveo PASS + Skillforge BookStack page (ZAKON PLAN).

---

**END v2.0 OPERATING PICTURE.**

**Sources:**
- `/tmp/srz-rebuild-2026-05-19/team-a/{control-plane-audit, control-plane-design, control-plane-build-plan}.md`
- `/tmp/srz-rebuild-2026-05-19/team-b/{knowledge-plane-audit, knowledge-plane-design, knowledge-plane-build-plan}.md`
- `/tmp/srz-rebuild-2026-05-19/team-c/{workflow-plane-audit, workflow-plane-design, workflow-plane-build-plan}.md`
- `~/system/specs/ceo-ai-system-audit-2026-05-18-REPORT.md` (v1.1)
- `~/system/specs/srz-rebuild-3-teams-2026-05-19-plan.md` (charter)

---

## 10. Validation Patches v2 (applied 2026-05-19 after Proveo + Verifier)

**Sources:** `/tmp/srz-rebuild-2026-05-19/proveo-v2-verdict.json`, `/tmp/srz-rebuild-2026-05-19/verifier-v2-report.json`

| Patch | Original | Corrected | Source |
|---|---|---|---|
| V2-P1 | "skill-registry.db has 1 row for 96 skills" | 96 rows, but only 12 with use_count>0; needs last_used column | verifier KP4 |
| V2-P2 | "Build cost: <$100" | ~$250 (40 MCs × $5–8 avg, consistent with §6 Decision 9 math) | verifier D4 |
| V2-P3 | "8 governance pages on BookStack" | 5 governance pages (Reality Anchor, Determinism, Tier Router, Evidence Ledger, Hooks) | verifier KP11 |
| V2-P4 | "Total Wave 2 MCs: 39 distinct" | 40 distinct (MC-C4-3 edita owner audit was missed in count) | verifier MC1 |
| V2-P5 | "65 agent files vs 30 mapping keys = 37 orphans" | 65 disk vs 52 mapping entries = 13 orphans | verifier WP8 |
| V2-P6 | "verdict-ledger 100% force_completion" | 79/107 rows (74%) force_completion; 28 standalone/done; PROBE_PASS=0 (gate-gaming concern stands) | verifier CP8 |
| V2-P7 | "claude-builder queue 2,945 failed / 1 completed" | TWO subsystems: queue-table has 2,944 rows (verifier WP3); durable-runner.db has 295/1/1 completed/failed/pending (Proveo C-04). MC-C4-2 NEEDS RE-PROBE before dispatch. | Proveo C-04 + verifier WP3 |
| V2-P8 | "TLDR daemon writes to ~/system/data/insights/ which does not exist" | Daemon writes to ~/system/logs/tldr-insights/ which EXISTS with files from 2026-04-24. MC-C1-4 scope needs re-audit. | Proveo C-11 |
| V2-P9 | "manifest-index.md last 2026-02-26" | mtime 2026-04-06 (Feb 26 is content audit date inside file); 43 days stale | verifier KP3 |
| V2-P10 | "HiveMind 21,741 rows" | 21,930 live (audit-snapshot drift) | verifier KP5 |
| V2-P11 | "True 7d = $365,104" | $366,236 (Proveo C-10, ±0.3% rounding) | Proveo C-10 |
| V2-P12 | "MC backlog blocked = 2,239" | 2,241 (Proveo C-02, +2 drift) | Proveo C-02 |

**Re-probe required (BLOCKERS for build dispatch):**
- MC-C4-2 (claude-builder drain decision) — Team C must specify exact DB path + table before scope freeze
- MC-C1-4 (TLDR daemon fix) — re-audit actual writer path vs `~/system/logs/tldr-insights/`
- WP6 "2,400 zombie MCs" — verifier blocked by bash-danger-gate; needs read-only sqlite policy fix or alternate probe

**Verdict on v2.0 after patches:** Strategic narrative + 4-week roadmap + 9 CEO decisions HOLD. Six precision errors corrected in this section. v2.0 is publication-ready with footnoted re-probes on MC-C4-2 + MC-C1-4.