ALAI AI System — Operating Picture 2026-05-18 (CEO Audit)
ALAI AI System — Operating Picture 2026-05-18
Date: 2026-05-18 Architect: Petter Graff Status: VALIDATED v1.1 — Proveo PASS (0 hallucinations, 3 minor drifts), Verifier PARTIAL (3 hallucinations from one root cause: manifest path mismatch; 2 PARTIAL — see Validation Patches below). Headlines stand.
Executive Summary
The ALAI AI system burned 742Kacrossthe8 − daywindowMay11–18onAnthropicOpus * *(99.98365,104 — still catastrophic. A single day (2026-05-11) hit **$377,487**. The prior audit's "$9,790/day" figure held only for a quiet day (May 13 = $9,954) but was 10–40× under for peak days. Revenue is $0; this is founder cash.
This is not a pricing problem. It is a causal chain of broken safety nets:
- Determinism doctrine is unenforced. Reality Anchor probes have not executed in 7 days — 0 PROBE_PASS/PROBE_FAIL events, both probe daemons absent from launchctl PID list (inference-determinism.md). Doctrine exists on paper only.
- Free local tier is degraded.
devstral:24b— the model targeted by 79% of tier-router code calls (531 calls) — does not exist on either Ollama host. Two of three ANVIL MLX servers (qwen3-32b, qwen3-8b) silently serve the wrong model (an embedding model that rejects generation). Tier 2c, M2c, and M3 are ghosts (inference-determinism.md). - Opus fallback is unbounded. With the free tier silent-failing and no Reality Anchor probe to detect the drop, every call escalates to Opus. There is no cost ceiling at runtime (business-roi.md).
- John builds on stale inventory.
discover.js --verifyreports system health citingmanifest-index.md(which DOES exist at~/system/tools/manifest-index.mdbut is stale since 2026-02-26, claims 1,310 scripts vs actual 273 — corrected by verifier) and askill-registry.dbcontaining 1 row (snowit-fb), not the 96 skills on disk. BookStack API is dead (CF Access 302) — staleness measurement offline for 478 tracked pages (knowledge-graph.md). The orchestrator is steering by an instrument panel that froze 3 months ago. - ZAKON #12 (RAG context injection) is dormant.
rag-context-for-builder.jsis referenced in protocol docs but not wired into any hook — every builder dispatch re-injects full MEMORY.md (~15K tokens) instead of a 500–800 token targeted block (rag-layer.md).
If you read nothing else:
- STOP THE BLEED: Enforce Sonnet-default + Opus gating today. At current pace this saves ~20K–90K/day.
- TURN ON THE LIGHTS: Start Reality Anchor probe daemons + reconcile tier-router to live model fleet.
- FIX THE COMPASS:
discover.js --verifyreads 3-month-stale data — regeneratemanifest-index.md, rebuildskill-registry.db, and restore CF Access token for BookStack before any further architecture decisions.
System Map — Planned vs Implemented vs Running
flowchart LR
CEO[Alem / CEO] --> John[John Orchestrator]
John -->|dispatch| Mehanik{Mehanik Gate}
Mehanik -->|authorize| Specialists[Specialist Agents]
Specialists --> Opus[Anthropic Opus]
Specialists -. intended .-> TierRouter[Tier Router]
TierRouter -.->|531 calls 79%| Devstral[devstral:24b GHOST]
TierRouter -->|works| OllamaANVIL[Ollama ANVIL 8 models]
TierRouter -->|works| OllamaFORGE[Ollama FORGE 8 models]
TierRouter -.->|wrong model| MLXqwen32[MLX qwen3-32b BROKEN]
TierRouter -.->|wrong model| MLXqwen8[MLX qwen3-8b BROKEN]
TierRouter --> MLXgemma[MLX gemma-4-26b OK]
John --> Discover[discover.js --verify]
Discover -.->|cites stale| ManifestIdx[manifest-index.md STALE 2026-02-26]
Discover -.->|lies| SkillReg[skill-registry.db 1 row of 96]
John --> RAG[rag-context-for-builder.js]
RAG -.->|not wired| Hooks[PreToolUse hooks]
Specialists --> LightRAG[LightRAG Azure]
LightRAG -.->|23,558 backlog| MigratePump[migrate-pump 600/run cap]
LightRAG -.->|CF Access 302| BookStack[BookStack API DEAD]
Specialists --> HiveMind[HiveMind 21,741 rows]
HiveMind -.->|15 dead agents| DeadAgents[Stale namespaces]
RealityAnchor[Reality Anchor Probes] -.->|0 fires 7d| Evidence[Evidence Ledger]
Evidence -.->|65 null paths| GateBypass[Gate bypass risk]
Opus -->|$741K / 7d| Cost[Cost Burn]
classDef green fill:#1d8c43,color:#fff
classDef yellow fill:#d4a017,color:#000
classDef red fill:#b3261e,color:#fff
class CEO,John,Mehanik,Specialists,OllamaANVIL,OllamaFORGE,MLXgemma,HiveMind green
class LightRAG,MigratePump,RAG,Discover,Evidence yellow
class Devstral,MLXqwen32,MLXqwen8,SkillReg,BookStack,DeadAgents,RealityAnchor,Cost,GateBypass,Hooks red
class ManifestIdx yellow
Inventory Table
| Subsystem | Planned | Implemented | Running | Used 7d | Status | Evidence |
|---|---|---|---|---|---|---|
| Anthropic Opus | yes | yes | yes | yes | RED | business-roi.md ($741K/7d, 99.995%) |
| Sonnet default policy | yes | yes | no | minimal | RED | business-roi.md ($72/7d only) |
| Ollama ANVIL (8 models) | yes | yes | yes | yes | GREEN | inference-determinism.md |
| Ollama FORGE (8 models) | yes | yes | yes | yes | GREEN | inference-determinism.md |
| MLX gemma-4-26b (ANVIL) | yes | yes | yes | yes | GREEN | inference-determinism.md |
| MLX qwen3-32b (ANVIL) | yes | yes | wrong-model | n | RED | inference-determinism.md |
| MLX qwen3-8b (ANVIL) | yes | yes | wrong-model | n | RED | inference-determinism.md |
| MLX gemma-4-26b (FORGE) | yes | yes | yes | yes | GREEN | inference-determinism.md |
| Tier Router devstral:24b | yes | route-only | ghost | 531 calls | RED | inference-determinism.md |
| Reality Anchor probes | yes | yes | not-firing | 0 events | RED | inference-determinism.md |
| Evidence Ledger (JSONL) | yes | yes | yes | yes | YELLOW | inference-determinism.md (16.7% null path) |
| Evidence Ledger (SQLite) | yes | partial | 0 tables | n | RED | inference-determinism.md |
| LightRAG core (Azure VM) | yes | yes | degraded | yes | YELLOW | rag-layer.md (15% probe fail) |
| LightRAG public endpoint | yes | yes | CF-blocked | n | RED | rag-layer.md, knowledge-graph.md |
| lightrag-migrate-pump | yes | yes | running | yes | YELLOW | rag-layer.md (23,558 backlog) |
| lightrag-outbox-ingest | yes | yes | stalled | n | RED | rag-layer.md, ops-layer.md |
| rag-context-for-builder.js | yes | yes | not-wired | n | RED | rag-layer.md (ZAKON #12 dormant) |
| HiveMind hivemind.db (primary) | yes | yes | yes | yes | GREEN | rag-layer.md (21,741 rows) |
| HiveMind orphan DBs (×3) | n/a | n/a | empty | n | RED | rag-layer.md |
| Dead HiveMind agents (15) | n/a | n/a | namespace pollution | n | YELLOW | rag-layer.md, knowledge-graph.md |
| BookStack content | yes | yes | yes | yes | GREEN | knowledge-graph.md (478 pages) |
| BookStack API / staleness | yes | yes | dead | n | RED | knowledge-graph.md |
| BookStack ADR/runbook coverage | yes | partial | partial | partial | RED | knowledge-graph.md (5 governance gaps) |
| ADR numbering integrity | yes | yes | corrupt | n/a | RED | knowledge-graph.md (adr-025×2, adr-026×4) |
| Library system (library.yaml) | yes | no | none | n | RED | knowledge-graph.md (0 across personas) |
| MC (mc.js) | yes | yes | yes | yes | GREEN | business-roi.md |
| Daemons — running healthy | yes | yes | 14 | yes | GREEN | ops-layer.md |
| Daemons — flapping (6) | n/a | yes | 2 running / 4 stopped | partial | RED | ops-layer.md |
| Daemons — unloaded orphans (3) | n/a | yes | not loaded | n | YELLOW | ops-layer.md |
| Daemons — .new shadow files (3) | n/a | n/a | risk-only | n | YELLOW | ops-layer.md |
| Hooks (58 entries, all present) | yes | yes | yes | yes | GREEN | ops-layer.md |
| Tools on disk (273 top-level) | yes | yes | partial | partial | YELLOW | code-surface.md |
| manifest-index.md (handbook ref) | yes | yes | stale (2026-02-26) | partial | YELLOW | verifier-report.json A10 |
| skill-registry.db | yes | yes | 1/96 rows | partial | RED | code-surface.md |
| specialist-mapping.json | yes | yes | yes | yes | YELLOW | code-surface.md (mehanik, dzevad-jahic missing) |
| Mehanik dispatch gate | yes | yes | yes | yes | YELLOW | code-surface.md (mapping mismatch) |
| Cost tracker (costs.db) | yes | yes | yes | yes | GREEN | business-roi.md |
| TLDR daemon | yes | yes | gapped | partial | YELLOW | business-roi.md (3-day May gap) |
Ranked Gap List
P0 — Stop The Bleed (this week)
P0-1. Opus burn $741K/7d. (business-roi.md, costs.db)
- Root cause: No model gate. 99.995% of calls hit Opus despite CLAUDE.md declaring Sonnet as orchestration default.
- Fix: (a) Sonnet-default enforcement at claude-cli
wrapper level; (b) Opus whitelist limited to
/prompt-forge+ novel-architecture review; (c)opus-cost-guard.shhook is registered (ops-layer.md) — verify it actually blocks vs warns. - /monthestimate : * * atpeakday(110K) → save ~2.7M/month; atrecentstabilization(26K/day) → save ~650K/month.Evenworstcasecrediblesavings : **500K+/month.
- Owner: FlowForge (Kelsey) for hook enforcement + CodeCraft for wrapper gate. Open MC required.
P0-2. devstral:24b ghost — 79% of tier-router code calls. (inference-determinism.md)
- Root cause: Tier 2c routes 531 calls to a model present on neither Ollama host. 4.5ms avg suggests silent fallback or unlogged substitution. Every "local code review" claim under tier 2c may have escalated to Opus or returned junk.
- Fix:
ollama pull devstral:24bon FORGE OR remap tier 2c toqwen3:8b-q8_0(already hot on FORGE). - /monthestimate : * * unknownuntilprobesrestored, but : 531calls × 7d, eachpotentiallyescalatingtoOpus = compoundingmultiplieronP0 − 1.Conservatively * *20K–$100K/month in avoided escalations.
- Owner: AgentForge (Georgi) — fleet reconciliation;
CodeCraft to update
tier-routing.json.
P0-3. Reality Anchor probes not executing (0 events in 7d). (inference-determinism.md)
- Root cause: Probe daemons
com.john.auto-verify-regression+com.john.ollama-health-probehave no PID. Probe scripts exist; registry v1.3 exists; nothing runs. - Fix:
launchctl startboth daemons + verify PROBE_PASS appears in~/system/state/. Add a watchdog daemon to alert on probe silence >24h. - **$/month estimate:** Indirect — but Reality Anchor is the **only deterministic check** between LLM self-report and gate pass. Without it, hallucinated work satisfies `mc.js done`. Rework cost estimable at 1–3 fabricated PASS incidents/week × ~$5K rework each = 20K–60K/month avoided.
- Owner: FlowForge (Kelsey).
P0-4. discover.js --verify is hallucinating system health. (code-surface.md, knowledge-graph.md)
- Root cause: Self-verification cites
manifest-index.md(exists at~/system/tools/manifest-index.mdbut stale since 2026-02-26 — claims 1,310 scripts vs actual 273) andskill-registry.dbwith 1 row representing 96 skills. The instrument reads frozen data. - Fix: (a) Regenerate
manifest-index.mdfrom real tool inventory on a daily cron; (b) Rebuild skill-registry.db withlast_usedcolumn + populate from disk scan; (c) Add a meta-probe that diffs claimed inventory vs actual at session-start. - /monthestimate : * * Indirectbutmultiplicative—everyplanJohnwritesonphantominventoryaddsdispatchwaste.Estimate * *5K–$15K/month in avoided wasted dispatches.
- Owner: CodeCraft (manifest regen) + AgentForge (skill registry).
P0-5. MLX tiers M2c + M3 broken (wrong model loaded). (inference-determinism.md)
- Root cause:
~/system/research/mlx-models/directory does not exist; both plists silently fall back to a cachedbge-m3-mlx-fp16embedding model that rejects generation requests. Redzo-reviewer and verifier tiers routed here get junk. - Fix: Locate or re-download Qwen3-32B-4bit + Qwen3-8B-4bit MLX weights, OR repoint M2c/M3 to FORGE Ollama equivalents (qwen3:32b, qwen3:8b-q8_0).
- /monthestimate : * * SameclassasP0 − 2—freeverifiercapacityrestored = Opusavoided. * *10K–$40K/month.
- Owner: AgentForge (Georgi).
P1 — Structural (next 2 weeks)
P1-1. ZAKON #12 dormant — rag-context-for-builder.js not in any hook. (rag-layer.md)
- Wire into
PreToolUse[Task]hook chain. Replaces ~15K-token MEMORY.md injection per builder call with ~500–800 token targeted block. Saves ~22K tokens/day at current pace. - Owner: CodeCraft.
P1-2. lightrag-migrate-pump cap (600/run, 23,558 backlog). (rag-layer.md)
- Backlog will never close at 1,200/day ingest vs ongoing writes. Increase to 5,000/run or remove cap.
- Owner: AgentForge.
P1-3. lightrag-outbox-ingest stalled. (rag-layer.md, ops-layer.md)
- New session content not reaching graph. Either re-enable daemon or formally decommission.
- Owner: FlowForge.
P1-4. BookStack API broken (CF Access token). (knowledge-graph.md)
bookstack-staleness.jsreturns HTML 302. 478 tracked pages have unknown staleness. Rotate CF Access token in Bitwarden.- Owner: FlowForge + Securion (token rotation).
P1-5. ADR numbering collision (adr-025 ×2, adr-026 ×4). (knowledge-graph.md)
- Schema integrity broken. Renumber + add a pre-commit guard.
- Owner: Skillforge / Datavera.
P1-6. 5 governance subsystems with zero BookStack page — Reality Anchor, Determinism/Tool-First, Tier Router, Evidence Ledger, Hooks. (knowledge-graph.md)
- The newest and most important systems have no central documentation. Publish runbook + ADR each.
- Owner: Skillforge.
P1-7. specialist-mapping.json missing mehanik + dzevad-jahic. (code-surface.md)
- Routing table referenced in CLAUDE.md but absent from the JSON the dispatch path reads. Mehanik gate hallucinates dispatch authorization because it cannot verify its own identity.
- Owner: CodeCraft.
P1-8. 6 flapping daemons. (ops-layer.md)
rag-fsevents-adapter(exit 1, still running),azure-db-backup(exit 1, still running),hook-drift-detector(exit 2, stopped),chain-e2e-nightly,rdap-audit-quarterly,apply-knowledge. Silent failures most dangerous.- Owner: FlowForge.
P2 — Cleanup (next month)
- P2-1. Cull 27 files: 13 dead tools + 5 stub skills + 9 hook .bak files (code-surface.md). Zero functional loss.
- P2-2. Prune 15 dead HiveMind agent namespaces (rag-layer.md, knowledge-graph.md).
- P2-3. Remove 3 empty/orphan HiveMind DBs
(
~/system/db/hivemind.db,~/system/data/hivemind.db,~/system/agents/hivemind/memory.db). - P2-4. Resolve 3 .new shadow plists + 3 unloaded orphan plists (ops-layer.md).
- P2-5. Library system: either deploy (0 library.yaml currently) or formally retire library-auto-push.md runbook (knowledge-graph.md).
- P2-6. Fix
mc.jshardcoded paths (lines 2808, 2822) andagent-runner.js:43env fallback (code-surface.md). - P2-7. Backfill or null-flag 65 evidence-ledger rows
with null
evidence_pathso they cannot satisfymc.js donegates (inference-determinism.md).
Token-Save Recommendations (with $/month estimates)
| # | Action | Estimated savings/month | Source |
|---|---|---|---|
| 1 | Sonnet-default + Opus gated to /prompt-forge only |
500K–2.7M | business-roi.md |
| 2 | Restore free local tier (fix devstral + MLX) | 30K–140K | inference-determinism.md |
| 3 | Restart Reality Anchor probes (rework avoidance) | 20K–60K | inference-determinism.md |
| 4 | Wire rag-context-for-builder.js into PreToolUse
hook |
~$4 (token), high indirect | rag-layer.md |
| 5 | Close lightrag-migrate-pump backlog (23,558 rows) | ~$15 token + freshness | rag-layer.md |
| 6 | Purge dead HiveMind namespaces + orphan DBs | ~$10 token + cleaner retrieval | rag-layer.md |
| 7 | Cull 27 dead files (tools/skills/.bak) | qualitative — cleaner discover.js | code-surface.md |
The headline is item 1: nothing else moves the needle until model selection is fixed.
CEO Decisions Surfaced
Risks Identified by Synthesis (not in individual reports)
R1. Compound failure mode — three safety nets failed together. Each report alone is concerning. Combined: (a) free tier silent-fails, (b) Reality Anchor probe doesn't detect drop, (c) no runtime cost ceiling, (d) discover.js misreports inventory so John can't see drift. There is no remaining instrument that would have caught the $741K burn except the cost tracker — which works, but is read by John after the fact, not enforced.
R2. discover.js as single point of trust failure.
Per ZAKON NULA, every tool-verify question routes through
discover.js. If discover.js --verify itself
lies about manifest-index.md and skill-registry.db, then every
"verified" claim downstream of it inherits the lie. This is the
most dangerous finding because it inverts the anti-hallucination
doctrine.
R3. Mehanik gate hallucinates dispatch
authorization. Mehanik is referenced in CLAUDE.md as the
mandatory pre-dispatch gate, but Mehanik itself is missing from
specialist-mapping.json (code-surface.md). The gate can't
authoritatively confirm an agent exists. Combined with the
manifest-index gap, dispatch routing operates on prose-level trust, not
data-level verification.
R4. Evidence ledger gate-bypass via null paths. 65
of 390 rows (16.7%) have null evidence_path. They count
toward gate row-counts without any artifact. With Reality Anchor probes
also dead, ledger integrity drops further — fabricated "PASS" claims
(precedent: Angie Jones qa-19, SnowIT public claims hallucination) can
re-occur with no automatic catch.
R5. The codebase is younger than the assumptions about it. code-surface.md notes 0 files >180 days old — system is <6 months old. But CLAUDE.md handbook references "1,310 scripts" and a manifest that never existed. The handbook narrates a system more mature than the disk reality. CEO planning may inherit this confidence gap.
Contradictions Across Reports
- Daemon count: ops-layer says 62 loaded / 70 plist files; business-roi says "55 total, 6 deprecated .bak". Likely both are partial views (ops counts launchctl entries; business counts canonical .plist files only). Reconcile via fresh probe.
- Opus spend prior claim: business-roi.md flags the prior $9,790/day audit as 10–11× too low — but that prior claim originates from the 2026-05-14 AI Factory audit cited in MEMORY index. Newer probe (costs.db) is authoritative; the May 14 finding should be retracted.
- LightRAG status: rag-layer says core is DEGRADED with ~15% probe failure; business-roi says "service up (302 CF Access = service up)"; knowledge-graph says "BLOCKED — returns 302". All three are partially correct: the Azure VM core responds at internal IP, but the public CF Access endpoint blocks tooling. Net verdict: YELLOW — operational but tooling-blind. (Source citations: rag-layer.md, business-roi.md, knowledge-graph.md.)
- HiveMind dead agent count: rag-layer cites 15;
knowledge-graph cites 15 with slightly different list (knowledge-graph
includes
john-delegate2026-04-11 and the mis-casedCodeCraft; rag-layer omits john-delegate but includestender-hunter2026-04-17). Both lists ~15; merge before pruning.
Validation Plan
Per /plan-with-team protocol:
- Task 8 (Proveo — Angie Jones): Re-probe ≥20% of cited claims with fresh tool output. Priority: costs.db spend total, Reality Anchor probe daemon PIDs, devstral:24b absence on both Ollama hosts, manifest-index.md non-existence, skill-registry.db row count, lightrag-migrate-pump backlog count, ADR numbering collision file list, specialist-mapping.json key set.
- Task 9 (Verifier atomic-claim decomposition): Read-only verifier subagent decomposes this report into ≤50 atomic claims, runs probe per claim, returns CONFIRMED/PARTIAL/HALLUCINATION verdict per claim. Cost <$0.50/run.
- Task 10 (Skillforge): Publish this report to
BookStack as
ALAI AI System Operating Picture 2026-05-18. Cross-link from System Architecture shelf. (Blocked until P1-4 CF Access token fix — fall back to manual upload.)
REPORT WRITTEN: ~/system/specs/ceo-ai-system-audit-2026-05-18-REPORT.md
Validation Patches (applied 2026-05-18 23:30 after Proveo + Verifier)
Sources:
/tmp/audit-2026-05-18/proveo-verdict.json,
/tmp/audit-2026-05-18/verifier-report.json
| Patch | Original Claim | Corrected | Source |
|---|---|---|---|
| V-P1 | $741,646 / 7 days | $742K / 8 days (May 11–18) — true 7d (May 12–18) = $365,104 | verifier A1, A2 |
| V-P2 | manifest-index.md MISSING | manifest-index.md exists at ~/system/tools/, STALE
since 2026-02-26 (claims 1,310, actual 273) |
verifier A10, A28, A35 |
| V-P3 | Mermaid node ManifestIdx = RED | recolored YELLOW (stale, not missing) | verifier A35 |
| V-P4 | P0-4 fix wording "generate or delete" | "regenerate on daily cron + add staleness meta-probe" | verifier corrective note |
| V-P5 | BookStack sync-map at ~/system/agents/ |
actual path ~/system/config/ |
proveo C7 |
| V-P6 | Prior $9,790/day estimate "10–11× under" | "10–40× under for peak days; on quiet days within ±2%" | verifier A5 |
Verdict on report after patches: Headlines (Opus burn, devstral ghost, Reality Anchor dead, MLX broken, skill-registry blind, ZAKON #12 dormant) all CONFIRMED by both validators. Diagnosis stands. Cost dollar range remains catastrophic regardless of window interpretation.