AI Factory Audit 2026-05-09 Complete audit deliverables Executive Summary (SENTINEL Final) SENTINEL AUDIT — Final Consolidated Report Date: 2026-05-09 Lead Validator: Sentinel Validator (consolidating P1–P5 findings) Destination: CEO (Alem Basic) FINAL VERDICT REWORK-MINOR The audit is fundamentally sound. The fix backlog is correctly prioritized. The CEO can act on Wave A items (RAG drain-worker, queue monitoring, Chroma audit, B2 billing) immediately. However, 5 MC stubs require AC refinement (≤30 min each) before general dispatch, and P4.1 carries 3 low-severity annotation corrections. None of these are blockers to CEO decision-making or Wave A execution. Headline (Bosnian) Fabrika je mrtva od marta — 62.5% obaveza ne radi. Pi-orchestrator nije dispatchovao ništa. John je ručni dispecer. Tri fixa otključavaju sve ostalo: RAG Vaultwarden kredencijal, definišite canonical dispatch path, žičajte verify-fix-loop. Top-5 Actionable Findings (Post-Corrections) 1. RAG ingest pipeline blocked — 3,150+ items queued (not the stale 454) Finding: rag-drain-worker crashed on Vaultwarden CF Access timeout. The metric file is 16 days stale (shows 454). Live SQLite count: 3,150 queued items — real state is 7x worse than the documented figure. Evidence: P3.1 H1 (health matrix), P5.2-verifier-report A5 (fresh queue depth probe showing 3,150), HiveMind #64900 (today's crash). Action priority: CRITICAL — Fix immediately (MC-STUB-01, Wave A, ~2h effort). Single credential fix (Vaultwarden session + CF Access token) drains 3,150+ items simultaneously. This single fix unblocks 3 downstream adapters. 2. pi-orchestrator not dispatching — HTTP port 8401 dead since March Finding: Process PID 75750 is alive. HTTP control plane is offline. No dispatch logs post-2026-03-19 (50+ days idle). durable-runner bridge (port 3052) is structurally alive but unclear if it's processing. The framing "mock mode" is inaccurate (P4.2 rebuttal) — the real issue is startup gating. Evidence: P3.1 C1/C2 (live probes), P4.2 Gap #2 rebuttal (no mock config found; config shows offlineMode: false ), P5.1 probe #3 (PID confirmed unchanged 5+ days). Action priority: HIGH — But requires architectural decision first (MC-STUB-02, Wave B). Is durable-runner the canonical dispatcher (HTTP port 8401 is legacy), or is HTTP supposed to be online? The fix depends on the answer. Do not attempt MC-STUB-08 (pi-orch restore) until this decision is made. 3. Verifier loop capability exists but zero auto-invocation Finding: verify-fix-loop skill is fully built, tested, and working. Accepts manual invocation. However, no daemon, hook, or pi-orchestrator code ever calls it. Important caveat (P4.2 rebuttal): This is NOT a structural gap. The REQUIRED verification gate is Proveo (Angie Jones), which IS wired via task-postflight. verify-fix-loop is an optional enhancement for self-correcting specs (docs, system, refactor domains). Evidence: P2.2 §2, P3.1 D1 (skill exists, manual-only), P4.2 Gap #3 (Proveo is the designed gate), CLAUDE.md Hard Constraint #4 (specifies Proveo, not verify-fix-loop). Action priority: MEDIUM — Feature enhancement, not blocker. Demoted to Wave C (MC-STUB-12) with L priority. Wire as optional section in /task-postflight after pi-orchestrator dispatch is restored. 4. Agent routing table incomplete — validator and distiller unmapped (44 references, 21 references, 0 routing entries) Finding: validator and distiller agents are cited 65 times across skill files but have zero entries in specialist-mapping.json. Important distinction (P4.2 rebuttal): These may be INTERNAL-ONLY agents (called from other agents, not from John). If internal-only, they should NOT be in the routing table. If routable by John, they must be added. This requires a routing policy decision first. Evidence: P1.3 (agent-fleet inventory shows 66 agents, mapping covers only 29), P4.2 Gap #5 rebuttal (may be internal-only), P4.3 MC-STUB-06 (design decision gates this fix). Action priority: MEDIUM — Requires CEO Decision #3 (routing policy scope: comprehensive vs curated). Once decided, implementation is ≤8h (MC-STUB-06, Wave B). 5. Four phantom companies unroutable (Axiom, Datavera, Resolver, Lexicon) Finding: All four have complete persona directories (CLAUDE.md, agents, company.json). ZERO entries in specialist-mapping.json. Correction (P4.2 rebuttal + P5.2-verifier A10): Lexicon IS routable (grep confirms 0 matches — P4.2 hallucinated a mapping entry). So the correct count is 3 phantom companies (Axiom, Datavera, Resolver), not 4. Lexicon is confirmed absent and phantom. Evidence: P1.3 (inventory shows all 4 have full infrastructure), P4.2 Gap #7 (rebuttal claims Lexicon is mapped — REFUTED by P5.2-verifier), P4.3 MC-STUB-07 (correctly lists 3 companies). Action priority: LOW — Inventory work + routing decision. Demoted to Wave B after routing policy (MC-STUB-06) is decided. MC-STUB-07 implements the fix for 3 companies (~4h effort, M priority). Wave A — Ship Now (No CEO Decisions Needed) These four MCs can be dispatched immediately. Combined effort: ~6h. Stub Title Effort Owner Why Safe to Ship MC-STUB-01 Restore RAG drain-worker: fix Vaultwarden session + CF Access S (≤2h) FlowForge Single credential fix. Machine-checkable ACs. Proveo-validated PASS (5.1 §2). Unblocks 3 adapters. MC-STUB-03 Implement live RAG queue depth monitoring M (≤8h) FlowForge Proveo PASS (5.1 §2). Depends on MC-STUB-01 (documented). No CEO decision required. MC-STUB-09 Audit and archive Chroma + stale mem0 collections S (≤2h) CodeCraft Proveo PASS (5.1 §2). Pure read-probe + cleanup. No blocking dependencies. MC-STUB-10 Raise B2 storage cap + verify litestream replication S (≤2h) FlowForge Proveo WEAK (credential placeholder needs fix — see rework list). But the task itself is low-risk (billing action). Fix AC before dispatch (≤5 min). Wave A partial: MC-STUB-04 (restore 5 deleted plists) — 4 of 5 plists can be unloaded/restored now. The 5th (pi-orch-health.sh) is blocked on MC-STUB-02 (canonical dispatch decision) because the health probe must be updated to check the right port. Wave B — Needs CEO Architectural Decisions First These fixes depend on 4 CEO decisions. Once decided, they are unblocked. CEO Decision #1 (CRITICAL): Canonical dispatch path The question: Is durable-runner (port 3052, 20d uptime) the canonical dispatcher — with pi-orchestrator HTTP (port 8401, dead) being a legacy control plane? OR is pi-orchestrator HTTP supposed to be online? Why only CEO can decide: This is a fork in how we interpret the system's design. No engineer can unilaterally choose which dead component to revive. Options: A. durable-runner is canonical. HTTP port 8401 is legacy. Document this, verify durable-runner is processing tasks, decommission HTTP. B. pi-orch HTTP is canonical. Diagnose startup gating (likely Ollama hang), restore it. durable-runner is subordinate. C. Both should be operational. Requires specifying the interaction model. Unblocks: MC-STUB-02 (design decision itself) MC-STUB-04 remainder (pi-orch-health.sh restoration) MC-STUB-08 (pi-orchestrator restore — actual kernel fix) CEO Decision #2 (MEDIUM): Blueprint score gate floor The question: What is the enforced minimum score for dispatch via Mehanik gate? Context: Observed practice allows dispatch at score 65 (WARN range). Original spec says 90. The code treats WARN as pass-through. Choose one and hardcode it. Options: A. Lower floor to 60 — match observed practice; WARN is acceptable. B. Floor stays at 90 — WARN becomes BLOCK; blueprints must score higher. C. Tiered: 60 for L tasks, 75 for M, 90 for H+. Unblocks: MC-STUB-05 (enforce gate at the chosen floor) CEO Decision #3 (MEDIUM): specialist-mapping.json scope policy The question: Should the routing table be comprehensive (all 66 agents) or curated (only John-dispatchable agents)? Why it matters: validator and distiller are cited 65 times but may be internal-only. If internal, they must NOT be in the routing table. If John-routable, they must be added. Options: A. Curated — only John-dispatchable agents enter the mapping. Internal agents documented separately. B. Comprehensive — all agents mapped; entry type field distinguishes dispatch vs internal. Unblocks: MC-STUB-06 (routing policy design + specialist-mapping update) MC-STUB-07 (register 3 phantom companies or mark as experimental) CEO Decision #4 (LOW): mem0 future role The question: What is mem0's long-term status? Context: 865 stale facts. Zero active writers. .md + LightRAG is the working pipeline. mem0 server running and consuming resources. Options: A. Deprecate — stop mem0 server; archive Qdrant vectors; remove from settings.json. B. Keep experimental — document as optional parallel sandbox, not canonical. C. Promote — wire PostToolUse hook to write every .md update to mem0 simultaneously (high effort, not recommended). Recommendation (Petter): Option A (deprecate). The .md pipeline works. mem0 is cognitive overhead. Unblocks: MC-STUB-09 + MC-STUB-11 (memory-plane documentation) Surfaced Contradictions Resolved Contradiction 1: RAG queue depth — 454 vs 3,150 P4.1 synthesis stated: Queue depth 454 (from stale metric). P5.2 verifier caught: Live SQLite shows 3,150 queued items (16 days newer data). Resolution: Both figures are correct — the metric file is 16 days stale. The synthesis should have emphasized the live count (3,150) or stated "actual count unknown; 454 is a lower bound from 16 days ago." This is a severity understatement, not a factual error. MC-STUB-01 AC#5 requires live queue monitoring to prevent future metric staleness. Contradiction 2: pi-orchestrator "mock mode" vs actual config P2.1 connectivity diagram stated: pi-orch in MOCK MODE, alai-config-mock.json loaded. P4.2 devils-advocate rebutted: No mock config found. Config shows offlineMode: false , enabled: true . P3.1 verified: Zero grep matches for "mock" in pi-orchestrator.js. Resolution: The "mock mode" framing is inaccurate. The real issue is HTTP port 8401 startup gating (likely an initialization hang, not intentional test mode). P4.1 executive summary repeats "mock/broken mod" but should be updated to "HTTP startup gating failure" per P3.1/P4.2 evidence. Contradiction 3: Chain runner existence P4.1 synthesis stated: 35 chain YAML files have no executor; chain-runner doesn't exist. P5.2 verifier caught: chain-runner.js (31KB, fully functional) and chain-runner.sh (Pillar #5) both exist. Resolution: Chain runners DO exist. They are not broken in the sense of missing — they are broken/unused because: (a) No active skill invokes them (skills call agents inline), (b) Three chain-related daemons exit 1 due to downstream failures, (c) The runners are un-integrated, not absent. The correct claim is "chains are un-invoked and un-integrated," not "no executor exists." This distinction matters for the fix: restoring chains requires fixing downstream dependencies, not writing a new runner. Contradiction 4: Lexicon company phantom status P4.1 Gap #7 stated: 4 phantom companies — Axiom, Datavera, Resolver, Lexicon. P4.2 devils-advocate claimed rebuttal: Lexicon IS in specialist-mapping.json. P5.2 verifier caught: grep "Lexicon" ~/system/agents/specialist-mapping.json → 0 matches. Lexicon is NOT routable. Resolution: P4.2 hallucinated the Lexicon entry (ZAKON NULA breach). The correct count is 4 phantom companies, not 3. P4.3 MC-STUB-07 correctly lists the affected companies as the full 4 in some passages but may have been partially rewritten. This audit's final count: all 4 are confirmed unroutable (Axiom, Datavera, Resolver, Lexicon). Update MC-STUB-07 scope to list all 4. Contradiction 5: mem0 SoR intent P4.1 synthesis stated: mem0 is the intended System of Record; it's broken. P4.2 devils-advocate rebutted: mem0 was never designated as SoR in CLAUDE.md or any spec. Resolution: The gap is dismissed (correctly). .md + LightRAG is the designed pipeline (Claude Code native auto-memory → lightrag-auto-ingest.sh hook → LightRAG). mem0 was a prototype that never achieved SoR status. The correct fix is documentation (MC-STUB-11), not re-wiring mem0. This satisfies the dismissed gap. Contradiction 6: HiveMind read API P1.1 implied: HiveMind has no read API. P3.1 found: hivemind.js read/query/semantic_query all functional. API exists. Resolution: P1.1 overstated the gap. HiveMind is the healthiest store in the factory (17,560+ live intel rows, read API functional, daily writes). No contradiction to resolve — P3.1 corrected the inventory claim. Open Questions for CEO Canonical dispatch path: durable-runner or pi-orchestrator HTTP? (CEO Decision #1) Blueprint score gate: Enforce at 60, 75, or 90? (CEO Decision #2) specialist-mapping.json scope: Comprehensive or curated? (CEO Decision #3) mem0 future role: Deprecate or keep as experimental? (CEO Decision #4) Anything else surfaced: Any findings in this audit that require clarification before we proceed with Wave A? Recommendation John should dispatch Wave A immediately (RAG drain-worker, queue monitoring, Chroma audit, B2 cap raise — ~6h total). These are unblocked and low-risk. While Wave A runs, John should surface CEO Decision #1 (canonical dispatch path) to the CEO and gather answers for Decisions #2–4. Once Decision #1 is resolved, Wave B becomes unblocked and John can schedule MC-STUB-02 (design decision) + the downstream fixes (pi-orch-health.sh, pi-orchestrator restore, routing policy). The audit is sound. The backlog is prioritized. The next blocker is not more analysis — it is the CEO's architectural calls. Rework Required Before General Dispatch Category A — AC refinement (5 stubs, ≤30 min each): MC-STUB-04: Split OR-condition into per-plist ACs; replace 24h window with point-in-time exit-code check. MC-STUB-06: Rewrite discover.js routing ACs to assert the specific agent returned (not just "non-empty"); make count-diff self-contained. MC-STUB-08: Replace 5-min wait AC with point-in-time dispatch log check; replace 30-min cron monitoring with a statement that cron probe is a child task. MC-STUB-10: Replace credential placeholder with bw get item command; add log-file existence check. MC-STUB-12: Define the "postflight log" artifact path; specify task-postflight invocation mode or output. Category B — P4.1 annotations (≤15 min): Replace "mock/broken mod" in executive summary with "HTTP startup gating failure." Update Gap #7 to note P4.2 rebuttal revised count (but P5.2-verifier refutes that rebuttal — final count is 4 phantom companies, not 3). Clarify that "93K+ vectors" is raw Qdrant embeddings across all collections, not mem0-only count (865 facts is the mem0 application-layer count). Audit Status: COMPLETE Validator: Sentinel Validator (consolidation) Evidence directory: /tmp/ai-factory-audit-2026-05-09/ Prior phases: P1 (inventory), P2 (connectivity), P3 (health matrix), P4 (synthesis + rebuttal + backlog), P5 (validation + verification + final consolidation) Report produced by Sentinel Validator 2026-05-09 Consolidated from 11 audit reports + 3 rebuttal layers + live probe verification Connectivity Diagram 2.1 — AI Factory Connectivity Diagram Date: 2026-05-09 Auditor: sentinel-architect Phase: 2 — Synthesis from P1 inventory reports 1.1, 1.2, 1.3, 1.4 and P2 reports 2.2, 2.3 Mode: READ-ONLY. No mutations. Section A — Control Plane Diagram The diagram below shows the advertised flow from CEO input to task closure. Solid arrows are flows that actually work. Dotted red arrows are advertised edges that are broken or absent. Labels show the transport mechanism. flowchart TD CEO([CEO / Alem]) JOHN([John — Orchestrator\nClaude Code CLI session]) MH["/mehanik gate\n~/.claude/agents/mehanik.md\n113 cleared tokens in /tmp"] PF["/prompt-forge\n~/.claude/skills/prompt-forge/"] PIO["pi-orchestrator\n~/system/kernel/pi-orchestrator.js\nPID 75750 — MOCK MODE"] SPEC["Specialist Agent\ne.g. petter-graff, angie-jones\n~/.claude/agents/*.md"] TOOL["Tool\n~/system/tools/ (250 live)"] ART["Artifact\n(code / doc / spec / evidence file)"] VERIFIER["Verifier / verify-fix-loop\n~/.claude/agents/verifier.md\n~/.claude/skills/verify-fix-loop/"] TPF["/task-postflight\n~/.claude/skills/task-postflight/"] MCD["mc.js done\n~/system/tools/mc.js"] PROVEO["Proveo / Angie Jones\n~/.claude/agents/angie-jones.md"] HOOK["Hook Layer\n~/.claude/hooks/ (12 active)"] CEO -- "CLI conversation" --> JOHN JOHN -- "CLI / Task dispatch" --> MH MH -- "cleared token written to /tmp/mehanik-cleared-N\nBlueprint read enforced (PARTIALLY — WARN scores pass)" --> PF PF -- "forged prompt → Task dispatch" --> PIO PIO -. "Task dispatch — mc.js write\nBROKEN: MOCK MODE\nalai-config-mock.json loaded\nPlanka localhost:3100 not listening\n'No eligible tasks' every 30s" .-> SPEC SPEC -- "Tool calls\n(Read / Edit / Bash / Grep)" --> TOOL TOOL -- "Write / Edit" --> ART ART -- "mc.js ready write" --> HOOK HOOK -- "PreToolUse / PostToolUse\nexits 0 = pass, exits 2 = block" --> MCD MCD -. "ADVERTISED: auto-invokes verifier\nACTUAL: ABSENT\n0 hooks, daemons, or pi-orch code\ncalls verify-fix-loop\n(source: 2.2)" .-> VERIFIER VERIFIER -. "ADVERTISED: auto-loop to fix-builder\nACTUAL: manual invocation only\nno programmatic trigger" .-> SPEC MCD -- "mc.js ready → /task-postflight\n(manual invocation only for H tasks)" --> TPF TPF -- "Task dispatch — CLI" --> PROVEO PROVEO -- "AC checklist → verdict" --> TPF TPF -- "mc.js done (with evidence)" --> MCD MCD -. "ADVERTISED: pi-orchestrator consumes\n'done' events for next task\nACTUAL: MOCK MODE — consuming nothing" .-> PIO style PIO fill:#ffcccc,stroke:#cc0000 style VERIFIER fill:#ffcccc,stroke:#cc0000 style MH fill:#ffffcc,stroke:#cccc00 Annotation notes: CEO → John: works. Standard CLI session. John → Mehanik: works. 113 cleared tokens confirm Mehanik runs regularly. Mehanik → prompt-forge → pi-orchestrator: the dispatch chain exists structurally. pi-orchestrator is alive (PID 75750) but in MOCK MODE — it reads mock config and never consumes real MC tasks. pi-orchestrator → Specialist: BROKEN because mock mode means pi-orchestrator never fires a Task dispatch to a real specialist. Specialist → Tool → Artifact: works when agents are dispatched by John manually (not via pi-orchestrator). Artifact → mc.js done (via hooks): works. The hook layer (12 active hooks) enforces gates on mc.js writes. mc.js done → verifier: ABSENT. No automated trigger. CEO is the de-facto verifier (source: 2.2). mc.js done → pi-orchestrator: BROKEN. Mock mode means pi-orchestrator does not react to task completions. Section B — Data Plane Diagram Shows all memory stores with their actual write paths (solid = live, dotted red = dead, dotted orange = partial/degraded). flowchart LR CC["Claude Code\n(built-in auto-memory)"] MDFILES[".md auto-memory files\n~/.claude/projects/-Users-makinja/memory/\n123 files — LIVE"] HOOK_LR["lightrag-auto-ingest.sh\nPostToolUse hook\nfires on Write/Edit to in-scope paths"] LR["LightRAG\nlocalhost:9621\n999 docs indexed\npipeline_busy=true\nHEALTHY but DEGRADED"] DISCOVER["discover.js\nhttps://lightrag.alai.no/query\n(external hostname — Caddy proxy)"] IQ["ingest-queue.sqlite\n~/system/state/\n946 items FROZEN"] RDW["rag-drain-worker\nPID 3640\nETIMEDOUT on Vaultwarden"] RBA["rag-bookstack-adapter\nevery 5min — exit 256\nblocked by backpressure"] RMCA["rag-mc-adapter\nevery 5min — exit 256\nblocked by backpressure"] RFSEA["rag-fsevents-adapter\nWatchPaths — exit 1\nblocked by backpressure"] BKS["BookStack\ndocs.alai.no"] MCLOG["mc-task-outcomes.jsonl\n~/system/logs/"] MEM0["mem0 API\nlocalhost:9000\nHEALTHY — 0 active writers"] QDR["Qdrant\nlocalhost:6333\n5 collections\n93,510 total vectors"] MEM0J["mem0_john collection\n865 vectors — STALE"] KNOW["knowledge collection\n31,274 vectors — STALE\nunknown origin"] SESS["sessions collection\n929 vectors — unknown writer"] HIVE_Q["hivemind collection\n60,442 vectors — LIVE"] HIVEJS["hivemind.js CLI\ndual-write on post"] HIVEDB["HiveDB SQLite\nhivemind.db\n17,551 intel rows — LIVE"] CHROMA["Chroma\n~/.claude-mem/chroma/\n6,584 embeddings\nno active writer or reader"] FLYWHEEL["flywheel.db SQLite\n~/system/databases/\nLIVE — rag-router.js cache"] RAG_ROUTER["rag-router.js\ncache → Ollama → external"] CC -- "native write" --> MDFILES MDFILES -- "PostToolUse trigger" --> HOOK_LR HOOK_LR -- "curl POST localhost:9621" --> LR LR -- "serves queries" --> DISCOVER BKS -- "poll every 5min" --> RBA MCLOG -- "tail" --> RMCA RBA -- "enqueue" --> IQ RMCA -- "enqueue" --> IQ RFSEA -- "enqueue" --> IQ IQ -- "drain attempt" --> RDW RDW -. "DEADLOCKED\nVaultwarden ETIMEDOUT\nCF Access creds missing\n946 items queued, 0 drained" .-> LR HIVEJS -- "write" --> HIVEDB HIVEJS -- "dual-write best-effort" --> HIVE_Q HIVE_Q --> QDR MEM0 --> QDR QDR --> MEM0J QDR --> KNOW QDR --> SESS QDR --> HIVE_Q CC -. "INTENDED: POST localhost:9000/add\nACTUAL: ABSENT\n0 callers in hooks/tools/daemons" .-> MEM0 DISCOVER -. "INTENDED: query mem0 for personal facts\nACTUAL: ABSENT\ndiscover.js does not call localhost:9000" .-> MEM0 CHROMA -. "writer UNKNOWN\nreader UNKNOWN\n6584 embeddings orphaned" .-> CHROMA RAG_ROUTER -- "learn" --> FLYWHEEL RAG_ROUTER -- "query cache-hit" --> FLYWHEEL style RDW fill:#ffcccc,stroke:#cc0000 style IQ fill:#ffcccc,stroke:#cc0000 style MEM0 fill:#fff0cc,stroke:#cc8800 style MEM0J fill:#ffcccc,stroke:#cc0000 style KNOW fill:#ffcccc,stroke:#cc0000 style CHROMA fill:#ffcccc,stroke:#cc0000 style SESS fill:#fff0cc,stroke:#cc8800 Key findings: The LightRAG local write path (Claude Code → .md → hook → LightRAG) works but the queue-drain path (746+ items from bookstack, MC logs, fsevents) is completely deadlocked because rag-drain-worker cannot authenticate through Cloudflare Access (Vaultwarden ETIMEDOUT). mem0 is a ghost: server alive, 93K+ vectors in Qdrant, zero active writers, zero active readers through the API. Chroma is a full orphan: 6,584 embeddings from an unknown writer, no identified reader. The Qdrant hivemind collection (60K+ vectors) is live because hivemind.js writes to it directly, bypassing the mem0 API entirely — this is the only healthy Qdrant write path. Section C — Agent / Persona / Chain Plane flowchart TD SMJ["specialist-mapping.json\n~/system/agents/specialist-mapping.json\n29 mapped agents\n9 registered companies\nSOURCE OF TRUTH (incomplete)"] CLAUDE_AGENTS["~/.claude/agents/\n66 .md files\nRUNTIME STORE\n(what Claude Code can dispatch)"] DEFINITIONS["~/system/agents/definitions/\nBACKUP STORE\n48 synced + 8 definitions-only"] SYNC["~/bin/agent-definitions-sync.sh\nMANUAL — not scheduled"] PERSONAS["~/system/agents/personas/\n12 persona dirs"] P_REAL["8 Routable Companies\nAgentForge, CodeCraft, Finverge\nFlowForge, Proveo, Securion\nSkybound, Vizu\n(partial mapping only)"] P_PHANTOM["4 Phantom Companies\nAxiom, Datavera, Resolver, Lexicon\nFull persona dirs, CLAUDE.md, agents/\n0 entries in specialist-mapping.json\nDispatch path = NONE via John routing"] CHAINS["~/system/agents/chains/\n35 .yaml files\nNO chain runner exists\nall DEAD as executable automation"] MAPPED_OK["24 mapped agents\nreachable on disk\nCAN be dispatched"] MAPPED_MISSING["7 mapped agents\nIN specialist-mapping.json\nMISSING from ~/.claude/agents/\ndispatches SILENTLY FAIL\n(dorota-huizinga, hadi-hariri\njames-bach, lee-robinson\nlisa-crispin, minion\nanthropicchief-architect=fully phantom)"] UNMAPPED_CRITICAL["Critical unmapped agents\nIN ~/.claude/agents/\nNOT in specialist-mapping.json:\n- validator (44 skill refs)\n- distiller (21 chain refs)\n- mehanik (7 skill refs)\n- evidence-verifier\n- baseline-comparator\n- dzevad-jahic (Lexicon)\n- planner (phantom — in chains only)"] UNMAPPED_ORPHAN["11 Orphan agents\nno chain/skill/daemon refs:\n0.md, dr-sarah-chen, Explore\nhelixsupport, indy-dandev\nmaria-santos, meta-agent\nPlan, rag-builder\nredzo-reviewer, thaer-sabri"] SMJ --> CLAUDE_AGENTS SMJ -. "7 mapped agents\nnot on disk = UNREACHABLE" .-> MAPPED_MISSING CLAUDE_AGENTS --> MAPPED_OK CLAUDE_AGENTS --> UNMAPPED_CRITICAL CLAUDE_AGENTS --> UNMAPPED_ORPHAN DEFINITIONS -- "manual sync\n(agent-definitions-sync.sh)" --> CLAUDE_AGENTS SYNC -. "not scheduled\ndrift pressure continuous" .-> DEFINITIONS PERSONAS --> P_REAL PERSONAS --> P_PHANTOM P_PHANTOM -. "no routing entry\ndirect session name-drop only\nundocumented and unreliable" .-> CLAUDE_AGENTS P_REAL --> SMJ CHAINS -. "NO EXECUTOR\n35 YAML files are docs only\nSkills call agents inline\nnot via chain runner" .-> CLAUDE_AGENTS style MAPPED_MISSING fill:#ffcccc,stroke:#cc0000 style P_PHANTOM fill:#fff0cc,stroke:#cc8800 style CHAINS fill:#ffcccc,stroke:#cc0000 style UNMAPPED_CRITICAL fill:#fff0cc,stroke:#cc8800 Key findings: specialist-mapping.json covers only 29 of 66 agents (44%). The two highest-usage agents system-wide — validator (44 skill file refs) and distiller (21 chain refs) — are completely absent from the routing table. 7 agents are mapped (John thinks he can dispatch them) but physically missing from ~/.claude/agents/ . Any dispatch attempt silently fails. 35 chain YAML files have no executor. They exist as documentation only — skills invoke agents inline and ignore chain files entirely. 4 phantom companies (Axiom, Datavera, Resolver, Lexicon) have full organizational infrastructure on disk but are completely invisible to John's routing system. Section D — The True Picture (CEO-readable, 60 seconds) Plan vs. Reality The architecture diagram on paper shows: CEO gives task → John gates it through Mehanik → pi-orchestrator dispatches specialists → work gets done → verifier autonomously checks it → mc.js closes the loop. The actual flow is: CEO gives task → John manually dispatches a specialist in the current conversation → specialist builds → John manually verifies (or CEO does) → John manually calls mc.js done. Every automatic layer between "task received" and "task closed" is either in mock mode, deadlocked, or simply absent. The 3 Fattest Dead Edges Dead Edge 1 — pi-orchestrator in MOCK MODE. The orchestration kernel (PID 75750) is alive and cycling every 30 seconds. It reads alai-config-mock.json . Planka/MC API at localhost:3100 is not listening. The kernel prints "No eligible tasks" and does nothing. Every task that should flow automatically through the factory instead requires John to manually dispatch via conversation. This is the single edge whose repair would convert the factory from "manual assembly" to "automated pipeline." Dead Edge 2 — RAG drain-worker deadlocked (946 items queued, 0 drained). Three adapters (BookStack, MC logs, filesystem events) successfully enqueue documents into ingest-queue.sqlite . The drain-worker (PID 3640) picks them up and tries to POST to LightRAG through Cloudflare Access — but Vaultwarden times out, so CF credentials cannot be fetched. The entire 946-item queue has been frozen. Meanwhile, the fsevents adapter is watching for filesystem changes and trying to enqueue lightrag-monitor health files — creating a feedback loop where the monitoring system feeds into the broken pipeline it is monitoring. One credential fix (valid /tmp/bw-session + reachable Vaultwarden) unblocks all three adapters simultaneously. Dead Edge 3 — Verifier auto-invocation ABSENT. The verify-fix-loop skill and its verifier + fix-builder agents are fully specified and internally correct. There is zero wiring to any automated trigger. No hook, no daemon, no pi-orchestrator code calls them. When mc.js ready fires, no verification agent is invoked. CEO is the de-facto quality gate for the entire factory. One wiring point in /task-postflight SKILL.md (Section 2b) would give autonomous verification for non-high-stakes tasks immediately, without new infrastructure. The 3 Highest-Leverage Wire Fixes Fix 1 — Restore pi-orchestrator real config (L fix, maximum leverage). Determine why alai-config-mock.json loads instead of real config. If Planka is intentionally offline, restore it or point the orchestrator at the real MC API endpoint. This single fix converts the factory from "John as human dispatcher" to "automated task routing." Impact: every other automation layer (specialist dispatch, postflight, cost tracking) becomes meaningful instead of idle. Fix 2 — Fix rag-drain-worker CF credentials (S fix, unblocks 946-item queue). Ensure Vaultwarden is reachable and /tmp/bw-session is valid for the service token that holds the LightRAG CF Access credentials. This is estimated as a 30-minute fix (refresh session token + verify vault connectivity). Impact: 946 queued RAG items drain, BookStack sync resumes, MC outcome logging resumes, the circular monitoring feedback loop breaks. Fix 3 — Wire verify-fix-loop into /task-postflight (M fix, eliminates CEO-as-verifier bottleneck). Add a Section 2b to ~/.claude/skills/task-postflight/SKILL.md : after Proveo passes AC checklist, dispatch /verify-fix-loop for docs / system / refactor / polish domain tasks (MAX_LOOPS=3, $5 cap already defined in the skill). This requires no new infrastructure — the skill conversation context already supports Task dispatch. Impact: CEO is removed from the quality loop for the majority of non-high-stakes tasks. Section E — Edge Inventory Table # From To Transport Status Evidence Fix Size 1 CEO John (orchestrator) CLI conversation LIVE Observed every session — 2 John /mehanik gate Task dispatch / CLI LIVE 113 cleared tokens in /tmp — 3 /mehanik gate Blueprint read Read tool call PARTIAL CB#2 enforced; WARN scores (65/80) pass; missing-MC-ID bypasses gate entirely (2.3) S 4 /mehanik gate /prompt-forge CLI / Task dispatch LIVE Observed in token chain — 5 /prompt-forge pi-orchestrator mc.js write / Task PARTIAL pi-orch alive but MOCK MODE (1.4) L 6 pi-orchestrator Specialist agent Task dispatch DEAD MOCK MODE — "No eligible tasks" every 30s; Planka localhost:3100 not listening (1.4) L 7 John (manual) Specialist agent Task dispatch (CLI) LIVE Observed — this is the actual dispatch path — 8 Specialist agent Tools (Read/Edit/Bash) Tool API calls LIVE 250 live tools verified (1.2) — 9 Tools Artifact (file/code) Write / Edit LIVE Standard Claude Code behavior — 10 Artifact mc.js ready mc.js write + hook LIVE mc-ready-gate.sh fires; 12 active hooks (2.2) — 11 mc.js ready verifier / verify-fix-loop (absent) DEAD 0 hooks, 0 daemons, 0 pi-orch code calls verify-fix-loop (2.2) M 12 mc.js ready /task-postflight Manual CLI invocation PARTIAL H-tasks only; manual trigger; no auto-invocation (2.2) M 13 /task-postflight Proveo / Angie Jones Task dispatch LIVE Skill dispatches angie-jones.md; present on disk (2.2) — 14 Proveo mc.js done mc.js write LIVE AC checklist → done path works — 15 mc.js done pi-orchestrator (next task) mc.js event / API DEAD MOCK MODE — pi-orch does not react to done events (1.4) L 16 Claude Code built-in .md memory files Native write LIVE 123 files, auto-written by Claude Code (1.1) — 17 .md memory files lightrag-auto-ingest.sh PostToolUse hook trigger LIVE Hook fires on Write/Edit to in-scope paths (1.1) — 18 lightrag-auto-ingest.sh LightRAG localhost:9621 curl POST LIVE 999 docs indexed; pipeline_busy=true (1.1) — 19 discover.js LightRAG (external) HTTPS GET to lightrag.alai.no LIVE External hostname via Caddy proxy (1.1) — 20 rag-bookstack-adapter ingest-queue.sqlite SQLite write DEAD Exit 256 — backpressure gate (946 > 500) from frozen drain-worker (1.4) S 21 rag-mc-adapter ingest-queue.sqlite SQLite write DEAD Exit 256 — same backpressure cascade (1.4) S 22 rag-fsevents-adapter ingest-queue.sqlite SQLite write / WatchPaths DEAD Exit 1 — blocked by backpressure; also feeding monitoring artifacts into queue (1.4) S 23 rag-drain-worker LightRAG (via CF Access) HTTPS POST (authenticated) DEAD Vaultwarden ETIMEDOUT — CF credentials unavailable; 946 items queued, 0 drained (1.4) S 24 Any tool/hook/daemon mem0 API localhost:9000 HTTP POST DEAD 0 callers found in all of ~/system/tools, ~/.claude/hooks, ~/system/daemons (1.1) M 25 discover.js mem0 API HTTP GET DEAD discover.js does not query localhost:9000 (1.1) M 26 mem0 API Qdrant mem0_john collection gRPC / HTTP PARTIAL Server healthy; mem0_john has 865 stale vectors; no active writer to keep them fresh (1.1) M 27 hivemind.js HiveDB SQLite SQLite write LIVE 17,551 intel rows; write path active (1.1) — 28 hivemind.js Qdrant hivemind collection HTTP (qdrant-client) LIVE 60,442 vectors; dual-write best-effort (1.1) — 29 Chroma store Any consumer (unknown) DEAD 6,584 embeddings, no traced writer or reader (1.1) M 30 agent-definitions-sync.sh ~/.claude/agents/ file copy PARTIAL 48 files synced; 8 definitions-only agents unreachable at runtime; sync not scheduled (1.3) S 31 specialist-mapping.json Dispatch routing JSON lookup PARTIAL 29/66 agents mapped; validator (44 refs) and distiller (21 refs) absent; 7 mapped agents missing from disk (1.3) M 32 35 chain YAML files chain runner / executor (absent) DEAD No chain runner exists; skills call agents inline; chains are documentation only (1.3) L 33 John routing Axiom/Datavera/Resolver/Lexicon discover.js lookup DEAD 4 companies absent from specialist-mapping.json; routing impossible via normal path (1.3) M 34 pi-orch-health monitor pi-orchestrator health signal shell script DEAD pi-orch-health.sh deleted; last verdict 2026-05-06 CRITICAL; dark since (1.4) S 35 cost-daily-report daemon daily cost visibility shell script DEAD cost-daily-report.sh deleted; cost reporting dark since 2026-04-29 — 10 days (1.4) S 36 mc-ready-gate.sh Blueprint score enforcement blueprint-check.js PARTIAL Check runs; WARN scores (65, 80) allow dispatch; threshold 90 is advisory only (2.3) S 37 Mehanik Session binding validation token mehanik_session_id DEAD All 113 inspected tokens show mehanik_session_id: unknown; cross-session reuse possible (2.3) S 38 b2-offsite-backup B2 cloud storage B2 API DEAD 403 storage_cap_exceeded; nightly snapshots not landing (1.4) S 39 litestream B2 replication stream B2 API PARTIAL Litestream PID alive; separate nightly job fails; live replication status uncertain (1.4) S 40 slack-bot Slack WebSocket Socket Mode PARTIAL PID 18046 alive; last crash exit 1; 300min silent at audit time; reconnects on timeout (1.4) S Status key: LIVE — flow confirmed working by tool-verified evidence DEAD — flow confirmed broken or absent by tool-verified evidence PARTIAL — flow structurally exists but has gaps, bypass paths, or degraded throughput Fix size: S — Small: under 4 hours, single-file or credential change M — Medium: 1–2 days, new wiring or multi-file coordination L — Large: 3+ days, architectural change or multi-system coordination Summary Statistics Category Count Total edges inventoried 40 LIVE 15 DEAD 15 PARTIAL 10 Edges repairable with S fix 10 Edges repairable with M fix 8 Edges repairable with L fix 3 The factory has a 37.5% live edge rate. The remaining 62.5% of advertised flows are either dead or degraded. The 3 L-fixes (pi-orchestrator mock mode, chain runner, verifier auto-invocation architecture) unblock the most downstream flows if resolved. The 10 S-fixes are individually cheap and collectively close significant operational blind spots (cost reporting, RAG drain, blueprint score enforcement, monitoring, B2 backup). Inventory: Memory Plane Memory Plane Inventory — AI Factory Audit Date: 2026-05-09 Auditor: Chip Huyen (AgentForge) Scope: Read-only probe. No mutations. Task: Plan Task 1.1 — Memory Plane Inventory 1. Per-Store Table Store Endpoint / Path Schema / Collections Live Count Write Path Read Path Owner Daemon Status mem0 / Qdrant http://localhost:9000 (mem0 API) / http://localhost:6333 (Qdrant gRPC+HTTP) 5 collections: mem0migrations (0 pts), sessions (929 pts), hivemind (60,442 pts), mem0_john (865 pts), knowledge (31,274 pts) 93,510 total vectors No caller found. mem0 API ( POST /add ) is NEVER called by any hook, tool, or daemon in ~/system/tools/ or ~/.claude/hooks/ . hivemind.js dual-writes to Qdrant hivemind collection directly via internal HTTP (port 6333). No tool reads localhost:9000 for queries. hivemind.js semantic search reads Qdrant hivemind collection directly via qdrant-client . discover.js does NOT query mem0. com.alai.mem0-server (LaunchAgent, KeepAlive=true, PID 65706 alive, last exit was SIGTERM -15) HEALTHY (server alive, but ORPHANED — no producer writes to mem0_john or knowledge via the mem0 API) Chroma ~/.claude-mem/chroma/chroma.sqlite3 1 collection: cm__claude-mem 6,584 embeddings Unknown — no daemon or hook references claude-mem path in scanned tools. Likely written by a claude-mem MCP server or CLI tool directly. Unknown — no caller found in ~/system/tools/ or ~/.claude/hooks/ . None identified PARTIAL (data exists, producer and consumer both untraced) LightRAG http://localhost:9621 Neo4J graph + NanoVectorDB + JsonKV storage; workspace /app/data 999 processed docs, 1 failed (pipeline_busy=true, 120 async locks pending — actively ingesting) ~/.claude/hooks/lightrag-auto-ingest.sh (PostToolUse: Write/Edit) — fires on writes to ~/.claude/projects/-Users-makinja/memory/*.md , ~/system/specs/*.md , and /tmp/*-bookstack-*.md . Also com.alai.lightrag-outbox-ingest.plist daemon. discover.js — primary read path. Queries https://lightrag.alai.no/query (external hostname, not localhost). Fallback: if local hits < 3, LightRAG fallback fires. com.alai.lightrag-watchdog.plist , com.alai.lightrag-keepwarm.plist , com.alai.lightrag-backup.plist , com.john.lightrag-monitor.plist , com.alai.lightrag-migrate-pump.plist HEALTHY (serving, ingesting) HiveDB (SQLite) ~/system/agents/hivemind/hivemind.db 7 tables: agents (139 rows), memos (100 rows), intel (17,551 rows), subscriptions (6 rows), _litestream_seq , _litestream_lock , sqlite_sequence 17,551 intel rows (NOTE: context memo said 64,889 — live probe shows 17,551; delta likely from live deletions or memo was stale) hivemind.js post — agents call this CLI to write intel. Also dual-writes embeddings to Qdrant hivemind collection (best-effort, fire-and-forget). hivemind.js read/query/search — text search + semantic search (cosine sim against local embeddings or Qdrant). discover.js does NOT query HiveDB directly. hivemind.js (stateless CLI, no daemon; called ad-hoc by agents) HEALTHY .md auto-memory ~/.claude/projects/-Users-makinja/memory/ 123 .md files (MEMORY.md index + per-topic files + feedback memos + _archive/) 123 files Claude Code's built-in auto-memory system (native Claude Code feature — writes .md files after conversations automatically, not via any explicit hook or daemon). lightrag-auto-ingest.sh PostToolUse hook then ingests these into LightRAG when they are written/edited. CLAUDE.md "Context Loading" section instructs John to Read specific files directly. discover.js memory "" is documented as LightRAG-backed (reads LightRAG, not the .md files directly). Built-in Claude Code (no external daemon) HEALTHY (write path functional; read path partially bypassed — LightRAG index only 999 docs, not all 123 .md files confirmed ingested) 2. Producer → Consumer Matrix Producer Store Written Consumer Notes Claude Code built-in auto-memory ~/.claude/projects/-Users-makinja/memory/*.md (123 files) lightrag-auto-ingest.sh hook (secondary producer → LightRAG) Auto-memory is Claude Code native. The .md write triggers the hook. lightrag-auto-ingest.sh (PostToolUse hook) LightRAG http://localhost:9621 discover.js (primary RAG consumer) Only fires on Write/Edit tool calls to in-scope paths. Does NOT write to mem0. com.alai.lightrag-outbox-ingest.plist daemon LightRAG discover.js Batch ingest pipeline for outbox staging hivemind.js post (called by agent tools) HiveDB SQLite hivemind.db + Qdrant hivemind collection (dual-write) hivemind.js read/query/search (CLI) Qdrant hivemind = 60,442 vectors; SQLite intel = 17,551 rows — divergence suggests Qdrant has historical vectors beyond current SQLite rows (possibly from bulk migration) NOBODY mem0 API ( localhost:9000/add ) — mem0_john collection (865 pts), knowledge collection (31,274 pts) NOBODY reads via mem0 API either WIRE BREAK: mem0_john has 865 facts that were presumably written at some point (possibly during initial mem0 setup / manual population), but no current tool, hook, daemon, or agent calls POST localhost:9000 . The mem0 API is a running server with no active clients. NOBODY identified Chroma ~/.claude-mem/chroma/ (6,584 embeddings) NOBODY identified Chroma has data (6,584 embeddings in cm__claude-mem ) but producer and consumer are both untraced in current tooling. Likely written by a claude-mem MCP tool in a previous iteration. com.john.session-archiver.plist Likely sessions Qdrant collection (929 pts) discover.js --sessions (reads sessions SQLite, not Qdrant) Sessions exist in Qdrant but discover.js reads from a local SQLite sessions table, not via mem0 or Qdrant API rag-router.js learn ~/system/databases/flywheel.db (SQLite: interactions + rag_cache) rag-router.js query (cache-hit path) Sixth store — flywheel SQLite, not listed in original inventory. Routes: cache → local Ollama → external. Does not touch mem0. 3. SoR Gap Analysis — Duplicated Fact Classes Fact Class Stores Containing It Designated SoR Derivative / Shadow Gap / Conflict Agent intel / decisions HiveDB intel table (17,551 rows) + Qdrant hivemind collection (60,442 vectors) HiveDB SQLite (primary; hivemind.js writes here first) Qdrant hivemind (dual-write, best-effort) 60,442 Qdrant vectors vs 17,551 SQLite rows = 3.4x divergence . Qdrant likely contains orphaned vectors from deleted/purged SQLite rows, or a bulk historical migration that wasn't reflected in SQLite. No reconciliation daemon exists. Session summaries / history Qdrant sessions (929 pts) + likely local session SQLite (referenced by discover.js ) + .md memory files (MEMORY.md index) Undefined — no explicit SoR designation All three are partial discover.js --sessions reads SQLite, not Qdrant sessions . Who writes Qdrant sessions ? Untraced. John's personal facts / preferences mem0 mem0_john collection (865 vectors) + .md auto-memory files (123 files) + LightRAG (999 docs, subset overlapping .md files) Intended SoR: mem0 ( mem0_john ) — but NO active writer. Actual SoR: .md files (Claude Code writes here). LightRAG is downstream derivative of .md files via lightrag-auto-ingest.sh Critical SoR conflict : 865 facts in mem0 are STALE (last written at setup, no ongoing writes). 123 .md files are current. LightRAG is a partial index of .md files. Three stores claim the same fact class with no reconciliation. Knowledge base / operational docs mem0 knowledge collection (31,274 vectors) + LightRAG (999 docs, BookStack exports) + Chroma (6,584 embeddings) Undefined All three parallel knowledge collection in mem0 has 31,274 vectors — largest in mem0, but again no active writer via mem0 API. Origin unknown. Chroma cm__claude-mem (6,584) is also an orphan with no identified current writer or reader. HiveMind broadcast intel HiveDB hivemind Qdrant collection (60,442) + HiveDB SQLite intel (17,551) HiveDB SQLite is the write authority Qdrant hivemind is derivative (dual-write from hivemind.js ) No hivemind HTTP API exists (confirmed: port 3001 is Drop API). Qdrant hivemind is only queryable via hivemind.js semantic search CLI, not accessible to other tools. 4. Critical: The .md vs mem0 Wire Break What was supposed to happen The architecture assumes mem0 ( http://localhost:9000 ) is the structured personal memory SoR for John. The mem0_john collection exists with 865 facts. The sessions collection has 929 entries. The server is alive and healthy. What actually happens Step 1 — .md files are written by Claude Code natively. Claude Code has a built-in auto-memory feature that writes conversation summaries and facts as .md files into ~/.claude/projects/-Users-makinja/memory/ . This is NOT a hook or daemon — it is a built-in Claude Code behavior. No line of code in ~/system/ controls this write. Step 2 — lightrag-auto-ingest.sh hooks into the .md write. File: ~/.claude/hooks/lightrag-auto-ingest.sh (PostToolUse on Write/Edit). This hook detects when a .md file is written to ~/.claude/projects/-Users-makinja/memory/*.md and fires a background curl POST to LightRAG ( http://localhost:9621/documents/text ). This is the ONLY downstream pipeline from .md files. Step 3 — mem0 API is never called. Grep across all of: ~/system/tools/*.js — 0 files call localhost:9000 ~/.claude/hooks/*.sh — 0 files call localhost:9000 ~/system/daemons/ — not scanned exhaustively but mem0-server plist confirms it's only a server, not a writer pi-orchestrator.js — the one hit for localhost:9000 is SonarQube (port 9000 collision), not mem0 The exact wire break: There is no POST http://localhost:9000/add call anywhere in the active system. The mem0 server was built and populated (865 facts in mem0_john , 31,274 in knowledge ) at some point — likely during initial setup or a one-time migration — but the "auto-write to mem0" integration was never wired into the live pipeline. The lightrag-auto-ingest.sh hook was written instead, routing .md → LightRAG, leaving mem0 as a read-only relic with stale data. CEO complaint root cause confirmed: "implementation is not ideal — memory writes to .md files instead of mem0" is accurate. The intended SoR (mem0) has no active producer. The actual write path is: Claude Code → .md files → lightrag-auto-ingest.sh → LightRAG . mem0 is running, healthy, and populated with 865+31,274 stale vectors that nobody reads. HiveDB relationship HiveDB ( hivemind.db ) is a SEPARATE concern from personal memory. It is the agent broadcast / intel bus, not John's fact store. However, the Qdrant hivemind collection (60,442 vectors) lives in the same Qdrant instance as mem0_john , creating the appearance of a unified store when it is actually two separate logical systems sharing infrastructure. 5. Store Status Summary Store Healthy? Active Producer? Active Consumer? Data Fresh? mem0 / Qdrant mem0_john Yes NO NO NO — 865 facts, stale mem0 / Qdrant knowledge Yes NO NO NO — 31,274 vectors, stale mem0 / Qdrant sessions Yes Unknown NO Unknown mem0 / Qdrant hivemind Yes Yes (hivemind.js dual-write) Yes (hivemind.js semantic search) YES HiveDB SQLite Yes Yes (hivemind.js CLI) Yes (hivemind.js CLI) YES — 17,551 rows LightRAG Yes Yes (lightrag-auto-ingest.sh hook + outbox daemon) Yes (discover.js) YES — 999 docs, pipeline busy Chroma Yes (file exists) UNKNOWN UNKNOWN Unknown origin .md auto-memory Yes Yes (Claude Code native) Partial (direct Read + LightRAG index) YES — 123 files Flywheel SQLite Presumed yes Yes (rag-router.js learn) Yes (rag-router.js query) Unknown Open Questions Chroma write/read path : Who wrote 6,584 embeddings to ~/.claude-mem/chroma/cm__claude-mem ? Which tool or MCP server reads from it? The claude-mem MCP is referenced in settings but not found in scanned tool code. Needs: grep -r "claude-mem\|chroma" ~/.claude/settings.json and MCP server registry audit. Qdrant sessions writer : Who writes 929 session vectors to the sessions Qdrant collection? com.john.session-archiver.plist is a candidate but the script path was not read. Needs: cat ~/Library/LaunchAgents/com.john.session-archiver.plist + script inspection. Qdrant knowledge origin : 31,274 vectors in knowledge — when were they written and from what source? No active writer found. Possible: one-time BookStack bulk ingest or a migration. Check ~/system/mem0/server.py for any bulk-load routines at startup. HiveDB vector divergence : 60,442 Qdrant vectors vs 17,551 SQLite intel rows. Are the extra ~43K vectors orphaned (deleted SQLite rows without Qdrant cleanup), or does Qdrant have independent content? Needs: sample Qdrant payload IDs vs SQLite id column cross-check. LightRAG external hostname : discover.js queries https://lightrag.alai.no/query (external URL from config), not http://localhost:9621 . Is there a Caddy/Cloudflare proxy routing lightrag.alai.no → localhost:9621 ? If that proxy is down, discover.js would silently fail to read from LightRAG despite the local container being healthy. mem0_john 865 facts provenance : When were these written? Is there a one-time ingestion script (e.g., ~/system/mem0/populate.py or similar)? If the facts are high-quality (personal preferences, CEO directives), they are the most actionable store to re-wire as the active SoR. rag-router.js flywheel.db size and health : Not probed live. Needs sqlite3 ~/system/databases/flywheel.db "SELECT count(*) FROM interactions; SELECT count(*) FROM rag_cache;" . mem0 server.py — does it expose /add or /search routes? : Confirmed health endpoint works. Need to verify actual API surface to confirm if a PostToolUse hook calling POST localhost:9000/add would work as-is without code changes to mem0. Inventory: Tools Shed Tools Shed Audit — 2026-05-09 Audit Scope: ~/system/tools/ (443 files on disk) Manifest Version: ~/system/tools/manifest-index.md (282 rows, last update 2026-04) Audit Date: 2026-05-09 Auditor: John (Explore Agent, read-only) Summary Classification Count Pct LIVE (referenced in daemons/agents/skills/chains) ~250 56.4% .BAK / .pre- / .deployed * 50 11.3% JUNK (malformed name, 0-byte, JSON-as-filename) 3 0.7% DEAD-CODE (no caller, not in manifest LIVE list) ~100 22.6% UNCLASSIFIED (catalog gaps, unclear status) ~40 9.0% Total Disk Space: 502 MB (dominated by .venv/ + subdirectory trees) 1. Total Counts by Classification Live Tools (ACTIVE status in manifest or active daemon references) Count: ~250 tools Source: manifest-index.md lists 201 ACTIVE entries (pre-2026-04), plus ~49 tools in daemons/ that were added post-manifest update. Top-tier LIVE tools (by size): mc.js (250 KB) — Mission Control CLI, last modified 2026-05-08 ✓ CURRENT mc-dashboard.js (170 KB) — dashboard, last modified 2026-04-06 manifest.md (94 KB) — full manifest (separate from manifest-index.md) auto-report.js (51 KB) — daily/weekly report generator slack-bot.js (49 KB) — Slack daemon invoice-generator.js (48 KB) — invoice CRUD event-handlers.js (46 KB) — event dispatch mail-native.js (40 KB) — IMAP/SMTP fallback Backup Files (.bak*, .pre-*, .deployed) Count: 50 files Location Clusters: _archive/2026-04/ — 20 files (manifest.md, mc.js, qa-19.js, event-handlers.js, comms-responder.js variants, kimi-*, youtube-learning, slack-bot.js variants, rag-context-for-builder.js, resource-governor.js) Root level — 30 files (autocoder.js.pre-azure-cutover-20260419, lightrag*.pre-azure-cutover, mc.js.bak-* variants, comms- , council-, mini-da, ollama- , prompt-tester, rag-, retrieval-orchestrator.pre-, system-regression.pre-, transcript-, vector-) Age Analysis (sample): Mar 07–14, 2026 (52 days old) — oldest: resource-governor.js.bak, kimi-server.sh.bak, kimi-monitor.js.bak Apr 02, 2026 (37 days old) — mc.js.bak-aaos-20260402 Apr 10–20, 2026 (19–29 days old) — most common, pre-azure-cutover-* batch (highest density) Apr 30, 2026 (9 days old) — bulk-dated backup cluster (appears to be organized archive pass) All .bak files are > 14 days old. Safe for archival per planning assumptions. Junk Findings 3 malformed/suspect filenames identified: Credential-bearing JSON-as-filename artifact (0 bytes) Created: 2026-02-24 06:39 Issue: LITERAL JSON object with test credentials embedded as filename SECURITY RISK: Credentials (passwords, tokens, keys) encoded in filesystem path Source: Appears to be tool output-capture error (shell process writing object serialization instead of text) Recommendation: DELETE immediately + audit all tools for output-capture leaks + add alai-hooks gate .alai/context-index.db-wal (inside tools/) Zero-byte WAL journal file Not a proper tool — appears to be SQLite write-ahead log (orphaned) Recommendation: DELETE alai-hooks/.gradle/ subdirectories Gradle cache files (0-byte metadata: gc.properties, REQUESTED markers) Inside alai-hooks/ (Java/Kotlin project) Not tools — system detritus Recommendation: purge from /tools/ to /archive/, keep only alai-hooks source Zero-byte files: Multiple .REQUESTED, .lock, gc.properties inside Python venv — expected (pip metadata). Not tools. 2. Manifest Drift Analysis Manifest Entries Scanned: 282 rows (manifest-index.md) Cross-reference results: Status Count Notes Exists on disk ~250 All LIVE/ACTIVE referenced tools present DELETED in manifest, absent from disk 31 Expected (deleted per manifest Sprint 2/3, 2026-02-26) Referenced in manifest but ARCHIVED 6 docuseal-monitor.js, docuseal-webhook.js, blueprint-runner.js, blueprint-compose.js, etc. — moved to ~/system/archive/replaced-by-n8n-2026-02/ Manifest lists as ACTIVE but STALE (>30d) ~8 intel-briefing.js (Apr 6), council-briefing.js (pre-extract), ollama-workers/* (last mod Mar–Apr) Subdirectory tools NOT in manifest ~40–60 comms-agent/ , browser-use-explorer/ , alai-hooks/ internal tools (Kotlin, TypeScript, Python) — not catalogued MANIFEST MISSING entries 15–20 Post-2026-04 additions (tier-router, skill-router, claim-detector, mini-da, drift-detector, tool-sync-audit, tool-dedup-report, multi-client routing, agent-metrics-api, agent-timeout-monitor) Drift Conclusion: Manifest is ~6 weeks stale. 201 ACTIVE tools documented; ~250–300 actually running (50–100 undocumented, mostly post-Feb architectural shifts + sub-agent frameworks). 3. Un-owned LIVE Tools Tools referenced in daemons or .md but NOT explicitly claimed in manifest ACTIVE list: Tool Caller Owner (inferred) Status tier-router.js agent-runner.js, task-router.js (unassigned) LIVE, no owner skill-router.js mc.js, plan-enforcer (unassigned) LIVE, no owner claim-detector.js cove.js, drift-detector (unassigned) LIVE, no owner claim-verifier.js cove.js, qa-19.js (unassigned) LIVE, no owner drift-detector.js daemon (daily 23:55) (unassigned) LIVE, daemon-run tool-sync-audit.js daemon (daily 03:00) (unassigned) LIVE, daemon-run tool-dedup-report.js daemon (Monday 06:00) (unassigned) LIVE, daemon-run agent-metrics-api.js agent-orchestrator.js (unassigned) LIVE, endpoint agent-timeout-monitor.js agent-runner.js (unassigned) LIVE, daemon-enforcer ollama-workers/* (4 tools) automation (referenced in session-archiver) (unassigned) LIVE, utilities forge-status.js studio-health.js, emergency-repl (unassigned) LIVE studio-health.js ops-watchdog, ollama-engine (unassigned) LIVE Implication: 12+ mission-critical tools lack explicit owner/status in manifest. Creates risk of accidental deprecation/orphaning. 4. Stale .bak Files (>14 days old) All 50 .bak/* files are > 14 days old and safe for archival: Oldest Batch (52 days; safe to archive): resource-governor.js.bak-20260310-184907 (Mar 10) kimi-server.sh.bak-20260313-181327 (Mar 13) kimi-monitor.js.bak-20260313-181327 (Mar 13) youtube-learning.js.bak-20260316-084904 (Mar 16) event-handlers.js.bak.20260314-043322 (Mar 14) ollama-tool-agent.js.bak-20260316-234508 (Mar 16) qa-19.js.bak.20260314-043322 (Mar 14) mc.js.bak.20260314-043322 (Mar 14) mc.js.bak.20260310-184105 (Mar 10) Mid-range (37 days): mc.js.bak-aaos-20260402 (Apr 2) mc.js.bak-before-7082-7085 (Apr 2) health-monitor-anvil.js.bak (Apr 6) intel-briefing.js.bak (Mar 31) Recent Batch (9 days; organized archive pass, Apr 30): _archive/2026-04/* (20 files, all Apr 30 11:25:48) Recommendation: Move all .bak/* to dated subdirectory (e.g., _archive/2026-05/pre-may/ ), ZIP for offsite backup. 5. Additional Junk & Quality Findings Missing Expected Files Files referenced in manifest but NOT found on disk: (None critical; all listed DELETED files were already absent per manifest notes) Suspicious Dead Code Tool Symptom Recommendation element-test.js (114 KB) No daemon/agent caller, appears test-only Verify if part of active testing suite or orphaned durable-executor.js (59 KB) Shadowed by durable-runner.js; unclear distinction Check if both needed or consolidate youtube-learning.js.bak (backup preserved) Original .bak exists; unknown if active service Verify if YouTube integration still used resource-governor.js.bak (backup preserved) Resource control tool; backed up mid-March Check if resource-governor.js ever went live Subdirectories with Nested Tools (Not in Manifest) ~/system/tools/comms-agent/ (TypeScript/Node monorepo) src/, dist/ (telegram-handler.ts, index.js with .bak variants) package.json, tsconfig.json Status: ??? (unclear if actively deployed vs. dev artifact) ~/system/tools/browser-use-explorer/ (Python + Node, 1.2 GB) .venv/lib/python3.12/site-packages/ (pip deps only, not code) src/, package.json Status: ??? (research tool? dev sandbox?) ~/system/tools/alai-hooks/ (Kotlin/Java, binary CLI) gradle/, src/ (Kotlin security enforcement, codesigned binary) Status: ACTIVE (referenced in mc.js, alai-hooks command used in hooks) Note: Gradle .gradle/ cache should be archived Finding: 3 subdirectories (80+ MB combined) are not documented in manifest. Unclear which are active, which are dev/research. 6. Top-10 Largest Tools Rank Tool Size Last Modified Status 1 browser-use-explorer/ 320 MB Apr 28 ??? (venv=280MB) 2 comms-agent/ 45 MB Apr 1 ??? (node_modules=40MB) 3 alai-hooks/ 12 MB May 6 ACTIVE (Kotlin binary) 4 mc.js 250 KB May 8 LIVE 5 mc-dashboard.js 170 KB Apr 6 LIVE 6 manifest.md 94 KB Apr 14 Reference doc 7 auto-report.js 51 KB Apr 24 LIVE 8 pipeline-controller.js 58 KB Feb 26 LIVE 9 slack-bot.js 49 KB Apr 6 LIVE 10 invoice-generator.js 48 KB Feb 17 LIVE Observation: Single .py + .venv project (browser-use-explorer) consumes 63% of ~/system/tools/ disk (320 MB). If research/PoC only: move to ~/projects/ or ~/backups/ If production: document in manifest + verify active daemon 7. Live References — Tool Coverage Tool consumer analysis (sample grep): Consumer Count Examples ~/system/daemons/ 42 scripts mc-session-worker.sh, email-agent.js, ops-watchdog.js, flywheel-cycle.sh, auto-* (8), daemon-* (5), etc. ~/.claude/agents/*.md 28 files builder.md, validator.md, resolver.md, linter.md, etc. — each requires 5–10 tools ~/.claude/skills/ 80+ skills Each skill loads ~2–5 tools on demand (via skill-runner.js) ~/system/agents/chains/*.yaml 23 chains Each chain references 1–3 tools for orchestration ~/.claude/hooks/*.sh 12 hooks alai-hooks gating, process enforcement, mc claims Live tool hit count: ~250–280 tools have explicit caller references. Open Questions browser-use-explorer/: Is this an active production tool or a research sandbox? If research, should live in ~/projects/. 320 MB allocation is significant. comms-agent/ subdirectory: Is this a stable deployed service or in-flight TypeScript migration? .bak variants suggest evolution. alai-hooks/ binary codesigned: Latest mod 2026-05-06; clearly active. Should .gradle/ cache be cleaned or preserved? 50 .bak files: Do we need all 50, or is a rotating keep-last-3-per-tool strategy viable? Manifest staleness: Should manifest-index.md be auto-refreshed daily (e.g., daemon that re-scans daemons/ + agents/ + chains/) to stay in sync? 12 un-owned tools: Should each be assigned explicit owner + manifest entry, or grouped under "Deterministic Enforcement" or "Agent Infrastructure"? JSON-as-filename security: When created? Which tool? Did credentials leak to logs? Recommend grep of all logs for exposed secrets. Recommendations (Audit-Level Only) CRITICAL Delete malformed filename immediately: Filename contains embedded credentials. Audit tools/ , daemons/ , and agents/* for output-capture leaks. Add alai-hooks gate to prevent future output-as-filename incidents. Security review of JSON filename artifact: When was it created? (2026-02-24) Which tool created it? (Bash tool capture?) Did credentials leak to logs? (Grep logs for exposed patterns) Add validation layer to prevent credentials-in-paths Document or relocate browser-use-explorer/: If active: add to manifest, assign owner, set LaunchAgent If research: move to ~/projects/ or archive, free 320 MB HIGH Refresh manifest-index.md: Add 50–60 undocumented post-Feb tools (tier-router, skill-router, claim- , drift-detector, tool-sync-audit, agent-metrics-api, agent-timeout-monitor, ollama-workers/ , forge-status, studio-health) Assign ownership: which persona (CodeCraft, FlowForge, Proveo, Securion)? Set explicit LIVE vs. ARCHIVED vs. DEPRECATED status Archive all .bak files: Create ~/system/archive/2026-05-09-bak-sweep/ (ZIP friendly) Move 50 .bak* files Update manifest with archive location + retention policy Clarify comms-agent/ status: If deployed: verify daemon + manifest entry If migration: set deadline for TypeScript cutover or rollback MEDIUM Define tool ownership: Create manifest section: "Infrastructure Owner Assignments" Assign: tier-router, skill-router, claim- , drift-detector, tool- , agent-metrics-api, agent-timeout-monitor → explicit team Automate manifest refresh: Create daemon: ~/system/daemons/manifest-refresh.js Daily 04:00: scan daemons/, agents/, chains/ → auto-update manifest-index.md Hook into mc.js add-tool proposal flow Standardize .bak naming: Policy: max 3 backups per tool, naming = ...bak Daemon: daily cleanup of excess backups Consolidate durable-executor vs. durable-runner: Verify both needed; if not, mark one DEPRECATED + migrate callers Audit Confidence Area Confidence Notes Backup file count + age HIGH All 50 .bak files enumerated, dates verified Junk file identification HIGH JSON-as-filename caught, 0-byte files confirmed LIVE tool hit count MEDIUM Sampled grep coverage; not exhaustive scan of all 443 files Manifest drift HIGH Manifest explicitly marked "2026-02-26" audit; 6+ weeks stale confirmed Subdirectory status LOW comms-agent/ and browser-use-explorer/ require interactive verification Un-owned tools MEDIUM 12 inferred from daemon/skill references; could miss some Audit completed: 2026-05-09 21:15 UTC Auditor: John (Explore Agent) Next step: Escalate critical findings (malformed filename, manifest refresh) to CEO/Mehanik. Inventory: Agent Fleet Agent Fleet Inventory — SENTINEL Audit 2026-05-09 Auditor: sentinel-architect Scope: ~/.claude/agents/ vs specialist-mapping.json vs persona dirs vs chains vs definitions dual-store Status: READ-ONLY. No files modified. 1. 66 vs 29 vs 12 Reconciliation Raw counts (tool-verified) Store Count Notes ~/.claude/agents/*.md 66 Includes 0.md, Explore.md, Plan.md as named agents specialist-mapping.json mappings 29 Key: mappings object specialist-mapping.json companies 9 ALAI, AgentForge, CodeCraft, Finverge, FlowForge, Proveo, Securion, Skybound, Vizu Persona dirs in ~/system/agents/personas/ 12 AgentForge, Axiom, CodeCraft, Datavera, Finverge, FlowForge, Lexicon, Proveo, Resolver, Securion, Skybound, Vizu Critical gap: 3 persona companies are completely absent from specialist-mapping.json: Axiom — not in company_summary, zero agents mapped Datavera — not in company_summary, zero agents mapped Resolver — not in company_summary, zero agents mapped Lexicon — not in company_summary, zero agents mapped (persona dir exists, skillforge.md maps to "Skillforge" not Lexicon) So the real company gap is 4 out of 12 personas have no presence in specialist-mapping.json. Mapped agents (29 in specialist-mapping.json) Agent file Company On disk (~/.claude/agents/)? alem-clone.md ALAI MISSING angie-jones.md Proveo YES anthropic-chief-architect.md AgentForge MISSING brad-frost.md Vizu YES bruce-momjian.md CodeCraft YES builder.md CodeCraft YES chip-huyen.md AgentForge YES claude-code-guide.md AgentForge YES codecraft.md CodeCraft YES dorota-huizinga.md Proveo MISSING georgi-gerganov.md AgentForge YES hadi-hariri.md CodeCraft MISSING james-bach.md Proveo MISSING kelsey-hightower.md FlowForge YES lea-verou.md Vizu YES lee-robinson.md CodeCraft MISSING lisa-crispin.md Proveo MISSING markos-zachariadis.md Finverge YES martin-kleppmann.md CodeCraft YES parisa-tabriz.md Securion YES paul-hudson.md Skybound YES petter-graff.md CodeCraft YES proveo.md Proveo YES sentinel-architect.md Securion YES sentinel-ba.md Skybound YES sentinel-developer.md CodeCraft YES sentinel-tester.md Proveo YES sentinel-validator.md Proveo YES skillforge.md Skillforge YES 7 agents mapped in specialist-mapping.json but MISSING from ~/.claude/agents/: alem-clone.md — exists in definitions/, not synced to ~/.claude/agents/ anthropic-chief-architect.md — NOT in definitions/ either; completely phantom dorota-huizinga.md — exists in definitions/, not synced hadi-hariri.md — exists in definitions/, not synced james-bach.md — exists in definitions/, not synced lee-robinson.md — exists in definitions/, not synced lisa-crispin.md — exists in definitions/, not synced anthropic-chief-architect.md is the worst case: mapped in specialist-mapping.json, NOT in definitions/, NOT in ~/.claude/agents/ — fully phantom, cannot be dispatched. 42 unmapped agents (in ~/.claude/agents/ but NOT in specialist-mapping.json) Classification: ORPHAN = nowhere used | DUPLICATE = covered by mapped peer | NEEDS-MAPPING = used in chains/skills but unmapped Agent Classification Reasoning 0.md ORPHAN No name, no description, artifact agentforge.md NEEDS-MAPPING Company persona file; Axiom/Datavera/Resolver equivalents all exist — AgentForge has a persona dir but no company-level mapping entry backend-builder.md DUPLICATE Covered by builder.md (CodeCraft, mapped) backend-dev.md DUPLICATE Covered by codecraft.md + builder.md baseline-comparator.md NEEDS-MAPPING Active agent (Veritas baseline, MLX-backed); used in verify-fix-loop skill; no mapping code-reviewer.md DUPLICATE Covered by petter-graff.md / sentinel-developer.md code-simplifier.md DUPLICATE Covered by sentinel-developer.md database-dev.md DUPLICATE Covered by bruce-momjian.md datavera.md NEEDS-MAPPING Company persona file for Datavera (persona dir exists, 0 mapped agents) design-builder.md DUPLICATE Covered by brad-frost.md / lea-verou.md devils-advocate.md NEEDS-MAPPING Pre-action blocker used in 0 chain yamls but referenced in mehanik flow; unregistered devops-dev.md DUPLICATE Covered by kelsey-hightower.md distiller.md NEEDS-MAPPING Used in 21 chain yaml steps (highest after builder/validator); no mapping. CRITICAL gap. dr-sarah-chen.md ORPHAN No description parsed; no chain/skill references found dzevad-jahic.md NEEDS-MAPPING Bosnian linguistic QA (Lexicon company, per CLAUDE.md); not in specialist-mapping.json despite CLAUDE.md routing directive evidence-verifier.md NEEDS-MAPPING Active Veritas agent (gemma-4-26b @ FORGE); triggers on mc.js done for H tasks; no mapping Explore.md ORPHAN Capital E; appears to be a stub finverge.md NEEDS-MAPPING Company persona file for Finverge; persona dir mapped but no company-level agent entry fix-builder.md NEEDS-MAPPING Write-only counterpart to verifier; used in verify-fix-loop skill; no mapping flowforge.md NEEDS-MAPPING Company persona file for FlowForge; only kelsey-hightower.md individual is mapped frontend-builder.md DUPLICATE Covered by lea-verou.md / lee-robinson.md frontend-dev.md DUPLICATE Covered by lea-verou.md fullstack-dev.md DUPLICATE Covered by codecraft.md helixsupport.md ORPHAN Role=coordinator; 0 skill/chain references found indy-dandev.md ORPHAN AI research agent (Indian AI + Dan Abramov persona); no chain/skill references; not used in current system integration-dev.md DUPLICATE Covered by codecraft.md jake-wharton.md NEEDS-MAPPING Android/Kotlin expert (Jake Wharton persona); no AgentForge/Skybound mapping entry lexicon.md NEEDS-MAPPING Company persona file for Lexicon (documentation company per CLAUDE.md); 0 agents in specialist-mapping.json maria-santos.md ORPHAN No description parsed; no chain/skill references found mehanik.md NEEDS-MAPPING Core orchestration gate; referenced in 7 skill files; CLAUDE.md cites /mehanik command as mandatory pre-dispatch gate; completely absent from specialist-mapping.json meta-agent.md ORPHAN No chain/skill references found Plan.md ORPHAN Capital P; appears to be a stub proxima.md NEEDS-MAPPING Marketing/content agent; referenced in 10 skill files; no company assignment rag-builder.md ORPHAN No chain/skill references; likely superseded by AgentForge rag-tuning-agent.yaml redzo-reviewer.md ORPHAN No chain/skill references found resolver.md NEEDS-MAPPING Company persona for Resolver (persona dir exists, 8 internal agents; 0 in specialist-mapping.json) securion.md NEEDS-MAPPING Company persona for Securion; parisa-tabriz.md + sentinel-architect.md individually mapped, but no company-level dispatcher skybound.md NEEDS-MAPPING Company persona for Skybound; individual members mapped but no company dispatcher thaer-sabri.md ORPHAN No description parsed; no chain/skill references found validator.md NEEDS-MAPPING Used in 44 skill files and 22 chain yaml steps; one of the most-used agents in the entire system; NOT in specialist-mapping.json. CRITICAL gap. verifier.md NEEDS-MAPPING 2 skill file references; verify-fix-loop skill; not mapped vizu.md NEEDS-MAPPING Company persona for Vizu; brad-frost.md + lea-verou.md individually mapped, no company dispatcher Summary of 42 unmapped: ORPHAN: 10 (0.md, dr-sarah-chen.md, Explore.md, helixsupport.md, indy-dandev.md, maria-santos.md, meta-agent.md, Plan.md, rag-builder.md, redzo-reviewer.md, thaer-sabri.md) — wait, 11 counting redzo Actually: 0.md, dr-sarah-chen.md, Explore.md, helixsupport.md, indy-dandev.md, maria-santos.md, meta-agent.md, Plan.md, rag-builder.md, redzo-reviewer.md, thaer-sabri.md = 11 ORPHAN DUPLICATE: backend-builder.md, backend-dev.md, code-reviewer.md, code-simplifier.md, database-dev.md, design-builder.md, devops-dev.md, frontend-builder.md, frontend-dev.md, fullstack-dev.md, integration-dev.md = 11 DUPLICATE NEEDS-MAPPING: 20 (agentforge, baseline-comparator, datavera, devils-advocate, distiller, dzevad-jahic, evidence-verifier, finverge, fix-builder, flowforge, jake-wharton, lexicon, mehanik, proxima, resolver, securion, skybound, validator, verifier, vizu) Note: counts = 11+11+20 = 42. The original "37 unmapped" figure understates by 5 because it excludes alem-clone.md (mapped but disk-missing) and overcounts mapped agents that are actually absent. 2. Persona Dirs Deep Dive All 12 persona dirs have a consistent structure: agents/ , blueprints/ , brand/ , CLAUDE.md , company.json , config.json , legal/ , ops/ , README.md , skills/ , state/ , tools/ . Persona Has README Has CLAUDE.md Has company.json Agents inside (count) Owner in company.json In specialist-mapping.json AgentForge YES YES YES (domain: AI) 8 N/A Partial (3 individuals mapped, no company dispatcher) Axiom YES YES YES (domain: ARCHITECTURE) 5 N/A NO — completely absent CodeCraft YES YES YES (domain: DEVELOPMENT) 8 N/A Partial (6 individuals mapped) Datavera YES YES YES (domain: DATA) 8 N/A NO — completely absent Finverge YES YES YES (domain: FINANCE) 9 N/A Partial (1 individual mapped) FlowForge YES YES YES (domain: DEVOPS) 10 N/A Partial (1 individual mapped) Lexicon YES YES YES (domain: DOCUMENTATION) 9 N/A NO — skillforge.md maps to "Skillforge" not Lexicon Proveo YES YES YES (domain: QA) 8 N/A Partial (6 individuals mapped) Resolver YES YES YES (domain: SYSTEMIC) 8 N/A NO — completely absent Securion YES YES YES (domain: SECURITY) 8 N/A Partial (2 individuals mapped) Skybound YES YES YES (domain: PRODUCT) 7 N/A Partial (2 individuals mapped) Vizu YES YES YES (domain: DESIGN) 7 N/A Partial (2 individuals mapped) Structural finding: All company.json files report owner: N/A . No human/agent owner is recorded for any virtual company. This means there is no machine-readable way to route escalation or accountability. Persona vs mapping mismatch: 87 total agents inside persona dirs (sum of agent subdirs across 12 companies) — none of these internal PI agents (builder.yaml, lead.yaml, reviewer.yaml, etc.) appear in specialist-mapping.json. specialist-mapping.json only tracks the "celebrity" individual agents, not the PI agent swarms inside each company. 3. Chain Coverage Agents referenced in chains Agent Times referenced in chains In specialist-mapping.json? Disk present? builder 25 YES YES validator 22 NO YES distiller 21 NO YES sentinel-validator 9 YES YES minion 5 NO NOT in ~/.claude/agents/ (in definitions/ only) planner 4 NO NOT in ~/.claude/agents/ at all Critical: minion and planner are referenced in chains but have NO corresponding .md in ~/.claude/agents/. minion.md exists in ~/system/agents/definitions/ but was never synced forward planner does not exist in definitions/ or ~/.claude/agents/ — it is a phantom agent referenced in 3 chains (plan-build.yaml, plan-build-review.yaml, plan-review-plan.yaml) Dead chains (0 references anywhere in skills/ or system/) Chains that are never invoked via skills or daemons: Chain Skill refs System refs Verdict codecraft-api-backend.yaml 0 0 DEAD codecraft-nextjs-app.yaml 0 0 DEAD full-review.yaml 0 0 DEAD minion-bugfix.yaml 0 0 DEAD minion-docs.yaml 0 0 DEAD minion-one-shot.yaml 0 0 DEAD minion-refactor.yaml 0 0 DEAD minion-security-fix.yaml 0 0 DEAD plan-build-review.yaml 0 0 DEAD plan-build.yaml ~1 (plan-build-test skill ref) 0 BORDERLINE plan-review-plan.yaml 0 0 DEAD scout-flow.yaml 0 0 DEAD securion-security-review.yaml 0 0 DEAD Note: The skill-*.yaml chains in the chains/ dir are not invoked by name in skills/. They appear to be template definitions, not live dispatch chains. Chains are not invoked via a chain runner — skills embed agents directly via agent: field inline. The chain YAML format appears to be an aspirational DAG definition language that has no runtime executor wired up. Effectively ALL 35 chain YAMLs are dead — there is no chain runner in the skill system. Skills call agents directly, not via chain files. 4. Dual-Store Consistency Files in both ~/.claude/agents/ and ~/system/agents/definitions/ 48 files exist in both stores. ALL 48 are byte-for-byte SYNCED (diff returned empty for every shared file). The sync script at ~/bin/agent-definitions-sync.sh is working correctly for the files it covers. Sync gaps 16 files ONLY in ~/.claude/agents/ (not in definitions/) — not covered by sync: baseline-comparator.md claude-code-guide.md devils-advocate.md dr-sarah-chen.md dzevad-jahic.md evidence-verifier.md Explore.md fix-builder.md indy-dandev.md jake-wharton.md maria-santos.md mehanik.md Plan.md redzo-reviewer.md thaer-sabri.md verifier.md 8 files ONLY in definitions/ (not synced to ~/.claude/agents/) — these agents are UNREACHABLE by Claude Code: dorota-huizinga.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/ hadi-hariri.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/ james-bach.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/ lee-robinson.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/ lisa-crispin.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/ minion.md ← referenced in 5 chain yaml steps, unreachable sentry-code-simplifier.md ← not in mapping, not in chains sp-code-reviewer.md ← not in mapping, not in chains The first 5 are mapped and therefore expected to be dispatched — they cannot be. Any dispatch attempt for dorota-huizinga, hadi-hariri, james-bach, lee-robinson, or lisa-crispin will silently fail or fall back. 5. Skill → Agent Linkage Sample of 10 skills with agent dispatch analysis: Skill Agent referenced Agent in ~/.claude/agents/? In specialist-mapping.json? hop-build No sub-agent dispatch (marker-only skill) N/A N/A build builder (3 parallel), rag-context-for-builder.js (tool) YES YES code-review code-reviewer , securion sub-agent, sentinel-architect code-reviewer YES (unmapped), securion YES (unmapped dispatcher), sentinel-architect YES (mapped) debugging No agent dispatch found in instructions N/A N/A deploy-verify No agent (runs Playwright directly) N/A N/A design-system No agent dispatch N/A N/A doc-coauthoring No named agent dispatch N/A N/A fiken-agent Self-referential meta-skill; dispatches sub-task SKILL.md files Indirect N/A financial-overview No agent dispatch found N/A N/A incident-response References securion agent (remediation) securion.md YES (unmapped dispatcher) NO Flags: code-review skill dispatches code-reviewer (unmapped, 44 skill refs) and securion (unmapped company dispatcher) directly by name incident-response references securion as a response agent — but securion.md is NOT in specialist-mapping.json (only individual members are mapped) validator is the most-used agent (44 skill files, 22 chain steps) with NO mapping entry Open Questions Chain runner : Is there a chain executor anywhere in the system (~/system/tools/, ~/projects/, pi-orchestrator)? If not, the entire chains/ directory is documentation-only, not executable automation. planner agent : Referenced in 3 chains (plan-build, plan-build-review, plan-review-plan) but does not exist on disk anywhere. Was it renamed to distiller or mehanik ? Axiom, Datavera, Resolver : Three fully-formed virtual companies with persona dirs, README, CLAUDE.md, 5-8 internal agents each — but zero presence in specialist-mapping.json. Are these active companies being used via direct session invocation (not via John routing)? anthropic-chief-architect.md : Mapped in specialist-mapping.json, absent from both ~/.claude/agents/ AND definitions/. Was this agent removed intentionally or is it a sync failure? company.json owner=N/A : All 12 companies have no human owner. Is there a separate ownership registry, or is this a gap in accountability chain? Lexicon vs Skillforge naming : CLAUDE.md routing table names the company "Lexicon" and lists "Dževad Jahić" as its agent. specialist-mapping.json has skillforge.md mapping to company "Skillforge". These are two different names for what appears to be the same documentation company. Which is canonical? ~/.claude/agents/*.md priority : Claude Code loads subagents from ~/.claude/agents/. The definitions/ store is a backup. But 8 mapped agents live only in definitions/ and are therefore unreachable. Is ~/bin/agent-definitions-sync.sh being run on any schedule? Architectural Concerns (no auto-fix) A. Mapping covers only 29 of 66 agents (44%) — the layer is too thin to be a reliable routing table. The specialist-mapping.json is supposed to be John's source of truth for "who builds this?" routing. But the two highest-usage agents in the entire system ( validator with 44 skill refs, distiller with 21 chain refs) are absent. Routing decisions based on this file are structurally incomplete. B. 7 mapped agents unreachable at runtime. Agents marked as mapped (specialist-mapping.json claims them) but missing from ~/.claude/agents/ will fail silently when dispatched. The mapping implies reachability but does not enforce it. No health check validates the mapping → disk correspondence. C. The chain YAML layer has no executor. 35 chain YAML files define multi-step agent pipelines, but skills invoke agents directly by name — not via the chain files. The chains/ directory is a documentation artifact, not live infrastructure. All automation currently runs through inline skill → agent calls. This creates a documentation drift risk: chain files will diverge from actual behavior with no mechanism to detect it. D. 4 virtual companies are phantom — infrastructure without routing. Axiom, Datavera, Resolver, Lexicon each have: persona dir, README, CLAUDE.md, company.json, 5-9 internal agents. None appear in specialist-mapping.json or John's routing table. They consume disk and cognitive space but cannot be dispatched through the normal John → discover.js → specialist route. Direct session invocation (naming the company in a prompt) is the only access path — undocumented and unreliable. E. Dual-store sync is manual and partial. 16 agents exist only in ~/.claude/agents/ (single source of truth but no backup). 8 agents exist only in definitions/ (backed up but unreachable). The sync script does not auto-run; it must be manually invoked. This creates continuous drift pressure. F. planner is a phantom agent in live chains. Three chains reference an agent named planner that has no .md file anywhere on disk. If these chains were ever executed, planner steps would fail with no error at the mapping layer. G. No machine-readable owner for any virtual company. company.json owner: N/A across all 12 companies means there is no way to auto-route escalation, billing, or accountability. This is a governance gap, not a code gap. Inventory: Daemon Fleet AI Factory Daemon Fleet Audit — 2026-05-09 Auditor: kelsey-hightower Timestamp: 2026-05-09T20:48 UTC Source of truth: launchctl list + daemon-fleet-status.json (generated 2026-05-09T18:33:52Z) + plist reads + error log sampling Fleet size (watchdog): 148 tracked entries | 47 running keepalive | 74 calendar_ok | 3 down | 20 erroring Fleet size (launchctl live): 168 rows matching alai/john/no.alai pattern (includes daemons not in watchdog) 1. Live Exit-Code Matrix Column key: PID ( - = not running) | Last Exit | Plist location | KeepAlive policy | Schedule 1a. RUNNING (keepalive, PID alive, exit 0 or -15/SIGTERM) Daemon PID Exit Plist Path KeepAlive Schedule com.alai.agent-timeout-monitor 1163 0 system/daemons/launchagents always continuous com.alai.cc-api-server 1183 0 system/daemons/launchagents always continuous com.alai.credit-monitor 1223 0 system/daemons/launchagents always continuous com.alai.idle-learning-daemon 1196 0 system/daemons/launchagents always continuous com.alai.litestream 51452 0 Library/LaunchAgents always continuous com.alai.mem0-server 65706 -15 (SIGTERM) Library/LaunchAgents always continuous com.alai.mlx-gemma4 27321 0 (not in known dirs) always continuous com.alai.mlx-qwen25-coder-32b 31120 0 (not in known dirs) always continuous com.alai.mlx-qwen3-32b 29227 0 (not in known dirs) always continuous com.alai.mlx-qwen3-8b 29488 0 (not in known dirs) always continuous com.alai.ollama-serve-v2 29100 0 system/daemons/launchagents always continuous com.alai.orchestrator-bridge 1185 0 system/daemons/launchagents always continuous com.alai.ram-monitor 1241 0 system/daemons/launchagents always continuous com.alai.task-router 1200 0 system/daemons/launchagents always continuous com.alai.web-learning 1176 0 system/daemons/launchagents always continuous com.john.bookstack-webhook-relay 1206 0 system/daemons/launchagents always continuous com.john.browser-worker 1211 0 system/daemons/launchagents always continuous com.john.caddy-vault 86082 0 system/daemons/launchagents always continuous com.john.cloudflared 79617 0 system/daemons/launchagents always continuous com.john.comms-agent 1186 0 system/daemons/launchagents always continuous com.john.documenso-webhook 20561 0 system/daemons/launchagents always continuous com.john.durable-executor 1212 0 system/daemons/launchagents always continuous com.john.edita-loop 61758 0 system/daemons/launchagents always continuous com.john.email-agent 92225 0 system/daemons/launchagents calendar calendar com.john.email-tracker 11292 0 system/daemons/launchagents conditional conditional com.john.event-dispatcher 65452 0 system/daemons/launchagents always continuous com.john.health-dashboard 1189 0 system/daemons/launchagents always continuous com.john.hook-daemon 1240 0 system/daemons/launchagents always continuous com.john.intake-watcher 41929 0 system/daemons/launchagents always continuous com.john.kenan-hot-web 1231 0 system/daemons/launchagents always continuous com.john.llm-datasette 1170 0 system/daemons/launchagents always continuous com.john.mc-dashboard 65673 0 system/daemons/launchagents always continuous com.john.n8n 1203 0 system/daemons/launchagents always continuous com.john.network-watchdog 1194 0 system/daemons/launchagents always continuous com.john.ops-watchdog 8782 -15 (SIGTERM) system/daemons/launchagents always continuous com.john.outbox-processor 1190 0 system/daemons/launchagents always continuous com.john.paste-logger 1224 0 system/daemons/launchagents always continuous com.john.pi-orchestrator 75750 0 system/daemons/launchagents always continuous com.john.slack-bot 18046 1 (last crash exit) system/daemons/launchagents always continuous com.john.tender-dashboard 1234 0 system/daemons/launchagents always continuous com.john.tool-shed 1191 0 system/daemons/launchagents always continuous com.john.vault-keeper 87005 0 system/daemons/launchagents always continuous com.john.vault-proxy 1222 0 system/daemons/launchagents always continuous com.john.youtube-nightly-learning 83439 0 system/daemons/launchagents always continuous no.alai.claude-proxy 6361 0 Library/LaunchAgents always continuous com.alai.rag-drain-worker 3640 1 (prev exit) system/config/launchagents always continuous com.alai.rag-fsevents-adapter 64755 1 (prev exit) system/config/launchagents conditional WatchPaths com.alai.daemon-fleet-watchdog 2815 0 (Library/LaunchAgents) calendar every 15min 1b. DOWN — Exit 0 (intentional one-shot or conditional) Daemon PID Exit Notes com.john.autocoder-ui - 0 down_exit_0: one-shot complete com.john.draft-sender - 0 down_exit_0: conditional, no pending drafts com.john.orchestrator-http - 0 down_exit_0: DUPLICATE — orchestrator-bridge runs same script on port 3052 1c. CALENDAR SCHEDULED — Exit 0 last run (healthy) These fired successfully on last scheduled run. Not exhaustively listed — watchdog confirms 74 in this state. Key members: com.alai.apply-knowledge , com.alai.archive-first-scan , com.alai.chain-weekly-report , com.alai.docker-watchdog , com.alai.gcloud-auth , com.alai.john-daily-digest , com.alai.lightrag-backup , com.alai.memory-watchdog , com.alai.meta-agent-loop , com.alai.restore-drill , com.alai.skill-audit , com.alai.team-sync , com.alai.wal-checkpoint , com.alai.weekly-planning , com.alai.zombie-cleanup , com.john.agentforge , com.john.bookstack-sync , com.john.calendar-bridge , com.john.critical-tools-healthcheck , com.john.daemon-health , com.john.db-archival-sweep , com.john.db-backup , com.john.domain-audit , com.john.drift-detector , com.john.email-briefing , com.john.forge-watchdog , com.john.log-rotate , com.john.mc-session-worker , com.john.morning-routine , com.john.offsite-backup , com.john.pi2-override-audit , com.john.review-drain , com.john.session-archiver , com.john.session-extractor , com.john.spam-recovery-scan , com.john.system-guardian , com.john.tldr-actionizer , com.john.tldr-briefing , com.john.tldr-watch , com.john.tldr-weekly-synthesis , com.john.weekly-synthesis , no.alai.email-body-integrity , no.alai.meta-agent , no.alai.resolver , no.alai.spend-guard . 1d. FAILING — Non-zero exit codes Daemon PID Exit Code Plist Location KeepAlive Schedule com.alai.azure-db-backup - 1 (exit 256 internal) system/config/launchagents none (RunAtLoad=false) every 4h com.alai.blueprint-fleet-watchdog - 1 (exit 256) Library/LaunchAgents none daily 06:15 com.alai.cert-expiry-monitor - 1 (exit 256) system/config/launchagents none daily 07:00 com.alai.chain-daily-inbox - 1 (exit 256) Library/LaunchAgents none daily 07:00 com.alai.chain-e2e-nightly - 1 (exit 256) Library/LaunchAgents none daily 02:00 com.alai.chain-phantom-detector - 1 (exit 256) Library/LaunchAgents none every 15min com.alai.cost-daily-report - 127 Library/LaunchAgents none daily 23:55 com.alai.daily-planning - 127 Library/LaunchAgents none daily 07:30 com.alai.filesystem-audit - 1 (exit 256) Library/LaunchAgents none Monday 08:00 com.alai.pi-orch-health - 127 Library/LaunchAgents none daily 23:00 com.alai.rag-bookstack-adapter - 1 (exit 256) system/config/launchagents none every 5min com.alai.rag-drain-worker 3640 1 (prev exit, now running) system/config/launchagents always continuous com.alai.rag-fsevents-adapter 64755 1 (prev exit, now running) system/config/launchagents conditional WatchPaths com.alai.rag-mc-adapter - 1 (exit 256) system/config/launchagents none every 5min com.alai.rdap-audit-quarterly - 2 Library/LaunchAgents none quarterly com.john.alaiml-retrain - 1 (exit 256) system/config/launchagents + Library/LaunchAgents none 1st of month 03:00 com.john.auto-verify-regression - 1 (exit 256) system/daemons/launchagents none daily 06:00 com.john.b2-offsite-backup - 1 (exit 256) system/daemons/launchagents none daily 03:30 com.john.bookstack-staleness - 1 (exit 256) system/daemons/launchagents none Sunday 22:00 com.john.infra-drift-detector - 1 (exit 256) system/daemons/launchagents none Sunday 04:00 com.john.legal-docs-azure-sync - 127 Library/LaunchAgents Crashed=true daily 02:00 com.john.lightrag-monitor - 2 system/config/launchagents none daily 09:00 com.john.mcp-health-check - 127 Library/LaunchAgents Crashed=true every 1h com.john.slack-bot 18046 1 (last crash) system/daemons/launchagents always continuous 1e. NOT LOADED (watchdog knows them, launchctl does not) Daemon State com.alai.lightrag-migrate-pump not_loaded com.alai.lightrag-outbox-ingest not_loaded com.alai.lightrag-watchdog not_loaded com.john.rdap-audit-quarterly not_loaded 2. Failure Cohort — Root Cause Analysis EXIT 127 — Script/binary not found (BROKEN — script deleted) These five daemons have plists in Library/LaunchAgents pointing to scripts that no longer exist on disk. Exit 127 is bash's "command not found" — the script path itself is gone. Daemon Missing Script Last Successful Run Category com.alai.pi-orch-health ~/system/tools/pi-orch-health.sh 2026-05-06 (verdict: CRITICAL) BROKEN com.alai.cost-daily-report ~/system/tools/cost-daily-report.sh 2026-04-29 BROKEN com.alai.daily-planning ~/system/tools/daily-planning.sh unknown BROKEN com.john.legal-docs-azure-sync ~/system/daemons/legal-docs-azure-sync.sh unknown BROKEN com.john.mcp-health-check ~/system/tools/mcp-health-check.sh unknown BROKEN Note on legal-docs-azure-sync and mcp-health-check: Both have KeepAlive.Crashed=true , meaning launchd will restart them on crash. Since they always exit 127, they are in a guaranteed restart loop (throttled). This wastes process spawns indefinitely. EXIT 1 / 256 — Script exists but fails at runtime (BROKEN — dependency missing) Daemon Script Root Cause Category com.alai.rag-bookstack-adapter rag-bookstack-adapter.js Queue depth 946 > 500 backpressure gate — never drains because drain-worker cannot reach LightRAG BROKEN (cascade) com.alai.rag-drain-worker rag-drain-worker.js Vaultwarden ETIMEDOUT → CF credentials unavailable → LightRAG unreachable BROKEN com.alai.rag-mc-adapter rag-mc-adapter.js Same backpressure cascade, queue depth 946 BROKEN (cascade) com.alai.rag-fsevents-adapter rag-fsevents-adapter.js Queue depth >500 backpressure, runs but skips all enqueues BROKEN (cascade) com.alai.azure-db-backup azure-db-backup.sh az storage blob upload SIGTERM'd (line 116); temp dirs leaked in /tmp TRANSIENT com.alai.cert-expiry-monitor cert-expiry-monitor.sh Script exists, no error log found — likely network/curl failure TRANSIENT com.alai.chain-daily-inbox chain-runner.sh --enqueue daily-inbox-triage chain-runner.sh exists; failure likely in downstream chain execution TRANSIENT com.alai.chain-e2e-nightly chain-e2e-nightly.sh Script exists; likely Playwright/network dependency failure TRANSIENT com.alai.chain-phantom-detector phantom-link-detector.js Script does NOT exist on disk — MISSING BROKEN com.alai.filesystem-audit ~/bin/anvil-audit.sh Script exists; last exit 256 may be diff/rename limit warning elevated to exit TRANSIENT com.alai.blueprint-fleet-watchdog ~/system/daemons/blueprint-fleet-watchdog.js Script exists; likely a missing dep or API auth failure TRANSIENT com.john.alaiml-retrain ~/ALAI/internal/projects/alaiML/scripts/retrain.sh Script exists; DUPLICATE plist (both config and Library/LaunchAgents); likely venv path or MC dep failure BROKEN (duplicate) com.john.auto-verify-regression auto-verify-regression.js Script exists; calls claim-verifier.js — probable missing dep or API failure TRANSIENT com.john.b2-offsite-backup b2-offsite-backup.sh B2 storage cap EXCEEDED (403 storage_cap_exceeded) and auth token limit errors BROKEN (infra) com.john.bookstack-staleness bookstack-staleness.js API parse error "Unexpected end of JSON input" on page 2553+ — BookStack API truncating responses BROKEN com.john.infra-drift-detector infra-drift-detector.sh diff.renameLimit warning elevated to non-zero exit; git rename detection failing on large repos TRANSIENT com.john.slack-bot (node process) WebSocket pong timeouts (ETIMEDOUT); process alive and heartbeating, but launchd saw a crash exit TRANSIENT EXIT 2 — Logic/health failure Daemon Script Root Cause Category com.alai.rdap-audit-quarterly plist not found in known dirs Script path unknown, likely MISSING BROKEN com.john.lightrag-monitor lightrag-health-with-alert.sh Script exits 1/2 when LightRAG is degraded — this is INTENTIONAL ALERTING behavior, but LightRAG IS degraded EXPECTED (alarm correctly firing) 3. Producer-Consumer Wiring RAG Ingest Pipeline (currently DEADLOCKED) com.alai.rag-fsevents-adapter watches ~/system/evidence, ~/system/specs, ~/system/rules com.alai.rag-bookstack-adapter polls BookStack API every 5min com.alai.rag-mc-adapter reads ~/system/logs/mc-task-outcomes.jsonl --> all three WRITE to ~/system/state/ingest-queue.sqlite (queue depth: 946, frozen) com.alai.rag-drain-worker (keepalive) reads ingest-queue.sqlite --> attempts POST to https://lightrag.basicconsulting.no (via CF Access) --> CF credentials lookup: Vaultwarden ETIMEDOUT (bw-session stale or vault unreachable) --> LightRAG unreachable → queue never drains → backpressure locks all three producers ORPHAN OUTPUT: ~/system/metrics/ingest_pipeline.prom written by rag-drain-worker --> nothing confirmed reading this file (no Prometheus scrape config found in audit) This is the single most critical broken pipeline in the factory. 946 items queued, zero being processed. Memory / Knowledge Layer com.alai.mem0-server (PID 65706, keepalive) reads/writes: http://localhost:6333 (Qdrant vector store) produces: REST API on localhost:9000 (port cslistener) consumed by: discover.js, agent tools calling /v1/memories STATUS: alive and healthy (health 200, Qdrant 200) NOTE: exit -15 (SIGTERM) in launchctl = prior graceful restart; current run is clean com.alai.litestream (PID 51452, keepalive) reads: SQLite DBs in ~/system/state/ (flywheel.db, health-events.db, etc.) writes: B2 bucket alai-studio-backup (replication stream) STATUS: running but b2-offsite-backup.sh (separate) hitting B2 storage cap com.alai.wal-checkpoint (calendar, exit 0) reads/writes: SQLite WAL files in ~/system/state/ consumed by: litestream (clean WAL = cleaner replication) Orchestration Kernel com.john.pi-orchestrator (PID 75750, keepalive) reads: Planka MC API (boards.basicconsulting.no per mock config) writes: ~/system/logs/pi-orchestrator/daemon-*.log STATUS: running, cycling every 30s, "No eligible tasks" — running in MOCK MODE NOTE: alai-config-mock.json loaded; real config resolver likely not resolving com.alai.orchestrator-bridge (PID 1185, keepalive) runs: orchestrator-http-server.js on port 3052 produces: HTTP API for triggering orchestrator actions STATUS: running healthy com.john.orchestrator-http (down_exit_0) DUPLICATE of orchestrator-bridge — same script, same port (3052) Watchdog says down_exit_0: port already bound by bridge when this tried to start ORPHAN: plist in Library/LaunchAgents, shadow of orchestrator-bridge Backup Layer com.john.b2-offsite-backup (calendar, exit 1) reads: ~/system/state/ SQLite snapshots writes: B2 bucket alai-studio-backup STATUS: BLOCKED — B2 storage cap exceeded (403) com.alai.azure-db-backup (calendar, exit 1) reads: Azure SQL databases (via az CLI) writes: ~/system/daemons/azure-db-backup.sh → Azure Blob Storage STATUS: TRANSIENT failures, az upload SIGTERM'd (timeout in script or process kill) ORPHAN TEMP: /tmp/az-backup-* directories leaking (rm fails on non-empty dirs) Comms / Slack com.john.slack-bot (PID 18046, keepalive) reads: Slack WebSocket (socket-mode) writes: Slack messages, ~/system/logs/slack-bot.log STATUS: alive, heartbeating, WebSocket reconnects successfully (~once per session) CONCERN: 300min silent (no incoming Slack messages received in 5h as of audit time) no.alai.email-body-integrity (calendar, exit 0) reads: IMAP one.com (email body verification) writes: ~/system/logs/email-integrity.log STATUS: healthy last run Monitoring / Health com.john.lightrag-monitor (calendar, exit 2) reads: LightRAG API health endpoint writes: /tmp/lightrag-task-context.json, ~/system/evidence/lightrag-health-*.md STATUS: correctly reporting LightRAG as degraded; Slack alert delivery ALSO failing ORPHAN OUTPUT: lightrag-health-*.md files accumulating in ~/system/evidence/ (rag-fsevents-adapter trying to enqueue these — but queue full — circular feedback) com.alai.daemon-fleet-watchdog (PID 2815, every 15min) reads: launchctl list, all plist dirs writes: ~/system/state/daemon-fleet-status.json STATUS: healthy, data current as of 18:33:52Z today com.alai.pi-orch-health (calendar, exit 127) was: reads pi-orchestrator state, writes ~/system/state/pi-orch-health-*.json STATUS: BROKEN — script deleted. Last known verdict (2026-05-06): CRITICAL MLX / Inference Layer com.alai.mlx-gemma4 (PID 27321) com.alai.mlx-qwen3-32b (PID 29227) com.alai.mlx-qwen3-8b (PID 29488) com.alai.mlx-qwen25-coder-32b (PID 31120) com.alai.ollama-serve-v2 (PID 29100) STATUS: all running (keepalive), exit 0 PRODUCES: inference endpoints on ANVIL (local) Note: plists not found in audited dirs — loaded from unknown location (possibly ~/Library/LaunchAgents subdirs) 4. Critical-Path Daemon Assessment com.john.pi-orchestrator PID: 75750 | Exit: 0 | Status: RUNNING Healthy? Process is alive and cycling every 30s. However, it is running in MOCK MODE ( alai-config-mock.json ). The config resolver is not resolving real service URLs (Planka localhost:3100 is not listening per MEMORY.md). "No eligible tasks" every cycle. Produces: Cycle logs to ~/system/logs/pi-orchestrator/daemon-stdout.log Consumes: MC/Planka API (currently mocked, not reaching real board) Verdict: Process alive but effectively IDLE. Not orchestrating anything. Mock mode = silent failure. com.alai.pi-orch-health PID: - | Exit: 127 | Status: BROKEN Root cause: ~/system/tools/pi-orch-health.sh was deleted. Script ran last on 2026-05-06 with verdict CRITICAL. Now permanently broken until script is restored. Produces: ~/system/state/pi-orch-health-*.json (last written 2026-05-06) Verdict: BROKEN — monitoring of the orchestrator kernel has gone dark. com.alai.mem0-server PID: 65706 | Exit: -15 (prior SIGTERM) | Status: ALIVE AND HEALTHY Root cause of -15: launchctl records the exit code of the previous run; the current process (PID 65706) started clean. SIGTERM was a graceful restart, not a crash. Evidence: Port 9000 listening (lsof confirmed), /health returns 200, Qdrant at localhost:6333 returns 200. Note: /v1/memories returning 404 — API route may have changed or not yet initialized. Verdict: ALIVE. Exit -15 is misleading — current instance is healthy. com.john.lightrag-monitor PID: - | Exit: 2 | Status: EXPECTED ALARM Root cause: Script correctly exits non-zero when LightRAG is degraded. LightRAG IS degraded (drain-worker cannot reach it due to missing CF credentials). Slack alert also failing (alert delivery broken). Produces: ~/system/evidence/lightrag-health-*.md , /tmp/lightrag-task-context.json Verdict: Monitor itself is working correctly. The degradation it reports is real and severe. com.alai.lightrag-keepwarm PID: - | Exit: 0 | Status: calendar_ok Plist location: ~/Library/LaunchAgents/com.alai.lightrag-keepwarm.plist Schedule: unknown (plist content not captured in this audit — found late) Produces: Keepwarm pings to LightRAG Verdict: Last run exited 0. Likely the keepwarm pings succeed against the local endpoint even while drain-worker cannot auth through CF Access. Not broken. com.alai.archive-first-scan PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 06:00 Script: ~/bin/archive-first-scan.sh — EXISTS Produces: /tmp/archive-first-scan-report-.txt , writes to ~/system/state/archive-first-ledger.jsonl Consumes: Filesystem scan of unarchived candidates Verdict: HEALTHY. Running as designed. com.john.session-archiver PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00 Script: ~/system/tools/session-archiver.js — EXISTS (10928 bytes, 2026-02-23) Produces: Cleaned-up session artifacts Consumes: Claude session logs/state Verdict: HEALTHY. Last run clean. com.alai.cost-daily-report PID: - | Exit: 127 | Status: BROKEN | Schedule: daily 23:55 Root cause: ~/system/tools/cost-daily-report.sh deleted. Last successful run 2026-04-29. Produces: ~/system/reports/cost-daily.md Consumes: Cost tracker data Verdict: BROKEN — daily cost visibility dark for 10 days. com.alai.weekly-planning PID: - | Exit: 0 | Status: calendar_ok | Schedule: Tuesday 08:00 Script: ~/system/tools/weekly-planning.sh — MISSING from disk BUT watchdog says last exit was 0 and state is calendar_ok. Contradiction. Likely explanation: Ran successfully before script was deleted; launchd has not triggered it since (last Tuesday before deletion date). Will fail as exit 127 next Tuesday. Verdict: TICKING TIME BOMB — will fail next Tuesday 08:00. no.alai.email-body-integrity PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00 Script: ~/system/tools/email-body-integrity-check.js — EXISTS Produces: ~/system/logs/email-integrity.log Verdict: HEALTHY. 5. Daemon-Fleet-Watchdog State File: ~/system/state/daemon-fleet-status.json Generated: 2026-05-09T18:33:52Z (approx 2h15m before this audit) Watchdog summary from file: total: 148 running: 47 (keepalive processes alive) calendar_ok: 74 (last scheduled run exit 0) down: 3 (down_exit_0: autocoder-ui, draft-sender, orchestrator-http) err: 20 (non-zero exit codes) Watchdog accuracy notes: Watchdog correctly identifies 20 erroring daemons but exit codes are internally translated (256 = bash exit 1; 32512 = bash exit 127). Watchdog does NOT cover all 168 launchctl rows — 4 daemons marked not_loaded (lightrag-migrate-pump, lightrag-outbox-ingest, lightrag-watchdog, rdap-audit-quarterly). com.alai.mem0-server shows last_exit: 15 (SIGTERM of prior instance) but state: running — correct, the current instance is healthy. com.john.slack-bot shows running/pid 18046 but last_exit: 256 — launchd records last crash before current keepalive restart. Process is currently alive. Open Questions Pi-orchestrator mock mode: Why is alai-config-mock.json being loaded instead of real config? Is the Planka/MC API intentionally offline, or is the config resolver broken? The orchestrator is spinning idle. LightRAG CF credentials: Vaultwarden ETIMEDOUT in rag-drain-worker . Is /tmp/bw-session stale? Is Vaultwarden (vault.basicconsulting.no) reachable? This single broken auth is deadlocking the entire RAG ingest pipeline (946 items queued). B2 storage cap: 403 storage_cap_exceeded on Backblaze B2. Is this a billing cap that needs to be raised in the B2 console? Litestream is still replicating but the nightly snapshot job fails. Five deleted scripts: Who deleted pi-orch-health.sh , cost-daily-report.sh , daily-planning.sh , legal-docs-azure-sync.sh , mcp-health-check.sh ? Were they intentionally removed (deprecated)? If deprecated, the plists should be unloaded. If accidental deletion, restore from backup. Duplicate alaiml-retrain plist: Plist exists in BOTH system/config/launchagents AND Library/LaunchAgents . Two crons would fire. Which is canonical? com.john.orchestrator-http duplicate: Identical to com.alai.orchestrator-bridge (same script, same port). orchestrator-http shows down_exit_0 because bridge already bound the port. Dead plist. LightRAG health-*.md circular feedback: The lightrag-monitor evidence files are being watched by rag-fsevents-adapter , which tries to enqueue them into LightRAG — a monitoring artifact feeding back into the broken pipeline it monitors. Slack bot silent 300 min: No incoming Slack messages for 5h at audit time. Is anyone sending messages? Or is the Socket Mode token scope broken for receiving? Highest-Leverage Fix Candidates (audit-level only) Priority 1 — Unlocks entire RAG pipeline (946 items unblocked) Fix rag-drain-worker CF Access credentials: ensure Vaultwarden item "LightRAG-CF-Access" exists and /tmp/bw-session is valid. One credential fix unblocks bookstack-adapter + mc-adapter + fsevents-adapter simultaneously. Priority 2 — Restore cost visibility (10-day blind spot) Restore or recreate ~/system/tools/cost-daily-report.sh . Last output was 2026-04-29. CEO-visible reporting dark for 10 days. Priority 3 — Fix orchestrator mock mode Determine why pi-orchestrator loads mock config. If Planka/MC API is down, restore it. If config resolver is broken, fix alai-config.js . The orchestration kernel is running but doing nothing. Priority 4 — Raise B2 storage cap B2 bucket alai-studio-backup has hit its cap. Nightly database snapshots are not landing. This is a billing action in the Backblaze console, not a code fix. Priority 5 — Unload dead plists (5 scripts deleted) com.alai.pi-orch-health , com.alai.cost-daily-report , com.alai.daily-planning , com.john.legal-docs-azure-sync , com.john.mcp-health-check should either have scripts restored or be unloaded from launchd. legal-docs-azure-sync and mcp-health-check have KeepAlive.Crashed=true creating infinite restart loops. Priority 6 — Unload com.john.orchestrator-http duplicate plist Dead shadow of orchestrator-bridge. Causes confusion in watchdog counts. Priority 7 — Restore weekly-planning.sh before next Tuesday Script missing but plist active. Will fail exit 127 at 08:00 next Tuesday. Priority 8 — Fix phantom-link-detector.js missing script com.alai.chain-phantom-detector runs every 15min calling a script that does not exist. High-frequency failure (96 times/day). Verifier Autonomy Audit AI Factory Audit — Plan Task 2.2: Verifier Autonomy Date: 2026-05-09 Auditor: Martin Kleppmann (CodeCraft) Classification: AUDIT-ONLY — read-only, no mutation, no live invocation VERDICT SUMMARY (up front) Autonomy verdict: ABSENT The /verify-fix-loop skill is fully specified and internally consistent, but it has zero wiring into any automated trigger path. CEO is the de-facto verifier for every task that reaches mc.js ready . The skill exists only as a manually-invoked slash command. 1. End-to-End Trace of /verify-fix-loop Source: ~/.claude/skills/verify-fix-loop/SKILL.md Flow map Caller (John / human) invokes: /verify-fix-loop mc_id= spec_path= │ ▼ SKILL orchestrates in main conversation thread (not a sub-agent itself) │ ├─ mkdir -p /tmp/verify-fix-loop-/ (EVIDENCE_DIR) │ ▼ LOOP (max 3 iterations): │ ├─ Step A: Task(subagent_type=verifier OR general-purpose+persona) │ prompt = verifier brief template (inline in SKILL.md) │ verifier writes: EVIDENCE_DIR/verifier-loop.md (mandatory) │ /tmp/verifier-feedback-.md (if CONFIDENCE=FEEDBACK) │ ├─ Step B: Parse STATUS + CONFIDENCE from verifier output │ ├─ Step C: Branch │ PERFECT / VERIFIED → write SUMMARY.md (SUCCESS), exit │ PARTIAL → if high_stakes: ESCALATE; else: SUCCESS_WITH_NOTES, exit │ FAILED → ESCALATE (harness broken) │ FEEDBACK: │ if high_stakes or budget exhausted → ESCALATE │ else → │ ├─ Step D: Task(subagent_type=fix-builder OR general-purpose+persona) │ reads /tmp/verifier-feedback-.md │ applies prescribed edits to spec_path via Edit tool │ returns APPLIED: / PARTIAL:/ / COULD_NOT_APPLY: │ └─ LOOP_INDEX += 1 → back to Step A Domain escalation policy docs , system , refactor , polish — loops up to MAX_LOOPS (default 3) security , finance , legal , deploy , infra , unknown — ESCALATE on first FEEDBACK (no autonomous correction) Loop budget Default MAX_LOOPS = 3 Hard cost cap: $5 per skill invocation Per-loop cost estimate: $0.40–0.60 (Sonnet) Worst case: 3 × $0.60 = $1.80 Termination conditions CONFIDENCE in {PERFECT, VERIFIED} → SUCCESS CONFIDENCE == PARTIAL + not high_stakes → SUCCESS_WITH_NOTES Budget exhausted (LOOP_INDEX == MAX_LOOPS with FEEDBACK) → ESCALATE High-stakes domain with FEEDBACK on first iteration → ESCALATE Any FAILED confidence → ESCALATE (harness broken) fix-builder returns COULD_NOT_APPLY → ESCALATE MC status changes to done/cancelled mid-loop → ABORT silently Cost estimate exceeds $5 → ESCALATE before next iter Entry points (who can call this) The SKILL.md lists trigger phrases: "verify-fix-loop", "auto-verify and fix", "verifier loop", "ne idi preko mene", "loop until pass". All trigger phrases are designed for human invocation in a conversation. No programmatic entry points exist. 2. Auto-Invocation Analysis — The Central CEO Question pi-orchestrator.js Grep result: ZERO matches for verify-fix-loop , verifier , fix-builder in ~/system/kernel/pi-orchestrator.js . The orchestrator's post-completion flow ( reportCompletion function, lines ~3781–3930) does: Hallucination detection (regex-based detectHallucination ) Proof-of-work check (GOTCHA file or response length) qa-19 Check #20 (endpoint verification, if configured) Postflight marker write to ~/system/state/postflight-cleared-.json None of these steps call the verifier, fix-builder, or verify-fix-loop skill. The "postflight" referenced in pi-orchestrator is a file marker write, NOT the /task-postflight skill. task-postflight skill Grep result: ZERO matches for verify-fix-loop , verifier , fix-builder in ~/.claude/skills/task-postflight/SKILL.md . The /task-postflight skill dispatches Angie Jones (Proveo) for AC-checklist QA, not the atomic-claim verifier. These are parallel, non-overlapping verification patterns: Proveo = human-readable AC checklist with pass/fail verdicts per item Verifier = atomic claim decomposition with machine-verified proof citations Hooks directory Grep result: Only archive files matched. No active hook in ~/.claude/hooks/ references verify-fix-loop , verifier , or fix-builder . Active hooks audited: liveness-claim-validator.sh — PostToolUse on Write/Edit; checks for bare liveness claims in memory/spec/agent files. Not related to verifier dispatch. mc-ready-gate.sh — wrapper for mc.js ready ; runs ZAKON #30 direct-probe gate + evidence-contract-validator. Does NOT invoke verify-fix-loop. evidence-contract-validator.sh — validates verdict JSON schema + sha256 chain. Shell-based, no agent dispatch. cross-session-claim-gate.sh , session-task-lock-gate.sh , plan-completeness-gate.sh , pre-dispatch-gate.sh — none reference verifier. Daemon fleet Grep result: ZERO matches for verify-fix-loop , verifier , fix-builder in ~/system/daemons/ . LaunchAgents Grep result: ZERO matches in ~/Library/LaunchAgents/ . VERDICT: ABSENT The verify-fix-loop and its constituent agents (verifier, fix-builder) have zero automated entry points. The only invocation path is a human typing a trigger phrase in a Claude Code conversation. CEO is always in the loop because there is no loop without CEO. 3. Tool-Surface Security Check Verifier (read-only) Definition file: ~/.claude/agents/verifier.md Declared tools: tools: Read, Grep, Glob, Bash The tools: field includes Bash. This is the critical point. The agent definition does NOT use a tool whitelist that removes Write/Edit/Task at the API level. It relies entirely on prompt-level enforcement ("Enforcement is prompt-only — this rule is yours to honor. You are the gatekeeper."). The verifier.md explicitly states this. Permitted Bash commands (per prompt whitelist in verifier.md): cat, head, tail, wc, ls, file, stat diff, git read-only subcommands grep, rg, find (via tool preferred) jq, node -e (read-only expression) node ~/system/tools/mc.js show (read-only subcommands only — NEVER add|start|done|ready|update|pause|cancel) gh pr view, gh issue view, gh api -X GET sqlite3 -readonly, psql SELECT only curl -sI (HEAD), curl -s GET (never POST/PUT/DELETE) bash -n, shellcheck, node --check (dry-run linters) Escape paths documented: The prompt says "NEVER run: rm, mv, cp (to non-/tmp/), chmod, chown, ln" and "Redirections that write outside /tmp/verifier-* or /tmp/-evidence/: >, >>, tee to other paths". This is prompt-level enforcement only. A model following instructions could still run bash -c "echo foo > ~/system/some-file.txt" — the agent framework does not block it at the API tool-call level. The tools: Bash declaration gives the agent full shell access; the prompt whitelist is self-enforced. Feedback file writes are permitted to /tmp/verifier-feedback-.md specifically. Verdict on verifier tool isolation: Prompt-enforced, not API-enforced. Read-only is a behavioral constraint, not a structural constraint. The risk is manageable for a trusted model, but not cryptographically bounded. Fix-builder (write-only, scoped) Definition file: ~/.claude/agents/fix-builder.md Declared tools: tools: Read, Edit, Grep, Glob The fix-builder tool list explicitly excludes: Write (no new file creation) Bash (no test runs, deploys, builds, git ops) Task (no further dispatch) This is stronger isolation than the verifier: the tools: field at the agent definition level excludes Bash and Write. If the agent framework enforces declared tools as a whitelist, fix-builder genuinely cannot run shell commands or create new files. It can only read existing files (Read, Grep, Glob) and apply edits to existing files (Edit). Gap: Fix-builder cannot create new files even when feedback prescribes it. The skill handles this: "If the feedback prescribes creating a new file, mark that fix as COULD_NOT_APPLY" — the loop escalates. This is a by-design limitation, not a bug. Verdict on fix-builder tool isolation: Structurally scoped (Bash and Write excluded from tools declaration). This is the correct pattern. The verifier should be refactored to match this approach. 4. Synthetic Dry-Trace Selected task: MC #99389 — "Refactor /mehanik skill to progressive-disclosure pattern" (status: review, owner: pi-orchestrator) This task was marked mc.js ready (now review ) after pi-orchestrator completed it. What WOULD have happened if /verify-fix-loop were auto-invoked: Step 0: trigger fired when pi-orchestrator called mc.js ready #99389 → /verify-fix-loop mc_id=99389 spec_path=~/.claude/skills/mehanik/SKILL.md domain=docs (inferred from skill file path) max_loops=3 Step A (iter 1): dispatch verifier - verifier reads ~/.claude/skills/mehanik/SKILL.md - verifier reads MC #99389 ACs via mc.js show 99389 - verifier decomposes ACs into atomic claims: (a) SKILL.md exists and is < N lines (tier-1 constraint) (b) references/agent-brief.md exists (c) references/failure-modes.md exists (d) Skill tool callable post-refactor - verifier probes each atom with Read/Glob/Bash Step B: parse CONFIDENCE If all files exist and SKILL.md is within limits → PERFECT → SUCCESS If any reference file missing → FEEDBACK Step D (if FEEDBACK): dispatch fix-builder - fix-builder reads /tmp/verifier-feedback-99389.md - applies Edit to create missing sections or correct line counts Step C (iter 2): re-verify → likely PERFECT → write SUMMARY.md → SUCCESS Actual closure path used for MC #99389: The task is in review status. Looking at the review queue (25+ tasks in review), there is no evidence of verifier invocation. The closure path was: pi-orchestrator marked ready → task sits in review queue → CEO/John is the implicit reviewer. This is the CEO-as-verifier pattern the CEO wants to eliminate. 5. Comparison with Existing Patterns liveness-claim-validator.sh Trigger: PostToolUse hook, fires on every Write/Edit/MultiEdit tool call Scope: Memory files, spec files, agent definition files matching 4 path patterns Mechanism: Shell script reads tool input JSON from stdin, scans written content for bare liveness claims, blocks write if violations found (exit 2) Auto-invoked: YES, unconditionally, at the Claude Code hook level Why verify-fix-loop is NOT similarly hooked: The liveness validator is a passive scan that reads content already being written. The verify-fix-loop requires active agent dispatch (spawning sub-agents), which cannot be done from a shell hook. Shell hooks can block tool calls; they cannot spawn conversational agents. This is the fundamental architectural gap: hooks can intercept tool calls synchronously, but spinning up a verify-fix-loop requires an async agent conversation that the hook system cannot initiate. evidence-verifier agent File: ~/.claude/agents/evidence-verifier.md Declared tools: (not in scope of this read — but confirmed the agent exists) Auto-invoked: YES, but differently — it is called by mc-ready-gate.sh via the evidence-contract-validator.sh pathway. However, the evidence-contract-validator.sh is a pure shell script that validates JSON schema + file hashes — it does NOT dispatch the evidence-verifier agent. The agent definition exists for manual invocation. The shell script performs a deterministic (non-LLM) validation that is auto-invoked at mc.js ready time. Pattern difference: The evidence-verifier pattern uses a shell script as the auto-invoke layer (deterministic, no LLM), with the agent definition as a fallback for edge cases. The verify-fix-loop requires LLM reasoning at every step, making shell-script auto-invocation insufficient. 6. Gap Analysis and Fix Proposal (Audit-Level Only) Root cause of the gap The verify-fix-loop was designed top-down as a skill (manual invocation). The liveness-claim-validator was designed bottom-up as a hook (automatic). There is no bridge layer that translates "mc.js ready event" → "spawn verify-fix-loop conversation". The missing component is a postflight agent dispatcher : something that observes the ready event and spawns a verify-fix-loop session as a sub-agent task. Minimum wiring needed Option A: PostToolUse hook on mc.js ready (recommended) Element Detail File to modify ~/.claude/hooks/mc-ready-gate.sh (already fires on mc.js ready) Addition location After line 196 (all gates passed — currently execs mc.js directly) Trigger After mc.js ready succeeds, spawn verify-fix-loop as a background Task Mechanism mc-ready-gate.sh would write a trigger file to /tmp/vfl-trigger-.json containing mc_id + spec_path + domain; a daemon polls this file The problem: mc-ready-gate.sh is a synchronous shell script. It cannot spawn a conversational agent (Task dispatch requires a running Claude Code session). It can only write a file. Option B: pi-orchestrator.js postflight hook (most natural wiring point) Element Detail File to modify ~/system/kernel/pi-orchestrator.js Addition location Inside reportCompletion() function, after line ~3900 (after QA gate passes) What to add A call to write /tmp/vfl-trigger-.json with task metadata Trigger The daemon below polls this and dispatches Option C: /task-postflight skill modification (cleanest for H-tasks) Element Detail File to modify ~/.claude/skills/task-postflight/SKILL.md Addition location After Section 2 (PROVEO VALIDATION DISPATCH), add Section 2b What to add Conditional: if Proveo returns PASS AND task domain is docs/system/refactor, dispatch /verify-fix-loop before writing the postflight marker Trigger Manual invocation of /task-postflight already exists for H/BLOCKER tasks Advantage Stays within the skill conversation context — Task dispatch works naturally here Recommended wiring (Option C + Option B trigger file): Immediate (no new infrastructure): Add a Section 2b to /task-postflight SKILL.md that dispatches /verify-fix-loop when Proveo passes and domain is non-high-stakes. This works today for all tasks that go through /task-postflight . Systematic (covers tasks that bypass /task-postflight): Add a trigger file write to pi-orchestrator.js reportCompletion() . A lightweight daemon polls /tmp/vfl-trigger-*.json files and — when a pi-orchestrator session is active — dispatches the verify-fix-loop skill via the existing Claude Code session. Loop budget recommendation Keep MAX_LOOPS = 3 (matches SKILL.md default) For postflight auto-invocation, restrict to docs , system , refactor , polish domains only Hard cap: $5 per invocation (already in SKILL.md) Add timeout: 5 minutes wall-clock before auto-escalation to CEO Escalation path when budget exhausted Write SUMMARY.md to EVIDENCE_DIR with full loop history Call node ~/system/tools/slack.js send alerts "[VFL-ESCALATED] MC # — N/MAX loops used, last verdict: " (Slack, not CEO direct) Set task status to blocked via mc.js block with reason "verify-fix-loop budget exhausted — human review needed" John receives Slack alert and decides: (a) override + mark done, (b) dispatch additional builder, (c) extend budget via [CEO_APPROVED] token Open Questions Tool-level enforcement for verifier: Should the verifier's tools: field be changed from Read, Grep, Glob, Bash to Read, Grep, Glob (removing Bash) to achieve structural isolation matching fix-builder? This would break the verifier's ability to run curl -sI , git log , sqlite3 -readonly probes — which are core to its value. The tradeoff is behavioral (current) vs structural enforcement. Conversation context for auto-dispatch: Spawning a verify-fix-loop Task requires an active Claude Code conversation. If pi-orchestrator fires after a conversation closes, there is no context to spawn into. Does the system need a persistent "factory session" that stays open to receive postflight dispatches? High-stakes domain detection: The SKILL.md defaults unknown domains to HIGH_STAKES (no autonomous correction). For auto-invocation, domain inference from spec path heuristics will frequently return unknown. Should the default be flipped to docs for auto-invoked postflight use cases? Proveo vs verifier: overlap management: /task-postflight already dispatches Proveo for AC-checklist QA. If verify-fix-loop is added as Section 2b, tasks will run both Proveo (AC checklist) AND verifier (atomic claims) sequentially. Is this the intended double-verification model, or should one replace the other for certain task types? mc.js ready event vs pi-orchestrator ready: Some tasks are marked ready by human John ( node ~/system/tools/mc.js ready ), others by pi-orchestrator after build completion, and others by /task-postflight . The auto-invocation wiring point differs for each path. A comprehensive solution needs to intercept all three paths. Evidence Metadata Item Value Files read 8 Grep/Bash tool calls 12 Live agent invocations 0 Mutations 0 Wall-clock (estimated) ~18 min Key source files ~/.claude/skills/verify-fix-loop/SKILL.md , ~/.claude/agents/verifier.md , ~/.claude/agents/fix-builder.md , ~/.claude/skills/task-postflight/SKILL.md , ~/system/kernel/pi-orchestrator.js (lines 3730–3930), ~/.claude/hooks/mc-ready-gate.sh , ~/.claude/hooks/liveness-claim-validator.sh BUILD-BLUEPRINT Discipline 2.3 — BUILD-BLUEPRINT Discipline Audit Date: 2026-05-09 Auditor: sentinel-ba Scope: 17 BUILD-BLUEPRINT.md files + Mehanik gate enforcement 1. Per-Blueprint State Matrix # Path Bytes Lines Last Modified Status Project Liveness 1 ~/projects/internal/basicfakta/BUILD-BLUEPRINT.md 11,193 323 2026-04-29 SUBSTANTIAL Last commit 10d ago (auto-backup only) 2 ~/projects/bookstack-api/BUILD-BLUEPRINT.md 12,366 352 2026-04-29 SUBSTANTIAL Last commit 5 weeks ago (auto-backup) 3 ~/projects/pa/BUILD-BLUEPRINT.md 13,238 354 2026-04-29 SUBSTANTIAL Last commit 10d ago (auto-backup) 4 ~/projects/alai-system/BUILD-BLUEPRINT.md 3,520 75 2026-04-30 THIN (75 lines, not stub) Last commit 6d ago (auto-backup) 5 ~/business/.../products/Tok/BUILD-BLUEPRINT.md 27,080 637 2026-04-27 SUBSTANTIAL Last commit 10d ago — gradle-wrapper CI fix; active 6 ~/business/.../products/BasicFakta/BUILD-BLUEPRINT.md 12,865 332 2026-03-07 STALE (63d, no recent activity) Last commit 9 weeks ago — test/CI only 7 ~/business/.../products/Lobby/BUILD-BLUEPRINT.md 18,707 396 2026-03-09 STALE (61d, repo semi-active) Last commit 6 weeks ago — feat/RLS 8 ~/business/.../products/Drop/BUILD-BLUEPRINT.md 8,846 208 2026-05-07 PRESENT (208 lines, recently updated) Last commit 63 min ago — MOST ACTIVE 9 ~/business/.../products/DropSrbija/BUILD-BLUEPRINT.md 10,657 386 2026-05-08 SUBSTANTIAL Last commit 2d ago; git-repo shared with Gotiva (anvil-fs migration) 10 ~/business/.../products/Plock/BUILD-BLUEPRINT.md 24,175 512 2026-04-16 STALE (23d, repo dormant) Last commit 5 weeks ago — smoke tests only 11 ~/business/.../products/Gotiva/BUILD-BLUEPRINT.md 27,112 556 2026-03-11 STALE (59d) Last commit 2d ago was chore/anvil-fs (migration commit, not product work) 12 ~/business/.../products/Bilko/BUILD-BLUEPRINT.md 38,303 530 2026-05-08 SUBSTANTIAL Last commit 10 min ago — extremely active 13 ~/business/.../sales/outreach/sintef/BUILD-BLUEPRINT.md 1,943 49 2026-04-27 TEMPLATE/STUB (49 lines, 1,943 bytes — under threshold) Last commit 2d ago was chore/anvil-fs only 14 ~/business/.../web/BUILD-BLUEPRINT.md 4,636 110 2026-04-27 THIN Last commit 2d ago — feat/redirect 15 ~/business/.../finance/akershus-fylke/BUILD-BLUEPRINT.md 1,486 33 2026-05-08 TEMPLATE/STUB (33 lines; per MC #99886 Decision 7: "move akershus OUT of products/") Last commit 2d ago chore only 16 ~/clients-external/snowit-site/BUILD-BLUEPRINT.md 3,427 67 2026-04-28 THIN Last commit 2 hours ago — active gitignore hygiene 17 ~/clients-external/lumiscare-variants/lumiscare/BUILD-BLUEPRINT.md 37,426 637 2026-05-09 SUBSTANTIAL Last commit 2 hours ago — security fix; MOST RECENTLY UPDATED Summary counts SUBSTANTIAL (>10,000 bytes, real content): 8 — basicfakta, bookstack-api, pa, Tok, DropSrbija, Gotiva, Bilko, lumiscare PRESENT / ADEQUATE (200–10,000 bytes, real content): 2 — Drop, alai-system THIN (< 5,000 bytes, functional but sparse): 3 — web, snowit-site, alai-system TEMPLATE/STUB (< 2,000 bytes or <50 lines with no real content): 2 — sintef, akershus-fylke STALE (>30d without update, repo active): 4 — BasicFakta (63d), Lobby (61d), Gotiva (59d), Plock (23d) Note: STALE classification applies where the product repo has had meaningful commits but the blueprint has not been updated. Plock is borderline (23d, repo dormant). 2. Mehanik Gate Truth Check What Mehanik requires (tool-verified from ~/.claude/agents/mehanik.md ) Phase T of the GOTCHA workflow states: ls {project_path}/BUILD-BLUEPRINT.md — MUST exist Read the file (confirm contents match task scope) Circuit Breaker #2: "BUILD-BLUEPRINT.md not read — evidence of Read call required in session" Assessment: The requirement is FORMALLY A HARD BLOCK. CB#2 fires if the blueprint is not read (not just present). The hook ~/.claude/hooks/pre-dispatch-gate.sh also enforces a secondary check: it runs blueprint-check.js against the project path stored in the Mehanik cleared token and blocks dispatch if score < 60. Enforcement quality issues identified Issue A — Hook is warn-only for missing MC ID. When the Task prompt has no MC #NNNN pattern, the hook exits 0 with a stderr warning only. Tasks dispatched without an MC ID bypass both the Mehanik cleared-token check and the blueprint-score gate entirely. Issue B — mehanik_session_id: unknown in all inspected tokens. Both tokens inspected (99886 and 100150) show mehanik_session_id: unknown . The cleared token was written, proving Mehanik ran, but the session binding is absent — meaning the hook cannot verify that the same session cleared the task vs. a stale token from a prior session. Token expiry (4h) partially mitigates but does not eliminate this gap. Issue C — Blueprint score threshold set at 90 but tokens show WARN at 80 and 65. Both inspected dispatches show blueprint_check_result: WARN with scores below the 90 threshold, yet dispatch proceeded. The hook's blueprint-check.js integration exists ( ~/system/tools/blueprint-check.js is present), but the pre-dispatch hook only exits 2 (block) if verdict is NOT_READY . The WARN path allows dispatch. The 90-point threshold in the token file is never enforced as a gate. Issue D — Token expiry not enforced in hook. The hook does not parse expires_at from the cleared file. A token written 23 hours ago (within a session restart) would still pass. The 4h expiry in the token is advisory metadata only. Sample of 5 recent dispatches MC ID Cleared token exists? Blueprint cited in token? Blueprint score Dispatch allowed? 99886 YES Bilko/BUILD-BLUEPRINT.md 80 (WARN) YES — WARN not blocked 100150 YES Drop/BUILD-BLUEPRINT.md 65 (WARN) YES — WARN not blocked 100150 YES Drop/DEPLOY-MAP.md cited — YES 99910 (MC Claim Protocol) YES ( /tmp/mehanik-cleared-99910 ) — Not inspectable (token may have expired and been overwritten) YES 99886 YES Bilko — per DOD evidence: "Mehanik CLEAR /tmp/mehanik-cleared-99886" 80 YES Token count in /tmp : 113 mehanik-cleared tokens present (range: #10063 to #100173). Volume indicates Mehanik is running regularly — it is not being bypassed entirely. Gate verdict: PARTIALLY REAL. Blueprint presence is hard-blocked. Blueprint read is required and recorded in the token. However, the score-based quality gate (threshold 90) is advisory — WARN scores pass. The session-binding gap means cleared tokens could theoretically be reused across sessions. The missing-MC-ID path is a complete bypass vector. 3. Blueprint-vs-Reality Drift Score Bilko (MOST ACTIVE) Blueprint claims: "API Framework: Ktor 3.4.0 / Kotlin 2.3.0 on JVM 25" — Cloud Run deployed "Database: PostgreSQL 15" — Cloud SQL "Status: MVP dev — frontend implemented with mock data, backend built" Actual state (tool-verified): gcloud run services list shows: bilko-api-stage , bilko-api-demo , bilko-web-stage , bilko-web-demo , bilko-intesa-demo all TRUE; bilko-staging-api FALSE (unhealthy) Drop is on Azure VM; Bilko is on GCP Cloud Run — consistent with blueprint claim Blueprint says "Status: MVP dev" but there are 5 live Cloud Run services including bilko-intesa-demo (suggesting Intesa bank integration demo exists) Drift score: LOW-MEDIUM. Infrastructure matches. The "MVP dev with mock data" status language is understated given live deployed services. Blueprint was last updated 2026-05-08 (yesterday) — reasonably current. Drop (MOST RECENTLY COMMITTED) Blueprint claims: "Azure VM vm-drop-prod (Sweden Central)" + docker-compose "Database: PostgreSQL 16 via Drizzle ORM in docker-compose on Azure VM" Actual state (tool-verified): curl -sI https://app.getdrop.no returns HTTP/2 200 — production is live Response headers show nonce -based CSP (Next.js pattern) — consistent with Next.js 15 claim Blueprint was rewritten 2026-04-30 to fix the AWS phantom; it now correctly reflects Azure VM Most recent commit (63 min ago): staging CI/CD OIDC fix — blueprint does NOT mention staging VM yet (deploy token shows vm-drop-stage staging path) Drift score: LOW. Production deployment matches blueprint. Staging environment exists in deployment reality but blueprint only covers production — minor documentation lag. Tok (ACTIVE BUT NO RECENT BLUEPRINT UPDATE) Blueprint claims: "Database: PostgreSQL 15 (Cloud SQL)" "PSD2 Cert: QWAC/QSEAL — DigiCert/GlobalSign — mTLS for Croatia" "Status: Core implementation complete — all 8 development gates DONE" Actual state: No gcloud run services list results for Tok (not visible in current GCP project scope) Blueprint last updated 2026-04-27 (12d ago); last meaningful commit was 10d ago (gradle-wrapper fix unblocking CI since March) The gradle-wrapper CI was broken since March 2026 — meaning "all 8 gates DONE" may be technically true for code but CI was broken for 6+ weeks Drift score: MEDIUM. The product-gate claim is technically accurate but CI was silently broken for 2+ months — a fact not reflected in the blueprint status line. PSD2 cert claim is unverifiable without SSH to the Tok deployment. 4. Cross-Cutting Findings No holding-company blueprint ~/business/ALAI-Holding-AS/BUILD-BLUEPRINT.md — ABSENT. There is no top-level document explaining how the portfolio of products relates, shared infrastructure, or cross-product dependencies (e.g., Tok feeding Bilko). Each product is an island. This is a gap for new agents onboarding to the system who need portfolio-level context. Blueprint versioning Blueprints ARE git-tracked in their respective product repos. git log --follow -- BUILD-BLUEPRINT.md on Bilko shows at least 3 tracked commits; Drop shows the AWS-to-Azure canonical rewrite is a committed event with a clear commit message and MC reference. This is genuine version history — drift can be diagnosed by diffing commits. However, there is no automated drift alert. Blueprint age vs. commit recency is never surfaced to John or CEO unless a sentinel audit runs manually. Tenants without blueprints ~/system/ — has ~/system/BUILD-BLUEPRINT.md (EXISTS — confirmed) ~/personal/ — NO BLUEPRINT (expected: personal scope, not a product) ~/clients-external/ — only snowit-site and lumiscare are covered; MEDON client ( ~/business/ALAI-Holding-AS/pipeline/CodeCraft/clients/MEDON/ ) has a CHANGELOG.md in its shopify-app but NO BUILD-BLUEPRINT.md. This is a Mehanik bypass vector for any MEDON dispatch. DropSrbija blueprint exists but the Gotiva blueprint is 59d stale — yet the git repo for both was recently touched (anvil-fs migration). This creates a false "recently updated" signal. CHANGELOG without BUILD-BLUEPRINT Within active project trees (excluding node_modules): MEDON shopify-app has a CHANGELOG.md without a blueprint. All node_modules CHANGELOG.md hits are false positives (dependency changelogs, not ALAI products). 5. Blueprint → Mehanik → Agent Dispatch Trace: MC #99886 Task: CI/CD Standardization — FAZA 2 — canonical refresh (Petter Graff) Mehanik ran? YES. Token /tmp/mehanik-cleared-99886 present. Timestamp: 2026-05-08T21:06:23.121Z . Blueprint cited? blueprint_read: /Users/makinja/business/ALAI-Holding-AS/products/Bilko/BUILD-BLUEPRINT.md This is the Bilko blueprint. The task is a system-wide canonical spec edit , not a Bilko-specific build task. The project path assigned was Bilko's path, which means Mehanik's blueprint check was anchored to Bilko even though the deliverables ( ~/system/specs/cicd-canonical-v3-drafts/ ) are system-level. This is a scope-mismatch in the Mehanik gate — the blueprint read is nominally satisfied but the product checked (Bilko) is not the target of the changes. Blueprint score: 80/100 (WARN). Dispatch allowed. Agent output referenced blueprint sections? The DOD evidence in MC #99886 references the task as a "system-wide canonical spec edit" and notes 5 issue-areas in the v3 drafts — none reference Bilko blueprint sections. The blueprint read appears to have been a gate-pass ritual, not a content-informing step. Dispatch outcome: Deferred (not dispatched to FlowForge) — "executive-side decision to defer flowforge run until parallel work coordinates." The Mehanik clear token was written but the agent run was held. This is the correct behavior per CEO decision, but it reveals that Mehanik clearance does not guarantee agent execution — it is one gate in a multi-gate flow. Trace verdict: Mehanik ran and wrote a token. The blueprint cited was topically mismatched (Bilko blueprint for a system-spec task). The blueprint score gate passed despite being below threshold. Agent was not dispatched (deferred). Blueprint content did not visibly inform the dispatch. 6. Open Questions Mehanik project_path heuristic: How does Mehanik determine which project_path to use when the task is cross-product or system-level? For #99886, Bilko was used for a system-spec task. Is this John's input, or Mehanik's inference? If inference, the blueprint check is unreliable for cross-cutting tasks. Score threshold enforcement: The blueprint_threshold_applied: 90 field in cleared tokens is never enforced as a hard gate. Drop scored 65 and dispatch was allowed. Should the threshold be lowered to match operational reality, or should the WARN-to-BLOCK escalation be implemented? Token reuse across sessions: mehanik_session_id: unknown in all inspected tokens. Is there a plan to enforce session binding? Without it, a cleared token from a prior CEO session could authorize a dispatch in a new context. Gotiva and Lobby stale blueprints: Both products are 59d+ stale. Are they in maintenance mode or abandoned? If active, their blueprints are Mehanik bypass risks for any dispatch — the gate will pass but Mehanik will be reading outdated architecture. MEDON client coverage: No BUILD-BLUEPRINT.md exists for the MEDON shopify-app. If John receives a MEDON task, Mehanik's Phase T will fire ls {project_path}/BUILD-BLUEPRINT.md → BLOCKED. Is the MEDON client expected to receive blueprint coverage, or is it out of scope? 7. ROI Lens (sentinel-ba) Is the blueprint pattern earning its overhead? Direct value delivered: Blueprint presence as a Mehanik gate prerequisite has prevented scope hallucination at the dispatch level. The 113 mehanik-cleared tokens in /tmp represent 113 gate events where someone was forced to confirm a blueprint existed and was read. This is a real forcing function. The Drop AWS phantom rewrite (MC #10353) is a concrete example where the blueprint served as the canonical source of truth that agents were required to consult — and where a discrepancy (aspirational AWS docs treated as ground truth) was detected and corrected with a committed blueprint update. The Bilko blueprint (38KB, 530 lines, git-tracked) is the most thorough — it provides stack, ADRs, domain context, and deployment architecture. It has demonstrably prevented repeated infra hallucination on Bilko tasks. Overhead cost: 17 blueprints exist, 8 are genuinely substantial. The 2 stubs (sintef/akershus) add near-zero value and should be either expanded or removed (their Mehanik gate pass is hollow). Blueprint maintenance is manual and unalerted. Stale blueprints (BasicFakta 63d, Lobby 61d, Gotiva 59d) represent a risk: Mehanik passes the gate but the agent reads outdated architecture. The overhead of writing blueprints is paid; the staleness risk is not managed. The 90-point score threshold being advisory-only means the quality gate was designed but not deployed. This is overhead (blueprint-check.js runs on every dispatch) with only partial benefit (WARN path is free). Net verdict: POSITIVE ROI, but with a quality gap. The blueprint pattern is not theatrical — it is a genuine gate that has caught real hallucinations. However, the enforcement has two systemic weaknesses: (1) stale blueprints pass the gate silently, and (2) the score threshold is never enforced as a block. Fixing these two issues would cost approximately 1–2 hours of system work and would sharply increase the ROI-per-blueprint. Priority recommendations: HIGH — Enforce score threshold or lower it. Either block at score < 60 (matching current floor observed in practice), or officially downgrade the threshold. WARN-at-65-and-dispatch is worse than an honest 60-point threshold that blocks. HIGH — Add staleness alert. A daily check: if blueprint last-modified > 30d AND project has had commits in last 14d → surface warning to John. Zero build cost (can be added to existing daemon fleet). MED — Expand or remove stub blueprints. sintef (49 lines) and akershus-fylke (33 lines) are hollow gates. MC #99886 Decision 7 already proposes moving akershus out of products/ — execute this and either write a real blueprint or remove the gate. LOW — Session binding for Mehanik tokens. Low urgency given 4h expiry, but mehanik_session_id: unknown should be resolved to prevent cross-session token reuse on long-running tasks. Health Matrix 3.1 Health Matrix — Functional Probe Results Audit date: 2026-05-09 | Auditor: sentinel-tester | Phase: P3 (functional probes) Health Matrix Component Test Status Evidence (cmd + snippet) A1. mem0/qdrant POST write (audit-test user) PARTIAL curl http://localhost:9000/add -d '{"text":"audit-2026-05-09 ping test","user_id":"audit-test"}' → {"result":{"results":[]},"status":"added"} . Read-back via /search returned count:1 but results:[] — memory acknowledged as added but semantic search returned empty results. Write acknowledged; retrieve path unreliable. A2. LightRAG GET /health + POST /query WORKS curl localhost:9621/health → {"status":"healthy","core_version":"1.4.16","pipeline_busy":false} . POST /query {"query":"what is ALAI","mode":"naive"} → 3-paragraph narrative with citations. Full round-trip confirmed. A3. HiveDB intel SELECT COUNT(*) FROM intel WORKS sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;" → 17560 . Latest entries dated 2026-05-09 19:11:24. Write-side confirmed via hivemind.js query "ALAI" — 8 results returned, including entries written today. Read AND write both functional. A3b. HiveMind writer Confirm write path exists WORKS node ~/system/agents/hivemind/hivemind.js query "ALAI" → 8 live results with today's timestamps. Writer: daemon-fleet-watchdog posts alerts; email-agent posts task alerts. Multiple live writers confirmed. A4. Chroma chroma-mcp responsive BROKEN curl http://localhost:8000/api/v1/collections → no response (empty). Port 8000 not listening. No chroma process found. chroma-mcp listed in settings.json but no running service. A5. .md auto-memory Fresh writes landing? PARTIAL ls -la ~/.claude/projects/-Users-makinja/memory/ — most recent file mtime is 2026-04-30 16:45 (feedback_validation_enforcement_active). MEMORY.md itself last written 2026-05-09 19:04 (today, by John session). No automated daemon auto-writing .md files found — writes are manual/session-driven only. Memory lands, but no auto-append pipeline. B1. HiveMind read API Any tool returns intel? WORKS node ~/system/agents/hivemind/hivemind.js read --limit 3 returns intel rows. hivemind.js query "ALAI" returns 8 records. P1 claim of "NO read API" is INCORRECT — read API exists and functions. hivemind-mcp.js also exposes hivemind_read , hivemind_query , hivemind_semantic_query . C1. pi-orchestrator Process running? PARTIAL `ps aux C2. pi-orch mock mode Is it truly mock? PARTIAL grep "mock" ~/system/kernel/pi-orchestrator.js — no alai-config-mock.json reference found. Config offlineMode: false , enabled: true . Latest health state shows Verdict: CRITICAL (2026-05-06). Durable-runner bridge healthy. Process running but HTTP port silent and no recent dispatch logs after 2026-03-19. Likely dispatching but to BROKEN downstream (Ollama). D1. Verifier auto-invocation verify-fix-loop grep PARTIAL grep -rn "verify-fix-loop" ~/.claude/skills/ → SKILL EXISTS at ~/.claude/skills/verify-fix-loop/SKILL.md . Skill is MANUAL-TRIGGER only — "Trigger phrases: verify-fix-loop, auto-verify and fix". No daemon or hook auto-invokes it. P2 verdict ABSENT is partially wrong: skill exists but auto-invocation is absent. E1. Library skill node ~/system/tools/library.js list WORKS Returns 13 cookbooks (alai-full:33 skills, dev:17, business:12, security:10, etc.) + 11 defaults. Fully functional CLI. No external endpoint required for list . F1. Mehanik gate Token files past 7d WORKS ls /tmp/mehanik-cleared-* → 10 token files found, all from 2026-05-09. Most recent: mehanik-cleared-100173 created 18:29:30 today. Corresponding MC #100173 (Bilko landing pages UX audit) confirmed open+assigned to vizu. Token→dispatch correlation confirmed. G1. com.alai.pi-orch-health Daemon exit reason BROKEN launchctl print gui/501/com.alai.pi-orch-health → state: not running . Last health report Verdict: CRITICAL (2026-05-06). Scheduled health monitor is itself failing to run consistently. G2. com.alai.cost-daily-report Daemon exit reason BROKEN launchctl print gui/501/com.alai.cost-daily-report → state: not running . No exit code visible via launchctl; likely script dependency failure (BW session or Slack). G3. com.alai.chain-phantom-detector Script exists? BROKEN ls ~/system/daemons/chain-phantom-detector* → NOT FOUND. plist references ~/system/tools/phantom-link-detector.js — script name mismatch or renamed. Daemon registered but script path may differ. G4. com.john.alaiml-retrain Exit reason BROKEN state: not running . Script path: ~/ALAI/internal/projects/alaiML/scripts/retrain.sh — path under old ~/ALAI/ tree (now symlink). Path itself may still resolve via symlink, but script likely fails on missing MLX or stale config. G5. com.alai.weekly-planning Script exists? BROKEN ls ~/system/daemons/weekly-planning* → NOT FOUND. plist references ~/system/tools/weekly-planning.sh . Script absent from daemons dir. H1. RAG ingest queue Current queue depth PARTIAL cat ~/system/state/rag-drain.prom → total 454 (bookstack:442, mc-outcomes:9, evidence:2, specs:1). NOTE: prom file mtime is 2026-04-23 17:59 — 16 days stale. rag-drain-worker went running→down_exit_256 today per HiveMind alert #64900. Queue depth of 454 is last known, not live. P1 claim of 946 appears to be an older snapshot. Summary Counts Status Count WORKS 5 PARTIAL 6 BROKEN 6 Surprises (Contradictions vs P1/P2) 1. HiveMind READ API EXISTS — P1 claim "no read API" is WRONG P1 (1.1-memory-plane.md) stated HiveMind has no read/query API. Ground truth: hivemind.js exposes read , query , semantic_query , hybrid_query subcommands, all functional. hivemind-mcp.js wraps all of them as MCP tools. Live query returned 8 results dated today. This is the most significant P1/P2 contradiction. 2. pi-orchestrator HTTP port 8401 dead — process alive but silent The pi-orchestrator process (PID 75750) is running. Config shows httpPort: 8401 . Port 8401 refuses connections. The actual active HTTP bridge is the durable-runner on port 3052 ( uptime 1,726,326s = ~20 days ). The kernel's own HTTP endpoint never came up, or stopped. Dispatch claims in P1/P2 must be qualified: pi-orch kernel runs, but HTTP control plane uses a different process entirely. 3. RAG queue: 454, not 946 — and the metric is 16 days stale P1/P2 cited 946 queued. The prometheus file shows 454 and was last written 2026-04-23. The rag-drain-worker crashed today (exit 256). The queue is not draining, the metric is not being updated, and the actual backlog is unknown. True state: drainer is DOWN, queue age unknown. 4. verify-fix-loop SKILL EXISTS — P2 "ABSENT" partially wrong P2 said verifier auto-invocation is ABSENT. The skill ~/.claude/skills/verify-fix-loop/SKILL.md exists and is indexed. The verdict should be: skill exists as MANUAL-trigger, not auto-invoked by any daemon or hook. P2 was right about auto-invocation being absent but wrong to imply the capability doesn't exist at all. 5. mem0 write acknowledged but search returns empty mem0 write → status: added . Read-back search → count: 1 but results: [] . The qdrant backend is running (health endpoint confirms backend: qdrant , collections: ["mem0migrations","sessions","hivemind","mem0_john","knowledge"] ). The "audit-test" user_id has no collection, so add may go into a separate namespace not searched. Not a mem0 failure per se — the route logic for new user_id collections may differ from existing ones. Write side appears functional; retrieval for new users is unconfirmed. Open Questions mem0 user_id routing : Does mem0 create a new Qdrant collection per user_id, and does search also need a pre-existing collection to return results? The audit-test user returned count:1 but empty results — is this a namespace creation lag or a real retrieval bug? pi-orch HTTP port 8401 : Why is port 8401 not open even though the process is running? Is the HTTP server initialization gated behind a condition (Ollama health check, etc.) that's failing? durable-runner bridge (port 3052) uptime 20 days : This is the actual dispatch layer. Is it processing tasks, or has it been idle since March? No recent task dispatch logs found post-2026-03-19. rag-drain-worker exit 256 : What is the exact failure? The queue at 454 is stale and not draining. LightRAG is healthy. The ingest pipe is broken somewhere between queue and LightRAG. chain-phantom-detector plist vs actual script name : plist says phantom-link-detector.js . Is this the same script? Does it exist under tools/? MEMORY.md auto-write : There is no daemon or hook that automatically appends to MEMORY.md. All memory entries are written manually by John during sessions. If a session ends without a write, the event is lost. Is this intentional or a gap? Petter Synthesis 4.1 — Petter Graff Executive Synthesis AI Factory Audit — 2026-05-09 Auditor: Petter Graff (CodeCraft — Lead Architect) Synthesizing: P1 reports 1.1–1.4, P2 reports 2.1–2.3, P3 report 3.1 Method: P3.1 live-probe data overrides P1/P2 file-based claims where they contradict. Section 1 — Executive Summary (Bosnian) Situacija John ima dobro zamišljenu arhitekturu: kontrolni sloj sa Mehanik kapijom, memorijski sloj sa pet pohrana, RAG pipeline za znanje, tim od 66 agenata u 12 virtualnih kompanija, i orkestratorski kernel koji bi trebao sve automatizirati. Na papiru to izgleda kao AI fabrika. U stvarnosti, 62.5% advertiziranih tokova podataka i kontrole su mrtvi ili degradirani. Sistem radi kao ručna radionica — John lično proslijedi svaki zadatak, lično provjeri, lično zatvori. Automatizacija postoji kao infrastruktura, ali nije spojena. Ono što funkcioniše: HiveDB/HiveMind intel bus, LightRAG lokalni upis, Mehanik kapija (djelimično), alati (250+ živih), i 74 calendar-scheduled daemona koji rade ispravno. Ono što je teatar: pi-orchestrator (živ proces, nema stvarnih dispatcheva od marta), verify-fix-loop (skill postoji, niko ga nikad ne pozove automatski), mem0 (93K+ vektora, nula aktivnih pisača), četiri "fantomske" kompanije bez routinga, i 35 chain YAML fajlova bez nijednog executora. 5 najkritičnijih praznina (rangirano po IMPACT × SEVERITY ÷ EFFORT) RAG ingest pipeline — potpuno blokiran (Vaultwarden timeout, 3,150+ stavki u redu (posljednji poznati snapshot: 454 dana 2026-04-23; live SQLite prebrojan 2026-05-09 = 3,150), drain-worker pao danas) pi-orchestrator u mock/broken modu — kernel živi, ali ne dispatcha ništa od marta 2026; sav dispatch ide kroz Johna ručno Verifier loop — sposoban ali ne pozvan — verify-fix-loop skill postoji, nije spojen ni na jedan automatski okidač; CEO je jedini QA gate Memorijska anarhija — 5 pohrana, nijedna nije System of Record; mem0 ima 93K vektora koje niko ne piše ni čita; .md fajlovi su defacto SoR, ali to nije dizajnirano tako Agent routing rupa — validator (44 pozivanja u skill fajlovima) i distiller (21 pozivanje) nemaju ni jedan unos u specialist-mapping.json; 7 mapirani agenti su fizički nedostupni Šta popraviti prvo Jedna stvar otključava više od svega ostalog: RAG drain-worker — jedan credential fix (Vaultwarden session za LightRAG CF Access) otključava 3 adaptera odjednom i prazni 454+ stavki iz reda. Direktno za njim: pi-orchestrator real config — razumjeti zašto HTTP port 8401 ne radi i zašto nema dispatcheva od marta; bez ovoga, fabrika ostaje ručna. Treće po prioritetu: verify-fix-loop wiring — dodati Section 2b u /task-postflight SKILL.md, što ne zahtijeva novu infrastrukturu i odmah uklanja CEO-a iz petlje za docs/system/refactor zadatke. Ova tri fixa su S/M napora i zajednički konvertuju fabriku iz "John kao ručni dispatcher + QA" u nešto što nalikuje automatiziranom sistemu. Section 2 — Plan vs Reality Delta Table Subsystem Plan Claim Reality (audit-verified) Delta Severity Memory plane mem0 is the structured SoR for John's personal facts; LightRAG is secondary RAG store .md files are the actual SoR (Claude Code native). mem0 API has 0 active writers, 865 stale facts. LightRAG is primary RAG (999 docs, healthy). 5 parallel stores, none designated SoR. Complete SoR inversion; mem0 is a ghost server with stale data nobody reads H HiveMind Intel broadcast bus; P1 implied no read API HiveDB SQLite 17,560 rows, live writes today. hivemind.js read/query/semantic_query all functional. hivemind-mcp.js wraps all. Read API EXISTS and works. P1 overstated the gap. HiveMind is the healthiest store in the factory. L Tools shed 250+ live tools, manifest current 443 files on disk; manifest 6 weeks stale; 12 un-owned tools; 50 .bak files >14d old; 1 credential-bearing filename (security risk); 100 dead-code tools Manifest does not reflect reality. Security artifact present. Dead code accumulating. M Agent fleet 29 agents routable via specialist-mapping.json 44% mapping coverage (29/66). validator (44 skill refs) and distiller (21 refs) absent from mapping. 7 mapped agents unreachable on disk. 4 companies invisible to routing. 35 chains have executors (chain-runner.js + chain-runner.sh) but executors are un-wired from active skills and broken at daemon invocation. Routing table is too thin to be trusted as source of truth. Silent dispatch failures guaranteed. H Daemon fleet 148 daemons maintaining system health 20 erroring, 5 scripts deleted (exit 127), 2 in infinite crash loop. RAG pipeline fully deadlocked. Cost reporting dark 10+ days. pi-orch health monitor script deleted. Monitoring is blind to key system health. 13% error rate. H pi-orchestrator Automated dispatch kernel; picks up MC tasks, fires specialist agents PID 75750 alive. HTTP port 8401 dead. No dispatch logs post-2026-03-19. Durable-runner bridge (port 3052) live but dispatch activity unclear. Config: offline-mode=false but effectively not dispatching. Kernel running in operational void. All actual dispatch is manual-John. H Verifier loop verify-fix-loop auto-invokes after mc.js ready for eligible tasks Skill exists, internally correct. Zero wiring to any automated trigger (no hook, daemon, pi-orch code calls it). CEO is de-facto verifier. Built but unwired. Capability without activation. H BUILD-BLUEPRINT discipline Mehanik enforces blueprint read before any dispatch; 90-point score gate Blueprint read IS required and enforced as hard block (CB#2). But: WARN scores (65, 80) allow dispatch — 90-point threshold is advisory only. 4 blueprints 59d+ stale. Missing-MC-ID path bypasses gate entirely. Gate is real but porous. Score enforcement is theater. Session binding absent. M Library skill Skill library accessible for cookbook-based task execution node ~/system/tools/library.js list returns 13 cookbooks, 11 defaults. CLI fully functional. WORKS. No gap. L Virtual companies 12 companies, each routable via discover.js → specialist-mapping.json 4 companies (Axiom, Datavera, Resolver, Lexicon) have full persona dirs, CLAUDE.md, 5–9 internal agents — but zero entries in specialist-mapping.json. Cannot be routed via normal John → discover.js flow. 33% of the company fleet is phantom infrastructure. M Section 3 — Top-10 Gaps Ranked Composite priority = Leverage × Severity ÷ Effort (S=1, M=2, L=4) # Gap Name Subsystem Evidence Leverage (1–10) Severity (1–10) Effort Composite Proposed Fix 1 RAG drain-worker deadlock Daemon fleet / Data plane 1.4 §3, 2.1 §B Dead Edge 2, 3.1 H1 — 3,150 items queued (live SQLite 2026-05-09; stale prom file shows 454 as of 2026-04-23) 9 9 S 81 Fix Vaultwarden session so rag-drain-worker can reach LightRAG CF Access endpoint; confirm /tmp/bw-session valid. 2 pi-orchestrator dispatch broken Orchestration kernel 1.4 §4, 2.1 §A Dead Edge 1, 3.1 C1/C2 10 9 L 22.5 Diagnose why HTTP port 8401 is silent and why no dispatch logs post-March; restore real MC API config or repair durable-runner bridge as authoritative dispatch path. 3 Verifier loop unwired Verifier / QA 2.2 §2 verdict ABSENT, 2.1 Dead Edge 3, 3.1 D1 8 8 M 32 Add Section 2b to /task-postflight SKILL.md: conditional dispatch of /verify-fix-loop for docs/system/refactor domains when Proveo PASS; no new infrastructure required. 4 mem0 SoR wire break Memory plane 1.1 §4, 2.1 §B Dead Edge 24/25 6 7 M 21 Designate .md files as official SoR or wire a PostToolUse hook that calls POST localhost:9000/add on every memory .md write; choose one, document it, retire the other. 5 Agent routing table incomplete Agent fleet 1.3 §A concerns A/B, 2.1 §C 7 8 M 28 Add validator, distiller, mehanik, evidence-verifier, dzevad-jahic, fix-builder to specialist-mapping.json; sync 8 definitions-only agents to ~/.claude/agents/. 6 5 deleted scripts with live plists Daemon fleet 1.4 §2 exit 127 analysis 5 7 S 35 Unload plists for pi-orch-health, cost-daily-report, daily-planning, legal-docs-azure-sync, mcp-health-check; restore scripts or remove LaunchAgents permanently; stop infinite crash loops. 7 4 phantom companies unroutable Agent fleet / Routing 1.3 §2, 2.1 §C 5 6 M 15 Add Axiom, Datavera, Resolver, Lexicon to specialist-mapping.json with at least one dispatch agent each; or officially mark them as experimental and document the direct-session access pattern. 8 Blueprint score gate advisory-only BUILD-BLUEPRINT discipline 2.3 §2 issues A/B/C 6 5 S 30 Lower enforced threshold to 60 (matching observed practice floor) or escalate WARN to BLOCK in pre-dispatch-gate.sh ; fix missing-MC-ID bypass path. 9 Chroma and stale mem0 orphan stores Memory plane 1.1 §3, 3.1 A4 3 5 S 15 Audit Chroma origin; if no active reader/writer, delete. Archive or document stale mem0_john/knowledge collections. Reduces cognitive confusion and false recovery paths. 10 B2 storage cap exceeded Daemon fleet / Backup 1.4 §3 backup layer, 2.1 Edge 38 4 7 S 28 Raise Backblaze B2 bucket cap in the console (billing action); verify litestream replication is picking up where nightly snapshots fail. Section 4 — Architectural Conclusions The fragmented memory plane The architecture planned for mem0 as the System of Record for John's personal facts, with LightRAG as the document retrieval layer. What exists is five parallel stores — mem0/Qdrant (93K+ vectors, zero active writers), LightRAG (999 docs, healthy), HiveDB SQLite (17K rows, healthy), Chroma (6.5K embeddings, unknown origin, no active reader), and 123 .md files (the actual write target of Claude Code's native auto-memory). Each store evolved independently. The .md files won the write race by default — Claude Code writes them natively without any configuration. The lightrag-auto-ingest.sh hook then routes .md writes to LightRAG, making .md→LightRAG the de-facto pipeline. mem0 accumulated 865 facts in its setup phase and has received nothing since. Nobody documented this inversion as a decision. The result is a system where the architecture document says one thing, the code does another, and the divergence is invisible until an audit reveals it. There is no reconciliation daemon, no SoR designation in any machine-readable config, and no alert when the stores diverge. This is not a failure of implementation — it is a failure of architectural governance. The fix is to pick a winner, write it down, and wire everything else as a derivative. Capability without auto-invocation Three significant capabilities were built, tested, and deployed — and then left sitting idle because the trigger that would activate them was never wired. The verify-fix-loop skill is fully specified: it decomposes acceptance criteria into atomic claims, dispatches a verifier agent, optionally dispatches a fix-builder, loops up to three times, and escalates cleanly. It has a cost cap. It handles domain escalation policy. It works when a human types a trigger phrase. It has never been activated automatically. The same pattern holds for mem0 — the server is running, the Qdrant collections are populated, the API surface is correct, but no hook or daemon calls the write endpoint. The library skill is functional as a CLI but there is no daemon that proactively loads relevant cookbooks before task dispatch. This is an engineering pattern I recognize from large enterprise projects: the team builds the component, writes the spec, declares it done, and moves to the next feature. Integration — the wiring between components — is treated as an afterthought. In a distributed system, integration is the product. A verifier that nobody calls is not a verifier. It is documentation. The phantom infrastructure pattern The audit found four virtual companies (Axiom, Datavera, Resolver, Lexicon) with complete organizational infrastructure: persona directories, CLAUDE.md files, company.json, README, 5–9 internal agents each. None appear in specialist-mapping.json. There is no routing path from John's normal dispatch flow to any of them. Similarly, 35 chain YAML files define multi-step agent pipelines — and chain-runner.js ( /system/tools/chain-runner.js, MC #1902) and chain-runner.sh ( /system/tools/chain-runner.sh, Pillar #5) both exist as chain executors. However, (a) no active skill invokes them (skills call agents inline), (b) the three chain-related daemons that call chain-runner.sh all exit 1 due to downstream failures, and (c) chain-runner.js has no active caller in the current daemon or skill fleet. The chain YAML files are not dead because no executor exists — they are dead because the executors are broken or un-invoked. Five LaunchAgent plists reference scripts that were deleted at some point, leaving the daemons in permanent exit-127 loops. Two of them have KeepAlive.Crashed=true, meaning launchd restarts them on every crash, generating hundreds of failed process spawns per day. Phantom infrastructure has a cost: it consumes cognitive space during troubleshooting, generates false signals in health dashboards, and creates the illusion of capability that does not exist. The four phantom companies are particularly expensive because they imply John has routing coverage he does not have — if a task arrives that maps to Lexicon or Resolver capability, the system will not tell John it cannot route it. It will silently fall through. The dual-process dispatch pattern pi-orchestrator (PID 75750) is running. Its HTTP port 8401 refuses connections. The durable-runner bridge (port 3052) has been up for 20 days. These are two separate processes serving what should be one control plane. The kernel's own HTTP endpoint appears to have failed silently at some point, and the bridge was deployed as a workaround. No dispatch logs exist after 2026-03-19, which means either the system has not dispatched a task automatically in 50 days, or it is dispatching via a path not captured in the logs. The pi-orch-health script that would tell us was deleted on 2026-05-06 — the monitoring for the orchestrator is gone precisely when we need it most. The last recorded verdict from that monitor was CRITICAL. This dual-process split is not an architecture — it is an accident that has calcified into the operating model. What the audit reveals about John as AI Director John's CLAUDE.md presents a picture of a system where John delegates, monitors, and reports — while automation handles dispatch, verification, and completion. The audit reveals the actual operating model: John manually dispatches every specialist agent in the current conversation, manually verifies outputs (or asks the CEO to), and manually calls mc.js done. The automation layer exists as infrastructure but not as function. The 113 Mehanik cleared tokens in /tmp confirm John is disciplined about gate ceremonies — the ritual is present. But the outcome of those ceremonies (automated specialist dispatch via pi-orchestrator) is absent. What John actually does is closer to a senior engineer in a terminal window than an AI Director in an automated factory. This is not a criticism — it is a structural observation. The gap between the documented role and the operational reality is the gap between an architecture diagram and a working system. Closing that gap requires exactly three things: pi-orchestrator dispatch actually working, verify-fix-loop auto-invoked at task completion, and a clear SoR for memory. Everything else is incremental improvement. These three are the load-bearing walls. Section 5 — Output for Downstream 5.1 Hand-off to devils-advocate (Phase 4.2) The following gaps are strong findings in the audit but carry assumptions that need rebuttal-challenge before being formally confirmed in the fix backlog: Gap Rebuttal challenge needed pi-orchestrator not dispatching P3.1 (3.1 C2) found no mock config reference in the actual js file; config shows offlineMode: false . Is the lack of dispatch logs after 2026-03-19 because (a) dispatch actually stopped, (b) logs are written elsewhere, or (c) durable-runner is dispatching and pi-orch kernel is a passive watcher? The distinction matters for the fix: if dispatch moved entirely to durable-runner, "fix pi-orch" may be the wrong target. mem0 as SoR — is it intentional? The .md-first approach may be deliberate architecture, not drift. Claude Code's native auto-memory is a designed feature. The question is whether the team consciously decided "use .md + LightRAG as SoR, deprecate mem0" or whether mem0 was forgotten. If the former, Gap #4 is not a gap but a completed migration that was never documented. 35 dead chains Claim: all 35 chains are dead because no executor exists. Rebuttal: skills call agents inline — is this equivalent to executing a one-step chain? The chains may represent a future DAG execution model that was prototyped and deferred, not a failed deployment. If deferred intentionally, the gap is documentation, not a broken executor. 4 phantom companies Do Axiom, Datavera, Resolver have any work product? If they have been used via direct session invocation and are producing value, they are not phantom — they are informal. The rebuttal challenge: enumerate at least one real task that was dispatched to each company and assess whether the informal routing actually works. verify-fix-loop wiring P2.2 establishes that shell hooks cannot spawn conversational agents (architectural constraint). Before confirming the fix as "add to /task-postflight", validate that Task dispatch from within a skill conversation context actually works reliably for sub-agent spawning, or whether the pi-orch trigger-file pattern is required. 5.2 Fix backlog skeleton (Phase 4.3 — MC stubs, audit-level only) These are audit-derived fix proposals. No MCs are created here — these are stubs for Phase 4.3 to evaluate, scope, and assign. Stub ID Title Target system Priority Effort Dependencies FIX-01 Restore RAG drain-worker: fix Vaultwarden session + CF Access credentials Daemon fleet / RAG pipeline H S Vaultwarden accessible FIX-02 Diagnose pi-orchestrator HTTP port 8401 + restore real dispatch Orchestration kernel H L FIX-01 (credential pattern same) FIX-03 Wire verify-fix-loop into /task-postflight Section 2b Verifier / QA H M FIX-02 ideally (or manual trigger as interim) FIX-04 Designate SoR for memory plane; document the .md→LightRAG pipeline as canonical or wire mem0 Memory plane H M None FIX-05 Sync 8 definitions-only agents to ~/.claude/agents/; add validator/distiller/mehanik to specialist-mapping.json Agent fleet M S None FIX-06 Unload 5 dead-script plists; restore or archive cost-daily-report.sh and pi-orch-health.sh Daemon fleet M S None FIX-07 Enforce blueprint score gate at threshold 60 (not advisory 90); fix missing-MC-ID bypass BUILD-BLUEPRINT M S None FIX-08 Register 4 phantom companies in specialist-mapping.json or formally mark as experimental Agent fleet M M FIX-05 FIX-09 Delete or document Chroma orphan; archive stale mem0_john/knowledge collections Memory plane L S FIX-04 FIX-10 Raise B2 storage cap in Backblaze console + verify litestream live replication Backup / Infra M S None (billing action) FIX-11 Schedule agent-definitions-sync.sh as daily cron to prevent dual-store drift Agent fleet L S None FIX-12 Add blueprint staleness alert daemon: if modified > 30d and repo commits > 14d, surface warning BUILD-BLUEPRINT L S None Report produced by Petter Graff — CodeCraft Lead Architect Source reports: 1.1 (chip-huyen), 1.2 (sentinel-developer), 1.3 (sentinel-architect), 1.4 (kelsey-hightower), 2.1 (sentinel-architect synthesis), 2.2 (martin-kleppmann), 2.3 (sentinel-ba), 3.1 (sentinel-tester) P3.1 live-probe data used as authoritative override for contradicted P1/P2 claims. Devils Advocate 4.2 — Devil's Advocate Rebuttal AI Factory Audit — 2026-05-09 Role: Internal auditor. Challenge Petter Graff's top-10 gaps with counter-evidence before they become fix tasks. Audit Approach For each of Petter's top-10 gaps, I attempt to disprove or demote the claim by: Re-reading the source evidence critically Running fresh read-only probes to verify freshness Checking if the gap is "broken" vs "working as intended but mis-documented" Looking for hidden pathways that might make the gap moot Gap-by-Gap Rebuttal Gap #1: RAG drain-worker deadlock (Composite Score: 81) Restatement: rag-drain-worker is hung on Vaultwarden timeout; 454 items queued; queue drain completely blocked. Petter's evidence: P1.4 §3 (Kelsey): daemon exit 256 on com.alai.rag-drain-worker P3.1 H1: rag-drain.prom mtime 2026-04-23 (16d stale); queue depth 454 (last snapshot) 2.1 §B (Dead Edge 2): Vaultwarden ETIMEDOUT; CF Access creds missing Rebuttal attempt: The evidence is correct that the file is 16 days stale. However, three claims need separation: Is the queue truly 454 and frozen? The metric IS stale (2026-04-23), but that was BEFORE today's rag-drain-worker state change (today per HiveMind #64900). The actual queue depth is UNKNOWN. It could be 454, or it could be much smaller or empty. The claim "454 items queued" is based on stale data. Is drain-worker the actual blocker? P3.1 C2 confirms "durable-runner bridge (port 3052) IS live" with uptime 20 days. No dispatch logs post-2026-03-19. This could mean: durable-runner has been idle (no tasks to dispatch) since March, OR durable-runner IS dispatching but to a broken downstream (Ollama), not to LightRAG Is Vaultwarden the root cause? The drain-worker calls Vaultwarden to get CF Access credentials. But LightRAG itself IS healthy (P3.1 A2: curl localhost:9621/health → 200 healthy ). The wire is: drain-worker → Vaultwarden → CF Access token → LightRAG. The break is credential-fetch, not LightRAG. Counter-evidence found: HiveMind #64900 (2026-05-09 19:04): "com.alai.rag-drain-worker:running→down_exit_256" — the daemon state changed TODAY, but the metric file hasn't been updated. Metric file mtime: 2026-04-23 17:59 (stale by 16 days) LightRAG health: curl localhost:9621/health → healthy (confirmed P3.1 A2) Verdict: CONFIRMED Reasoning: The gap IS real (drain-worker is down and Vaultwarden creds are the blocker), but the metric is stale . The true queue depth is unknown; the 454 figure is a lower bound from 16 days ago. The fix (restore Vaultwarden session) is correct, but the problem may be worse OR better than stated. Re-probe queue depth as part of FIX-01. Gap #2: pi-orchestrator dispatch broken (Composite Score: 22.5) Restatement: pi-orchestrator process (PID 75750) is alive but HTTP port 8401 refuses connections; no dispatch logs post-2026-03-19; kernel in "mock mode" or operational void. Petter's evidence: P3.1 C1/C2: HTTP port 8401 dead; durable-runner bridge (port 3052) alive 20d; no dispatch logs post-03-19 2.1 §A (Dead Edge 1): "pi-orchestrator — MOCK MODE — consumes nothing" P1.4 §4: pi-orch-health script deleted; monitoring is blind Rebuttal attempt: Petter claims pi-orch is in "mock mode" — but the evidence for this is weak: P3.1 C2 says "no mock config reference found." I verified: grep "mock\|alai-config-mock" ~/system/kernel/pi-orchestrator.js → ZERO matches. But P3.1 also says config shows offlineMode: false and enabled: true . This contradicts "MOCK MODE." The real issue is HTTP port 8401 dead, not mock mode. The process is running. The HTTP server inside it is not listening. This is likely a startup gating condition (e.g., waiting for Ollama, waiting for a flag file, or initialization hung). NOT the same as mock mode. durable-runner bridge (port 3052) is the real dispatch layer. P3.1 confirms it's alive. The question is: IS IT PROCESSING TASKS? Petter says "dispatch activity unclear" but offers no probe. I checked: curl http://localhost:3052/status → 404 (no status endpoint) No task dispatch logs post-03-19 (confirmed) But durable-runner uptime = 20 days (stable) The durable-runner could be correctly idle if John is dispatching manually. If John is calling /mehanik and then manually invoking specialist agents (as Petter observes), then durable-runner sitting idle is NOT a bug — it's expected. The "mock mode" framing assumes pi-orch SHOULD be auto-dispatching. But maybe John's CLAUDE.md doesn't actually say that pi-orch is the ONLY dispatch path. Counter-evidence found: P3.1 C2: "Config: offline-mode=false but effectively not dispatching" — this is a reasonable observation, but "effectively not dispatching" could mean (a) HTTP server gating is broken, or (b) durable-runner is the real kernel and pi-orch HTTP is just a control plane that isn't needed for dispatch. Durable-runner healthy and stable (20d uptime) — suggests it's part of the design, not a workaround Verdict: CONFIRMED BUT MISDESCRIBED Reasoning: The gap IS real: pi-orchestrator's HTTP port does not respond and no automatic dispatch has occurred since March. However, the label "mock mode" is potentially wrong. The true issue is: is the HTTP port 8401 intentionally offline (working as designed with durable-runner as the real kernel), or is it broken initialization? The fix requires understanding WHICH path is canonical: If durable-runner IS the canonical dispatcher, then pi-orch HTTP being offline is irrelevant and the fix is to document this and verify durable-runner is actually processing tasks. If pi-orch HTTP SHOULD be online, then the fix is to diagnose the startup gating condition. Demote severity from 10→7 pending clarification of canonical dispatch path. Gap #3: Verifier loop unwired (Composite Score: 32) Restatement: verify-fix-loop skill exists and is internally correct; zero wiring to any automated trigger; CEO is de-facto verifier. Petter's evidence: P2.2 §2: Skill exists; zero matches for "verify-fix-loop" in pi-orchestrator.js or task-postflight SKILL.md 2.1 Dead Edge 3: "ADVERTISED: auto-invokes verifier. ACTUAL: ABSENT." P3.1 D1: Skill exists, manual-trigger only; "No daemon or hook auto-invokes it" Rebuttal attempt: This gap is valid but the fix assumes a requirement that may not exist: P2.2 is correct: verify-fix-loop is NOT auto-invoked. No hook, daemon, or pi-orch code calls it. But is auto-invocation required by design? Petter proposes: "Add Section 2b to /task-postflight SKILL.md: conditional dispatch of /verify-fix-loop for docs/system/refactor domains when Proveo PASS." The question: does CLAUDE.md or any architecture spec say that every task MUST be auto-verified by verify-fix-loop? Let me check the record: CLAUDE.md §Hard Constraint #4: "Builder cannot say done. mc.js ready -> Proveo verification -> done." This says Proveo verification is required, NOT verify-fix-loop. verify-fix-loop is a TOOL for atomic-claim verification, not a mandatory gate. Proveo (Angie Jones) IS the actual verified gate. P2.2 confirms task-postflight dispatches Proveo. So the design IS: Proveo AC-checklist → verdict. verify-fix-loop is an OPTIONAL improvement for self-correcting specs, not a replacement. The gap might be: "verify-fix-loop is never used because John doesn't know about it or doesn't trust it." That's a culture/training gap, not an architecture gap. Counter-evidence found: CLAUDE.md Hard Constraint #4 specifies Proveo as the verification gate, not verify-fix-loop task-postflight DOES dispatch Proveo (confirmed P2.2, line ~98) verify-fix-loop is a SKILL (optional improvement pattern), not a required gate Verdict: DISPUTED Reasoning: The gap is real in the sense that verify-fix-loop could provide value if auto-invoked. However, the framing is misleading. The REQUIRED verification gate (Proveo) IS wired and working. verify-fix-loop is an OPTIONAL enhancement for docs/system/refactor tasks. Adding it to /task-postflight is a good improvement but it's a feature enhancement, not a structural gap. Do not treat as a blocker. Gap #4: mem0 SoR wire break (Composite Score: 21) Restatement: mem0 is the intended SoR for John's personal facts; 865 facts in mem0_john; zero active writers via API; .md files are the actual write target. Petter's evidence: P1.1 §4: "There is no POST http://localhost:9000/add call anywhere in the active system" 2.1 §B (Dead Edge 24/25): mem0 → intended but unused; .md → actual Architecture assumes mem0 is SoR; reality is .md files Rebuttal attempt: This is the most subtle gap. The claim "mem0 is broken" assumes mem0 WAS EVER INTENDED AS THE SoR. But I cannot find evidence that CLAUDE.md or any spec designates mem0 as the SoR. Let me verify: CLAUDE.md does NOT mention mem0 or designate it as SoR. I searched: grep -i "mem0" ~/.claude/CLAUDE.md → 0 matches grep -i "memory.*SoR\|System of Record" ~/.claude/CLAUDE.md → 0 matches No memory architecture section in CLAUDE.md .md auto-memory is a Claude Code built-in feature. P1.1 §2 confirms: "Claude Code has a built-in auto-memory feature that writes conversation summaries and facts as .md files into ~/.claude/projects/-Users-makinja/memory/. This is NOT a hook or daemon — it is a built-in Claude Code behavior." The design might actually be: .md is the SoR by default (Claude Code native), and mem0 is a secondary/parallel store for future enhancement. P1.1 explicitly states that lightrag-auto-ingest.sh was written to route .md → LightRAG. This is the ACTUAL design, not a deviation from it. mem0 has 865 facts in mem0_john. These are STALE (last write during initial setup). But the question is: were these ever actively maintained? Or was mem0 a prototype that was never fully integrated? Counter-evidence found: CLAUDE.md has ZERO mention of mem0 as the SoR P1.1 §2: Claude Code auto-memory writes .md natively; this is intentional design, not a workaround lightrag-auto-ingest.sh was explicitly written to handle .md → LightRAG pipeline mem0 was likely prototyped but never wired into the active pipeline Verdict: DISMISSED Reasoning: The gap is a false positive. mem0 is not "broken" — it's intentionally deprioritized . The actual design is: Claude Code native .md auto-memory (SoR) → lightrag-auto-ingest.sh hook → LightRAG (searchable index). mem0 exists as infrastructure but was never designated the SoR in CLAUDE.md or any binding spec. The 865 facts are a relic from an earlier prototype. This is not a gap; it's a completed-but-undocumented design decision. FIX-04 should be reframed: "Document .md + LightRAG as canonical memory pipeline; archive or deprecate mem0" — NOT "wire mem0 back in." Gap #5: Agent routing table incomplete (Composite Score: 28) Restatement: validator (44 skill refs) and distiller (21 refs) absent from specialist-mapping.json; 7 mapped agents unreachable; 4 companies invisible to routing. Petter's evidence: P1.3 §A: validator and distiller have zero entries in specialist-mapping.json despite being referenced in skill files 2.1 §C: 44 phantom agents unroutable Both agents exist on disk (confirmed) Rebuttal attempt: This gap is PARTIALLY valid but the framing needs clarification: validator.md and distiller.md DO exist. I confirmed: ls ~/.claude/agents/{validator,distiller}.md . Both are real agents with content (8KB validator, 3.5KB distiller). Are they supposed to be in specialist-mapping.json? The map is supposed to route John's dispatch to the right company. But validator and distiller might be internal agents (helper agents, not dispatch-routable). Let me check if they are ever invoked: If they're only called FROM other agents (not FROM John), they don't need to be in the mapping. If they're called FROM John (or task-postflight), they need routing. Challenge: Is specialist-mapping.json intentionally minimal? I found: 12 personas with CLAUDE.md directories exist Only 10 are in specialist-mapping.json (missing: Axiom, Datavera, Resolver) This could be: (a) a gap in routing, OR (b) intentional — those 3 companies are experimental/informal The "phantom companies" claim: Axiom, Datavera, Resolver have full directory structure but zero entries in the map. Are they phantom? Or are they: Scheduled for later activation? Accessed via direct session invocation (informal)? Experimental features not yet routable? Counter-evidence found: validator.md and distiller.md exist and are real agents (confirmed with ls ) specialist-mapping.json explicitly states it's a routing map for discover.js flow If validator/distiller are internal (called from other agents), they don't need routing entries 4 company directories (Axiom, Datavera, Resolver, Lexicon) have full CLAUDE.md but limited/zero routing Verdict: CONFIRMED BUT UNDER-SPECIFIED Reasoning: The gap is real but the fix is incomplete. The root issue is: which agents and companies are SUPPOSED to be routable via John's normal dispatch flow? This requires a design decision: If validator/distiller are internal-only, no routing needed If they should be routable, add them If Axiom/Datavera/Resolver/Lexicon are experimental, mark them explicitly and document the direct-session access pattern Demote composite score from 28→18 because the fix depends on a prior design clarification, not just data entry. Gap #6: 5 deleted scripts with live plists (Composite Score: 35) Restatement: 5 LaunchAgent plists reference deleted scripts; daemons in exit-127 loops; infinite crash loops generating spam. Petter's evidence: P1.4 §2: Exit 127 entries for pi-orch-health, cost-daily-report, daily-planning, legal-docs-azure-sync, mcp-health-check P3.1 G4/G5: Scripts not found; mismatch between plist path and actual script Rebuttal attempt: This gap is straightforward and correct. Exit 127 (command not found) is definitive: the script is missing. However: Is this new or chronic? P1.4 shows these have been failing for unspecified time. The question is whether this is: Recent deletion (scripts legitimately removed, plists not cleaned up) Old chronic state (scripts deleted months ago, nobody noticed) This determines urgency. Are these critical? The names suggest: pi-orch-health: health monitoring (HIGH priority, Petter correctly identifies as crucial) cost-daily-report: financial tracking (M priority) daily-planning: planning assistance (M priority) legal-docs-azure-sync: legal document sync (M priority) mcp-health-check: MCP monitoring (L priority) But P1.4 lists these with KeepAlive=none, meaning they're scheduled but NOT auto-restarted. This reduces the spam concern. Counter-evidence found: Exit 127 is a hard fact: script missing KeepAlive=none (confirmed P1.4) means launchd does NOT crash-loop; it runs once, fails, and stops This is not generating "hundreds of failed process spawns per day" (Petter's claim) if KeepAlive is off Verdict: CONFIRMED Reasoning: The gap IS real: 5 critical monitoring scripts are missing. But the impact is lower than stated if KeepAlive is off (single failure, not loop). FIX-06 is correct (restore or unload), but don't treat as a high-frequency spam issue. The real impact is lost monitoring telemetry , not system strain. Gap #7: 4 phantom companies unroutable (Composite Score: 15) Restatement: Axiom, Datavera, Resolver, Lexicon have full persona dirs but zero entries in specialist-mapping.json; cannot be routed via discover.js. Petter's evidence: P1.3 §2: 4 companies have CLAUDE.md + agents but no routing 2.1 §C: "Cannot be routed via normal John → discover.js flow" Rebuttal attempt: This gap is partially disputed: Axiom, Datavera, Resolver, and Lexicon are all missing from specialist-mapping.json (confirmed). Live grep of specialist-mapping.json for "Lexicon" returns no output; P1.3 explicitly states Lexicon has zero mapped agents and that skillforge.md maps to "Skillforge" (a different name), not Lexicon. The framing "phantom infrastructure" assumes all 4 should be routable. But what if they're: Axiom: prototyped but not active Datavera: backend-only support (not user-facing) Resolver: special-purpose agent (incident response?) Lexicon: ALAI-backed, already routable Are they producing work? P2.1 asks: "Do Axiom, Datavera, Resolver have any work product?" I cannot find work products in the normal project trees, but they could be accessed via: Direct session invocation (informal routing) Internal-only tools (not exposed via discover.js) The actual gap might be documentation, not routing. If these companies exist and are used, they should be documented (marked experimental or mapped). If they're not used, they should be archived. Counter-evidence found: No grep results for work products in standard project structure, but this doesn't prove they're unused Missing routing could indicate incomplete configuration, not broken capability Verdict: CONFIRMED (4 phantom companies) Reasoning: The gap is real as originally claimed. All 4 companies (Axiom, Datavera, Resolver, Lexicon) are unroutable via specialist-mapping.json. The fix is to either: Add Axiom/Datavera/Resolver/Lexicon to specialist-mapping.json if they're active Mark them as experimental and document direct-session access Archive them if unused Gap #8: Blueprint score gate advisory-only (Composite Score: 30) Restatement: Mehanik gate checks blueprint score; threshold claimed as 90; but WARN scores (65, 80) allow dispatch; threshold is advisory, not enforced. Petter's evidence: P2.3 §2: WARN scores allow dispatch; 90-point threshold is advisory Pre-dispatch-gate.sh allows tasks through with WARN missing-MC-ID path bypasses gate entirely Rebuttal attempt: This is a valid gate gap. WARN scores should not bypass a hard gate. However: Is the 90-point threshold the INTENDED threshold, or is 65 the designed floor? P2.3 found that observed practice allows 65+ (WARN range). This could mean: The gate is broken (should be 90, but isn't) The gate is correct and 90 was aspirational documentation The missing-MC-ID path is real and worth fixing. That's a clear bypass. Counter-evidence found: None significant. This gap appears valid. Verdict: CONFIRMED Reasoning: The gate has two issues: WARN scores (65–80) allow dispatch when the spec says 90 is the floor missing-MC-ID path bypasses entirely These are real structural gaps. FIX-07 is correct. Gap #9: Chroma and stale mem0 orphan stores (Composite Score: 15) Restatement: Chroma (6.5K embeddings, no active reader/writer); mem0_john/knowledge (31K+ stale vectors) are cognitive clutter. Petter's evidence: P1.1 §3: Chroma origin unknown; no identified reader P3.1 A4: Chroma port 8000 not listening; no chroma process found Rebuttal attempt: This gap is valid. Both stores are orphaned. However: Chroma might be a historical artifact. P3.1 A4 confirms "chroma-mcp listed in settings.json but no running service." This suggests it was deprioritized, not actively deleted. mem0 stale vectors: 865 facts in mem0_john are stale by design (as I determined in Gap #4). If .md + LightRAG is the canonical SoR, then mem0_john is intentionally not updated. Counter-evidence found: No technical counterpoint. This gap is valid. Verdict: CONFIRMED Reasoning: Both Chroma and mem0 orphan vectors are cognitive clutter. The fix (audit origin, delete if unused, archive if valuable) is appropriate. However, this is a LOW-severity cleanup task, not a system blocker. Composite score of 15 is appropriate. Gap #10: B2 storage cap exceeded (Composite Score: 28) Restatement: B2 bucket approaching cap; litestream replication may be failing; billing action required. Petter's evidence: P1.4 §3: B2 backup layer near cap P2.1 Edge 38: Backblaze B2 cap exceeded Rebuttal attempt: No meaningful rebuttal. This is a straightforward billing/ops issue. The fix (raise cap or review replication) is correct. Not an architecture problem. Verdict: CONFIRMED Reasoning: Valid gap. Low-priority ops action. Additional Challenges to Petter's Findings Challenge: HiveMind "read API does not exist" (P1 claim) P1 (1.1-memory-plane.md) claimed: "No tool reads localhost:9000 for queries. discover.js does NOT query mem0." But P1 didn't check HiveMind's OWN read API. I verified: node ~/system/agents/hivemind/hivemind.js query "ALAI" → === SEARCH: "ALAI" (20 results) === [8 live results with today's timestamps] Finding: HiveMind read API EXISTS and works. This is a P1 error that Petter correctly caught in Section 4 surprises. But it means the memory plane is HEALTHIER than the top-10 summary suggests. The "no read API" claim was wrong. Challenge: RAG queue metric freshness The 454 figure in Gap #1 is based on a file mtime of 2026-04-23 — 16 days old. The rag-drain-worker exit state changed TODAY (2026-05-09 19:04). Finding: The queue depth is UNKNOWN. It could be 454, or 10, or 1000. Petter should have flagged this metric staleness as a separate issue: "FIX-00: implement live queue depth monitoring." Challenge: Canonical dispatch path ambiguity Petter claims pi-orch is "broken" and "in mock mode," but: pi-orch HTTP (port 8401) is dead durable-runner bridge (port 3052) is alive No recent dispatch logs (since March) Finding: The system design is AMBIGUOUS. Is durable-runner the canonical dispatcher (and pi-orch HTTP is a dead control plane)? Or is pi-orch HTTP supposed to be the dispatcher (and the deadness is a regression)? This ambiguity makes it impossible to know whether "fix pi-orch" or "verify durable-runner dispatch" is correct. Summary Table Gap # Petter's Title Verdict Composite Notes 1 RAG drain-worker deadlock CONFIRMED 81 → 81 Real, but metric is 16d stale. Queue depth unknown. 2 pi-orchestrator dispatch broken CONFIRMED BUT MISDESCRIBED 22.5 → 18 HTTP port dead is real; "mock mode" label is questionable. Need canonical dispatch path clarification. 3 Verifier loop unwired DISPUTED 32 → 16 Proveo (required gate) IS wired. verify-fix-loop is optional enhancement. Not a structural gap. 4 mem0 SoR wire break DISMISSED 21 → 0 False positive. .md + LightRAG is the INTENDED design; mem0 was never designated SoR in CLAUDE.md. 5 Agent routing incomplete CONFIRMED BUT UNDER-SPECIFIED 28 → 18 Real gap, but requires design decision first: which agents should be routable? 6 5 deleted scripts / exit-127 CONFIRMED 35 → 35 Real gap. But impact lower than stated if KeepAlive=none (no crash loops). 7 4 phantom companies CONFIRMED 15 → 15 All 4 (Axiom, Datavera, Resolver, Lexicon) unroutable via specialist-mapping.json. 8 Blueprint score gate CONFIRMED 30 → 30 Real structural issue. WARN scores should not bypass hard gate. 9 Chroma/mem0 orphans CONFIRMED 15 → 15 Valid cleanup task. Low priority. 10 B2 storage cap CONFIRMED 28 → 28 Straightforward ops task. Surviving Gaps (Re-ranked) # Gap New Score Priority Fix 1 RAG drain-worker + Vaultwarden auth 81 H FIX-01: Restore Vaultwarden session; re-measure queue depth live. 2 pi-orchestrator HTTP port dead OR canonical dispatch ambiguity 18 H FIX-02A (if pi-orch is canonical): Diagnose HTTP startup gate. FIX-02B (if durable-runner is canonical): Document + verify dispatch activity. 6 5 deleted monitoring scripts 35 M FIX-06: Restore or unload. Re-enable pi-orch-health (critical). 8 Blueprint score gate WARN bypass 30 M FIX-07: Lower threshold to 60 or escalate WARN to BLOCK. 5 Agent routing ambiguity 18 M FIX-05: Design decision first: which agents routable? Then update specialist-mapping.json. 7 4 phantom companies (Axiom/Datavera/Resolver/Lexicon) 15 L FIX-08: Add to mapping OR mark experimental + document direct access. 9 Chroma/mem0 orphans 15 L FIX-09: Audit, delete, or archive. 10 B2 storage cap 28 M FIX-10: Ops task (raise cap, verify replication). Gaps DISMISSED (Corrected or False Positives) Gap Reason Action mem0 SoR wire break (was Gap #4) False positive. .md + LightRAG is the INTENDED design; mem0 was never designated SoR in CLAUDE.md. DO NOT FIX. Document that .md is canonical. Archive or deprecate mem0. verify-fix-loop "unwired" (was Gap #3, downgraded to feature request) Proveo (required gate) IS wired. verify-fix-loop is optional enhancement, not mandatory automation. DO NOT TREAT AS BLOCKER. Adding to /task-postflight is a feature improvement, not a gap fix. NEW Gaps Exposed by Rebuttal New Gap A: Monitoring Blind Spots (Severity: M) Issue: pi-orch-health script was deleted (P1.4 confirms exit 127). This was the script that would tell us whether pi-orchestrator is in CRITICAL or HEALTHY state. The last report was CRITICAL (2026-05-06). We are now flying blind on the orchestrator's health. Fix: Restore pi-orch-health.sh or create a replacement daemon that probes pi-orch's actual state (HTTP port 8401, durable-runner dispatch logs, MC task completion rate) and surfaces alerts. Composite: 6/10 leverage × 8/10 severity ÷ 2 (M effort) = 24 New Gap B: Canonical Dispatch Path Undefined (Severity: H) Issue: Two potential dispatch layers exist: pi-orchestrator HTTP (port 8401) — dead durable-runner bridge (port 3052) — alive, purpose unclear No architectural document clarifies which is canonical or whether the system is designed to have both. This ambiguity blocks debugging and prevents correct fixes. Fix: Kernel owners (Petter or architect) must create a design doc: "Is durable-runner the canonical dispatcher? Is pi-orch HTTP a legacy control plane? Should one be decommissioned?" Composite: 8/10 leverage × 9/10 severity ÷ 4 (L effort, design-only) = 18 New Gap C: Queue Depth Monitoring Metric Stale (Severity: M) Issue: rag-drain.prom has mtime 2026-04-23 (16d stale). The queue depth metric (454) is from that snapshot. Today, rag-drain-worker exited. We don't know if the queue is empty or 10,000 items deep. Fix: Implement live queue depth reporting. The drain-worker or a monitoring daemon should publish current queue depth to ~/system/state/rag-drain-live.json (updated every 5min or on state change). Composite: 5/10 leverage × 7/10 severity ÷ 2 (M effort) = 17.5 What the Auditors Got Wrong (Summary) Petter's audit is 75% correct and extremely valuable. The following aspects were over-stated or mis-labeled: mem0 "wire break" : Not a break. It's a completed-but-undocumented design migration from mem0-centric (planned) to .md-centric (actual). "pi-orchestrator mock mode" : The label is uncertain. The real issue is HTTP port 8401 is dead. Whether this is by design (durable-runner is canonical) or a regression (initialization broken) is unclear and must be determined before fixing. "Verifier loop unwired" : Framig is misleading. The REQUIRED verifier (Proveo) IS wired. verify-fix-loop is an OPTIONAL improvement. Treating it as a blocker overstates the gap. "4 phantom companies" : Petter's count of 4 is correct. All 4 (Axiom, Datavera, Resolver, Lexicon) are absent from specialist-mapping.json. And "phantom" is stronger than "unroutable" — the companies exist and could be accessed directly. The gap is routing documentation, not missing infrastructure. "RAG queue: 454 items" : Metric is 16d stale. True queue depth is unknown. Petter should have flagged this metric staleness separately. "5 deleted scripts = infinite crash loops" : Exit 127 is real, but if KeepAlive=none, there's no crash loop — just a one-time failure per schedule. Impact is loss of monitoring, not system strain. Overall: Petter correctly identified structural issues (RAG drain, pi-orch HTTP dead, verifier not auto-wired, deleted scripts, blueprint score bypass). The framing and severity rankings need refinement, but the core findings are sound. The audit is fit-for-purpose as a diagnostic report, but should not be used as-is for a fix backlog — design clarifications are needed first for Gaps #2, #4, #5. Auditor: AI Factory Devils Advocate Date: 2026-05-09 21:22 UTC Confidence: Rebuttal validated against live probes and source documents. Fix Backlog 4.3 — Prioritized Fix Backlog (MC-Stub List) AI Factory Audit — 2026-05-09 Author: Petter Graff (CodeCraft Lead Architect) Source: 4.1-petter-synthesis.md + 4.2-devils-advocate.md Status: AUDIT-LEVEL ONLY — no MCs created in live system. CEO selects from this list. Section 1 — Prioritized MC-Stub List Composite = Leverage (1–10) × Severity (1–10) ÷ Effort (S=1, M=2, L=4) Devils-advocate score adjustments applied. Final ordering is post-rebuttal. MC-STUB-01: Restore RAG drain-worker — fix Vaultwarden session + CF Access credentials Subsystem: Daemon fleet / RAG ingest pipeline Owner-company: FlowForge Priority: H Composite (Leverage × Severity / Effort): 81 (9 × 9 / 1) Effort: S (≤2h) Cost (token + CEO-action time): ~$0.20 tokens / 5 min CEO (approve billing session if needed) Acceptance criteria (machine-checkable): cat /tmp/bw-session exits 0 and returns a non-empty string curl -s http://localhost:9621/health returns {"status":"healthy"} (LightRAG reachable) launchctl list | grep rag-drain-worker shows LastExitStatus = 0 within 15 min of fix stat ~/system/state/rag-drain.prom shows mtime within last 10 min (metric is live) Live queue depth is written to ~/system/state/rag-drain-live.json (new artifact — see MC-STUB-03) Evidence path: 4.1 §3 Gap #1, 4.2 Gap #1 (CONFIRMED), P3.1 H1, P1.4 §3 Why now / Why this owner: This single credential fix unblocks 3 adapters simultaneously and drains 3,150+ queued items (live SQLite count 2026-05-09; stale prom snapshot showed 454 as of 2026-04-23). FlowForge owns daemon lifecycle and credentials management. BlockedBy: None MC-STUB-02: Resolve canonical dispatch path — pi-orch HTTP vs durable-runner Subsystem: Orchestration kernel Owner-company: CodeCraft Priority: H Composite (Leverage × Severity / Effort): 18 (8 × 9 / 4) — design work, L effort Effort: L (≤2d — includes live probes + decision doc + architectural note) Cost (token + CEO-action time): ~$1.50 tokens / 20 min CEO (one architectural decision required) Acceptance criteria (machine-checkable): A file ~/system/specs/dispatch-path-canonical.md exists with mtime today The file explicitly states which of {pi-orch HTTP port 8401 | durable-runner port 3052} is the canonical dispatch layer If pi-orch HTTP is canonical: curl -s http://localhost:8401/health returns HTTP 200 after fix If durable-runner is canonical: grep -c "dispatched" ~/system/logs/durable-runner.log shows at least 1 entry with today's date within 24h of fix No dispatch logs older than 2026-04-01 are the NEWEST entry (proves dispatch is current) Evidence path: 4.1 §4 (dual-process dispatch pattern), 4.2 Gap #2 (CONFIRMED BUT MISDESCRIBED), 4.2 New Gap B Why now / Why this owner: Every other orchestration fix is blocked on knowing which process is authoritative. CodeCraft holds kernel architecture; the decision requires architectural judgment, not just ops execution. BlockedBy: None (this IS the unblocking action for MC-STUB-05) MC-STUB-03: Implement live RAG queue depth monitoring Subsystem: Daemon fleet / Observability Owner-company: FlowForge Priority: H Composite (Leverage × Severity / Effort): 17.5 (5 × 7 / 2) Effort: M (≤8h) Cost (token + CEO-action time): ~$0.30 tokens / 0 min CEO (no decision needed) Acceptance criteria (machine-checkable): ~/system/state/rag-drain-live.json exists and contains queue_depth key mtime of that file is within 5 min of any check launchctl list | grep rag-queue-monitor shows LastExitStatus = 0 HiveMind receives an alert if queue_depth exceeds 100 (verify via node ~/system/agents/hivemind/hivemind.js query "rag queue" showing a row within last 1h) Evidence path: 4.2 New Gap C — 454-item figure was a 16d-stale metric; true queue depth unknown when rag-drain-worker crashed today Why now / Why this owner: Without live queue depth, every future RAG incident assessment will rely on stale file mtimes. FlowForge owns the monitoring daemon pattern. BlockedBy: MC-STUB-01 (drain-worker must be restored first; queue depth metric is only meaningful when writer is live) MC-STUB-04: Restore or unload 5 deleted-script daemon plists Subsystem: Daemon fleet / Monitoring Owner-company: FlowForge Priority: M (pi-orch-health sub-task is H) Composite (Leverage × Severity / Effort): 35 (5 × 7 / 1) Effort: S (≤2h) Cost (token + CEO-action time): ~$0.15 tokens / 0 min CEO Acceptance criteria (machine-checkable): launchctl list | grep -E "pi-orch-health|cost-daily-report|daily-planning|legal-docs-azure-sync|mcp-health-check" shows ZERO entries (unloaded) OR shows LastExitStatus = 0 (restored) ls ~/system/daemons/pi-orch-health.sh exits 0 if restored; if unloaded, plist file is absent from ~/Library/LaunchAgents/ Zero exit-127 entries for these 5 daemon names in launchctl list within 24h of fix If pi-orch-health is restored: it writes a report to ~/system/state/pi-orch-health-latest.json with mtime within last 1h Evidence path: 4.1 §3 Gap #6, 4.2 Gap #6 (CONFIRMED), P1.4 §2, P3.1 G4/G5 Why now / Why this owner: pi-orch-health.sh was the last known diagnostic for orchestrator state; it was deleted on 2026-05-06 when the last recorded status was CRITICAL. Blind monitoring of the primary kernel is not acceptable. FlowForge owns daemon lifecycle. BlockedBy: MC-STUB-02 (pi-orch-health.sh restoration requires knowing which health signal to probe — depends on canonical dispatch decision) MC-STUB-05: Enforce blueprint score gate — eliminate WARN bypass and missing-MC-ID hole Subsystem: BUILD-BLUEPRINT discipline / Mehanik gate Owner-company: CodeCraft Priority: M Composite (Leverage × Severity / Effort): 30 (6 × 5 / 1) Effort: S (≤2h) Cost (token + CEO-action time): ~$0.10 tokens / 5 min CEO (score floor decision: 60 or 90?) Acceptance criteria (machine-checkable): grep -n "WARN\|warn" ~/system/hooks/pre-dispatch-gate.sh shows no bypass path that allows WARN to proceed without explicit CEO override token A test run with a blueprint scoring 65 exits gate with non-zero exit code (BLOCKED) A run without MC-ID also exits gate with non-zero exit code (BLOCKED) grep "SCORE_FLOOR" ~/system/hooks/pre-dispatch-gate.sh returns a numeric value (60 or 90, per CEO decision) Evidence path: 4.1 §3 Gap #8, 4.2 Gap #8 (CONFIRMED), P2.3 §2 Why now / Why this owner: A gate that emits warnings but allows dispatch is theater. The CEO's Mehanik enforcement ceremony is trusted — the underlying gate code must match the ceremony's intent. CodeCraft owns the gate scripting. BlockedBy: CEO decision on score floor value (see Section 4) MC-STUB-06: Design decision + routing update for agent fleet coverage Subsystem: Agent fleet / Routing Owner-company: CodeCraft (design) + Resolver (if Resolver is activated) Priority: M Composite (Leverage × Severity / Effort): 18 (7 × 5 / 2) — post-rebuttal adjusted Effort: M (≤8h — requires design decision first, then data entry) Cost (token + CEO-action time): ~$0.40 tokens / 15 min CEO (routing policy decisions) Acceptance criteria (machine-checkable): A file ~/system/specs/agent-routing-policy.md exists defining: which agents are routable via discover.js vs internal-only vs experimental node ~/system/tools/discover.js routing "validate acceptance criteria" returns a non-empty company/agent result node ~/system/tools/discover.js routing "distill text" returns a non-empty company/agent result grep -c '"company"' ~/system/agents/specialist-mapping.json is >= the previous count + however many new entries are added (verifiable by diff) Evidence path: 4.1 §3 Gap #5, 4.2 Gap #5 (CONFIRMED BUT UNDER-SPECIFIED) Why now / Why this owner: validator (44 skill references) and distiller (21 references) are the most-cited agents without routing entries. Silent dispatch failures are guaranteed when John tries to route tasks that map to these agents. Design decision first, then data entry. BlockedBy: CEO decision on routing policy scope (see Section 4); MC-STUB-02 for overall dispatch health MC-STUB-07: Register or formally archive Axiom / Datavera / Resolver companies Subsystem: Agent fleet / Routing Owner-company: CodeCraft Priority: L Composite (Leverage × Severity / Effort): 10 (5 × 4 / 2) Effort: M (≤4h — inventory work products, then register or archive) Cost (token + CEO-action time): ~$0.20 tokens / 5 min CEO Acceptance criteria (machine-checkable): Each of Axiom, Datavera, Resolver, Lexicon appears EITHER in specialist-mapping.json (if active) OR has a STATUS: experimental or STATUS: archived entry in their company.json file node ~/system/tools/discover.js routing "axiom" returns a result or a clear "experimental — contact via direct session" message No company directory under ~/system/agents/personas/ has an unresolved routing status (every dir has an explicit status flag) Evidence path: 4.1 §3 Gap #7, 4.2 Gap #7 (CONFIRMED — all 4 unroutable: Axiom, Datavera, Resolver, Lexicon; Lexicon is absent from specialist-mapping.json) Why now / Why this owner: Silent routing fallthrough is a user-experience failure. When a task arrives that maps to Resolver or Lexicon capability, John will receive no routing error — the task will silently fall to the wrong handler. Four companies is a manageable cleanup. BlockedBy: MC-STUB-06 (routing policy decision must precede adding more entries) MC-STUB-08: Restore pi-orchestrator dispatch to operational status Subsystem: Orchestration kernel Owner-company: CodeCraft Priority: H (blocked — becomes H after MC-STUB-02 resolves) Composite (Leverage × Severity / Effort): 22.5 (10 × 9 / 4) — Petter's original; blocked on design decision Effort: L (≤2d) Cost (token + CEO-action time): ~$2.00 tokens / 30 min CEO (architecture + approval of restored config) Acceptance criteria (machine-checkable): If pi-orch HTTP is the canonical path: curl -s http://localhost:8401/health returns HTTP 200 If durable-runner is canonical: node ~/system/tools/mc.js list --status ready --limit 1 followed by 5 min wait shows the task state has changed (dispatched or assigned) without manual John intervention Dispatch log file exists and has an entry with today's date: grep "$(date +%Y-%m-%d)" ~/system/logs/pi-orchestrator.log | tail -1 No task with status "ready" sits unprocessed for more than 30 min in an idle queue (monitored via cron probe) Evidence path: 4.1 §3 Gap #2, 4.1 §4 (dual-process dispatch pattern), 4.2 Gap #2 (CONFIRMED BUT MISDESCRIBED) Why now / Why this owner: pi-orchestrator is the load-bearing wall of the factory. Without it dispatching automatically, John IS the factory. This is the gap that converts the system from manual radionica to automated pipeline. CodeCraft owns kernel architecture. BlockedBy: MC-STUB-02 (canonical dispatch path must be defined before this can be correctly fixed) MC-STUB-09: Audit and archive Chroma + stale mem0 orphan collections Subsystem: Memory plane / Cleanup Owner-company: CodeCraft Priority: L Composite (Leverage × Severity / Effort): 15 (3 × 5 / 1) Effort: S (≤2h) Cost (token + CEO-action time): ~$0.10 tokens / 0 min CEO Acceptance criteria (machine-checkable): curl -s http://localhost:8000/api/v1/collections either returns a list with a documented owner for each collection, or returns connection refused (service confirmed decommissioned) If Chroma is decommissioned: its entry is removed from ~/.claude/settings.json MCP server list curl -s http://localhost:9000/v1/memories/?user_id=john returns either 0 results or a documented "archived" state A ~/system/specs/memory-plane-canonical.md file exists documenting the final memory topology: .md as SoR, LightRAG as searchable index, mem0/Chroma status (deprecated/experimental) Evidence path: 4.1 §3 Gap #9, 4.2 Gap #9 (CONFIRMED), 4.2 Gap #4 (DISMISSED — mem0 was never SoR; this cleanup is the correct response) Why now / Why this owner: Cognitive overhead from orphaned stores creates false recovery paths during incidents. The decommission is straightforward. The documentation artifact (memory-plane-canonical.md) satisfies the dismissed Gap #4 reframing. BlockedBy: None (can run in parallel with any Wave A task) MC-STUB-10: Raise B2 storage cap and verify litestream replication health Subsystem: Backup / Infra Owner-company: FlowForge Priority: M Composite (Leverage × Severity / Effort): 28 (4 × 7 / 1) Effort: S (≤2h — primarily a billing console action) Cost (token + CEO-action time): ~$0.05 tokens / 10 min CEO (billing console access) Acceptance criteria (machine-checkable): curl -s -H "Authorization: applicationKey:..." https://api.backblazeb2.com/b2api/v2/b2_get_bucket_info returns storageCapacity > current used value (cap raised) launchctl list | grep litestream shows LastExitStatus = 0 A litestream replication log entry exists from the last 24h: grep "$(date +%Y-%m-%d)" ~/system/logs/litestream.log | tail -1 Nightly snapshot script exits 0: check ~/system/state/backup-status.json shows last_success within 24h Evidence path: 4.1 §3 Gap #10, 4.2 Gap #10 (CONFIRMED), P1.4 §3, P2.1 Edge 38 Why now / Why this owner: A capped backup bucket means data loss risk grows each day until raised. The fix is a billing action — no code required. FlowForge owns infra/backup. BlockedBy: None; requires CEO credentials for Backblaze console MC-STUB-11: Document .md + LightRAG as canonical memory pipeline (doc-only) Subsystem: Memory plane / Documentation Owner-company: Skillforge Priority: L Composite (Leverage × Severity / Effort): 8 (4 × 4 / 2) Effort: M (≤4h — research + write + BookStack publish) Cost (token + CEO-action time): ~$0.30 tokens / 5 min CEO (approve publish) Acceptance criteria (machine-checkable): ~/system/specs/memory-plane-canonical.md exists (may be produced by MC-STUB-09 instead — share artifact if so) CLAUDE.md "auto memory" section contains phrase " .md is canonical " or equivalent explicit statement BookStack page exists under the Infrastructure book for "Memory Plane Architecture" — curl -s https://docs.alai.no/books/infrastructure | grep -i "memory" returns a hit mem0 status is documented as "sandbox/experimental" in the spec (not "active SoR") Evidence path: 4.2 Gap #4 (DISMISSED — but reframed as doc task, not fix task); 4.2 Gap #4 recommendation: "Document .md is canonical" Why now / Why this owner: The dismissed Gap #4 still requires a documentation response. Without an authoritative statement, the next engineer touching the system will re-investigate and potentially re-introduce mem0 wiring. Skillforge produces technical documentation. BlockedBy: MC-STUB-09 (confirm Chroma/mem0 decommission state before documenting the final topology) MC-STUB-12: Wire verify-fix-loop as optional /task-postflight enhancement (Wave C) Subsystem: Verifier / QA skill Owner-company: Proveo Priority: L Composite (Leverage × Severity / Effort): 16 (8 × 4 / 2) — post-rebuttal, demoted from H Effort: M (≤8h) Cost (token + CEO-action time): ~$0.40 tokens / 0 min CEO Acceptance criteria (machine-checkable): grep -n "verify-fix-loop" ~/system/agents/skills/task-postflight/SKILL.md returns at least 1 match (Section 2b exists) The section has a conditional trigger: domain IN {docs, system, refactor} AND Proveo PASS A dry-run of /task-postflight on a docs-domain MC shows verify-fix-loop invoked (not just Proveo) verify-fix-loop invocation does NOT replace Proveo (both must appear in the postflight log) Evidence path: 4.1 §3 Gap #3, 4.2 Gap #3 (DISPUTED — demoted; Proveo IS the required gate; this is an enhancement) Why now / Why this owner: verify-fix-loop is a fully built capability sitting idle. Wiring it as a conditional enhancement (not a required gate) improves self-correction for low-risk domains. Proveo owns the verification pipeline. BlockedBy: MC-STUB-08 (pi-orchestrator must be dispatching for auto-invocation to work reliably; in the interim, a manual invocation pattern is acceptable) Section 2 — Sequencing Graph Wave A — Immediate, S effort, high leverage (ship first) These are unblocked today. Combined effort: ~6h. No CEO decisions needed to START. MC-STUB-01 (RAG drain-worker credential fix) | +---> MC-STUB-03 (Live queue depth monitor) [depends on 01 being live] MC-STUB-04 (Restore 5 dead-script plists) [sub-task: pi-orch-health blocked on STUB-02] MC-STUB-09 (Chroma/mem0 orphan audit) [parallel, no deps] MC-STUB-10 (B2 storage cap raise) [parallel, no deps — billing action] Wave A ships: 01, 03, 09, 10 (immediately); 04 partially (4 of 5 plists — pi-orch-health blocked on STUB-02). Wave B — After Wave A + CEO decisions These depend on an architectural decision or on Wave A completing. MC-STUB-02 (Canonical dispatch path decision) | +---> MC-STUB-04 [remainder: pi-orch-health script restoration] | +---> MC-STUB-08 (Restore pi-orchestrator dispatch — actual kernel fix) | | | +---> MC-STUB-12 (wire verify-fix-loop — optional enhancement, needs dispatch working) | +---> MC-STUB-06 (Routing policy decision + specialist-mapping update) | +---> MC-STUB-07 (Register Axiom/Datavera/Resolver or archive them) MC-STUB-05 (Blueprint score gate enforce) [needs CEO score floor decision — otherwise ship at 60] CEO decision trigger: before MC-STUB-02 can produce a useful output, the CEO must make one call (see Section 4 item #1). Wave C — Cleanup / hygiene (non-urgent) No blocking dependencies. Run when bandwidth allows. MC-STUB-09 --> MC-STUB-11 (memory-plane doc — safe to write after Chroma state is known) MC-STUB-12 [verify-fix-loop wiring — Wave C because Wave B must stabilize dispatch first] Full DAG (text form) [NOW] STUB-01 (RAG creds) ─────────────────────> STUB-03 (queue monitor) STUB-04 partial (4 plists) STUB-09 (Chroma/mem0 audit) ──────────────> STUB-11 (memory doc) STUB-10 (B2 billing) [CEO DECISION on dispatch path] STUB-02 (canonical dispatch decision) ├──> STUB-04 remainder (pi-orch-health) ├──> STUB-08 (pi-orch restore) ──────────> STUB-12 (verify-fix-loop wire) └──> STUB-06 (routing policy) ──────────> STUB-07 (3 phantom companies) [CEO DECISION on score floor] STUB-05 (blueprint gate enforce) Section 3 — Out of Backlog (and Why) DISMISSED gaps — not a fix mem0 SoR wire break (original Gap #4): Not a break. .md + LightRAG is the actual working design — Claude Code writes .md natively; lightrag-auto-ingest.sh routes .md writes to LightRAG. mem0 was a prototype that was never wired into the active pipeline. CLAUDE.md has zero mention of mem0 as SoR. The correct response is NOT to wire mem0 back — it is to document the actual design (see MC-STUB-11, a documentation-only stub). verify-fix-loop "unwired" structural gap (original Gap #3): Framing was misleading. CLAUDE.md Hard Constraint #4 requires Proveo verification — and Proveo IS wired and called by /task-postflight. verify-fix-loop is an optional enhancement for docs/system/refactor domains, not the required gate. Adding it is a feature improvement (see MC-STUB-12, demoted to Wave C), not a structural fix. DEMOTED gaps — lighter scope than original claim 4 phantom companies (original Gap #7 — scope confirmed at 4, not demoted): All 4 companies (Axiom, Datavera, Resolver, Lexicon) are absent from specialist-mapping.json. None are phantom in the sense of missing directories — all have full persona directories — but none are routable via the normal John → discover.js flow. The fix is: inventory work products, then register OR mark as experimental. Addressed in MC-STUB-07 at L priority (documentation + optional routing). Verifier loop (original Gap #3 — demoted from H to L): Retained as MC-STUB-12 but explicitly classified Wave C, marked as optional enhancement not structural fix. Proveo is the real gate and it is working. Section 4 — CEO Decision Items These are blocking decisions that no engineer can make unilaterally. They gate specific MCs. Decision 1 (CRITICAL — gates MC-STUB-02, 04, 08): Canonical dispatch path The question: Is durable-runner (port 3052, 20d uptime, stable) the canonical dispatch layer — with pi-orchestrator HTTP (port 8401, dead) being an old control plane that can be decommissioned? OR is pi-orchestrator HTTP supposed to be online, and its deadness is a regression that must be fixed? Why only CEO can decide: This is an architectural fork. If durable-runner is canonical, FIX is: document it, verify it's processing tasks, and decommission the old HTTP endpoint. If pi-orch HTTP is canonical, FIX is: diagnose startup gating (likely an initialization hang on Ollama or a flag file), restore it, and ensure durable-runner is correctly subordinate. Options: A. durable-runner is canonical dispatcher. pi-orch HTTP is legacy. Document this, decommission port 8401. B. pi-orch HTTP is canonical. Diagnose and restore it. durable-runner is subordinate. C. Both should be operational. Hybrid model (requires Petter to specify the interaction model). Decision 2 (M — gates MC-STUB-05): Blueprint score gate floor The question: What is the enforced minimum score for dispatching a task through Mehanik gate? Context: Observed practice allows dispatch at score 65 (WARN range). Original spec says 90 is the floor. The gate code currently treats WARN as pass-through. The correct floor must be chosen and hardcoded. Options: A. Lower floor to 60 — match observed practice; WARN is acceptable. B. Floor stays at 90 — WARN becomes BLOCK; blueprints must be updated to score higher. C. Introduce tiered floors: 60 for L tasks, 75 for M, 90 for H+. Decision 3 (M — gates MC-STUB-06, 07): Specialist-mapping.json scope policy The question: Should specialist-mapping.json be comprehensive (cover all 66 agents, all 12 companies) — or curated (cover only primary dispatch paths, leaving internal/helper agents out)? Why it matters: validator and distiller have 44 and 21 skill references respectively, but may be internal-only agents (called from other agents, not from John). If they're internal-only, they must NOT be in the routing table — they should be in the agent definition files only. If they ARE routable by John, they must be added. Options: A. Curated: only John-dispatchable agents enter the routing table. Internal agents documented separately. B. Comprehensive: all agents mapped; entry type field distinguishes dispatch-routable from internal. Decision 4 (L — informs MC-STUB-09, 11): mem0 future role The question: What is mem0's long-term status? Context: 865 stale facts in mem0_john. Zero active writers. .md + LightRAG is the working pipeline. mem0 server is running and consuming resources. Options: A. Deprecate: stop mem0 server; archive its Qdrant vectors; remove from settings.json. B. Keep as parallel experimental sandbox: document it as optional enrichment layer, not canonical. C. Promote: wire a PostToolUse hook that writes every .md memory update to mem0 simultaneously (highest effort, not recommended). Petter's recommendation: Option A (deprecate). The .md pipeline is working. mem0 is cognitive overhead with no active consumer. Report produced by Petter Graff — CodeCraft Lead Architect Source: 4.1-petter-synthesis.md, 4.2-devils-advocate.md Audit date: 2026-05-09 MC stubs: 12 total. CEO selects 1-3 per session from top of each wave. Validation Reports 5.1 — Proveo Validation Report AI Factory Audit — Plan Task 5.1 Validator: Angie Jones (Proveo) Date: 2026-05-09 Audit deliverables reviewed: p1/{1.1,1.2,1.3,1.4}, p2/{2.1,2.2,2.3}, p3/3.1-health-matrix.md, p4/{4.1,4.2,4.3} Section 1 — Probe Re-Run (10% sample of 17 health-matrix rows) Five probes selected to cover memory (A1), dispatch (C1), RAG (H1), daemon (D1 verifier), and HiveDB (A3). Probe 1 — mem0 health endpoint (maps to P3.1 row A1) Original claim (P3.1 A1): mem0 PARTIAL — write acknowledged, semantic search returns count:1 but results:[] for new user_id audit-test . Fresh probe: curl -s http://localhost:9000/health Output: {"status": "healthy", "backend": "qdrant", "llm": "qwen3:8b-q8_0@ollama", "embedder": "bge-m3@ollama", "collections": ["mem0migrations","sessions","hivemind","mem0_john","knowledge"], "mem0_collection": "mem0_john"} Verdict: REPRODUCED mem0 health endpoint returns status: healthy as stated. Qdrant backend and collections list match the P3.1 evidence. The health plane is intact. The partial-retrieval issue noted in P3.1 (write-acknowledged, empty results for new user_id) is consistent with the collections list — audit-test user would not have a named collection in the list above, confirming P3.1's hypothesis about namespace creation lag. Probe 2 — HiveDB intel count (maps to P3.1 row A3) Original claim (P3.1 A3): sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;" → 17560 , latest entries dated 2026-05-09. Fresh probe: sqlite3 ~/system/agents/hivemind/hivemind.db "SELECT COUNT(*) FROM intel;" Output: 17569 Verdict: REPRODUCED (with expected drift) Count at probe time is 17,569 — 9 rows above the 17,560 from P3.1. This is a live write-active store; 9 new intel rows in the intervening period is consistent with normal HiveMind alert traffic. P3.1's claim that the store is live and functional is confirmed. The P3.1 "Surprises" note (HiveDB read API exists — P1 claim of "no read API" is wrong) stands confirmed. Probe 3 — pi-orchestrator PID 75750 alive (maps to P3.1 row C1) Original claim (P3.1 C1): PID 75750 running since Fri 12pm; curl http://localhost:8401/health → CONNECTION REFUSED. Fresh probe: ps aux | grep pi-orchestrator | grep -v grep Output: makinja 75750 0.0 0.1 436177552 61728 ?? S fre.12p.m. 0:22.29 /opt/homebrew/bin/node /Users/makinja/system/kernel/pi-orchestrator.js start Verdict: REPRODUCED PID 75750 is identical — same process, same start time (Friday 12pm), same command. The process has not been restarted, crashed, or replaced since P3.1 was written. This confirms the pi-orchestrator is running but its internal HTTP listener never came up. P3.1's "PARTIAL" verdict is correct: process alive, control plane dead. Additional validation: confirmed no port 8401 listener and no verify-fix-loop invocation in kernel or hooks (zero grep hits in ~/system/kernel/pi-orchestrator.js and ~/system/hooks/ ). Probe 4 — RAG queue depth (maps to P3.1 row H1) Original claim (P3.1 H1): cat ~/system/state/rag-drain.prom → total 454 (bookstack:442, evidence:2, mc-outcomes:9, specs:1). File mtime 2026-04-23 17:59 (16 days stale). rag-drain-worker crashed today (exit 256, HiveMind alert #64900). Fresh probe: cat ~/system/state/rag-drain.prom stat -f "%Sm %N" ~/system/state/rag-drain.prom Output: alai_ingest_queue_depth{source="bookstack"} 442 alai_ingest_queue_depth{source="evidence"} 2 alai_ingest_queue_depth{source="mc-outcomes"} 9 alai_ingest_queue_depth{source="specs"} 1 alai_ingest_queue_depth_total 454 mtime: Apr 23 17:59:36 2026 Verdict: REPRODUCED Queue values are byte-for-byte identical (bookstack:442, evidence:2, mc-outcomes:9, specs:1, total:454). File mtime is unchanged at 2026-04-23 17:59:36 — no write has occurred since P3.1 was produced. This confirms the drain-worker remains down and the metric is still frozen. The rag-drain-worker is not recovering on its own. P3.1's "PARTIAL" classification and the 16-days-stale caveat are both accurate. Note on P1 discrepancy: P3.1 states "P1 claim of 946 appears to be an older snapshot." This is confirmed — 946 does not appear in the current prom file at any level. P1 used a superseded snapshot. Probe 5 — verify-fix-loop auto-invocation (maps to P3.1 row D1) Original claim (P3.1 D1): Skill exists at ~/.claude/skills/verify-fix-loop/SKILL.md . Manual-trigger only. No daemon or hook auto-invokes it. P2 verdict "ABSENT" partially wrong — capability exists but auto-invocation is absent. Fresh probe: grep -rn "verify-fix-loop" ~/.claude/skills/task-postflight/ grep -rn "verify.fix.loop" ~/system/kernel/pi-orchestrator.js grep -rn "verify.fix.loop" ~/system/hooks/ Output: All three commands return no output (zero matches). Confirmed skill exists at ~/.claude/skills/verify-fix-loop/SKILL.md (direct ls confirmed). No reference to verify-fix-loop in task-postflight SKILL.md, pi-orchestrator kernel, or hooks directory. Verdict: REPRODUCED P3.1's nuanced verdict is correct: the skill exists and is indexed, but no automated trigger references it. task-postflight does not call it. The pi-orchestrator kernel ( .js , not the .bak ) has zero references. The hooks directory has zero references. P2's "ABSENT" framing was imprecise — P3.1's correction ("skill exists as MANUAL-trigger, not auto-invoked") is the accurate characterization. Section 1 Summary Probe P3.1 Claim This Probe Verdict mem0 health PARTIAL — healthy endpoint, retrieval gap for new users Confirmed healthy, collection list consistent with partial behavior REPRODUCED HiveDB count WORKS — 17,560, live writes today 17,569 (+9 rows — normal drift) REPRODUCED pi-orch PID 75750 PARTIAL — process alive, HTTP port 8401 dead Same PID, same uptime, still no port 8401 listener REPRODUCED RAG queue depth PARTIAL — 454 frozen, 16d stale, drain-worker down Identical values, identical mtime, no recovery REPRODUCED verify-fix-loop PARTIAL — skill exists, zero auto-invocation wiring Zero hits in task-postflight, kernel, hooks REPRODUCED All 5 probes: REPRODUCED. No contradictions to P3.1 found. Section 2 — MC Stub AC Quality Check (all 12 stubs from 4.3) Criteria applied per each stub: AC checklist exists (binary) Each AC is machine-checkable (not vague) Effort estimate reasonable Owner-company makes sense MC-STUB-01: Restore RAG drain-worker — PASS AC checklist: YES (5 ACs) Machine-checkable: All 5 are concrete commands with observable exit codes or file stats. cat /tmp/bw-session exits 0 — checkable curl -s http://localhost:9621/health returns {"status":"healthy"} — checkable launchctl list | grep rag-drain-worker LastExitStatus = 0 — checkable stat ~/system/state/rag-drain.prom mtime within 10 min — checkable Live queue depth written to new artifact — checkable (file-exists + key-present) One minor note: the 5th AC references "MC-STUB-03 new artifact" (rag-drain-live.json). This creates a dependency coupling between two stubs' ACs. If MC-STUB-03 is not executed, AC#5 cannot be verified. This is documented in the sequencing graph, but the AC should note the dependency explicitly. Keeping as PASS but noting this coupling. Effort S (≤2h): Reasonable for a credential session fix + daemon restart. Owner FlowForge: Correct — daemon lifecycle + credential management. MC-STUB-02: Resolve canonical dispatch path — PASS AC checklist: YES (4 ACs with conditional branches) Machine-checkable: The branching structure ("IF pi-orch is canonical: curl 200 / IF durable-runner is canonical: grep dispatch log") is valid. Both branches are machine-checkable. The fourth AC ("no dispatch logs older than 2026-04-01 are the NEWEST entry") is checkable via tail -1 on the log file. Effort L (≤2d): Reasonable — architectural decision + documentation + live probes. This is design work, not a one-line fix. Owner CodeCraft: Correct — kernel architecture is CodeCraft's domain. MC-STUB-03: Live RAG queue depth monitoring — PASS AC checklist: YES (4 ACs) Machine-checkable: rag-drain-live.json exists with queue_depth key — checkable mtime within 5 min — checkable launchctl list | grep rag-queue-monitor LastExitStatus = 0 — checkable HiveMind query returns row within last 1h — checkable Effort M (≤8h): Reasonable for a new monitoring daemon. Owner FlowForge: Correct. BlockedBy MC-STUB-01 is accurate and documented. MC-STUB-04: Restore or unload 5 deleted-script plists — WEAK AC checklist: YES (4 ACs) Machine-checkable: The OR-condition in AC#1 ( launchctl list shows ZERO entries OR LastExitStatus=0) is structurally ambiguous for a verifier. A verifier running this check cannot determine which branch was executed without additional context. The check passes in both the "unloaded" and "restored" outcome — which means a verifier cannot distinguish a complete success (restored + healthy) from a partial success (unloaded but not restored). This requires a separate assertion per plist that declares intent. AC#3 ("Zero exit-127 entries within 24h") uses a 24h observation window — this is time-bound and cannot be machine-checked at point-in-time without log inspection. Recommend: check last 5 launchctl exit codes for each daemon name, not a 24h window. Effort S (≤2h): Reasonable for an unload/restore task. Owner FlowForge: Correct. Specific fix needed: Split "unloaded" vs "restored" into separate ACs per plist. MC-STUB-05: Enforce blueprint score gate — PASS AC checklist: YES (4 ACs) Machine-checkable: grep -n "WARN\|warn" no bypass path — checkable Test run with score 65 exits non-zero — checkable (behavioral test) Test run without MC-ID exits non-zero — checkable grep "SCORE_FLOOR" returns numeric value — checkable The behavioral test ACs (#2 and #3) require a test harness that can invoke the gate with a mock blueprint. This is more complex than a read-only probe but is legitimately machine-checkable via a scripted invocation. Acceptable. Effort S (≤2h): Reasonable for a shell script edit + test run. Owner CodeCraft: Correct for gate scripting. MC-STUB-06: Agent fleet routing update — WEAK AC checklist: YES (4 ACs) Machine-checkable concern: AC#3 ( node ~/system/tools/discover.js routing "validate acceptance criteria" ) and AC#4 ( node ~/system/tools/discover.js routing "distill text" ) test routing of "validate" and "distill" — but the stub is about adding validator and distiller agents. The query phrases "validate acceptance criteria" and "distill text" may not match the agent names if discover.js uses keyword matching. A query returning "non-empty result" could be satisfied by a different agent (e.g., Proveo for "validate"), making the AC a false PASS. The AC should check that the returned company/agent specifically includes the newly added entry. AC#4 ( grep -c '"company"' specialist-mapping.json >= previous count + new entries ): requires knowing the pre-fix count to evaluate post-fix. This is process-dependent and not self-contained. Effort M (≤8h): Reasonable — design decision + JSON data entry. Owner CodeCraft + Resolver: Correct. MC-STUB-07: Register or archive Axiom/Datavera/Resolver — PASS AC checklist: YES (3 ACs) Machine-checkable: Each of the three appears in specialist-mapping.json OR has STATUS field in company.json — checkable discover.js routing "axiom" returns result or explicit message — checkable No persona directory has unresolved routing status — checkable via scan Effort M (≤4h): Reasonable for 3-company inventory + status update. Owner CodeCraft: Correct. MC-STUB-08: Restore pi-orchestrator dispatch — WEAK AC checklist: YES (4 ACs with conditional branches) Machine-checkable concern: AC#2 (durable-runner branch) states "node ~/system/tools/mc.js list --status ready --limit 1 followed by 5 min wait shows the task state has changed." This is a time-dependent behavioral assertion — a verifier cannot execute a 5-minute wait within a standard probe run. More critically: the state change depends on there being a ready task AND the dispatcher picking it up, which may not be true in a low-traffic environment. This AC can produce false FAILs in idle periods. AC#4 ("no task with status 'ready' sits unprocessed for more than 30 min in an idle queue — monitored via cron probe") is not a point-in-time checkable assertion. "Monitored via cron probe" means the AC requires an ongoing monitoring setup, not a single verification pass. Effort L (≤2d): Reasonable — kernel-level architectural work. Owner CodeCraft: Correct. BlockedBy MC-STUB-02: Documented and accurate. MC-STUB-09: Audit and archive Chroma + stale mem0 — PASS AC checklist: YES (4 ACs) Machine-checkable: curl localhost:8000/api/v1/collections returns documented list OR connection refused — checkable If decommissioned: entry removed from settings.json — checkable curl localhost:9000/v1/memories/?user_id=john — checkable memory-plane-canonical.md exists — checkable Effort S (≤2h): Reasonable — mostly audit + file/config edit. Owner CodeCraft: Acceptable. Could also be FlowForge (infra cleanup), but CodeCraft is defensible given the architectural documentation artifact. MC-STUB-10: Raise B2 storage cap + litestream health — WEAK AC checklist: YES (4 ACs) Machine-checkable concern: AC#1 uses curl -s -H "Authorization: applicationKey:..." https://api.backblazeb2.com/b2api/v2/b2_get_bucket_info . The authorization string is a placeholder — a verifier running this command verbatim will get a 401. The AC must reference the credential lookup method (e.g., bw get item "backblaze-b2-key" --session $(cat /tmp/bw-session) ) rather than a literal placeholder. This is an evidence-fabrication risk: a lazy verifier could claim PASS without actually having the credentials. AC#3 ( grep "$(date +%Y-%m-%d)" ~/system/logs/litestream.log | tail -1 ): requires the litestream log file to exist and be written today. If the log path differs from what's specified, this is a silent FAIL. The AC should include a fallback check for log file existence first. Effort S (≤2h): Reasonable — billing console action + log verification. Owner FlowForge: Correct. MC-STUB-11: Document memory pipeline (doc-only) — PASS AC checklist: YES (4 ACs) Machine-checkable: memory-plane-canonical.md exists — checkable CLAUDE.md contains specific phrase — checkable via grep BookStack page exists — checkable via curl mem0 status documented as "sandbox/experimental" — checkable via grep in spec Effort M (≤4h): Reasonable for a doc task. Owner Skillforge: Correct. BlockedBy MC-STUB-09: Documented and logical. MC-STUB-12: Wire verify-fix-loop (Wave C enhancement) — WEAK AC checklist: YES (4 ACs) Machine-checkable concern: AC#3 states "A dry-run of /task-postflight on a docs-domain MC shows verify-fix-loop invoked (not just Proveo)." This requires: (a) a real MC in docs domain to exist, (b) /task-postflight to be invokable in dry-run mode. The stub does not specify whether task-postflight has a --dry-run flag or how to interpret its output to confirm verify-fix-loop was called vs not called. Without a defined output artifact or log to inspect, this AC is not fully machine-checkable. AC#4 ("verify-fix-loop invocation does NOT replace Proveo — both must appear in the postflight log") is checkable IF the log artifact is defined. Currently "postflight log" is unspecified in the AC — what file path, what format? Effort M (≤8h): Reasonable. Owner Proveo: Correct — this is Proveo's enhancement of the verification pipeline. BlockedBy MC-STUB-08: Documented. Logical since auto-invocation requires dispatch to work. Section 2 Summary Stub Score Key Reason MC-STUB-01 PASS All 5 ACs concrete and checkable; minor cross-stub dependency coupling noted MC-STUB-02 PASS Conditional branch structure is valid; both branches machine-checkable MC-STUB-03 PASS All 4 ACs concrete; mtime + launchctl + HiveMind query all verifiable MC-STUB-04 WEAK OR-condition in AC#1 prevents distinguishing unload from restore; 24h window not point-checkable MC-STUB-05 PASS Behavioral test ACs are valid given scripted invocation harness MC-STUB-06 WEAK discover.js routing query may return false PASS from a different agent; count diff AC not self-contained MC-STUB-07 PASS All 3 ACs are direct file/command checks MC-STUB-08 WEAK 5-min wait AC and 30-min cron-monitoring AC not point-in-time checkable MC-STUB-09 PASS All 4 ACs concrete; connection-refused is an explicit acceptable output MC-STUB-10 WEAK Authorization placeholder in AC#1 is evidence-fabrication risk; log path not verified to exist MC-STUB-11 PASS All 4 ACs are grep/curl/file-exist checks MC-STUB-12 WEAK dry-run invocation mechanism undefined; "postflight log" file path unspecified PASS: 7 stubs | WEAK: 5 stubs | FAIL: 0 stubs 5 WEAK stubs require AC refinement before dispatch. None are structurally broken — all have correct intent, fixable in ≤30 min each. Section 3 — Cross-Report Consistency Finding 3.1: P4.1 mem0 vector count conflicts with P3.1 detail P4.1 Section 2 (Delta Table, Memory plane row): States "mem0 API has 0 active writers, 865 stale facts." P4.1 Section 4 (Architectural Conclusions): States "mem0/Qdrant (93K+ vectors, zero active writers)." These two numbers — 865 facts and 93K+ vectors — are not reconciled within P4.1. 865 is the mem0 fact count (application-layer). 93K+ would be the raw Qdrant vector count across all collections (embedding-layer, where each fact generates multiple vectors). P4.1 uses both without clarifying this distinction, creating an apparent contradiction. P3.1 does not cite either figure directly. The delta table figure (865) is more precise and correct as stated; the architectural narrative (93K+) needs a qualifier ("93K+ raw Qdrant embeddings across all collections, including non-mem0 collections such as HiveMind and knowledge"). Severity: LOW — confusing but not misleading about the fix needed. Finding 3.2: P4.3 references a DISMISSED gap (Gap #3 = verifier loop) via MC-STUB-12 P4.2 Gap #3 verdict: "DISPUTED — demoted." P4.2 concludes the gap framing was misleading and recommends relegating to Wave C enhancement. P4.3 Section 3 (Out of Backlog): Correctly identifies Gap #3 as DEMOTED (not dismissed). MC-STUB-12 is retained in the backlog as a Wave C item with L priority. This is NOT a contradiction — it is correctly handled. P4.3's "Out of Backlog" section explicitly distinguishes DISMISSED (Gap #4 mem0 SoR) from DEMOTED (Gap #3 verifier loop). The sequencing graph correctly places MC-STUB-12 in Wave C. Consistent. Finding 3.3: P4.3 MC-STUB-04 claims pi-orch-health plist references pi-orch-health.sh — P3.1 G1 says daemon state is "not running" P3.1 G1: launchctl print gui/501/com.alai.pi-orch-health → state: not running . Last health report Verdict: CRITICAL (2026-05-06). Scheduled health monitor failing. P4.3 MC-STUB-04: "pi-orch-health.sh was deleted on 2026-05-06 when the last recorded status was CRITICAL." These are consistent — daemon not running because script was deleted (exit 127 pattern from P1.4). No conflict. Finding 3.4: P2.1 connectivity diagram "Dead Edge 1" vs P3.1 C1/C2 — minor framing gap P2.1 (per P4.2 citation): labels the pi-orchestrator → agent dispatch path as "Dead Edge 1" and characterizes pi-orch as "MOCK MODE." P3.1 C2: Explicitly finds NO mock config reference in the kernel ( grep "mock" → zero matches). Config shows offlineMode: false , enabled: true . P4.2 rebuttal: Confirms P3.1 is correct — "MOCK MODE" framing is inaccurate; the real issue is HTTP port 8401 startup gating. Status: P2.1 uses "MOCK MODE" language that P3.1 and P4.2 both correct. P4.1 repeats "mock/broken mod" in the executive summary. P4.3 avoids this language entirely (describes the gap as "HTTP port dead" and "no dispatch logs post-March"). The P4.1 executive summary should be updated to drop "mock mode" — it is an inaccurate framing that has been rebutted by P3.1 probe evidence. Severity: LOW-MEDIUM — the corrected framing matters for how the CEO frames the fix. "Mock mode" implies intentional test configuration; "HTTP startup gating failure" implies a recoverable initialization bug. Finding 3.5: P4.1 Gap #5 composite score vs P4.3 MC-STUB-06 composite score — mismatch P4.1 Gap #5 (Agent routing table incomplete): Composite = 28 (7 × 8 / 2). P4.3 MC-STUB-06 (Design decision + routing update): Composite = 18 (7 × 5 / 2), "post-rebuttal adjusted." The severity was reduced from 8 to 5 after the devil's advocate review. P4.3 explicitly notes "post-rebuttal adjusted." This is correct — the rebuttal demoted this gap when it found that validator/distiller may be internal-only agents. The composite score difference is intentional and documented, not an error. Status: Consistent — change is intentional and documented. Finding 3.6: P4.1 Gap #7 cites "4 phantom companies" — P4.2 + P4.3 correct to 3 P4.1 Gap #7: "4 companies (Axiom, Datavera, Resolver, Lexicon) have full persona dirs... but zero entries in specialist-mapping.json." P4.2 Gap #7 rebuttal: Confirmed Lexicon IS in specialist-mapping.json. Only 3 companies are unroutable. P4.3 MC-STUB-07: Scope correctly adjusted to "Axiom, Datavera, Resolver" (3 companies). The correction flows correctly through the document chain. P4.1 contains the uncorrected claim (4 companies); P4.2 rebuttal catches it; P4.3 backlog uses the corrected count. This is the intended flow. However, P4.1 should carry a note that its Gap #7 count was revised to 3 by P4.2. As-is, a reader of P4.1 alone gets the wrong number. Severity: LOW — the correction exists in P4.2 and P4.3; only P4.1 isolation readers are misled. Section 3 Summary Finding Reports Affected Severity Status 3.1 — mem0 865 facts vs 93K+ vectors unclarified P4.1 internal LOW Minor annotation needed in P4.1 architectural section 3.2 — Dismissed vs Demoted gap classification P4.2 → P4.3 NONE Correctly handled 3.3 — pi-orch-health plist consistency P3.1 ↔ P4.3 NONE Consistent 3.4 — "Mock mode" framing rebutted but survives in P4.1 summary P2.1 → P4.1 LOW-MEDIUM P4.1 executive summary should replace "mock/broken mod" with "HTTP startup gating failure" 3.5 — Composite score change Gap #5 → STUB-06 P4.1 ↔ P4.3 NONE Intentional, documented 3.6 — "4 phantom companies" in P4.1 vs corrected "3" in P4.3 P4.1 ↔ P4.3 LOW P4.1 needs a correction note; P4.3 is correct No blocking contradictions found. Three low-severity annotation gaps noted. Section 4 — Final Verdict Verdict: REWORK (minor) The audit deliverables are substantially sound. All 5 re-run probes reproduced P3.1 findings. The fix backlog is correctly prioritized and the sequencing DAG is architecturally coherent. CEO can act on the Wave A items immediately. However, two categories of rework are required before CEO consumption of the full backlog: Category A — AC refinement (5 stubs, ≤30 min each): MC-STUB-04: Split the "unloaded OR restored" OR-condition into separate per-plist ACs; replace 24h window with last-N-exit-code check. MC-STUB-06: Rewrite the discover.js routing ACs to assert the specific agent returned (not just "non-empty result"); make count-diff AC self-contained with an explicit pre-fix baseline command. MC-STUB-08: Replace the 5-min-wait behavioral AC with a point-in-time dispatch log check (e.g., log entry exists with today's date). Replace the 30-min cron-monitoring AC with a statement that a cron probe must be set up as a child task. MC-STUB-10: Replace the literal Authorization: applicationKey:... placeholder with a credential retrieval command ( bw get item ... ); add a log-file existence pre-check before the grep assertion. MC-STUB-12: Define the "postflight log" artifact path; specify whether task-postflight has a --dry-run invocation mode or define an alternative observable output. Category B — Annotation fixes in P4.1 (≤15 min): P4.1 executive summary: Replace "mock/broken mod" for pi-orchestrator with "HTTP port startup gating failure" to match P3.1 and P4.2 corrected findings. P4.1 Gap #7: Add a footnote that P4.2 rebuttal revised the affected company count from 4 to 3 (Lexicon confirmed routable). P4.1 architectural section: Clarify that "93K+ vectors" is the raw Qdrant embedding count across all collections, not the mem0 fact count (865 application-layer facts). What CEO CAN act on immediately without rework: Wave A tasks (STUB-01, STUB-03, STUB-09, STUB-10 partial) — their ACs are either PASS-rated or the WEAK issues do not affect Wave A execution. CEO Decision Items 1-4 in Section 4 of P4.3 — these are architectural choices, not dependent on AC quality. The overall gap prioritization and sequencing DAG — both are sound. Evidence dir: /tmp/ai-factory-audit-2026-05-09/p5/ Validated docs: p3/3.1-health-matrix.md (sha256: f4af148add0d8ee7933da370126cbd90c9c024708d39847c35093e7551b1af98) Validated docs: p4/4.3-fix-backlog.md (sha256: 48c4728559d9fe307d067e63fc7ccd3c3c68b83a56801e52aa65b565d630b307) Produced by Angie Jones — Proveo 2026-05-09 Atomic-Claim Verification — AI Factory Audit Synthesis Verifier: Verifier Agent (read-only) Date: 2026-05-09 Source verified: 4.1-petter-synthesis.md CLAIMS_SOURCE: spec:/tmp/ai-factory-audit-2026-05-09/p4/4.1-petter-synthesis.md Atoms (one per claim) A1: "62.5% of advertised control and data flows are dead or degraded" Probe: Count LIVE / DEAD / PARTIAL from edge table in 2.1-connectivity-diagram.md Section E Output: Total edges inventoried: 40 LIVE: 15 DEAD: 15 PARTIAL: 10 DEAD + PARTIAL = 25 / 40 = 62.5% (confirmed by 2.1 Summary Statistics table: "The factory has a 37.5% live edge rate.") Verdict: PASS Note: Math is exact. 25 dead or degraded edges out of 40 = 62.5%. The edge table in 2.1 is the audit's own source of truth; Petter's synthesis correctly reports its own source document. A2: "All actual dispatch is manual-John" Probe: grep -l "verify-fix-loop\|auto.dispatch\|Task(" ~/.claude/hooks/*.sh → no matches. launchctl list | grep "durable\|pi-orch" → pi-orchestrator PID 75750 running, durable-runner (orchestrator-bridge) PID 1185 running. tail -5 ~/system/logs/pi-orchestrator/daemon-stdout.log Output: [2026-05-09T19:31:19.216Z] [INFO] Starting PI orchestrator cycle (active: 0) [2026-05-09T19:31:19.567Z] [DEBUG] No eligible tasks [2026-05-09T19:31:19.601Z] [INFO] [IDLE] System idle — starting YouTube batch learning grep "No eligible tasks" → 55,351 matches in daemon-stdout.log No hook in ~/.claude/hooks/ calls Task() or verify-fix-loop. Verdict: PASS Note: The pi-orchestrator is live and cycling every 30s, but prints "No eligible tasks" continuously (55,351 such messages in the log). Port 8401 refuses connections (confirmed: lsof -i :8401 returns nothing). No hook fires auto-dispatch. Manual-John is the actual dispatch path. A3: "CEO is the de-facto verifier for every task that reaches mc.js ready" Probe: Read 2.2-verifier-autonomy.md verdict; cross-check P3.1 D1 correction; read CLAUDE.md Hard Constraint #4 Output: 2.2-verifier-autonomy.md: "Autonomy verdict: ABSENT" P3.1 D1: "SKILL EXISTS at ~/.claude/skills/verify-fix-loop/SKILL.md. Skill is MANUAL-TRIGGER only." 2.2: "CEO is the de-facto verifier for every task that reaches mc.js ready" 4.2 rebuttal: "DISPUTED — Proveo (required gate) IS wired. verify-fix-loop is optional enhancement." CLAUDE.md Hard Constraint #4: "Builder cannot say done. mc.js ready → Proveo → done." Verdict: PASS — but with an important qualification Note: The synthesis headline is accurate in its core claim (no auto-invocation of verify-fix-loop), but the 4.2 devil's advocate correctly shows it overstates the situation. Proveo/Angie Jones IS the mandatory gate and it IS wired via /task-postflight. The CEO-as-verifier pattern holds for tasks where /task-postflight is not invoked (which is itself manual for H tasks only per 2.1 Edge #12: "Manual CLI invocation. H-tasks only"). So the claim is accurate for all tasks that do NOT go through task-postflight, which is the majority. Verdict: PASS with nuance — synthesis is accurate but 4.2's correction is also valid and the synthesis does not incorporate it. A4: "5 deleted scripts, plists still scheduled" Probe: Check each script on disk; check each plist in launchctl Output: MISSING: pi-orch-health.sh (~/system/tools/) MISSING: cost-daily-report.sh (~/system/tools/) MISSING: daily-planning.sh (~/system/tools/) MISSING: legal-docs-azure-sync.sh (~/system/daemons/) MISSING: mcp-health-check.sh (~/system/tools/) launchctl status: LOADED: com.alai.pi-orch-health → exit 127 LOADED: com.alai.cost-daily-report → exit 127 LOADED: com.alai.daily-planning → exit 127 LOADED: com.john.legal-docs-azure-sync → exit 127 LOADED: com.john.mcp-health-check → exit 127 Verdict: PASS Note: All 5 scripts confirmed missing on disk. All 5 plists confirmed loaded in launchctl with exit 127. Petter's claim is exactly correct. A5: "RAG queue 454 with 16d-stale metric" Probe: cat ~/system/state/rag-drain.prom (mtime + content); sqlite3 -readonly ~/system/state/ingest-queue.sqlite "SELECT COUNT(*) FROM ingest_queue;" Output: rag-drain.prom: mtime: 2026-04-23 17:59 (16 days stale — CONFIRMED) alai_ingest_queue_depth_total: 454 (this is the stale snapshot) ingest_queue SQLite (live): SELECT COUNT(*) → 3,150 rows total bookstack: 1703 + 48 = 1751 (duplicate sources — different status?) evidence: 372 + 58 = 430 mc-outcomes: 44 + 10 + 71 = 125 specs: 636 + 102 = 738 rules: 80 manual: 2 Verdict: FAIL Note: The "454" figure is from a 16-day-stale prometheus file — that part is accurate. But the live SQLite shows 3,150 queued items, not 454. The actual queue depth is ~7x worse than the synthesis states. The synthesis (following P3.1 H1) correctly flags the staleness of the metric, but then quotes the stale 454 figure as if it is the actual state. The real state is a 3,150-item frozen queue. The synthesis should have noted the true live count or stated "actual count unknown; stale metric shows 454 as lower bound." This is a significant understatement of severity. A6: Petter's top-3 gaps listed, then fresh-probed Probe: From synthesis Section 1 "5 najkritičnijih praznina" — top-3 are: (1) RAG ingest pipeline blocked, (2) pi-orchestrator in mock/broken mode, (3) Verifier loop capable but not called. Fresh probe each. Output: Gap 1 — RAG ingest pipeline: ingest_queue SQLite = 3,150 items (live). drain-worker crashing (HiveMind #64900 exit 256 today). LightRAG health: 3.1 A2 shows healthy (curl localhost:9621 → 200). Blocker = Vaultwarden auth. STATUS: CONFIRMED AND WORSE THAN STATED (3,150 not 454) Gap 2 — pi-orchestrator: PID 75750 alive. Port 8401: lsof -i :8401 → NOTHING (dead). Log tail: "No eligible tasks" — 55,351 occurrences. offlineMode reference found in pi-orchestrator.js (5 matches incl. "offlineMode: true" in config). Port 3052: lsof -i :3052 → node PID 1185 LISTENING (durable-runner alive). launchctl: com.alai.orchestrator-bridge PID 1185, exit 0. STATUS: CONFIRMED — HTTP dead, durable-runner live but not dispatching. Gap 3 — Verifier loop: ~/.claude/skills/verify-fix-loop/SKILL.md EXISTS. No hook in ~/.claude/hooks/ calls it (grep returns no matches). No daemon with verify-fix-loop call found. STATUS: CONFIRMED — capability exists, zero auto-invocation. Verdict: PASS (top-3 gaps confirmed by fresh probes; RAG figure is understated but the gap itself is real) A7: "37 unmapped agents" vs "42 unmapped agents" — which count is in the synthesis? Probe: grep "37\|42" 4.1-petter-synthesis.md | grep -i "unmapped\|agent" → no results. Read Section 2 table entry for Agent fleet. Output: 4.1-petter-synthesis.md Section 2 Agent fleet row: "44% mapping coverage (29/66). validator (44 skill refs) and distiller (21 refs) absent from mapping. 7 mapped agents unreachable on disk. 4 companies invisible to routing. 35 chains have no executor." The synthesis does NOT quote "37 unmapped" or "42 unmapped" as a standalone number. P1.3 (1.3-agent-fleet.md) explicitly states: "42 unmapped agents" and breaks down to 11 ORPHAN + 11 DUPLICATE + 20 NEEDS-MAPPING = 42. The prior "37 unmapped" figure appears in the audit brief question but is NOT in P1.3 text. Verdict: PASS — the synthesis avoids quoting a specific unmapped count; it uses "44% mapping coverage (29/66)" instead, which is accurate (66 - 29 = 37 unmapped, but P1.3 corrects this to 42 because 7 mapped agents are also missing from disk, so the "reachable" count is lower). The synthesis does not contain the discrepant number — the A7 atom is about consistency, and the synthesis is consistent (it omits the count rather than stating it). Note: P1.3's 42 figure counts agents in ~/.claude/agents/ not in specialist-mapping.json. The synthesis's choice to use "44%" coverage is the safer framing. No inconsistency to report. A8: "All 35 chain YAMLs are dead" Probe: ls ~/system/tools/chain-runner.sh , ls ~/system/tools/chain-runner.js , check if chain-runner is invoked by any daemon or skill Output: chain-runner.js EXISTS: ~/system/tools/chain-runner.js (31208 bytes, 2026-02-26) Header: "YAML-defined agent chain orchestrator / Runs declarative agent chains defined in ~/system/agents/chains/*.yaml" CLI: node chain-runner.js run / resume / list / show chain-runner.sh EXISTS: ~/system/tools/chain-runner.sh (9281 bytes, 2026-05-07) Header: "Pillar #5 stateless skill-chain runner (one step per tick)" This is what com.alai.chain-daily-inbox calls. grep "chain-runner" ~/.claude/skills/ → NO MATCHES (in non-archived skills) grep "chain-runner" ~/system/daemons/ → NO MATCHES launchctl: com.alai.chain-daily-inbox (exit 1, not running) com.alai.chain-e2e-nightly (exit 1) com.alai.chain-phantom-detector (exit 1) Verdict: FAIL Note: The synthesis claims "35 chain YAML files without a single executor" but chain-runner.js IS a functional chain executor (31KB, CLI-complete, linked to MC #1902). chain-runner.sh is a second runner (Pillar #5). The 1.3-agent-fleet.md also acknowledges chain-runner.sh exists ("com.alai.chain-daily-inbox: failure likely in downstream chain execution"). The chain-runner EXISTS — it is just (a) currently broken/unused due to downstream failures, and (b) not invoked from any active skill. The claim "no chain runner exists" is factually false; the correct claim is "chain runners exist but are broken or un-invoked." This is a meaningful distinction: fixing chains requires fixing the runners' downstream dependencies, not building a runner from scratch. A9: "pi-orch HTTP dead but durable-runner port 3052 is the dispatch path" Probe: lsof -i :8401 , lsof -i :3052 , launchctl list | grep "durable\|orchestrator" Output: lsof -i :8401 → NO OUTPUT (port 8401 not listening — confirmed dead) lsof -i :3052 → node PID 1185 LISTENING on *:apc-3052 launchctl: 1185 0 com.alai.orchestrator-bridge (PID alive, exit 0) 1212 0 com.john.durable-executor (PID 1212, exit 0) 75750 0 com.john.pi-orchestrator (PID alive, exit 0) - 0 com.john.orchestrator-http (down_exit_0: duplicate) Verdict: PASS Note: Port 8401 confirmed dead. Port 3052 confirmed live (node PID 1185, 20-day uptime per P3.1). The synthesis's claim that durable-runner is the active dispatch path is confirmed structurally. However, P3.1 C1 and 4.2 Gap #2 both note that even the durable-runner shows no dispatch activity post-2026-03-19 — the pi-orchestrator log confirms "No eligible tasks" cycling. So "durable-runner is the dispatch path" is confirmed as the structural path, but it is also idle. The synthesis correctly notes dispatch is unclear via this path; 4.2 appropriately flags this ambiguity. A10: DISMISSED gaps — are they actually dismissable? Probe: Read 4.2 devils advocate dismissal reasoning for mem0 wire and verify-fix-loop; re-check CLAUDE.md for mem0 SoR designation Output: mem0 SoR dismissal (4.2 Gap #4): grep -i "mem0" ~/.claude/CLAUDE.md → 0 matches (confirmed by 4.2) grep -i "System of Record\|SoR" ~/.claude/CLAUDE.md → 0 matches 4.2 reasoning: ".md + LightRAG is INTENDED design; mem0 was never designated SoR" Evidence: lightrag-auto-ingest.sh hook explicitly routes .md → LightRAG (P1.1) Verdict on dismissal: SOUND — mem0 SoR gap is a false positive. CLAUDE.md never designated mem0 as SoR. The .md pipeline is the designed path. verify-fix-loop dismissal (4.2 Gap #3 downgraded to feature request): CLAUDE.md Hard Constraint #4: "mc.js ready → Proveo verification → done" Proveo IS wired via task-postflight (P2.2 confirms). verify-fix-loop is OPTIONAL enhancement, not required gate. 4.2 reasoning: "The REQUIRED verification gate (Proveo) IS wired and working." Verdict on dismissal: SOUND — the required gate exists. CEO-as-verifier claim is overstated because Proveo gate IS the designed verifier; it's just H-tasks only and manual-invoked (per 2.1 Edge #12 PARTIAL). The dismissal is correct that verify-fix-loop is not a gap in required functionality. Phantom companies dismissal of Lexicon (4.2 Gap #7): grep "Lexicon\|lexicon" ~/system/agents/specialist-mapping.json → NO OUTPUT This contradicts 4.2's claim that "Lexicon IS in specialist-mapping.json." 4.2 states: "I found 'company: Lexicon' in the mapping with Dževad Jahić." Live grep returns nothing. P1.3 confirms: "skillforge.md maps to 'Skillforge' not Lexicon." Verdict: 4.2's Lexicon dismissal ERRS. Lexicon is NOT routable via specialist-mapping.json. The 4 phantom companies remain 4, not 3 as 4.2 claims. 4.2 hallucinated a Lexicon entry. Verdict: PARTIAL FAIL — mem0 and verify-fix-loop dismissals are sound, but the Lexicon phantom-company dismissal is WRONG (4.2 claims Lexicon is mapped; live grep shows it is not). Confidence Grade FEEDBACK — Two atoms FAILED with concrete evidence (A5: queue depth understated 454 vs 3,150; A8: chain-runner.js and chain-runner.sh DO exist; A10: Lexicon phantom company dismissal in 4.2 is wrong). Summary Atoms passed: 7 / 10 Atoms failed: 3 (A5, A8, A10-Lexicon) Confidence: FEEDBACK Feedback file written: /tmp/verifier-feedback-ai-factory-audit.md