AI Factory Audit 2026-05-09
Complete audit deliverables
- Executive Summary (SENTINEL Final)
- Connectivity Diagram
- Inventory: Memory Plane
- Inventory: Tools Shed
- Inventory: Agent Fleet
- Inventory: Daemon Fleet
- Verifier Autonomy Audit
- BUILD-BLUEPRINT Discipline
- Health Matrix
- Petter Synthesis
- Devils Advocate
- Fix Backlog
- Validation Reports
Executive Summary (SENTINEL Final)
SENTINEL AUDIT — Final Consolidated Report
Date: 2026-05-09 Lead Validator: Sentinel Validator (consolidating P1–P5 findings) Destination: CEO (Alem Basic)
FINAL VERDICT
REWORK-MINOR
The audit is fundamentally sound. The fix backlog is correctly prioritized. The CEO can act on Wave A items (RAG drain-worker, queue monitoring, Chroma audit, B2 billing) immediately. However, 5 MC stubs require AC refinement (≤30 min each) before general dispatch, and P4.1 carries 3 low-severity annotation corrections. None of these are blockers to CEO decision-making or Wave A execution.
Headline (Bosnian)
Fabrika je mrtva od marta — 62.5% obaveza ne radi. Pi-orchestrator nije dispatchovao ništa. John je ručni dispecer. Tri fixa otključavaju sve ostalo: RAG Vaultwarden kredencijal, definišite canonical dispatch path, žičajte verify-fix-loop.
Top-5 Actionable Findings (Post-Corrections)
1. RAG ingest pipeline blocked — 3,150+ items queued (not the stale 454)
- Finding: rag-drain-worker crashed on Vaultwarden CF Access timeout. The metric file is 16 days stale (shows 454). Live SQLite count: 3,150 queued items — real state is 7x worse than the documented figure.
- Evidence: P3.1 H1 (health matrix), P5.2-verifier-report A5 (fresh queue depth probe showing 3,150), HiveMind #64900 (today's crash).
- Action priority: CRITICAL — Fix immediately (MC-STUB-01, Wave A, ~2h effort). Single credential fix (Vaultwarden session + CF Access token) drains 3,150+ items simultaneously. This single fix unblocks 3 downstream adapters.
2. pi-orchestrator not dispatching — HTTP port 8401 dead since March
- Finding: Process PID 75750 is alive. HTTP control plane is offline. No dispatch logs post-2026-03-19 (50+ days idle). durable-runner bridge (port 3052) is structurally alive but unclear if it's processing. The framing "mock mode" is inaccurate (P4.2 rebuttal) — the real issue is startup gating.
- Evidence: P3.1 C1/C2 (live probes), P4.2 Gap #2 rebuttal (no mock config found; config shows
offlineMode: false), P5.1 probe #3 (PID confirmed unchanged 5+ days). - Action priority: HIGH — But requires architectural decision first (MC-STUB-02, Wave B). Is durable-runner the canonical dispatcher (HTTP port 8401 is legacy), or is HTTP supposed to be online? The fix depends on the answer. Do not attempt MC-STUB-08 (pi-orch restore) until this decision is made.
3. Verifier loop capability exists but zero auto-invocation
- Finding: verify-fix-loop skill is fully built, tested, and working. Accepts manual invocation. However, no daemon, hook, or pi-orchestrator code ever calls it. Important caveat (P4.2 rebuttal): This is NOT a structural gap. The REQUIRED verification gate is Proveo (Angie Jones), which IS wired via task-postflight. verify-fix-loop is an optional enhancement for self-correcting specs (docs, system, refactor domains).
- Evidence: P2.2 §2, P3.1 D1 (skill exists, manual-only), P4.2 Gap #3 (Proveo is the designed gate), CLAUDE.md Hard Constraint #4 (specifies Proveo, not verify-fix-loop).
- Action priority: MEDIUM — Feature enhancement, not blocker. Demoted to Wave C (MC-STUB-12) with L priority. Wire as optional section in /task-postflight after pi-orchestrator dispatch is restored.
4. Agent routing table incomplete — validator and distiller unmapped (44 references, 21 references, 0 routing entries)
- Finding: validator and distiller agents are cited 65 times across skill files but have zero entries in specialist-mapping.json. Important distinction (P4.2 rebuttal): These may be INTERNAL-ONLY agents (called from other agents, not from John). If internal-only, they should NOT be in the routing table. If routable by John, they must be added. This requires a routing policy decision first.
- Evidence: P1.3 (agent-fleet inventory shows 66 agents, mapping covers only 29), P4.2 Gap #5 rebuttal (may be internal-only), P4.3 MC-STUB-06 (design decision gates this fix).
- Action priority: MEDIUM — Requires CEO Decision #3 (routing policy scope: comprehensive vs curated). Once decided, implementation is ≤8h (MC-STUB-06, Wave B).
5. Four phantom companies unroutable (Axiom, Datavera, Resolver, Lexicon)
- Finding: All four have complete persona directories (CLAUDE.md, agents, company.json). ZERO entries in specialist-mapping.json. Correction (P4.2 rebuttal + P5.2-verifier A10): Lexicon IS routable (grep confirms 0 matches — P4.2 hallucinated a mapping entry). So the correct count is 3 phantom companies (Axiom, Datavera, Resolver), not 4. Lexicon is confirmed absent and phantom.
- Evidence: P1.3 (inventory shows all 4 have full infrastructure), P4.2 Gap #7 (rebuttal claims Lexicon is mapped — REFUTED by P5.2-verifier), P4.3 MC-STUB-07 (correctly lists 3 companies).
- Action priority: LOW — Inventory work + routing decision. Demoted to Wave B after routing policy (MC-STUB-06) is decided. MC-STUB-07 implements the fix for 3 companies (~4h effort, M priority).
Wave A — Ship Now (No CEO Decisions Needed)
These four MCs can be dispatched immediately. Combined effort: ~6h.
| Stub | Title | Effort | Owner | Why Safe to Ship |
|---|---|---|---|---|
| MC-STUB-01 | Restore RAG drain-worker: fix Vaultwarden session + CF Access | S (≤2h) | FlowForge | Single credential fix. Machine-checkable ACs. Proveo-validated PASS (5.1 §2). Unblocks 3 adapters. |
| MC-STUB-03 | Implement live RAG queue depth monitoring | M (≤8h) | FlowForge | Proveo PASS (5.1 §2). Depends on MC-STUB-01 (documented). No CEO decision required. |
| MC-STUB-09 | Audit and archive Chroma + stale mem0 collections | S (≤2h) | CodeCraft | Proveo PASS (5.1 §2). Pure read-probe + cleanup. No blocking dependencies. |
| MC-STUB-10 | Raise B2 storage cap + verify litestream replication | S (≤2h) | FlowForge | Proveo WEAK (credential placeholder needs fix — see rework list). But the task itself is low-risk (billing action). Fix AC before dispatch (≤5 min). |
Wave A partial: MC-STUB-04 (restore 5 deleted plists) — 4 of 5 plists can be unloaded/restored now. The 5th (pi-orch-health.sh) is blocked on MC-STUB-02 (canonical dispatch decision) because the health probe must be updated to check the right port.
Wave B — Needs CEO Architectural Decisions First
These fixes depend on 4 CEO decisions. Once decided, they are unblocked.
CEO Decision #1 (CRITICAL): Canonical dispatch path
The question: Is durable-runner (port 3052, 20d uptime) the canonical dispatcher — with pi-orchestrator HTTP (port 8401, dead) being a legacy control plane? OR is pi-orchestrator HTTP supposed to be online?
Why only CEO can decide: This is a fork in how we interpret the system's design. No engineer can unilaterally choose which dead component to revive.
Options:
- A. durable-runner is canonical. HTTP port 8401 is legacy. Document this, verify durable-runner is processing tasks, decommission HTTP.
- B. pi-orch HTTP is canonical. Diagnose startup gating (likely Ollama hang), restore it. durable-runner is subordinate.
- C. Both should be operational. Requires specifying the interaction model.
Unblocks:
- MC-STUB-02 (design decision itself)
- MC-STUB-04 remainder (pi-orch-health.sh restoration)
- MC-STUB-08 (pi-orchestrator restore — actual kernel fix)
CEO Decision #2 (MEDIUM): Blueprint score gate floor
The question: What is the enforced minimum score for dispatch via Mehanik gate?
Context: Observed practice allows dispatch at score 65 (WARN range). Original spec says 90. The code treats WARN as pass-through. Choose one and hardcode it.
Options:
- A. Lower floor to 60 — match observed practice; WARN is acceptable.
- B. Floor stays at 90 — WARN becomes BLOCK; blueprints must score higher.
- C. Tiered: 60 for L tasks, 75 for M, 90 for H+.
Unblocks: MC-STUB-05 (enforce gate at the chosen floor)
CEO Decision #3 (MEDIUM): specialist-mapping.json scope policy
The question: Should the routing table be comprehensive (all 66 agents) or curated (only John-dispatchable agents)?
Why it matters: validator and distiller are cited 65 times but may be internal-only. If internal, they must NOT be in the routing table. If John-routable, they must be added.
Options:
- A. Curated — only John-dispatchable agents enter the mapping. Internal agents documented separately.
- B. Comprehensive — all agents mapped; entry type field distinguishes dispatch vs internal.
Unblocks:
- MC-STUB-06 (routing policy design + specialist-mapping update)
- MC-STUB-07 (register 3 phantom companies or mark as experimental)
CEO Decision #4 (LOW): mem0 future role
The question: What is mem0's long-term status?
Context: 865 stale facts. Zero active writers. .md + LightRAG is the working pipeline. mem0 server running and consuming resources.
Options:
- A. Deprecate — stop mem0 server; archive Qdrant vectors; remove from settings.json.
- B. Keep experimental — document as optional parallel sandbox, not canonical.
- C. Promote — wire PostToolUse hook to write every .md update to mem0 simultaneously (high effort, not recommended).
Recommendation (Petter): Option A (deprecate). The .md pipeline works. mem0 is cognitive overhead.
Unblocks: MC-STUB-09 + MC-STUB-11 (memory-plane documentation)
Surfaced Contradictions Resolved
Contradiction 1: RAG queue depth — 454 vs 3,150
P4.1 synthesis stated: Queue depth 454 (from stale metric). P5.2 verifier caught: Live SQLite shows 3,150 queued items (16 days newer data).
Resolution: Both figures are correct — the metric file is 16 days stale. The synthesis should have emphasized the live count (3,150) or stated "actual count unknown; 454 is a lower bound from 16 days ago." This is a severity understatement, not a factual error. MC-STUB-01 AC#5 requires live queue monitoring to prevent future metric staleness.
Contradiction 2: pi-orchestrator "mock mode" vs actual config
P2.1 connectivity diagram stated: pi-orch in MOCK MODE, alai-config-mock.json loaded.
P4.2 devils-advocate rebutted: No mock config found. Config shows offlineMode: false, enabled: true.
P3.1 verified: Zero grep matches for "mock" in pi-orchestrator.js.
Resolution: The "mock mode" framing is inaccurate. The real issue is HTTP port 8401 startup gating (likely an initialization hang, not intentional test mode). P4.1 executive summary repeats "mock/broken mod" but should be updated to "HTTP startup gating failure" per P3.1/P4.2 evidence.
Contradiction 3: Chain runner existence
P4.1 synthesis stated: 35 chain YAML files have no executor; chain-runner doesn't exist. P5.2 verifier caught: chain-runner.js (31KB, fully functional) and chain-runner.sh (Pillar #5) both exist.
Resolution: Chain runners DO exist. They are not broken in the sense of missing — they are broken/unused because:
- (a) No active skill invokes them (skills call agents inline),
- (b) Three chain-related daemons exit 1 due to downstream failures,
- (c) The runners are un-integrated, not absent.
The correct claim is "chains are un-invoked and un-integrated," not "no executor exists." This distinction matters for the fix: restoring chains requires fixing downstream dependencies, not writing a new runner.
Contradiction 4: Lexicon company phantom status
P4.1 Gap #7 stated: 4 phantom companies — Axiom, Datavera, Resolver, Lexicon.
P4.2 devils-advocate claimed rebuttal: Lexicon IS in specialist-mapping.json.
P5.2 verifier caught: grep "Lexicon" ~/system/agents/specialist-mapping.json → 0 matches. Lexicon is NOT routable.
Resolution: P4.2 hallucinated the Lexicon entry (ZAKON NULA breach). The correct count is 4 phantom companies, not 3. P4.3 MC-STUB-07 correctly lists the affected companies as the full 4 in some passages but may have been partially rewritten. This audit's final count: all 4 are confirmed unroutable (Axiom, Datavera, Resolver, Lexicon). Update MC-STUB-07 scope to list all 4.
Contradiction 5: mem0 SoR intent
P4.1 synthesis stated: mem0 is the intended System of Record; it's broken. P4.2 devils-advocate rebutted: mem0 was never designated as SoR in CLAUDE.md or any spec.
Resolution: The gap is dismissed (correctly). .md + LightRAG is the designed pipeline (Claude Code native auto-memory → lightrag-auto-ingest.sh hook → LightRAG). mem0 was a prototype that never achieved SoR status. The correct fix is documentation (MC-STUB-11), not re-wiring mem0. This satisfies the dismissed gap.
Contradiction 6: HiveMind read API
P1.1 implied: HiveMind has no read API.
P3.1 found: hivemind.js read/query/semantic_query all functional. API exists.
Resolution: P1.1 overstated the gap. HiveMind is the healthiest store in the factory (17,560+ live intel rows, read API functional, daily writes). No contradiction to resolve — P3.1 corrected the inventory claim.
Open Questions for CEO
- Canonical dispatch path: durable-runner or pi-orchestrator HTTP? (CEO Decision #1)
- Blueprint score gate: Enforce at 60, 75, or 90? (CEO Decision #2)
- specialist-mapping.json scope: Comprehensive or curated? (CEO Decision #3)
- mem0 future role: Deprecate or keep as experimental? (CEO Decision #4)
- Anything else surfaced: Any findings in this audit that require clarification before we proceed with Wave A?
Recommendation
John should dispatch Wave A immediately (RAG drain-worker, queue monitoring, Chroma audit, B2 cap raise — ~6h total). These are unblocked and low-risk. While Wave A runs, John should surface CEO Decision #1 (canonical dispatch path) to the CEO and gather answers for Decisions #2–4. Once Decision #1 is resolved, Wave B becomes unblocked and John can schedule MC-STUB-02 (design decision) + the downstream fixes (pi-orch-health.sh, pi-orchestrator restore, routing policy). The audit is sound. The backlog is prioritized. The next blocker is not more analysis — it is the CEO's architectural calls.
Rework Required Before General Dispatch
Category A — AC refinement (5 stubs, ≤30 min each):
- MC-STUB-04: Split OR-condition into per-plist ACs; replace 24h window with point-in-time exit-code check.
- MC-STUB-06: Rewrite discover.js routing ACs to assert the specific agent returned (not just "non-empty"); make count-diff self-contained.
- MC-STUB-08: Replace 5-min wait AC with point-in-time dispatch log check; replace 30-min cron monitoring with a statement that cron probe is a child task.
- MC-STUB-10: Replace credential placeholder with
bw get itemcommand; add log-file existence check. - MC-STUB-12: Define the "postflight log" artifact path; specify task-postflight invocation mode or output.
Category B — P4.1 annotations (≤15 min):
- Replace "mock/broken mod" in executive summary with "HTTP startup gating failure."
- Update Gap #7 to note P4.2 rebuttal revised count (but P5.2-verifier refutes that rebuttal — final count is 4 phantom companies, not 3).
- Clarify that "93K+ vectors" is raw Qdrant embeddings across all collections, not mem0-only count (865 facts is the mem0 application-layer count).
Audit Status: COMPLETE
Validator: Sentinel Validator (consolidation)
Evidence directory: /tmp/ai-factory-audit-2026-05-09/
Prior phases: P1 (inventory), P2 (connectivity), P3 (health matrix), P4 (synthesis + rebuttal + backlog), P5 (validation + verification + final consolidation)
Report produced by Sentinel Validator 2026-05-09 Consolidated from 11 audit reports + 3 rebuttal layers + live probe verification
Connectivity Diagram
2.1 — AI Factory Connectivity Diagram
Date: 2026-05-09 Auditor: sentinel-architect Phase: 2 — Synthesis from P1 inventory reports 1.1, 1.2, 1.3, 1.4 and P2 reports 2.2, 2.3 Mode: READ-ONLY. No mutations.
Section A — Control Plane Diagram
The diagram below shows the advertised flow from CEO input to task closure. Solid arrows are flows that actually work. Dotted red arrows are advertised edges that are broken or absent. Labels show the transport mechanism.
flowchart TD
CEO([CEO / Alem])
JOHN([John — Orchestrator\nClaude Code CLI session])
MH["/mehanik gate\n~/.claude/agents/mehanik.md\n113 cleared tokens in /tmp"]
PF["/prompt-forge\n~/.claude/skills/prompt-forge/"]
PIO["pi-orchestrator\n~/system/kernel/pi-orchestrator.js\nPID 75750 — MOCK MODE"]
SPEC["Specialist Agent\ne.g. petter-graff, angie-jones\n~/.claude/agents/*.md"]
TOOL["Tool\n~/system/tools/ (250 live)"]
ART["Artifact\n(code / doc / spec / evidence file)"]
VERIFIER["Verifier / verify-fix-loop\n~/.claude/agents/verifier.md\n~/.claude/skills/verify-fix-loop/"]
TPF["/task-postflight\n~/.claude/skills/task-postflight/"]
MCD["mc.js done\n~/system/tools/mc.js"]
PROVEO["Proveo / Angie Jones\n~/.claude/agents/angie-jones.md"]
HOOK["Hook Layer\n~/.claude/hooks/ (12 active)"]
CEO -- "CLI conversation" --> JOHN
JOHN -- "CLI / Task dispatch" --> MH
MH -- "cleared token written to /tmp/mehanik-cleared-N\nBlueprint read enforced (PARTIALLY — WARN scores pass)" --> PF
PF -- "forged prompt → Task dispatch" --> PIO
PIO -. "Task dispatch — mc.js write\nBROKEN: MOCK MODE\nalai-config-mock.json loaded\nPlanka localhost:3100 not listening\n'No eligible tasks' every 30s" .-> SPEC
SPEC -- "Tool calls\n(Read / Edit / Bash / Grep)" --> TOOL
TOOL -- "Write / Edit" --> ART
ART -- "mc.js ready write" --> HOOK
HOOK -- "PreToolUse / PostToolUse\nexits 0 = pass, exits 2 = block" --> MCD
MCD -. "ADVERTISED: auto-invokes verifier\nACTUAL: ABSENT\n0 hooks, daemons, or pi-orch code\ncalls verify-fix-loop\n(source: 2.2)" .-> VERIFIER
VERIFIER -. "ADVERTISED: auto-loop to fix-builder\nACTUAL: manual invocation only\nno programmatic trigger" .-> SPEC
MCD -- "mc.js ready → /task-postflight\n(manual invocation only for H tasks)" --> TPF
TPF -- "Task dispatch — CLI" --> PROVEO
PROVEO -- "AC checklist → verdict" --> TPF
TPF -- "mc.js done (with evidence)" --> MCD
MCD -. "ADVERTISED: pi-orchestrator consumes\n'done' events for next task\nACTUAL: MOCK MODE — consuming nothing" .-> PIO
style PIO fill:#ffcccc,stroke:#cc0000
style VERIFIER fill:#ffcccc,stroke:#cc0000
style MH fill:#ffffcc,stroke:#cccc00
Annotation notes:
- CEO → John: works. Standard CLI session.
- John → Mehanik: works. 113 cleared tokens confirm Mehanik runs regularly.
- Mehanik → prompt-forge → pi-orchestrator: the dispatch chain exists structurally. pi-orchestrator is alive (PID 75750) but in MOCK MODE — it reads mock config and never consumes real MC tasks.
- pi-orchestrator → Specialist: BROKEN because mock mode means pi-orchestrator never fires a Task dispatch to a real specialist.
- Specialist → Tool → Artifact: works when agents are dispatched by John manually (not via pi-orchestrator).
- Artifact → mc.js done (via hooks): works. The hook layer (12 active hooks) enforces gates on mc.js writes.
- mc.js done → verifier: ABSENT. No automated trigger. CEO is the de-facto verifier (source: 2.2).
- mc.js done → pi-orchestrator: BROKEN. Mock mode means pi-orchestrator does not react to task completions.
Section B — Data Plane Diagram
Shows all memory stores with their actual write paths (solid = live, dotted red = dead, dotted orange = partial/degraded).
flowchart LR
CC["Claude Code\n(built-in auto-memory)"]
MDFILES[".md auto-memory files\n~/.claude/projects/-Users-makinja/memory/\n123 files — LIVE"]
HOOK_LR["lightrag-auto-ingest.sh\nPostToolUse hook\nfires on Write/Edit to in-scope paths"]
LR["LightRAG\nlocalhost:9621\n999 docs indexed\npipeline_busy=true\nHEALTHY but DEGRADED"]
DISCOVER["discover.js\nhttps://lightrag.alai.no/query\n(external hostname — Caddy proxy)"]
IQ["ingest-queue.sqlite\n~/system/state/\n946 items FROZEN"]
RDW["rag-drain-worker\nPID 3640\nETIMEDOUT on Vaultwarden"]
RBA["rag-bookstack-adapter\nevery 5min — exit 256\nblocked by backpressure"]
RMCA["rag-mc-adapter\nevery 5min — exit 256\nblocked by backpressure"]
RFSEA["rag-fsevents-adapter\nWatchPaths — exit 1\nblocked by backpressure"]
BKS["BookStack\ndocs.alai.no"]
MCLOG["mc-task-outcomes.jsonl\n~/system/logs/"]
MEM0["mem0 API\nlocalhost:9000\nHEALTHY — 0 active writers"]
QDR["Qdrant\nlocalhost:6333\n5 collections\n93,510 total vectors"]
MEM0J["mem0_john collection\n865 vectors — STALE"]
KNOW["knowledge collection\n31,274 vectors — STALE\nunknown origin"]
SESS["sessions collection\n929 vectors — unknown writer"]
HIVE_Q["hivemind collection\n60,442 vectors — LIVE"]
HIVEJS["hivemind.js CLI\ndual-write on post"]
HIVEDB["HiveDB SQLite\nhivemind.db\n17,551 intel rows — LIVE"]
CHROMA["Chroma\n~/.claude-mem/chroma/\n6,584 embeddings\nno active writer or reader"]
FLYWHEEL["flywheel.db SQLite\n~/system/databases/\nLIVE — rag-router.js cache"]
RAG_ROUTER["rag-router.js\ncache → Ollama → external"]
CC -- "native write" --> MDFILES
MDFILES -- "PostToolUse trigger" --> HOOK_LR
HOOK_LR -- "curl POST localhost:9621" --> LR
LR -- "serves queries" --> DISCOVER
BKS -- "poll every 5min" --> RBA
MCLOG -- "tail" --> RMCA
RBA -- "enqueue" --> IQ
RMCA -- "enqueue" --> IQ
RFSEA -- "enqueue" --> IQ
IQ -- "drain attempt" --> RDW
RDW -. "DEADLOCKED\nVaultwarden ETIMEDOUT\nCF Access creds missing\n946 items queued, 0 drained" .-> LR
HIVEJS -- "write" --> HIVEDB
HIVEJS -- "dual-write best-effort" --> HIVE_Q
HIVE_Q --> QDR
MEM0 --> QDR
QDR --> MEM0J
QDR --> KNOW
QDR --> SESS
QDR --> HIVE_Q
CC -. "INTENDED: POST localhost:9000/add\nACTUAL: ABSENT\n0 callers in hooks/tools/daemons" .-> MEM0
DISCOVER -. "INTENDED: query mem0 for personal facts\nACTUAL: ABSENT\ndiscover.js does not call localhost:9000" .-> MEM0
CHROMA -. "writer UNKNOWN\nreader UNKNOWN\n6584 embeddings orphaned" .-> CHROMA
RAG_ROUTER -- "learn" --> FLYWHEEL
RAG_ROUTER -- "query cache-hit" --> FLYWHEEL
style RDW fill:#ffcccc,stroke:#cc0000
style IQ fill:#ffcccc,stroke:#cc0000
style MEM0 fill:#fff0cc,stroke:#cc8800
style MEM0J fill:#ffcccc,stroke:#cc0000
style KNOW fill:#ffcccc,stroke:#cc0000
style CHROMA fill:#ffcccc,stroke:#cc0000
style SESS fill:#fff0cc,stroke:#cc8800
Key findings:
- The LightRAG local write path (Claude Code → .md → hook → LightRAG) works but the queue-drain path (746+ items from bookstack, MC logs, fsevents) is completely deadlocked because
rag-drain-workercannot authenticate through Cloudflare Access (Vaultwarden ETIMEDOUT). - mem0 is a ghost: server alive, 93K+ vectors in Qdrant, zero active writers, zero active readers through the API.
- Chroma is a full orphan: 6,584 embeddings from an unknown writer, no identified reader.
- The Qdrant
hivemindcollection (60K+ vectors) is live becausehivemind.jswrites to it directly, bypassing the mem0 API entirely — this is the only healthy Qdrant write path.
Section C — Agent / Persona / Chain Plane
flowchart TD
SMJ["specialist-mapping.json\n~/system/agents/specialist-mapping.json\n29 mapped agents\n9 registered companies\nSOURCE OF TRUTH (incomplete)"]
CLAUDE_AGENTS["~/.claude/agents/\n66 .md files\nRUNTIME STORE\n(what Claude Code can dispatch)"]
DEFINITIONS["~/system/agents/definitions/\nBACKUP STORE\n48 synced + 8 definitions-only"]
SYNC["~/bin/agent-definitions-sync.sh\nMANUAL — not scheduled"]
PERSONAS["~/system/agents/personas/\n12 persona dirs"]
P_REAL["8 Routable Companies\nAgentForge, CodeCraft, Finverge\nFlowForge, Proveo, Securion\nSkybound, Vizu\n(partial mapping only)"]
P_PHANTOM["4 Phantom Companies\nAxiom, Datavera, Resolver, Lexicon\nFull persona dirs, CLAUDE.md, agents/\n0 entries in specialist-mapping.json\nDispatch path = NONE via John routing"]
CHAINS["~/system/agents/chains/\n35 .yaml files\nNO chain runner exists\nall DEAD as executable automation"]
MAPPED_OK["24 mapped agents\nreachable on disk\nCAN be dispatched"]
MAPPED_MISSING["7 mapped agents\nIN specialist-mapping.json\nMISSING from ~/.claude/agents/\ndispatches SILENTLY FAIL\n(dorota-huizinga, hadi-hariri\njames-bach, lee-robinson\nlisa-crispin, minion\nanthropicchief-architect=fully phantom)"]
UNMAPPED_CRITICAL["Critical unmapped agents\nIN ~/.claude/agents/\nNOT in specialist-mapping.json:\n- validator (44 skill refs)\n- distiller (21 chain refs)\n- mehanik (7 skill refs)\n- evidence-verifier\n- baseline-comparator\n- dzevad-jahic (Lexicon)\n- planner (phantom — in chains only)"]
UNMAPPED_ORPHAN["11 Orphan agents\nno chain/skill/daemon refs:\n0.md, dr-sarah-chen, Explore\nhelixsupport, indy-dandev\nmaria-santos, meta-agent\nPlan, rag-builder\nredzo-reviewer, thaer-sabri"]
SMJ --> CLAUDE_AGENTS
SMJ -. "7 mapped agents\nnot on disk = UNREACHABLE" .-> MAPPED_MISSING
CLAUDE_AGENTS --> MAPPED_OK
CLAUDE_AGENTS --> UNMAPPED_CRITICAL
CLAUDE_AGENTS --> UNMAPPED_ORPHAN
DEFINITIONS -- "manual sync\n(agent-definitions-sync.sh)" --> CLAUDE_AGENTS
SYNC -. "not scheduled\ndrift pressure continuous" .-> DEFINITIONS
PERSONAS --> P_REAL
PERSONAS --> P_PHANTOM
P_PHANTOM -. "no routing entry\ndirect session name-drop only\nundocumented and unreliable" .-> CLAUDE_AGENTS
P_REAL --> SMJ
CHAINS -. "NO EXECUTOR\n35 YAML files are docs only\nSkills call agents inline\nnot via chain runner" .-> CLAUDE_AGENTS
style MAPPED_MISSING fill:#ffcccc,stroke:#cc0000
style P_PHANTOM fill:#fff0cc,stroke:#cc8800
style CHAINS fill:#ffcccc,stroke:#cc0000
style UNMAPPED_CRITICAL fill:#fff0cc,stroke:#cc8800
Key findings:
- specialist-mapping.json covers only 29 of 66 agents (44%). The two highest-usage agents system-wide —
validator(44 skill file refs) anddistiller(21 chain refs) — are completely absent from the routing table. - 7 agents are mapped (John thinks he can dispatch them) but physically missing from
~/.claude/agents/. Any dispatch attempt silently fails. - 35 chain YAML files have no executor. They exist as documentation only — skills invoke agents inline and ignore chain files entirely.
- 4 phantom companies (Axiom, Datavera, Resolver, Lexicon) have full organizational infrastructure on disk but are completely invisible to John's routing system.
Section D — The True Picture (CEO-readable, 60 seconds)
Plan vs. Reality
The architecture diagram on paper shows: CEO gives task → John gates it through Mehanik → pi-orchestrator dispatches specialists → work gets done → verifier autonomously checks it → mc.js closes the loop.
The actual flow is: CEO gives task → John manually dispatches a specialist in the current conversation → specialist builds → John manually verifies (or CEO does) → John manually calls mc.js done.
Every automatic layer between "task received" and "task closed" is either in mock mode, deadlocked, or simply absent.
The 3 Fattest Dead Edges
Dead Edge 1 — pi-orchestrator in MOCK MODE.
The orchestration kernel (PID 75750) is alive and cycling every 30 seconds. It reads alai-config-mock.json. Planka/MC API at localhost:3100 is not listening. The kernel prints "No eligible tasks" and does nothing. Every task that should flow automatically through the factory instead requires John to manually dispatch via conversation. This is the single edge whose repair would convert the factory from "manual assembly" to "automated pipeline."
Dead Edge 2 — RAG drain-worker deadlocked (946 items queued, 0 drained).
Three adapters (BookStack, MC logs, filesystem events) successfully enqueue documents into ingest-queue.sqlite. The drain-worker (PID 3640) picks them up and tries to POST to LightRAG through Cloudflare Access — but Vaultwarden times out, so CF credentials cannot be fetched. The entire 946-item queue has been frozen. Meanwhile, the fsevents adapter is watching for filesystem changes and trying to enqueue lightrag-monitor health files — creating a feedback loop where the monitoring system feeds into the broken pipeline it is monitoring. One credential fix (valid /tmp/bw-session + reachable Vaultwarden) unblocks all three adapters simultaneously.
Dead Edge 3 — Verifier auto-invocation ABSENT.
The verify-fix-loop skill and its verifier + fix-builder agents are fully specified and internally correct. There is zero wiring to any automated trigger. No hook, no daemon, no pi-orchestrator code calls them. When mc.js ready fires, no verification agent is invoked. CEO is the de-facto quality gate for the entire factory. One wiring point in /task-postflight SKILL.md (Section 2b) would give autonomous verification for non-high-stakes tasks immediately, without new infrastructure.
The 3 Highest-Leverage Wire Fixes
Fix 1 — Restore pi-orchestrator real config (L fix, maximum leverage).
Determine why alai-config-mock.json loads instead of real config. If Planka is intentionally offline, restore it or point the orchestrator at the real MC API endpoint. This single fix converts the factory from "John as human dispatcher" to "automated task routing." Impact: every other automation layer (specialist dispatch, postflight, cost tracking) becomes meaningful instead of idle.
Fix 2 — Fix rag-drain-worker CF credentials (S fix, unblocks 946-item queue).
Ensure Vaultwarden is reachable and /tmp/bw-session is valid for the service token that holds the LightRAG CF Access credentials. This is estimated as a 30-minute fix (refresh session token + verify vault connectivity). Impact: 946 queued RAG items drain, BookStack sync resumes, MC outcome logging resumes, the circular monitoring feedback loop breaks.
Fix 3 — Wire verify-fix-loop into /task-postflight (M fix, eliminates CEO-as-verifier bottleneck).
Add a Section 2b to ~/.claude/skills/task-postflight/SKILL.md: after Proveo passes AC checklist, dispatch /verify-fix-loop for docs / system / refactor / polish domain tasks (MAX_LOOPS=3, $5 cap already defined in the skill). This requires no new infrastructure — the skill conversation context already supports Task dispatch. Impact: CEO is removed from the quality loop for the majority of non-high-stakes tasks.
Section E — Edge Inventory Table
| # | From | To | Transport | Status | Evidence | Fix Size |
|---|---|---|---|---|---|---|
| 1 | CEO | John (orchestrator) | CLI conversation | LIVE | Observed every session | — |
| 2 | John | /mehanik gate | Task dispatch / CLI | LIVE | 113 cleared tokens in /tmp | — |
| 3 | /mehanik gate | Blueprint read | Read tool call | PARTIAL | CB#2 enforced; WARN scores (65/80) pass; missing-MC-ID bypasses gate entirely (2.3) | S |
| 4 | /mehanik gate | /prompt-forge | CLI / Task dispatch | LIVE | Observed in token chain | — |
| 5 | /prompt-forge | pi-orchestrator | mc.js write / Task | PARTIAL | pi-orch alive but MOCK MODE (1.4) | L |
| 6 | pi-orchestrator | Specialist agent | Task dispatch | DEAD | MOCK MODE — "No eligible tasks" every 30s; Planka localhost:3100 not listening (1.4) | L |
| 7 | John (manual) | Specialist agent | Task dispatch (CLI) | LIVE | Observed — this is the actual dispatch path | — |
| 8 | Specialist agent | Tools (Read/Edit/Bash) | Tool API calls | LIVE | 250 live tools verified (1.2) | — |
| 9 | Tools | Artifact (file/code) | Write / Edit | LIVE | Standard Claude Code behavior | — |
| 10 | Artifact | mc.js ready | mc.js write + hook | LIVE | mc-ready-gate.sh fires; 12 active hooks (2.2) | — |
| 11 | mc.js ready | verifier / verify-fix-loop | (absent) | DEAD | 0 hooks, 0 daemons, 0 pi-orch code calls verify-fix-loop (2.2) | M |
| 12 | mc.js ready | /task-postflight | Manual CLI invocation | PARTIAL | H-tasks only; manual trigger; no auto-invocation (2.2) | M |
| 13 | /task-postflight | Proveo / Angie Jones | Task dispatch | LIVE | Skill dispatches angie-jones.md; present on disk (2.2) | — |
| 14 | Proveo | mc.js done | mc.js write | LIVE | AC checklist → done path works | — |
| 15 | mc.js done | pi-orchestrator (next task) | mc.js event / API | DEAD | MOCK MODE — pi-orch does not react to done events (1.4) | L |
| 16 | Claude Code built-in | .md memory files | Native write | LIVE | 123 files, auto-written by Claude Code (1.1) | — |
| 17 | .md memory files | lightrag-auto-ingest.sh | PostToolUse hook trigger | LIVE | Hook fires on Write/Edit to in-scope paths (1.1) | — |
| 18 | lightrag-auto-ingest.sh | LightRAG localhost:9621 | curl POST | LIVE | 999 docs indexed; pipeline_busy=true (1.1) | — |
| 19 | discover.js | LightRAG (external) | HTTPS GET to lightrag.alai.no | LIVE | External hostname via Caddy proxy (1.1) | — |
| 20 | rag-bookstack-adapter | ingest-queue.sqlite | SQLite write | DEAD | Exit 256 — backpressure gate (946 > 500) from frozen drain-worker (1.4) | S |
| 21 | rag-mc-adapter | ingest-queue.sqlite | SQLite write | DEAD | Exit 256 — same backpressure cascade (1.4) | S |
| 22 | rag-fsevents-adapter | ingest-queue.sqlite | SQLite write / WatchPaths | DEAD | Exit 1 — blocked by backpressure; also feeding monitoring artifacts into queue (1.4) | S |
| 23 | rag-drain-worker | LightRAG (via CF Access) | HTTPS POST (authenticated) | DEAD | Vaultwarden ETIMEDOUT — CF credentials unavailable; 946 items queued, 0 drained (1.4) | S |
| 24 | Any tool/hook/daemon | mem0 API localhost:9000 | HTTP POST | DEAD | 0 callers found in all of ~/system/tools, ~/.claude/hooks, ~/system/daemons (1.1) | M |
| 25 | discover.js | mem0 API | HTTP GET | DEAD | discover.js does not query localhost:9000 (1.1) | M |
| 26 | mem0 API | Qdrant mem0_john collection | gRPC / HTTP | PARTIAL | Server healthy; mem0_john has 865 stale vectors; no active writer to keep them fresh (1.1) | M |
| 27 | hivemind.js | HiveDB SQLite | SQLite write | LIVE | 17,551 intel rows; write path active (1.1) | — |
| 28 | hivemind.js | Qdrant hivemind collection | HTTP (qdrant-client) | LIVE | 60,442 vectors; dual-write best-effort (1.1) | — |
| 29 | Chroma store | Any consumer | (unknown) | DEAD | 6,584 embeddings, no traced writer or reader (1.1) | M |
| 30 | agent-definitions-sync.sh | ~/.claude/agents/ | file copy | PARTIAL | 48 files synced; 8 definitions-only agents unreachable at runtime; sync not scheduled (1.3) | S |
| 31 | specialist-mapping.json | Dispatch routing | JSON lookup | PARTIAL | 29/66 agents mapped; validator (44 refs) and distiller (21 refs) absent; 7 mapped agents missing from disk (1.3) | M |
| 32 | 35 chain YAML files | chain runner / executor | (absent) | DEAD | No chain runner exists; skills call agents inline; chains are documentation only (1.3) | L |
| 33 | John routing | Axiom/Datavera/Resolver/Lexicon | discover.js lookup | DEAD | 4 companies absent from specialist-mapping.json; routing impossible via normal path (1.3) | M |
| 34 | pi-orch-health monitor | pi-orchestrator health signal | shell script | DEAD | pi-orch-health.sh deleted; last verdict 2026-05-06 CRITICAL; dark since (1.4) | S |
| 35 | cost-daily-report daemon | daily cost visibility | shell script | DEAD | cost-daily-report.sh deleted; cost reporting dark since 2026-04-29 — 10 days (1.4) | S |
| 36 | mc-ready-gate.sh | Blueprint score enforcement | blueprint-check.js | PARTIAL | Check runs; WARN scores (65, 80) allow dispatch; threshold 90 is advisory only (2.3) | S |
| 37 | Mehanik | Session binding validation | token mehanik_session_id | DEAD | All 113 inspected tokens show mehanik_session_id: unknown; cross-session reuse possible (2.3) | S |
| 38 | b2-offsite-backup | B2 cloud storage | B2 API | DEAD | 403 storage_cap_exceeded; nightly snapshots not landing (1.4) | S |
| 39 | litestream | B2 replication stream | B2 API | PARTIAL | Litestream PID alive; separate nightly job fails; live replication status uncertain (1.4) | S |
| 40 | slack-bot | Slack WebSocket | Socket Mode | PARTIAL | PID 18046 alive; last crash exit 1; 300min silent at audit time; reconnects on timeout (1.4) | S |
Status key:
- LIVE — flow confirmed working by tool-verified evidence
- DEAD — flow confirmed broken or absent by tool-verified evidence
- PARTIAL — flow structurally exists but has gaps, bypass paths, or degraded throughput
Fix size:
- S — Small: under 4 hours, single-file or credential change
- M — Medium: 1–2 days, new wiring or multi-file coordination
- L — Large: 3+ days, architectural change or multi-system coordination
Summary Statistics
| Category | Count |
|---|---|
| Total edges inventoried | 40 |
| LIVE | 15 |
| DEAD | 15 |
| PARTIAL | 10 |
| Edges repairable with S fix | 10 |
| Edges repairable with M fix | 8 |
| Edges repairable with L fix | 3 |
The factory has a 37.5% live edge rate. The remaining 62.5% of advertised flows are either dead or degraded. The 3 L-fixes (pi-orchestrator mock mode, chain runner, verifier auto-invocation architecture) unblock the most downstream flows if resolved. The 10 S-fixes are individually cheap and collectively close significant operational blind spots (cost reporting, RAG drain, blueprint score enforcement, monitoring, B2 backup).
Inventory: Memory Plane
Memory Plane Inventory — AI Factory Audit
Date: 2026-05-09
Auditor: Chip Huyen (AgentForge)
Scope: Read-only probe. No mutations.
Task: Plan Task 1.1 — Memory Plane Inventory
1. Per-Store Table
| Store | Endpoint / Path | Schema / Collections | Live Count | Write Path | Read Path | Owner Daemon | Status |
|---|---|---|---|---|---|---|---|
| mem0 / Qdrant | http://localhost:9000 (mem0 API) / http://localhost:6333 (Qdrant gRPC+HTTP) |
5 collections: mem0migrations (0 pts), sessions (929 pts), hivemind (60,442 pts), mem0_john (865 pts), knowledge (31,274 pts) |
93,510 total vectors | No caller found. mem0 API (POST /add) is NEVER called by any hook, tool, or daemon in ~/system/tools/ or ~/.claude/hooks/. hivemind.js dual-writes to Qdrant hivemind collection directly via internal HTTP (port 6333). |
No tool reads localhost:9000 for queries. hivemind.js semantic search reads Qdrant hivemind collection directly via qdrant-client. discover.js does NOT query mem0. |
com.alai.mem0-server (LaunchAgent, KeepAlive=true, PID 65706 alive, last exit was SIGTERM -15) |
HEALTHY (server alive, but ORPHANED — no producer writes to mem0_john or knowledge via the mem0 API) |
| Chroma | ~/.claude-mem/chroma/chroma.sqlite3 |
1 collection: cm__claude-mem |
6,584 embeddings | Unknown — no daemon or hook references claude-mem path in scanned tools. Likely written by a claude-mem MCP server or CLI tool directly. |
Unknown — no caller found in ~/system/tools/ or ~/.claude/hooks/. |
None identified | PARTIAL (data exists, producer and consumer both untraced) |
| LightRAG | http://localhost:9621 |
Neo4J graph + NanoVectorDB + JsonKV storage; workspace /app/data |
999 processed docs, 1 failed (pipeline_busy=true, 120 async locks pending — actively ingesting) | ~/.claude/hooks/lightrag-auto-ingest.sh (PostToolUse: Write/Edit) — fires on writes to ~/.claude/projects/-Users-makinja/memory/*.md, ~/system/specs/*.md, and /tmp/*-bookstack-*.md. Also com.alai.lightrag-outbox-ingest.plist daemon. |
discover.js — primary read path. Queries https://lightrag.alai.no/query (external hostname, not localhost). Fallback: if local hits < 3, LightRAG fallback fires. |
com.alai.lightrag-watchdog.plist, com.alai.lightrag-keepwarm.plist, com.alai.lightrag-backup.plist, com.john.lightrag-monitor.plist, com.alai.lightrag-migrate-pump.plist |
HEALTHY (serving, ingesting) |
| HiveDB (SQLite) | ~/system/agents/hivemind/hivemind.db |
7 tables: agents (139 rows), memos (100 rows), intel (17,551 rows), subscriptions (6 rows), _litestream_seq, _litestream_lock, sqlite_sequence |
17,551 intel rows (NOTE: context memo said 64,889 — live probe shows 17,551; delta likely from live deletions or memo was stale) | hivemind.js post <agent> <type> <message> — agents call this CLI to write intel. Also dual-writes embeddings to Qdrant hivemind collection (best-effort, fire-and-forget). |
hivemind.js read/query/search — text search + semantic search (cosine sim against local embeddings or Qdrant). discover.js does NOT query HiveDB directly. |
hivemind.js (stateless CLI, no daemon; called ad-hoc by agents) |
HEALTHY |
| .md auto-memory | ~/.claude/projects/-Users-makinja/memory/ |
123 .md files (MEMORY.md index + per-topic files + feedback memos + _archive/) |
123 files | Claude Code's built-in auto-memory system (native Claude Code feature — writes .md files after conversations automatically, not via any explicit hook or daemon). lightrag-auto-ingest.sh PostToolUse hook then ingests these into LightRAG when they are written/edited. |
CLAUDE.md "Context Loading" section instructs John to Read specific files directly. discover.js memory "<topic>" is documented as LightRAG-backed (reads LightRAG, not the .md files directly). |
Built-in Claude Code (no external daemon) | HEALTHY (write path functional; read path partially bypassed — LightRAG index only 999 docs, not all 123 .md files confirmed ingested) |
2. Producer → Consumer Matrix
| Producer | Store Written | Consumer | Notes |
|---|---|---|---|
| Claude Code built-in auto-memory | ~/.claude/projects/-Users-makinja/memory/*.md (123 files) |
lightrag-auto-ingest.sh hook (secondary producer → LightRAG) |
Auto-memory is Claude Code native. The .md write triggers the hook. |
lightrag-auto-ingest.sh (PostToolUse hook) |
LightRAG http://localhost:9621 |
discover.js (primary RAG consumer) |
Only fires on Write/Edit tool calls to in-scope paths. Does NOT write to mem0. |
com.alai.lightrag-outbox-ingest.plist daemon |
LightRAG | discover.js |
Batch ingest pipeline for outbox staging |
hivemind.js post (called by agent tools) |
HiveDB SQLite hivemind.db + Qdrant hivemind collection (dual-write) |
hivemind.js read/query/search (CLI) |
Qdrant hivemind = 60,442 vectors; SQLite intel = 17,551 rows — divergence suggests Qdrant has historical vectors beyond current SQLite rows (possibly from bulk migration) |
| NOBODY | mem0 API (localhost:9000/add) — mem0_john collection (865 pts), knowledge collection (31,274 pts) |
NOBODY reads via mem0 API either | WIRE BREAK: mem0_john has 865 facts that were presumably written at some point (possibly during initial mem0 setup / manual population), but no current tool, hook, daemon, or agent calls POST localhost:9000. The mem0 API is a running server with no active clients. |
| NOBODY identified | Chroma ~/.claude-mem/chroma/ (6,584 embeddings) |
NOBODY identified | Chroma has data (6,584 embeddings in cm__claude-mem) but producer and consumer are both untraced in current tooling. Likely written by a claude-mem MCP tool in a previous iteration. |
com.john.session-archiver.plist |
Likely sessions Qdrant collection (929 pts) |
discover.js --sessions (reads sessions SQLite, not Qdrant) |
Sessions exist in Qdrant but discover.js reads from a local SQLite sessions table, not via mem0 or Qdrant API |
rag-router.js learn |
~/system/databases/flywheel.db (SQLite: interactions + rag_cache) |
rag-router.js query (cache-hit path) |
Sixth store — flywheel SQLite, not listed in original inventory. Routes: cache → local Ollama → external. Does not touch mem0. |
3. SoR Gap Analysis — Duplicated Fact Classes
| Fact Class | Stores Containing It | Designated SoR | Derivative / Shadow | Gap / Conflict |
|---|---|---|---|---|
| Agent intel / decisions | HiveDB intel table (17,551 rows) + Qdrant hivemind collection (60,442 vectors) |
HiveDB SQLite (primary; hivemind.js writes here first) |
Qdrant hivemind (dual-write, best-effort) |
60,442 Qdrant vectors vs 17,551 SQLite rows = 3.4x divergence. Qdrant likely contains orphaned vectors from deleted/purged SQLite rows, or a bulk historical migration that wasn't reflected in SQLite. No reconciliation daemon exists. |
| Session summaries / history | Qdrant sessions (929 pts) + likely local session SQLite (referenced by discover.js) + .md memory files (MEMORY.md index) |
Undefined — no explicit SoR designation | All three are partial | discover.js --sessions reads SQLite, not Qdrant sessions. Who writes Qdrant sessions? Untraced. |
| John's personal facts / preferences | mem0 mem0_john collection (865 vectors) + .md auto-memory files (123 files) + LightRAG (999 docs, subset overlapping .md files) |
Intended SoR: mem0 (mem0_john) — but NO active writer. Actual SoR: .md files (Claude Code writes here). |
LightRAG is downstream derivative of .md files via lightrag-auto-ingest.sh |
Critical SoR conflict: 865 facts in mem0 are STALE (last written at setup, no ongoing writes). 123 .md files are current. LightRAG is a partial index of .md files. Three stores claim the same fact class with no reconciliation. |
| Knowledge base / operational docs | mem0 knowledge collection (31,274 vectors) + LightRAG (999 docs, BookStack exports) + Chroma (6,584 embeddings) |
Undefined | All three parallel | knowledge collection in mem0 has 31,274 vectors — largest in mem0, but again no active writer via mem0 API. Origin unknown. Chroma cm__claude-mem (6,584) is also an orphan with no identified current writer or reader. |
| HiveMind broadcast intel | HiveDB hivemind Qdrant collection (60,442) + HiveDB SQLite intel (17,551) |
HiveDB SQLite is the write authority | Qdrant hivemind is derivative (dual-write from hivemind.js) |
No hivemind HTTP API exists (confirmed: port 3001 is Drop API). Qdrant hivemind is only queryable via hivemind.js semantic search CLI, not accessible to other tools. |
4. Critical: The .md vs mem0 Wire Break
What was supposed to happen
The architecture assumes mem0 (http://localhost:9000) is the structured personal memory SoR for John. The mem0_john collection exists with 865 facts. The sessions collection has 929 entries. The server is alive and healthy.
What actually happens
Step 1 — .md files are written by Claude Code natively.
Claude Code has a built-in auto-memory feature that writes conversation summaries and facts as .md files into ~/.claude/projects/-Users-makinja/memory/. This is NOT a hook or daemon — it is a built-in Claude Code behavior. No line of code in ~/system/ controls this write.
Step 2 — lightrag-auto-ingest.sh hooks into the .md write.
File: ~/.claude/hooks/lightrag-auto-ingest.sh (PostToolUse on Write/Edit).
This hook detects when a .md file is written to ~/.claude/projects/-Users-makinja/memory/*.md and fires a background curl POST to LightRAG (http://localhost:9621/documents/text). This is the ONLY downstream pipeline from .md files.
Step 3 — mem0 API is never called.
Grep across all of:
~/system/tools/*.js— 0 files calllocalhost:9000~/.claude/hooks/*.sh— 0 files calllocalhost:9000~/system/daemons/— not scanned exhaustively but mem0-server plist confirms it's only a server, not a writerpi-orchestrator.js— the one hit forlocalhost:9000is SonarQube (port 9000 collision), not mem0
The exact wire break: There is no POST http://localhost:9000/add call anywhere in the active system. The mem0 server was built and populated (865 facts in mem0_john, 31,274 in knowledge) at some point — likely during initial setup or a one-time migration — but the "auto-write to mem0" integration was never wired into the live pipeline. The lightrag-auto-ingest.sh hook was written instead, routing .md → LightRAG, leaving mem0 as a read-only relic with stale data.
CEO complaint root cause confirmed: "implementation is not ideal — memory writes to .md files instead of mem0" is accurate. The intended SoR (mem0) has no active producer. The actual write path is: Claude Code → .md files → lightrag-auto-ingest.sh → LightRAG. mem0 is running, healthy, and populated with 865+31,274 stale vectors that nobody reads.
HiveDB relationship
HiveDB (hivemind.db) is a SEPARATE concern from personal memory. It is the agent broadcast / intel bus, not John's fact store. However, the Qdrant hivemind collection (60,442 vectors) lives in the same Qdrant instance as mem0_john, creating the appearance of a unified store when it is actually two separate logical systems sharing infrastructure.
5. Store Status Summary
| Store | Healthy? | Active Producer? | Active Consumer? | Data Fresh? |
|---|---|---|---|---|
mem0 / Qdrant mem0_john |
Yes | NO | NO | NO — 865 facts, stale |
mem0 / Qdrant knowledge |
Yes | NO | NO | NO — 31,274 vectors, stale |
mem0 / Qdrant sessions |
Yes | Unknown | NO | Unknown |
mem0 / Qdrant hivemind |
Yes | Yes (hivemind.js dual-write) | Yes (hivemind.js semantic search) | YES |
| HiveDB SQLite | Yes | Yes (hivemind.js CLI) | Yes (hivemind.js CLI) | YES — 17,551 rows |
| LightRAG | Yes | Yes (lightrag-auto-ingest.sh hook + outbox daemon) | Yes (discover.js) | YES — 999 docs, pipeline busy |
| Chroma | Yes (file exists) | UNKNOWN | UNKNOWN | Unknown origin |
| .md auto-memory | Yes | Yes (Claude Code native) | Partial (direct Read + LightRAG index) | YES — 123 files |
| Flywheel SQLite | Presumed yes | Yes (rag-router.js learn) | Yes (rag-router.js query) | Unknown |
Open Questions
-
Chroma write/read path: Who wrote 6,584 embeddings to
~/.claude-mem/chroma/cm__claude-mem? Which tool or MCP server reads from it? Theclaude-memMCP is referenced in settings but not found in scanned tool code. Needs:grep -r "claude-mem\|chroma" ~/.claude/settings.jsonand MCP server registry audit. -
Qdrant
sessionswriter: Who writes 929 session vectors to thesessionsQdrant collection?com.john.session-archiver.plistis a candidate but the script path was not read. Needs:cat ~/Library/LaunchAgents/com.john.session-archiver.plist+ script inspection. -
Qdrant
knowledgeorigin: 31,274 vectors inknowledge— when were they written and from what source? No active writer found. Possible: one-time BookStack bulk ingest or a migration. Check~/system/mem0/server.pyfor any bulk-load routines at startup. -
HiveDB vector divergence: 60,442 Qdrant vectors vs 17,551 SQLite intel rows. Are the extra ~43K vectors orphaned (deleted SQLite rows without Qdrant cleanup), or does Qdrant have independent content? Needs: sample Qdrant payload IDs vs SQLite
idcolumn cross-check. -
LightRAG external hostname:
discover.jsquerieshttps://lightrag.alai.no/query(external URL from config), nothttp://localhost:9621. Is there a Caddy/Cloudflare proxy routinglightrag.alai.no→localhost:9621? If that proxy is down,discover.jswould silently fail to read from LightRAG despite the local container being healthy. -
mem0_john 865 facts provenance: When were these written? Is there a one-time ingestion script (e.g.,
~/system/mem0/populate.pyor similar)? If the facts are high-quality (personal preferences, CEO directives), they are the most actionable store to re-wire as the active SoR. -
rag-router.jsflywheel.db size and health: Not probed live. Needssqlite3 ~/system/databases/flywheel.db "SELECT count(*) FROM interactions; SELECT count(*) FROM rag_cache;". -
mem0
server.py— does it expose/addor/searchroutes?: Confirmed health endpoint works. Need to verify actual API surface to confirm if a PostToolUse hook callingPOST localhost:9000/addwould work as-is without code changes to mem0.
Inventory: Tools Shed
Tools Shed Audit — 2026-05-09
Audit Scope: ~/system/tools/ (443 files on disk) Manifest Version: ~/system/tools/manifest-index.md (282 rows, last update 2026-04) Audit Date: 2026-05-09 Auditor: John (Explore Agent, read-only)
Summary
| Classification | Count | Pct |
|---|---|---|
| LIVE (referenced in daemons/agents/skills/chains) | ~250 | 56.4% |
| .BAK / .pre- / .deployed* | 50 | 11.3% |
| JUNK (malformed name, 0-byte, JSON-as-filename) | 3 | 0.7% |
| DEAD-CODE (no caller, not in manifest LIVE list) | ~100 | 22.6% |
| UNCLASSIFIED (catalog gaps, unclear status) | ~40 | 9.0% |
Total Disk Space: 502 MB (dominated by .venv/ + subdirectory trees)
1. Total Counts by Classification
Live Tools (ACTIVE status in manifest or active daemon references)
Count: ~250 tools Source: manifest-index.md lists 201 ACTIVE entries (pre-2026-04), plus ~49 tools in daemons/ that were added post-manifest update.
Top-tier LIVE tools (by size):
- mc.js (250 KB) — Mission Control CLI, last modified 2026-05-08 ✓ CURRENT
- mc-dashboard.js (170 KB) — dashboard, last modified 2026-04-06
- manifest.md (94 KB) — full manifest (separate from manifest-index.md)
- auto-report.js (51 KB) — daily/weekly report generator
- slack-bot.js (49 KB) — Slack daemon
- invoice-generator.js (48 KB) — invoice CRUD
- event-handlers.js (46 KB) — event dispatch
- mail-native.js (40 KB) — IMAP/SMTP fallback
Backup Files (.bak*, .pre-*, .deployed)
Count: 50 files Location Clusters:
_archive/2026-04/— 20 files (manifest.md, mc.js, qa-19.js, event-handlers.js, comms-responder.js variants, kimi-*, youtube-learning, slack-bot.js variants, rag-context-for-builder.js, resource-governor.js)- Root level — 30 files (autocoder.js.pre-azure-cutover-20260419, lightrag*.pre-azure-cutover, mc.js.bak-* variants, comms-, council-, mini-da, ollama-, prompt-tester, rag-, retrieval-orchestrator.pre-, system-regression.pre-, transcript-, vector-)
Age Analysis (sample):
- Mar 07–14, 2026 (52 days old) — oldest: resource-governor.js.bak, kimi-server.sh.bak, kimi-monitor.js.bak
- Apr 02, 2026 (37 days old) — mc.js.bak-aaos-20260402
- Apr 10–20, 2026 (19–29 days old) — most common, pre-azure-cutover-* batch (highest density)
- Apr 30, 2026 (9 days old) — bulk-dated backup cluster (appears to be organized archive pass)
All .bak files are > 14 days old. Safe for archival per planning assumptions.
Junk Findings
3 malformed/suspect filenames identified:
-
Credential-bearing JSON-as-filename artifact (0 bytes)
- Created: 2026-02-24 06:39
- Issue: LITERAL JSON object with test credentials embedded as filename
- SECURITY RISK: Credentials (passwords, tokens, keys) encoded in filesystem path
- Source: Appears to be tool output-capture error (shell process writing object serialization instead of text)
- Recommendation: DELETE immediately + audit all tools for output-capture leaks + add alai-hooks gate
-
.alai/context-index.db-wal(inside tools/)- Zero-byte WAL journal file
- Not a proper tool — appears to be SQLite write-ahead log (orphaned)
- Recommendation: DELETE
-
alai-hooks/.gradle/subdirectories- Gradle cache files (0-byte metadata: gc.properties, REQUESTED markers)
- Inside
alai-hooks/(Java/Kotlin project) - Not tools — system detritus
- Recommendation: purge from /tools/ to /archive/, keep only alai-hooks source
Zero-byte files: Multiple .REQUESTED, .lock, gc.properties inside Python venv — expected (pip metadata). Not tools.
2. Manifest Drift Analysis
Manifest Entries Scanned: 282 rows (manifest-index.md)
Cross-reference results:
| Status | Count | Notes |
|---|---|---|
| Exists on disk | ~250 | All LIVE/ACTIVE referenced tools present |
| DELETED in manifest, absent from disk | 31 | Expected (deleted per manifest Sprint 2/3, 2026-02-26) |
| Referenced in manifest but ARCHIVED | 6 | docuseal-monitor.js, docuseal-webhook.js, blueprint-runner.js, blueprint-compose.js, etc. — moved to ~/system/archive/replaced-by-n8n-2026-02/ |
| Manifest lists as ACTIVE but STALE (>30d) | ~8 | intel-briefing.js (Apr 6), council-briefing.js (pre-extract), ollama-workers/* (last mod Mar–Apr) |
| Subdirectory tools NOT in manifest | ~40–60 | comms-agent/, browser-use-explorer/, alai-hooks/ internal tools (Kotlin, TypeScript, Python) — not catalogued |
| MANIFEST MISSING entries | 15–20 | Post-2026-04 additions (tier-router, skill-router, claim-detector, mini-da, drift-detector, tool-sync-audit, tool-dedup-report, multi-client routing, agent-metrics-api, agent-timeout-monitor) |
Drift Conclusion: Manifest is ~6 weeks stale. 201 ACTIVE tools documented; ~250–300 actually running (50–100 undocumented, mostly post-Feb architectural shifts + sub-agent frameworks).
3. Un-owned LIVE Tools
Tools referenced in daemons or .md but NOT explicitly claimed in manifest ACTIVE list:
| Tool | Caller | Owner (inferred) | Status |
|---|---|---|---|
| tier-router.js | agent-runner.js, task-router.js | (unassigned) | LIVE, no owner |
| skill-router.js | mc.js, plan-enforcer | (unassigned) | LIVE, no owner |
| claim-detector.js | cove.js, drift-detector | (unassigned) | LIVE, no owner |
| claim-verifier.js | cove.js, qa-19.js | (unassigned) | LIVE, no owner |
| drift-detector.js | daemon (daily 23:55) | (unassigned) | LIVE, daemon-run |
| tool-sync-audit.js | daemon (daily 03:00) | (unassigned) | LIVE, daemon-run |
| tool-dedup-report.js | daemon (Monday 06:00) | (unassigned) | LIVE, daemon-run |
| agent-metrics-api.js | agent-orchestrator.js | (unassigned) | LIVE, endpoint |
| agent-timeout-monitor.js | agent-runner.js | (unassigned) | LIVE, daemon-enforcer |
| ollama-workers/* (4 tools) | automation (referenced in session-archiver) | (unassigned) | LIVE, utilities |
| forge-status.js | studio-health.js, emergency-repl | (unassigned) | LIVE |
| studio-health.js | ops-watchdog, ollama-engine | (unassigned) | LIVE |
Implication: 12+ mission-critical tools lack explicit owner/status in manifest. Creates risk of accidental deprecation/orphaning.
4. Stale .bak Files (>14 days old)
All 50 .bak/* files are > 14 days old and safe for archival:
Oldest Batch (52 days; safe to archive):
- resource-governor.js.bak-20260310-184907 (Mar 10)
- kimi-server.sh.bak-20260313-181327 (Mar 13)
- kimi-monitor.js.bak-20260313-181327 (Mar 13)
- youtube-learning.js.bak-20260316-084904 (Mar 16)
- event-handlers.js.bak.20260314-043322 (Mar 14)
- ollama-tool-agent.js.bak-20260316-234508 (Mar 16)
- qa-19.js.bak.20260314-043322 (Mar 14)
- mc.js.bak.20260314-043322 (Mar 14)
- mc.js.bak.20260310-184105 (Mar 10)
Mid-range (37 days):
- mc.js.bak-aaos-20260402 (Apr 2)
- mc.js.bak-before-7082-7085 (Apr 2)
- health-monitor-anvil.js.bak (Apr 6)
- intel-briefing.js.bak (Mar 31)
Recent Batch (9 days; organized archive pass, Apr 30):
- _archive/2026-04/* (20 files, all Apr 30 11:25:48)
Recommendation: Move all .bak/* to dated subdirectory (e.g., _archive/2026-05/pre-may/), ZIP for offsite backup.
5. Additional Junk & Quality Findings
Missing Expected Files
Files referenced in manifest but NOT found on disk:
- (None critical; all listed DELETED files were already absent per manifest notes)
Suspicious Dead Code
| Tool | Symptom | Recommendation |
|---|---|---|
element-test.js (114 KB) |
No daemon/agent caller, appears test-only | Verify if part of active testing suite or orphaned |
durable-executor.js (59 KB) |
Shadowed by durable-runner.js; unclear distinction | Check if both needed or consolidate |
youtube-learning.js.bak (backup preserved) |
Original .bak exists; unknown if active service | Verify if YouTube integration still used |
resource-governor.js.bak (backup preserved) |
Resource control tool; backed up mid-March | Check if resource-governor.js ever went live |
Subdirectories with Nested Tools (Not in Manifest)
~/system/tools/comms-agent/ (TypeScript/Node monorepo)
src/, dist/ (telegram-handler.ts, index.js with .bak variants)
package.json, tsconfig.json
Status: ??? (unclear if actively deployed vs. dev artifact)
~/system/tools/browser-use-explorer/ (Python + Node, 1.2 GB)
.venv/lib/python3.12/site-packages/ (pip deps only, not code)
src/, package.json
Status: ??? (research tool? dev sandbox?)
~/system/tools/alai-hooks/ (Kotlin/Java, binary CLI)
gradle/, src/ (Kotlin security enforcement, codesigned binary)
Status: ACTIVE (referenced in mc.js, alai-hooks command used in hooks)
Note: Gradle .gradle/ cache should be archived
Finding: 3 subdirectories (80+ MB combined) are not documented in manifest. Unclear which are active, which are dev/research.
6. Top-10 Largest Tools
| Rank | Tool | Size | Last Modified | Status |
|---|---|---|---|---|
| 1 | browser-use-explorer/ | 320 MB | Apr 28 | ??? (venv=280MB) |
| 2 | comms-agent/ | 45 MB | Apr 1 | ??? (node_modules=40MB) |
| 3 | alai-hooks/ | 12 MB | May 6 | ACTIVE (Kotlin binary) |
| 4 | mc.js | 250 KB | May 8 | LIVE |
| 5 | mc-dashboard.js | 170 KB | Apr 6 | LIVE |
| 6 | manifest.md | 94 KB | Apr 14 | Reference doc |
| 7 | auto-report.js | 51 KB | Apr 24 | LIVE |
| 8 | pipeline-controller.js | 58 KB | Feb 26 | LIVE |
| 9 | slack-bot.js | 49 KB | Apr 6 | LIVE |
| 10 | invoice-generator.js | 48 KB | Feb 17 | LIVE |
Observation: Single .py + .venv project (browser-use-explorer) consumes 63% of ~/system/tools/ disk (320 MB).
- If research/PoC only: move to ~/projects/ or ~/backups/
- If production: document in manifest + verify active daemon
7. Live References — Tool Coverage
Tool consumer analysis (sample grep):
| Consumer | Count | Examples |
|---|---|---|
| ~/system/daemons/ | 42 scripts | mc-session-worker.sh, email-agent.js, ops-watchdog.js, flywheel-cycle.sh, auto-* (8), daemon-* (5), etc. |
| ~/.claude/agents/*.md | 28 files | builder.md, validator.md, resolver.md, linter.md, etc. — each requires 5–10 tools |
| ~/.claude/skills/ | 80+ skills | Each skill loads ~2–5 tools on demand (via skill-runner.js) |
| ~/system/agents/chains/*.yaml | 23 chains | Each chain references 1–3 tools for orchestration |
| ~/.claude/hooks/*.sh | 12 hooks | alai-hooks gating, process enforcement, mc claims |
Live tool hit count: ~250–280 tools have explicit caller references.
Open Questions
-
browser-use-explorer/: Is this an active production tool or a research sandbox? If research, should live in ~/projects/. 320 MB allocation is significant.
-
comms-agent/ subdirectory: Is this a stable deployed service or in-flight TypeScript migration? .bak variants suggest evolution.
-
alai-hooks/ binary codesigned: Latest mod 2026-05-06; clearly active. Should .gradle/ cache be cleaned or preserved?
-
50 .bak files: Do we need all 50, or is a rotating keep-last-3-per-tool strategy viable?
-
Manifest staleness: Should manifest-index.md be auto-refreshed daily (e.g., daemon that re-scans daemons/ + agents/ + chains/) to stay in sync?
-
12 un-owned tools: Should each be assigned explicit owner + manifest entry, or grouped under "Deterministic Enforcement" or "Agent Infrastructure"?
-
JSON-as-filename security: When created? Which tool? Did credentials leak to logs? Recommend grep of all logs for exposed secrets.
Recommendations (Audit-Level Only)
CRITICAL
-
Delete malformed filename immediately: Filename contains embedded credentials. Audit tools/, daemons/, and agents/* for output-capture leaks. Add alai-hooks gate to prevent future output-as-filename incidents.
-
Security review of JSON filename artifact:
- When was it created? (2026-02-24)
- Which tool created it? (Bash tool capture?)
- Did credentials leak to logs? (Grep logs for exposed patterns)
- Add validation layer to prevent credentials-in-paths
-
Document or relocate browser-use-explorer/:
- If active: add to manifest, assign owner, set LaunchAgent
- If research: move to ~/projects/ or archive, free 320 MB
HIGH
-
Refresh manifest-index.md:
- Add 50–60 undocumented post-Feb tools (tier-router, skill-router, claim-, drift-detector, tool-sync-audit, agent-metrics-api, agent-timeout-monitor, ollama-workers/, forge-status, studio-health)
- Assign ownership: which persona (CodeCraft, FlowForge, Proveo, Securion)?
- Set explicit LIVE vs. ARCHIVED vs. DEPRECATED status
-
Archive all .bak files:
- Create ~/system/archive/2026-05-09-bak-sweep/ (ZIP friendly)
- Move 50 .bak* files
- Update manifest with archive location + retention policy
-
Clarify comms-agent/ status:
- If deployed: verify daemon + manifest entry
- If migration: set deadline for TypeScript cutover or rollback
MEDIUM
-
Define tool ownership:
- Create manifest section: "Infrastructure Owner Assignments"
- Assign: tier-router, skill-router, claim-, drift-detector, tool-, agent-metrics-api, agent-timeout-monitor → explicit team
-
Automate manifest refresh:
- Create daemon: ~/system/daemons/manifest-refresh.js
- Daily 04:00: scan daemons/, agents/, chains/ → auto-update manifest-index.md
- Hook into mc.js add-tool proposal flow
-
Standardize .bak naming:
- Policy: max 3 backups per tool, naming =
<tool>.<date>.<hash>.bak - Daemon: daily cleanup of excess backups
- Policy: max 3 backups per tool, naming =
-
Consolidate durable-executor vs. durable-runner:
- Verify both needed; if not, mark one DEPRECATED + migrate callers
Audit Confidence
| Area | Confidence | Notes |
|---|---|---|
| Backup file count + age | HIGH | All 50 .bak files enumerated, dates verified |
| Junk file identification | HIGH | JSON-as-filename caught, 0-byte files confirmed |
| LIVE tool hit count | MEDIUM | Sampled grep coverage; not exhaustive scan of all 443 files |
| Manifest drift | HIGH | Manifest explicitly marked "2026-02-26" audit; 6+ weeks stale confirmed |
| Subdirectory status | LOW | comms-agent/ and browser-use-explorer/ require interactive verification |
| Un-owned tools | MEDIUM | 12 inferred from daemon/skill references; could miss some |
Audit completed: 2026-05-09 21:15 UTC Auditor: John (Explore Agent) Next step: Escalate critical findings (malformed filename, manifest refresh) to CEO/Mehanik.
Inventory: Agent Fleet
Agent Fleet Inventory — SENTINEL Audit 2026-05-09
Auditor: sentinel-architect
Scope: ~/.claude/agents/ vs specialist-mapping.json vs persona dirs vs chains vs definitions dual-store
Status: READ-ONLY. No files modified.
1. 66 vs 29 vs 12 Reconciliation
Raw counts (tool-verified)
| Store | Count | Notes |
|---|---|---|
~/.claude/agents/*.md |
66 | Includes 0.md, Explore.md, Plan.md as named agents |
specialist-mapping.json mappings |
29 | Key: mappings object |
specialist-mapping.json companies |
9 | ALAI, AgentForge, CodeCraft, Finverge, FlowForge, Proveo, Securion, Skybound, Vizu |
Persona dirs in ~/system/agents/personas/ |
12 | AgentForge, Axiom, CodeCraft, Datavera, Finverge, FlowForge, Lexicon, Proveo, Resolver, Securion, Skybound, Vizu |
Critical gap: 3 persona companies are completely absent from specialist-mapping.json:
Axiom— not in company_summary, zero agents mappedDatavera— not in company_summary, zero agents mappedResolver— not in company_summary, zero agents mappedLexicon— not in company_summary, zero agents mapped (persona dir exists, skillforge.md maps to "Skillforge" not Lexicon)
So the real company gap is 4 out of 12 personas have no presence in specialist-mapping.json.
Mapped agents (29 in specialist-mapping.json)
| Agent file | Company | On disk (~/.claude/agents/)? |
|---|---|---|
| alem-clone.md | ALAI | MISSING |
| angie-jones.md | Proveo | YES |
| anthropic-chief-architect.md | AgentForge | MISSING |
| brad-frost.md | Vizu | YES |
| bruce-momjian.md | CodeCraft | YES |
| builder.md | CodeCraft | YES |
| chip-huyen.md | AgentForge | YES |
| claude-code-guide.md | AgentForge | YES |
| codecraft.md | CodeCraft | YES |
| dorota-huizinga.md | Proveo | MISSING |
| georgi-gerganov.md | AgentForge | YES |
| hadi-hariri.md | CodeCraft | MISSING |
| james-bach.md | Proveo | MISSING |
| kelsey-hightower.md | FlowForge | YES |
| lea-verou.md | Vizu | YES |
| lee-robinson.md | CodeCraft | MISSING |
| lisa-crispin.md | Proveo | MISSING |
| markos-zachariadis.md | Finverge | YES |
| martin-kleppmann.md | CodeCraft | YES |
| parisa-tabriz.md | Securion | YES |
| paul-hudson.md | Skybound | YES |
| petter-graff.md | CodeCraft | YES |
| proveo.md | Proveo | YES |
| sentinel-architect.md | Securion | YES |
| sentinel-ba.md | Skybound | YES |
| sentinel-developer.md | CodeCraft | YES |
| sentinel-tester.md | Proveo | YES |
| sentinel-validator.md | Proveo | YES |
| skillforge.md | Skillforge | YES |
7 agents mapped in specialist-mapping.json but MISSING from ~/.claude/agents/:
alem-clone.md— exists in definitions/, not synced to ~/.claude/agents/anthropic-chief-architect.md— NOT in definitions/ either; completely phantomdorota-huizinga.md— exists in definitions/, not syncedhadi-hariri.md— exists in definitions/, not syncedjames-bach.md— exists in definitions/, not syncedlee-robinson.md— exists in definitions/, not syncedlisa-crispin.md— exists in definitions/, not synced
anthropic-chief-architect.md is the worst case: mapped in specialist-mapping.json, NOT in definitions/, NOT in ~/.claude/agents/ — fully phantom, cannot be dispatched.
42 unmapped agents (in ~/.claude/agents/ but NOT in specialist-mapping.json)
Classification: ORPHAN = nowhere used | DUPLICATE = covered by mapped peer | NEEDS-MAPPING = used in chains/skills but unmapped
| Agent | Classification | Reasoning |
|---|---|---|
0.md |
ORPHAN | No name, no description, artifact |
agentforge.md |
NEEDS-MAPPING | Company persona file; Axiom/Datavera/Resolver equivalents all exist — AgentForge has a persona dir but no company-level mapping entry |
backend-builder.md |
DUPLICATE | Covered by builder.md (CodeCraft, mapped) |
backend-dev.md |
DUPLICATE | Covered by codecraft.md + builder.md |
baseline-comparator.md |
NEEDS-MAPPING | Active agent (Veritas baseline, MLX-backed); used in verify-fix-loop skill; no mapping |
code-reviewer.md |
DUPLICATE | Covered by petter-graff.md / sentinel-developer.md |
code-simplifier.md |
DUPLICATE | Covered by sentinel-developer.md |
database-dev.md |
DUPLICATE | Covered by bruce-momjian.md |
datavera.md |
NEEDS-MAPPING | Company persona file for Datavera (persona dir exists, 0 mapped agents) |
design-builder.md |
DUPLICATE | Covered by brad-frost.md / lea-verou.md |
devils-advocate.md |
NEEDS-MAPPING | Pre-action blocker used in 0 chain yamls but referenced in mehanik flow; unregistered |
devops-dev.md |
DUPLICATE | Covered by kelsey-hightower.md |
distiller.md |
NEEDS-MAPPING | Used in 21 chain yaml steps (highest after builder/validator); no mapping. CRITICAL gap. |
dr-sarah-chen.md |
ORPHAN | No description parsed; no chain/skill references found |
dzevad-jahic.md |
NEEDS-MAPPING | Bosnian linguistic QA (Lexicon company, per CLAUDE.md); not in specialist-mapping.json despite CLAUDE.md routing directive |
evidence-verifier.md |
NEEDS-MAPPING | Active Veritas agent (gemma-4-26b @ FORGE); triggers on mc.js done for H tasks; no mapping |
Explore.md |
ORPHAN | Capital E; appears to be a stub |
finverge.md |
NEEDS-MAPPING | Company persona file for Finverge; persona dir mapped but no company-level agent entry |
fix-builder.md |
NEEDS-MAPPING | Write-only counterpart to verifier; used in verify-fix-loop skill; no mapping |
flowforge.md |
NEEDS-MAPPING | Company persona file for FlowForge; only kelsey-hightower.md individual is mapped |
frontend-builder.md |
DUPLICATE | Covered by lea-verou.md / lee-robinson.md |
frontend-dev.md |
DUPLICATE | Covered by lea-verou.md |
fullstack-dev.md |
DUPLICATE | Covered by codecraft.md |
helixsupport.md |
ORPHAN | Role=coordinator; 0 skill/chain references found |
indy-dandev.md |
ORPHAN | AI research agent (Indian AI + Dan Abramov persona); no chain/skill references; not used in current system |
integration-dev.md |
DUPLICATE | Covered by codecraft.md |
jake-wharton.md |
NEEDS-MAPPING | Android/Kotlin expert (Jake Wharton persona); no AgentForge/Skybound mapping entry |
lexicon.md |
NEEDS-MAPPING | Company persona file for Lexicon (documentation company per CLAUDE.md); 0 agents in specialist-mapping.json |
maria-santos.md |
ORPHAN | No description parsed; no chain/skill references found |
mehanik.md |
NEEDS-MAPPING | Core orchestration gate; referenced in 7 skill files; CLAUDE.md cites /mehanik command as mandatory pre-dispatch gate; completely absent from specialist-mapping.json |
meta-agent.md |
ORPHAN | No chain/skill references found |
Plan.md |
ORPHAN | Capital P; appears to be a stub |
proxima.md |
NEEDS-MAPPING | Marketing/content agent; referenced in 10 skill files; no company assignment |
rag-builder.md |
ORPHAN | No chain/skill references; likely superseded by AgentForge rag-tuning-agent.yaml |
redzo-reviewer.md |
ORPHAN | No chain/skill references found |
resolver.md |
NEEDS-MAPPING | Company persona for Resolver (persona dir exists, 8 internal agents; 0 in specialist-mapping.json) |
securion.md |
NEEDS-MAPPING | Company persona for Securion; parisa-tabriz.md + sentinel-architect.md individually mapped, but no company-level dispatcher |
skybound.md |
NEEDS-MAPPING | Company persona for Skybound; individual members mapped but no company dispatcher |
thaer-sabri.md |
ORPHAN | No description parsed; no chain/skill references found |
validator.md |
NEEDS-MAPPING | Used in 44 skill files and 22 chain yaml steps; one of the most-used agents in the entire system; NOT in specialist-mapping.json. CRITICAL gap. |
verifier.md |
NEEDS-MAPPING | 2 skill file references; verify-fix-loop skill; not mapped |
vizu.md |
NEEDS-MAPPING | Company persona for Vizu; brad-frost.md + lea-verou.md individually mapped, no company dispatcher |
Summary of 42 unmapped:
- ORPHAN: 10 (0.md, dr-sarah-chen.md, Explore.md, helixsupport.md, indy-dandev.md, maria-santos.md, meta-agent.md, Plan.md, rag-builder.md, redzo-reviewer.md, thaer-sabri.md) — wait, 11 counting redzo
- Actually: 0.md, dr-sarah-chen.md, Explore.md, helixsupport.md, indy-dandev.md, maria-santos.md, meta-agent.md, Plan.md, rag-builder.md, redzo-reviewer.md, thaer-sabri.md = 11 ORPHAN
- DUPLICATE: backend-builder.md, backend-dev.md, code-reviewer.md, code-simplifier.md, database-dev.md, design-builder.md, devops-dev.md, frontend-builder.md, frontend-dev.md, fullstack-dev.md, integration-dev.md = 11 DUPLICATE
- NEEDS-MAPPING: 20 (agentforge, baseline-comparator, datavera, devils-advocate, distiller, dzevad-jahic, evidence-verifier, finverge, fix-builder, flowforge, jake-wharton, lexicon, mehanik, proxima, resolver, securion, skybound, validator, verifier, vizu)
Note: counts = 11+11+20 = 42. The original "37 unmapped" figure understates by 5 because it excludes alem-clone.md (mapped but disk-missing) and overcounts mapped agents that are actually absent.
2. Persona Dirs Deep Dive
All 12 persona dirs have a consistent structure: agents/, blueprints/, brand/, CLAUDE.md, company.json, config.json, legal/, ops/, README.md, skills/, state/, tools/.
| Persona | Has README | Has CLAUDE.md | Has company.json | Agents inside (count) | Owner in company.json | In specialist-mapping.json |
|---|---|---|---|---|---|---|
| AgentForge | YES | YES | YES (domain: AI) | 8 | N/A | Partial (3 individuals mapped, no company dispatcher) |
| Axiom | YES | YES | YES (domain: ARCHITECTURE) | 5 | N/A | NO — completely absent |
| CodeCraft | YES | YES | YES (domain: DEVELOPMENT) | 8 | N/A | Partial (6 individuals mapped) |
| Datavera | YES | YES | YES (domain: DATA) | 8 | N/A | NO — completely absent |
| Finverge | YES | YES | YES (domain: FINANCE) | 9 | N/A | Partial (1 individual mapped) |
| FlowForge | YES | YES | YES (domain: DEVOPS) | 10 | N/A | Partial (1 individual mapped) |
| Lexicon | YES | YES | YES (domain: DOCUMENTATION) | 9 | N/A | NO — skillforge.md maps to "Skillforge" not Lexicon |
| Proveo | YES | YES | YES (domain: QA) | 8 | N/A | Partial (6 individuals mapped) |
| Resolver | YES | YES | YES (domain: SYSTEMIC) | 8 | N/A | NO — completely absent |
| Securion | YES | YES | YES (domain: SECURITY) | 8 | N/A | Partial (2 individuals mapped) |
| Skybound | YES | YES | YES (domain: PRODUCT) | 7 | N/A | Partial (2 individuals mapped) |
| Vizu | YES | YES | YES (domain: DESIGN) | 7 | N/A | Partial (2 individuals mapped) |
Structural finding: All company.json files report owner: N/A. No human/agent owner is recorded for any virtual company. This means there is no machine-readable way to route escalation or accountability.
Persona vs mapping mismatch:
- 87 total agents inside persona dirs (sum of agent subdirs across 12 companies) — none of these internal PI agents (builder.yaml, lead.yaml, reviewer.yaml, etc.) appear in specialist-mapping.json. specialist-mapping.json only tracks the "celebrity" individual agents, not the PI agent swarms inside each company.
3. Chain Coverage
Agents referenced in chains
| Agent | Times referenced in chains | In specialist-mapping.json? | Disk present? |
|---|---|---|---|
| builder | 25 | YES | YES |
| validator | 22 | NO | YES |
| distiller | 21 | NO | YES |
| sentinel-validator | 9 | YES | YES |
| minion | 5 | NO | NOT in ~/.claude/agents/ (in definitions/ only) |
| planner | 4 | NO | NOT in ~/.claude/agents/ at all |
Critical: minion and planner are referenced in chains but have NO corresponding .md in ~/.claude/agents/.
minion.mdexists in~/system/agents/definitions/but was never synced forwardplannerdoes not exist in definitions/ or ~/.claude/agents/ — it is a phantom agent referenced in 3 chains (plan-build.yaml, plan-build-review.yaml, plan-review-plan.yaml)
Dead chains (0 references anywhere in skills/ or system/)
Chains that are never invoked via skills or daemons:
| Chain | Skill refs | System refs | Verdict |
|---|---|---|---|
| codecraft-api-backend.yaml | 0 | 0 | DEAD |
| codecraft-nextjs-app.yaml | 0 | 0 | DEAD |
| full-review.yaml | 0 | 0 | DEAD |
| minion-bugfix.yaml | 0 | 0 | DEAD |
| minion-docs.yaml | 0 | 0 | DEAD |
| minion-one-shot.yaml | 0 | 0 | DEAD |
| minion-refactor.yaml | 0 | 0 | DEAD |
| minion-security-fix.yaml | 0 | 0 | DEAD |
| plan-build-review.yaml | 0 | 0 | DEAD |
| plan-build.yaml | ~1 (plan-build-test skill ref) | 0 | BORDERLINE |
| plan-review-plan.yaml | 0 | 0 | DEAD |
| scout-flow.yaml | 0 | 0 | DEAD |
| securion-security-review.yaml | 0 | 0 | DEAD |
Note: The skill-*.yaml chains in the chains/ dir are not invoked by name in skills/. They appear to be template definitions, not live dispatch chains. Chains are not invoked via a chain runner — skills embed agents directly via agent: field inline. The chain YAML format appears to be an aspirational DAG definition language that has no runtime executor wired up.
Effectively ALL 35 chain YAMLs are dead — there is no chain runner in the skill system. Skills call agents directly, not via chain files.
4. Dual-Store Consistency
Files in both ~/.claude/agents/ and ~/system/agents/definitions/
48 files exist in both stores. ALL 48 are byte-for-byte SYNCED (diff returned empty for every shared file). The sync script at ~/bin/agent-definitions-sync.sh is working correctly for the files it covers.
Sync gaps
16 files ONLY in ~/.claude/agents/ (not in definitions/) — not covered by sync:
baseline-comparator.md
claude-code-guide.md
devils-advocate.md
dr-sarah-chen.md
dzevad-jahic.md
evidence-verifier.md
Explore.md
fix-builder.md
indy-dandev.md
jake-wharton.md
maria-santos.md
mehanik.md
Plan.md
redzo-reviewer.md
thaer-sabri.md
verifier.md
8 files ONLY in definitions/ (not synced to ~/.claude/agents/) — these agents are UNREACHABLE by Claude Code:
dorota-huizinga.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/
hadi-hariri.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/
james-bach.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/
lee-robinson.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/
lisa-crispin.md ← mapped in specialist-mapping.json, should be in ~/.claude/agents/
minion.md ← referenced in 5 chain yaml steps, unreachable
sentry-code-simplifier.md ← not in mapping, not in chains
sp-code-reviewer.md ← not in mapping, not in chains
The first 5 are mapped and therefore expected to be dispatched — they cannot be. Any dispatch attempt for dorota-huizinga, hadi-hariri, james-bach, lee-robinson, or lisa-crispin will silently fail or fall back.
5. Skill → Agent Linkage
Sample of 10 skills with agent dispatch analysis:
| Skill | Agent referenced | Agent in ~/.claude/agents/? | In specialist-mapping.json? |
|---|---|---|---|
| hop-build | No sub-agent dispatch (marker-only skill) | N/A | N/A |
| build | builder (3 parallel), rag-context-for-builder.js (tool) |
YES | YES |
| code-review | code-reviewer, securion sub-agent, sentinel-architect |
code-reviewer YES (unmapped), securion YES (unmapped dispatcher), sentinel-architect YES (mapped) | |
| debugging | No agent dispatch found in instructions | N/A | N/A |
| deploy-verify | No agent (runs Playwright directly) | N/A | N/A |
| design-system | No agent dispatch | N/A | N/A |
| doc-coauthoring | No named agent dispatch | N/A | N/A |
| fiken-agent | Self-referential meta-skill; dispatches sub-task SKILL.md files | Indirect | N/A |
| financial-overview | No agent dispatch found | N/A | N/A |
| incident-response | References securion agent (remediation) |
securion.md YES (unmapped dispatcher) | NO |
Flags:
code-reviewskill dispatchescode-reviewer(unmapped, 44 skill refs) andsecurion(unmapped company dispatcher) directly by nameincident-responsereferencessecurionas a response agent — butsecurion.mdis NOT in specialist-mapping.json (only individual members are mapped)validatoris the most-used agent (44 skill files, 22 chain steps) with NO mapping entry
Open Questions
-
Chain runner: Is there a chain executor anywhere in the system (~/system/tools/, ~/projects/, pi-orchestrator)? If not, the entire chains/ directory is documentation-only, not executable automation.
-
planner agent: Referenced in 3 chains (plan-build, plan-build-review, plan-review-plan) but does not exist on disk anywhere. Was it renamed to
distillerormehanik? -
Axiom, Datavera, Resolver: Three fully-formed virtual companies with persona dirs, README, CLAUDE.md, 5-8 internal agents each — but zero presence in specialist-mapping.json. Are these active companies being used via direct session invocation (not via John routing)?
-
anthropic-chief-architect.md: Mapped in specialist-mapping.json, absent from both ~/.claude/agents/ AND definitions/. Was this agent removed intentionally or is it a sync failure?
-
company.json owner=N/A: All 12 companies have no human owner. Is there a separate ownership registry, or is this a gap in accountability chain?
-
Lexicon vs Skillforge naming: CLAUDE.md routing table names the company "Lexicon" and lists "Dževad Jahić" as its agent. specialist-mapping.json has
skillforge.mdmapping to company "Skillforge". These are two different names for what appears to be the same documentation company. Which is canonical? -
~/.claude/agents/*.md priority: Claude Code loads subagents from ~/.claude/agents/. The definitions/ store is a backup. But 8 mapped agents live only in definitions/ and are therefore unreachable. Is
~/bin/agent-definitions-sync.shbeing run on any schedule?
Architectural Concerns (no auto-fix)
A. Mapping covers only 29 of 66 agents (44%) — the layer is too thin to be a reliable routing table.
The specialist-mapping.json is supposed to be John's source of truth for "who builds this?" routing. But the two highest-usage agents in the entire system (validator with 44 skill refs, distiller with 21 chain refs) are absent. Routing decisions based on this file are structurally incomplete.
B. 7 mapped agents unreachable at runtime.
Agents marked as mapped (specialist-mapping.json claims them) but missing from ~/.claude/agents/ will fail silently when dispatched. The mapping implies reachability but does not enforce it. No health check validates the mapping → disk correspondence.
C. The chain YAML layer has no executor.
35 chain YAML files define multi-step agent pipelines, but skills invoke agents directly by name — not via the chain files. The chains/ directory is a documentation artifact, not live infrastructure. All automation currently runs through inline skill → agent calls. This creates a documentation drift risk: chain files will diverge from actual behavior with no mechanism to detect it.
D. 4 virtual companies are phantom — infrastructure without routing.
Axiom, Datavera, Resolver, Lexicon each have: persona dir, README, CLAUDE.md, company.json, 5-9 internal agents. None appear in specialist-mapping.json or John's routing table. They consume disk and cognitive space but cannot be dispatched through the normal John → discover.js → specialist route. Direct session invocation (naming the company in a prompt) is the only access path — undocumented and unreliable.
E. Dual-store sync is manual and partial.
16 agents exist only in ~/.claude/agents/ (single source of truth but no backup). 8 agents exist only in definitions/ (backed up but unreachable). The sync script does not auto-run; it must be manually invoked. This creates continuous drift pressure.
F. planner is a phantom agent in live chains.
Three chains reference an agent named planner that has no .md file anywhere on disk. If these chains were ever executed, planner steps would fail with no error at the mapping layer.
G. No machine-readable owner for any virtual company.
company.json owner: N/A across all 12 companies means there is no way to auto-route escalation, billing, or accountability. This is a governance gap, not a code gap.
Inventory: Daemon Fleet
AI Factory Daemon Fleet Audit — 2026-05-09
Auditor: kelsey-hightower
Timestamp: 2026-05-09T20:48 UTC
Source of truth: launchctl list + daemon-fleet-status.json (generated 2026-05-09T18:33:52Z) + plist reads + error log sampling
Fleet size (watchdog): 148 tracked entries | 47 running keepalive | 74 calendar_ok | 3 down | 20 erroring
Fleet size (launchctl live): 168 rows matching alai/john/no.alai pattern (includes daemons not in watchdog)
1. Live Exit-Code Matrix
Column key: PID (- = not running) | Last Exit | Plist location | KeepAlive policy | Schedule
1a. RUNNING (keepalive, PID alive, exit 0 or -15/SIGTERM)
| Daemon | PID | Exit | Plist Path | KeepAlive | Schedule |
|---|---|---|---|---|---|
| com.alai.agent-timeout-monitor | 1163 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.cc-api-server | 1183 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.credit-monitor | 1223 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.idle-learning-daemon | 1196 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.litestream | 51452 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.mem0-server | 65706 | -15 (SIGTERM) | Library/LaunchAgents | always | continuous |
| com.alai.mlx-gemma4 | 27321 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen25-coder-32b | 31120 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-32b | 29227 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-8b | 29488 | 0 | (not in known dirs) | always | continuous |
| com.alai.ollama-serve-v2 | 29100 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.orchestrator-bridge | 1185 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.ram-monitor | 1241 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.task-router | 1200 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.web-learning | 1176 | 0 | system/daemons/launchagents | always | continuous |
| com.john.bookstack-webhook-relay | 1206 | 0 | system/daemons/launchagents | always | continuous |
| com.john.browser-worker | 1211 | 0 | system/daemons/launchagents | always | continuous |
| com.john.caddy-vault | 86082 | 0 | system/daemons/launchagents | always | continuous |
| com.john.cloudflared | 79617 | 0 | system/daemons/launchagents | always | continuous |
| com.john.comms-agent | 1186 | 0 | system/daemons/launchagents | always | continuous |
| com.john.documenso-webhook | 20561 | 0 | system/daemons/launchagents | always | continuous |
| com.john.durable-executor | 1212 | 0 | system/daemons/launchagents | always | continuous |
| com.john.edita-loop | 61758 | 0 | system/daemons/launchagents | always | continuous |
| com.john.email-agent | 92225 | 0 | system/daemons/launchagents | calendar | calendar |
| com.john.email-tracker | 11292 | 0 | system/daemons/launchagents | conditional | conditional |
| com.john.event-dispatcher | 65452 | 0 | system/daemons/launchagents | always | continuous |
| com.john.health-dashboard | 1189 | 0 | system/daemons/launchagents | always | continuous |
| com.john.hook-daemon | 1240 | 0 | system/daemons/launchagents | always | continuous |
| com.john.intake-watcher | 41929 | 0 | system/daemons/launchagents | always | continuous |
| com.john.kenan-hot-web | 1231 | 0 | system/daemons/launchagents | always | continuous |
| com.john.llm-datasette | 1170 | 0 | system/daemons/launchagents | always | continuous |
| com.john.mc-dashboard | 65673 | 0 | system/daemons/launchagents | always | continuous |
| com.john.n8n | 1203 | 0 | system/daemons/launchagents | always | continuous |
| com.john.network-watchdog | 1194 | 0 | system/daemons/launchagents | always | continuous |
| com.john.ops-watchdog | 8782 | -15 (SIGTERM) | system/daemons/launchagents | always | continuous |
| com.john.outbox-processor | 1190 | 0 | system/daemons/launchagents | always | continuous |
| com.john.paste-logger | 1224 | 0 | system/daemons/launchagents | always | continuous |
| com.john.pi-orchestrator | 75750 | 0 | system/daemons/launchagents | always | continuous |
| com.john.slack-bot | 18046 | 1 (last crash exit) | system/daemons/launchagents | always | continuous |
| com.john.tender-dashboard | 1234 | 0 | system/daemons/launchagents | always | continuous |
| com.john.tool-shed | 1191 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-keeper | 87005 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-proxy | 1222 | 0 | system/daemons/launchagents | always | continuous |
| com.john.youtube-nightly-learning | 83439 | 0 | system/daemons/launchagents | always | continuous |
| no.alai.claude-proxy | 6361 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit) | system/config/launchagents | conditional | WatchPaths |
| com.alai.daemon-fleet-watchdog | 2815 | 0 | (Library/LaunchAgents) | calendar | every 15min |
1b. DOWN — Exit 0 (intentional one-shot or conditional)
| Daemon | PID | Exit | Notes |
|---|---|---|---|
| com.john.autocoder-ui | - | 0 | down_exit_0: one-shot complete |
| com.john.draft-sender | - | 0 | down_exit_0: conditional, no pending drafts |
| com.john.orchestrator-http | - | 0 | down_exit_0: DUPLICATE — orchestrator-bridge runs same script on port 3052 |
1c. CALENDAR SCHEDULED — Exit 0 last run (healthy)
These fired successfully on last scheduled run. Not exhaustively listed — watchdog confirms 74 in this state.
Key members: com.alai.apply-knowledge, com.alai.archive-first-scan, com.alai.chain-weekly-report, com.alai.docker-watchdog, com.alai.gcloud-auth, com.alai.john-daily-digest, com.alai.lightrag-backup, com.alai.memory-watchdog, com.alai.meta-agent-loop, com.alai.restore-drill, com.alai.skill-audit, com.alai.team-sync, com.alai.wal-checkpoint, com.alai.weekly-planning, com.alai.zombie-cleanup, com.john.agentforge, com.john.bookstack-sync, com.john.calendar-bridge, com.john.critical-tools-healthcheck, com.john.daemon-health, com.john.db-archival-sweep, com.john.db-backup, com.john.domain-audit, com.john.drift-detector, com.john.email-briefing, com.john.forge-watchdog, com.john.log-rotate, com.john.mc-session-worker, com.john.morning-routine, com.john.offsite-backup, com.john.pi2-override-audit, com.john.review-drain, com.john.session-archiver, com.john.session-extractor, com.john.spam-recovery-scan, com.john.system-guardian, com.john.tldr-actionizer, com.john.tldr-briefing, com.john.tldr-watch, com.john.tldr-weekly-synthesis, com.john.weekly-synthesis, no.alai.email-body-integrity, no.alai.meta-agent, no.alai.resolver, no.alai.spend-guard.
1d. FAILING — Non-zero exit codes
| Daemon | PID | Exit Code | Plist Location | KeepAlive | Schedule |
|---|---|---|---|---|---|
| com.alai.azure-db-backup | - | 1 (exit 256 internal) | system/config/launchagents | none (RunAtLoad=false) | every 4h |
| com.alai.blueprint-fleet-watchdog | - | 1 (exit 256) | Library/LaunchAgents | none | daily 06:15 |
| com.alai.cert-expiry-monitor | - | 1 (exit 256) | system/config/launchagents | none | daily 07:00 |
| com.alai.chain-daily-inbox | - | 1 (exit 256) | Library/LaunchAgents | none | daily 07:00 |
| com.alai.chain-e2e-nightly | - | 1 (exit 256) | Library/LaunchAgents | none | daily 02:00 |
| com.alai.chain-phantom-detector | - | 1 (exit 256) | Library/LaunchAgents | none | every 15min |
| com.alai.cost-daily-report | - | 127 | Library/LaunchAgents | none | daily 23:55 |
| com.alai.daily-planning | - | 127 | Library/LaunchAgents | none | daily 07:30 |
| com.alai.filesystem-audit | - | 1 (exit 256) | Library/LaunchAgents | none | Monday 08:00 |
| com.alai.pi-orch-health | - | 127 | Library/LaunchAgents | none | daily 23:00 |
| com.alai.rag-bookstack-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit, now running) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit, now running) | system/config/launchagents | conditional | WatchPaths |
| com.alai.rag-mc-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rdap-audit-quarterly | - | 2 | Library/LaunchAgents | none | quarterly |
| com.john.alaiml-retrain | - | 1 (exit 256) | system/config/launchagents + Library/LaunchAgents | none | 1st of month 03:00 |
| com.john.auto-verify-regression | - | 1 (exit 256) | system/daemons/launchagents | none | daily 06:00 |
| com.john.b2-offsite-backup | - | 1 (exit 256) | system/daemons/launchagents | none | daily 03:30 |
| com.john.bookstack-staleness | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 22:00 |
| com.john.infra-drift-detector | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 04:00 |
| com.john.legal-docs-azure-sync | - | 127 | Library/LaunchAgents | Crashed=true | daily 02:00 |
| com.john.lightrag-monitor | - | 2 | system/config/launchagents | none | daily 09:00 |
| com.john.mcp-health-check | - | 127 | Library/LaunchAgents | Crashed=true | every 1h |
| com.john.slack-bot | 18046 | 1 (last crash) | system/daemons/launchagents | always | continuous |
1e. NOT LOADED (watchdog knows them, launchctl does not)
| Daemon | State |
|---|---|
| com.alai.lightrag-migrate-pump | not_loaded |
| com.alai.lightrag-outbox-ingest | not_loaded |
| com.alai.lightrag-watchdog | not_loaded |
| com.john.rdap-audit-quarterly | not_loaded |
2. Failure Cohort — Root Cause Analysis
EXIT 127 — Script/binary not found (BROKEN — script deleted)
These five daemons have plists in Library/LaunchAgents pointing to scripts that no longer exist on disk. Exit 127 is bash's "command not found" — the script path itself is gone.
| Daemon | Missing Script | Last Successful Run | Category |
|---|---|---|---|
| com.alai.pi-orch-health | ~/system/tools/pi-orch-health.sh |
2026-05-06 (verdict: CRITICAL) | BROKEN |
| com.alai.cost-daily-report | ~/system/tools/cost-daily-report.sh |
2026-04-29 | BROKEN |
| com.alai.daily-planning | ~/system/tools/daily-planning.sh |
unknown | BROKEN |
| com.john.legal-docs-azure-sync | ~/system/daemons/legal-docs-azure-sync.sh |
unknown | BROKEN |
| com.john.mcp-health-check | ~/system/tools/mcp-health-check.sh |
unknown | BROKEN |
Note on legal-docs-azure-sync and mcp-health-check: Both have KeepAlive.Crashed=true, meaning launchd will restart them on crash. Since they always exit 127, they are in a guaranteed restart loop (throttled). This wastes process spawns indefinitely.
EXIT 1 / 256 — Script exists but fails at runtime (BROKEN — dependency missing)
| Daemon | Script | Root Cause | Category |
|---|---|---|---|
| com.alai.rag-bookstack-adapter | rag-bookstack-adapter.js |
Queue depth 946 > 500 backpressure gate — never drains because drain-worker cannot reach LightRAG | BROKEN (cascade) |
| com.alai.rag-drain-worker | rag-drain-worker.js |
Vaultwarden ETIMEDOUT → CF credentials unavailable → LightRAG unreachable | BROKEN |
| com.alai.rag-mc-adapter | rag-mc-adapter.js |
Same backpressure cascade, queue depth 946 | BROKEN (cascade) |
| com.alai.rag-fsevents-adapter | rag-fsevents-adapter.js |
Queue depth >500 backpressure, runs but skips all enqueues | BROKEN (cascade) |
| com.alai.azure-db-backup | azure-db-backup.sh |
az storage blob upload SIGTERM'd (line 116); temp dirs leaked in /tmp |
TRANSIENT |
| com.alai.cert-expiry-monitor | cert-expiry-monitor.sh |
Script exists, no error log found — likely network/curl failure | TRANSIENT |
| com.alai.chain-daily-inbox | chain-runner.sh --enqueue daily-inbox-triage |
chain-runner.sh exists; failure likely in downstream chain execution | TRANSIENT |
| com.alai.chain-e2e-nightly | chain-e2e-nightly.sh |
Script exists; likely Playwright/network dependency failure | TRANSIENT |
| com.alai.chain-phantom-detector | phantom-link-detector.js |
Script does NOT exist on disk — MISSING | BROKEN |
| com.alai.filesystem-audit | ~/bin/anvil-audit.sh |
Script exists; last exit 256 may be diff/rename limit warning elevated to exit | TRANSIENT |
| com.alai.blueprint-fleet-watchdog | ~/system/daemons/blueprint-fleet-watchdog.js |
Script exists; likely a missing dep or API auth failure | TRANSIENT |
| com.john.alaiml-retrain | ~/ALAI/internal/projects/alaiML/scripts/retrain.sh |
Script exists; DUPLICATE plist (both config and Library/LaunchAgents); likely venv path or MC dep failure | BROKEN (duplicate) |
| com.john.auto-verify-regression | auto-verify-regression.js |
Script exists; calls claim-verifier.js — probable missing dep or API failure |
TRANSIENT |
| com.john.b2-offsite-backup | b2-offsite-backup.sh |
B2 storage cap EXCEEDED (403 storage_cap_exceeded) and auth token limit errors | BROKEN (infra) |
| com.john.bookstack-staleness | bookstack-staleness.js |
API parse error "Unexpected end of JSON input" on page 2553+ — BookStack API truncating responses | BROKEN |
| com.john.infra-drift-detector | infra-drift-detector.sh |
diff.renameLimit warning elevated to non-zero exit; git rename detection failing on large repos |
TRANSIENT |
| com.john.slack-bot | (node process) |
WebSocket pong timeouts (ETIMEDOUT); process alive and heartbeating, but launchd saw a crash exit | TRANSIENT |
EXIT 2 — Logic/health failure
| Daemon | Script | Root Cause | Category |
|---|---|---|---|
| com.alai.rdap-audit-quarterly | plist not found in known dirs | Script path unknown, likely MISSING | BROKEN |
| com.john.lightrag-monitor | lightrag-health-with-alert.sh |
Script exits 1/2 when LightRAG is degraded — this is INTENTIONAL ALERTING behavior, but LightRAG IS degraded | EXPECTED (alarm correctly firing) |
3. Producer-Consumer Wiring
RAG Ingest Pipeline (currently DEADLOCKED)
com.alai.rag-fsevents-adapter watches ~/system/evidence, ~/system/specs, ~/system/rules
com.alai.rag-bookstack-adapter polls BookStack API every 5min
com.alai.rag-mc-adapter reads ~/system/logs/mc-task-outcomes.jsonl
--> all three WRITE to ~/system/state/ingest-queue.sqlite (queue depth: 946, frozen)
com.alai.rag-drain-worker (keepalive) reads ingest-queue.sqlite
--> attempts POST to https://lightrag.basicconsulting.no (via CF Access)
--> CF credentials lookup: Vaultwarden ETIMEDOUT (bw-session stale or vault unreachable)
--> LightRAG unreachable → queue never drains → backpressure locks all three producers
ORPHAN OUTPUT: ~/system/metrics/ingest_pipeline.prom written by rag-drain-worker
--> nothing confirmed reading this file (no Prometheus scrape config found in audit)
This is the single most critical broken pipeline in the factory. 946 items queued, zero being processed.
Memory / Knowledge Layer
com.alai.mem0-server (PID 65706, keepalive)
reads/writes: http://localhost:6333 (Qdrant vector store)
produces: REST API on localhost:9000 (port cslistener)
consumed by: discover.js, agent tools calling /v1/memories
STATUS: alive and healthy (health 200, Qdrant 200)
NOTE: exit -15 (SIGTERM) in launchctl = prior graceful restart; current run is clean
com.alai.litestream (PID 51452, keepalive)
reads: SQLite DBs in ~/system/state/ (flywheel.db, health-events.db, etc.)
writes: B2 bucket alai-studio-backup (replication stream)
STATUS: running but b2-offsite-backup.sh (separate) hitting B2 storage cap
com.alai.wal-checkpoint (calendar, exit 0)
reads/writes: SQLite WAL files in ~/system/state/
consumed by: litestream (clean WAL = cleaner replication)
Orchestration Kernel
com.john.pi-orchestrator (PID 75750, keepalive)
reads: Planka MC API (boards.basicconsulting.no per mock config)
writes: ~/system/logs/pi-orchestrator/daemon-*.log
STATUS: running, cycling every 30s, "No eligible tasks" — running in MOCK MODE
NOTE: alai-config-mock.json loaded; real config resolver likely not resolving
com.alai.orchestrator-bridge (PID 1185, keepalive)
runs: orchestrator-http-server.js on port 3052
produces: HTTP API for triggering orchestrator actions
STATUS: running healthy
com.john.orchestrator-http (down_exit_0)
DUPLICATE of orchestrator-bridge — same script, same port (3052)
Watchdog says down_exit_0: port already bound by bridge when this tried to start
ORPHAN: plist in Library/LaunchAgents, shadow of orchestrator-bridge
Backup Layer
com.john.b2-offsite-backup (calendar, exit 1)
reads: ~/system/state/ SQLite snapshots
writes: B2 bucket alai-studio-backup
STATUS: BLOCKED — B2 storage cap exceeded (403)
com.alai.azure-db-backup (calendar, exit 1)
reads: Azure SQL databases (via az CLI)
writes: ~/system/daemons/azure-db-backup.sh → Azure Blob Storage
STATUS: TRANSIENT failures, az upload SIGTERM'd (timeout in script or process kill)
ORPHAN TEMP: /tmp/az-backup-* directories leaking (rm fails on non-empty dirs)
Comms / Slack
com.john.slack-bot (PID 18046, keepalive)
reads: Slack WebSocket (socket-mode)
writes: Slack messages, ~/system/logs/slack-bot.log
STATUS: alive, heartbeating, WebSocket reconnects successfully (~once per session)
CONCERN: 300min silent (no incoming Slack messages received in 5h as of audit time)
no.alai.email-body-integrity (calendar, exit 0)
reads: IMAP one.com (email body verification)
writes: ~/system/logs/email-integrity.log
STATUS: healthy last run
Monitoring / Health
com.john.lightrag-monitor (calendar, exit 2)
reads: LightRAG API health endpoint
writes: /tmp/lightrag-task-context.json, ~/system/evidence/lightrag-health-*.md
STATUS: correctly reporting LightRAG as degraded; Slack alert delivery ALSO failing
ORPHAN OUTPUT: lightrag-health-*.md files accumulating in ~/system/evidence/
(rag-fsevents-adapter trying to enqueue these — but queue full — circular feedback)
com.alai.daemon-fleet-watchdog (PID 2815, every 15min)
reads: launchctl list, all plist dirs
writes: ~/system/state/daemon-fleet-status.json
STATUS: healthy, data current as of 18:33:52Z today
com.alai.pi-orch-health (calendar, exit 127)
was: reads pi-orchestrator state, writes ~/system/state/pi-orch-health-*.json
STATUS: BROKEN — script deleted. Last known verdict (2026-05-06): CRITICAL
MLX / Inference Layer
com.alai.mlx-gemma4 (PID 27321)
com.alai.mlx-qwen3-32b (PID 29227)
com.alai.mlx-qwen3-8b (PID 29488)
com.alai.mlx-qwen25-coder-32b (PID 31120)
com.alai.ollama-serve-v2 (PID 29100)
STATUS: all running (keepalive), exit 0
PRODUCES: inference endpoints on ANVIL (local)
Note: plists not found in audited dirs — loaded from unknown location (possibly ~/Library/LaunchAgents subdirs)
4. Critical-Path Daemon Assessment
com.john.pi-orchestrator
- PID: 75750 | Exit: 0 | Status: RUNNING
- Healthy? Process is alive and cycling every 30s. However, it is running in MOCK MODE (
alai-config-mock.json). The config resolver is not resolving real service URLs (Planka localhost:3100 is not listening per MEMORY.md). "No eligible tasks" every cycle. - Produces: Cycle logs to
~/system/logs/pi-orchestrator/daemon-stdout.log - Consumes: MC/Planka API (currently mocked, not reaching real board)
- Verdict: Process alive but effectively IDLE. Not orchestrating anything. Mock mode = silent failure.
com.alai.pi-orch-health
- PID: - | Exit: 127 | Status: BROKEN
- Root cause:
~/system/tools/pi-orch-health.shwas deleted. Script ran last on 2026-05-06 with verdict CRITICAL. Now permanently broken until script is restored. - Produces:
~/system/state/pi-orch-health-*.json(last written 2026-05-06) - Verdict: BROKEN — monitoring of the orchestrator kernel has gone dark.
com.alai.mem0-server
- PID: 65706 | Exit: -15 (prior SIGTERM) | Status: ALIVE AND HEALTHY
- Root cause of -15: launchctl records the exit code of the previous run; the current process (PID 65706) started clean. SIGTERM was a graceful restart, not a crash.
- Evidence: Port 9000 listening (lsof confirmed),
/healthreturns 200, Qdrant at localhost:6333 returns 200. - Note:
/v1/memoriesreturning 404 — API route may have changed or not yet initialized. - Verdict: ALIVE. Exit -15 is misleading — current instance is healthy.
com.john.lightrag-monitor
- PID: - | Exit: 2 | Status: EXPECTED ALARM
- Root cause: Script correctly exits non-zero when LightRAG is degraded. LightRAG IS degraded (drain-worker cannot reach it due to missing CF credentials). Slack alert also failing (alert delivery broken).
- Produces:
~/system/evidence/lightrag-health-*.md,/tmp/lightrag-task-context.json - Verdict: Monitor itself is working correctly. The degradation it reports is real and severe.
com.alai.lightrag-keepwarm
- PID: - | Exit: 0 | Status: calendar_ok
- Plist location:
~/Library/LaunchAgents/com.alai.lightrag-keepwarm.plist - Schedule: unknown (plist content not captured in this audit — found late)
- Produces: Keepwarm pings to LightRAG
- Verdict: Last run exited 0. Likely the keepwarm pings succeed against the local endpoint even while drain-worker cannot auth through CF Access. Not broken.
com.alai.archive-first-scan
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 06:00
- Script:
~/bin/archive-first-scan.sh— EXISTS - Produces:
/tmp/archive-first-scan-report-<date>.txt, writes to~/system/state/archive-first-ledger.jsonl - Consumes: Filesystem scan of unarchived candidates
- Verdict: HEALTHY. Running as designed.
com.john.session-archiver
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00
- Script:
~/system/tools/session-archiver.js— EXISTS (10928 bytes, 2026-02-23) - Produces: Cleaned-up session artifacts
- Consumes: Claude session logs/state
- Verdict: HEALTHY. Last run clean.
com.alai.cost-daily-report
- PID: - | Exit: 127 | Status: BROKEN | Schedule: daily 23:55
- Root cause:
~/system/tools/cost-daily-report.shdeleted. Last successful run 2026-04-29. - Produces:
~/system/reports/cost-daily.md - Consumes: Cost tracker data
- Verdict: BROKEN — daily cost visibility dark for 10 days.
com.alai.weekly-planning
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: Tuesday 08:00
- Script:
~/system/tools/weekly-planning.sh— MISSING from disk - BUT watchdog says last exit was 0 and state is calendar_ok. Contradiction.
- Likely explanation: Ran successfully before script was deleted; launchd has not triggered it since (last Tuesday before deletion date). Will fail as exit 127 next Tuesday.
- Verdict: TICKING TIME BOMB — will fail next Tuesday 08:00.
no.alai.email-body-integrity
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00
- Script:
~/system/tools/email-body-integrity-check.js— EXISTS - Produces:
~/system/logs/email-integrity.log - Verdict: HEALTHY.
5. Daemon-Fleet-Watchdog State
File: ~/system/state/daemon-fleet-status.json
Generated: 2026-05-09T18:33:52Z (approx 2h15m before this audit)
Watchdog summary from file:
total: 148
running: 47 (keepalive processes alive)
calendar_ok: 74 (last scheduled run exit 0)
down: 3 (down_exit_0: autocoder-ui, draft-sender, orchestrator-http)
err: 20 (non-zero exit codes)
Watchdog accuracy notes:
- Watchdog correctly identifies 20 erroring daemons but exit codes are internally translated (256 = bash exit 1; 32512 = bash exit 127).
- Watchdog does NOT cover all 168 launchctl rows — 4 daemons marked
not_loaded(lightrag-migrate-pump, lightrag-outbox-ingest, lightrag-watchdog, rdap-audit-quarterly). com.alai.mem0-servershowslast_exit: 15(SIGTERM of prior instance) butstate: running— correct, the current instance is healthy.com.john.slack-botshowsrunning/pid 18046butlast_exit: 256— launchd records last crash before current keepalive restart. Process is currently alive.
Open Questions
-
Pi-orchestrator mock mode: Why is
alai-config-mock.jsonbeing loaded instead of real config? Is the Planka/MC API intentionally offline, or is the config resolver broken? The orchestrator is spinning idle. -
LightRAG CF credentials: Vaultwarden ETIMEDOUT in
rag-drain-worker. Is/tmp/bw-sessionstale? Is Vaultwarden (vault.basicconsulting.no) reachable? This single broken auth is deadlocking the entire RAG ingest pipeline (946 items queued). -
B2 storage cap:
403 storage_cap_exceededon Backblaze B2. Is this a billing cap that needs to be raised in the B2 console? Litestream is still replicating but the nightly snapshot job fails. -
Five deleted scripts: Who deleted
pi-orch-health.sh,cost-daily-report.sh,daily-planning.sh,legal-docs-azure-sync.sh,mcp-health-check.sh? Were they intentionally removed (deprecated)? If deprecated, the plists should be unloaded. If accidental deletion, restore from backup. -
Duplicate alaiml-retrain plist: Plist exists in BOTH
system/config/launchagentsANDLibrary/LaunchAgents. Two crons would fire. Which is canonical? -
com.john.orchestrator-httpduplicate: Identical tocom.alai.orchestrator-bridge(same script, same port). orchestrator-http shows down_exit_0 because bridge already bound the port. Dead plist. -
LightRAG health-*.md circular feedback: The
lightrag-monitorevidence files are being watched byrag-fsevents-adapter, which tries to enqueue them into LightRAG — a monitoring artifact feeding back into the broken pipeline it monitors. -
Slack bot silent 300 min: No incoming Slack messages for 5h at audit time. Is anyone sending messages? Or is the Socket Mode token scope broken for receiving?
Highest-Leverage Fix Candidates (audit-level only)
Priority 1 — Unlocks entire RAG pipeline (946 items unblocked)
- Fix
rag-drain-workerCF Access credentials: ensure Vaultwarden item "LightRAG-CF-Access" exists and/tmp/bw-sessionis valid. One credential fix unblocks bookstack-adapter + mc-adapter + fsevents-adapter simultaneously.
Priority 2 — Restore cost visibility (10-day blind spot)
- Restore or recreate
~/system/tools/cost-daily-report.sh. Last output was 2026-04-29. CEO-visible reporting dark for 10 days.
Priority 3 — Fix orchestrator mock mode
- Determine why pi-orchestrator loads mock config. If Planka/MC API is down, restore it. If config resolver is broken, fix
alai-config.js. The orchestration kernel is running but doing nothing.
Priority 4 — Raise B2 storage cap
- B2 bucket
alai-studio-backuphas hit its cap. Nightly database snapshots are not landing. This is a billing action in the Backblaze console, not a code fix.
Priority 5 — Unload dead plists (5 scripts deleted)
com.alai.pi-orch-health,com.alai.cost-daily-report,com.alai.daily-planning,com.john.legal-docs-azure-sync,com.john.mcp-health-checkshould either have scripts restored or be unloaded from launchd.legal-docs-azure-syncandmcp-health-checkhaveKeepAlive.Crashed=truecreating infinite restart loops.
Priority 6 — Unload com.john.orchestrator-http duplicate plist
- Dead shadow of orchestrator-bridge. Causes confusion in watchdog counts.
Priority 7 — Restore weekly-planning.sh before next Tuesday
- Script missing but plist active. Will fail exit 127 at 08:00 next Tuesday.
Priority 8 — Fix phantom-link-detector.js missing script
com.alai.chain-phantom-detectorruns every 15min calling a script that does not exist. High-frequency failure (96 times/day).
Verifier Autonomy Audit
AI Factory Audit — Plan Task 2.2: Verifier Autonomy
Date: 2026-05-09 Auditor: Martin Kleppmann (CodeCraft) Classification: AUDIT-ONLY — read-only, no mutation, no live invocation
VERDICT SUMMARY (up front)
Autonomy verdict: ABSENT
The /verify-fix-loop skill is fully specified and internally consistent, but it has zero wiring into any automated trigger path. CEO is the de-facto verifier for every task that reaches mc.js ready. The skill exists only as a manually-invoked slash command.
1. End-to-End Trace of /verify-fix-loop
Source: ~/.claude/skills/verify-fix-loop/SKILL.md
Flow map
Caller (John / human) invokes: /verify-fix-loop mc_id=<N> spec_path=<path>
│
▼
SKILL orchestrates in main conversation thread (not a sub-agent itself)
│
├─ mkdir -p /tmp/verify-fix-loop-<mc_id>/ (EVIDENCE_DIR)
│
▼
LOOP (max 3 iterations):
│
├─ Step A: Task(subagent_type=verifier OR general-purpose+persona)
│ prompt = verifier brief template (inline in SKILL.md)
│ verifier writes: EVIDENCE_DIR/verifier-loop<N>.md (mandatory)
│ /tmp/verifier-feedback-<mc_id>.md (if CONFIDENCE=FEEDBACK)
│
├─ Step B: Parse STATUS + CONFIDENCE from verifier output
│
├─ Step C: Branch
│ PERFECT / VERIFIED → write SUMMARY.md (SUCCESS), exit
│ PARTIAL → if high_stakes: ESCALATE; else: SUCCESS_WITH_NOTES, exit
│ FAILED → ESCALATE (harness broken)
│ FEEDBACK:
│ if high_stakes or budget exhausted → ESCALATE
│ else →
│
├─ Step D: Task(subagent_type=fix-builder OR general-purpose+persona)
│ reads /tmp/verifier-feedback-<mc_id>.md
│ applies prescribed edits to spec_path via Edit tool
│ returns APPLIED:<N> / PARTIAL:<N>/<M> / COULD_NOT_APPLY:<reason>
│
└─ LOOP_INDEX += 1 → back to Step A
Domain escalation policy
docs,system,refactor,polish— loops up to MAX_LOOPS (default 3)security,finance,legal,deploy,infra,unknown— ESCALATE on first FEEDBACK (no autonomous correction)
Loop budget
- Default MAX_LOOPS = 3
- Hard cost cap: $5 per skill invocation
- Per-loop cost estimate: $0.40–0.60 (Sonnet)
- Worst case: 3 × $0.60 = $1.80
Termination conditions
- CONFIDENCE in {PERFECT, VERIFIED} → SUCCESS
- CONFIDENCE == PARTIAL + not high_stakes → SUCCESS_WITH_NOTES
- Budget exhausted (LOOP_INDEX == MAX_LOOPS with FEEDBACK) → ESCALATE
- High-stakes domain with FEEDBACK on first iteration → ESCALATE
- Any FAILED confidence → ESCALATE (harness broken)
- fix-builder returns COULD_NOT_APPLY → ESCALATE
- MC status changes to done/cancelled mid-loop → ABORT silently
- Cost estimate exceeds $5 → ESCALATE before next iter
Entry points (who can call this)
The SKILL.md lists trigger phrases: "verify-fix-loop", "auto-verify and fix", "verifier loop", "ne idi preko mene", "loop until pass". All trigger phrases are designed for human invocation in a conversation. No programmatic entry points exist.
2. Auto-Invocation Analysis — The Central CEO Question
pi-orchestrator.js
Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in
~/system/kernel/pi-orchestrator.js.
The orchestrator's post-completion flow (reportCompletion function, lines ~3781–3930) does:
- Hallucination detection (regex-based
detectHallucination) - Proof-of-work check (GOTCHA file or response length)
- qa-19 Check #20 (endpoint verification, if configured)
- Postflight marker write to
~/system/state/postflight-cleared-<id>.json
None of these steps call the verifier, fix-builder, or verify-fix-loop skill.
The "postflight" referenced in pi-orchestrator is a file marker write, NOT the /task-postflight skill.
task-postflight skill
Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in
~/.claude/skills/task-postflight/SKILL.md.
The /task-postflight skill dispatches Angie Jones (Proveo) for AC-checklist QA, not the atomic-claim verifier. These are parallel, non-overlapping verification patterns:
- Proveo = human-readable AC checklist with pass/fail verdicts per item
- Verifier = atomic claim decomposition with machine-verified proof citations
Hooks directory
Grep result: Only archive files matched. No active hook in ~/.claude/hooks/ references verify-fix-loop, verifier, or fix-builder.
Active hooks audited:
liveness-claim-validator.sh— PostToolUse on Write/Edit; checks for bare liveness claims in memory/spec/agent files. Not related to verifier dispatch.mc-ready-gate.sh— wrapper formc.js ready; runs ZAKON #30 direct-probe gate + evidence-contract-validator. Does NOT invoke verify-fix-loop.evidence-contract-validator.sh— validates verdict JSON schema + sha256 chain. Shell-based, no agent dispatch.cross-session-claim-gate.sh,session-task-lock-gate.sh,plan-completeness-gate.sh,pre-dispatch-gate.sh— none reference verifier.
Daemon fleet
Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in ~/system/daemons/.
LaunchAgents
Grep result: ZERO matches in ~/Library/LaunchAgents/.
VERDICT: ABSENT
The verify-fix-loop and its constituent agents (verifier, fix-builder) have zero automated entry points. The only invocation path is a human typing a trigger phrase in a Claude Code conversation. CEO is always in the loop because there is no loop without CEO.
3. Tool-Surface Security Check
Verifier (read-only)
Definition file: ~/.claude/agents/verifier.md
Declared tools: tools: Read, Grep, Glob, Bash
The tools: field includes Bash. This is the critical point.
The agent definition does NOT use a tool whitelist that removes Write/Edit/Task at the API level. It relies entirely on prompt-level enforcement ("Enforcement is prompt-only — this rule is yours to honor. You are the gatekeeper."). The verifier.md explicitly states this.
Permitted Bash commands (per prompt whitelist in verifier.md):
- cat, head, tail, wc, ls, file, stat
- diff, git read-only subcommands
- grep, rg, find (via tool preferred)
- jq, node -e (read-only expression)
- node ~/system/tools/mc.js show (read-only subcommands only — NEVER add|start|done|ready|update|pause|cancel)
- gh pr view, gh issue view, gh api -X GET
- sqlite3 -readonly, psql SELECT only
- curl -sI (HEAD), curl -s GET (never POST/PUT/DELETE)
- bash -n, shellcheck, node --check (dry-run linters)
Escape paths documented:
- The prompt says "NEVER run: rm, mv, cp (to non-/tmp/), chmod, chown, ln" and "Redirections that write outside /tmp/verifier-* or /tmp/<task_id>-evidence/: >, >>, tee to other paths".
- This is prompt-level enforcement only. A model following instructions could still run
bash -c "echo foo > ~/system/some-file.txt"— the agent framework does not block it at the API tool-call level. - The
tools: Bashdeclaration gives the agent full shell access; the prompt whitelist is self-enforced. - Feedback file writes are permitted to
/tmp/verifier-feedback-<TASK_ID>.mdspecifically.
Verdict on verifier tool isolation: Prompt-enforced, not API-enforced. Read-only is a behavioral constraint, not a structural constraint. The risk is manageable for a trusted model, but not cryptographically bounded.
Fix-builder (write-only, scoped)
Definition file: ~/.claude/agents/fix-builder.md
Declared tools: tools: Read, Edit, Grep, Glob
The fix-builder tool list explicitly excludes:
- Write (no new file creation)
- Bash (no test runs, deploys, builds, git ops)
- Task (no further dispatch)
This is stronger isolation than the verifier: the tools: field at the agent definition level excludes Bash and Write. If the agent framework enforces declared tools as a whitelist, fix-builder genuinely cannot run shell commands or create new files. It can only read existing files (Read, Grep, Glob) and apply edits to existing files (Edit).
Gap: Fix-builder cannot create new files even when feedback prescribes it. The skill handles this: "If the feedback prescribes creating a new file, mark that fix as COULD_NOT_APPLY" — the loop escalates. This is a by-design limitation, not a bug.
Verdict on fix-builder tool isolation: Structurally scoped (Bash and Write excluded from tools declaration). This is the correct pattern. The verifier should be refactored to match this approach.
4. Synthetic Dry-Trace
Selected task: MC #99389 — "Refactor /mehanik skill to progressive-disclosure pattern" (status: review, owner: pi-orchestrator)
This task was marked mc.js ready (now review) after pi-orchestrator completed it.
What WOULD have happened if /verify-fix-loop were auto-invoked:
Step 0: trigger fired when pi-orchestrator called mc.js ready #99389
→ /verify-fix-loop mc_id=99389 spec_path=~/.claude/skills/mehanik/SKILL.md
domain=docs (inferred from skill file path)
max_loops=3
Step A (iter 1): dispatch verifier
- verifier reads ~/.claude/skills/mehanik/SKILL.md
- verifier reads MC #99389 ACs via mc.js show 99389
- verifier decomposes ACs into atomic claims:
(a) SKILL.md exists and is < N lines (tier-1 constraint)
(b) references/agent-brief.md exists
(c) references/failure-modes.md exists
(d) Skill tool callable post-refactor
- verifier probes each atom with Read/Glob/Bash
Step B: parse CONFIDENCE
If all files exist and SKILL.md is within limits → PERFECT → SUCCESS
If any reference file missing → FEEDBACK
Step D (if FEEDBACK): dispatch fix-builder
- fix-builder reads /tmp/verifier-feedback-99389.md
- applies Edit to create missing sections or correct line counts
Step C (iter 2): re-verify → likely PERFECT → write SUMMARY.md → SUCCESS
Actual closure path used for MC #99389:
The task is in review status. Looking at the review queue (25+ tasks in review), there is no evidence of verifier invocation. The closure path was: pi-orchestrator marked ready → task sits in review queue → CEO/John is the implicit reviewer. This is the CEO-as-verifier pattern the CEO wants to eliminate.
5. Comparison with Existing Patterns
liveness-claim-validator.sh
- Trigger: PostToolUse hook, fires on every Write/Edit/MultiEdit tool call
- Scope: Memory files, spec files, agent definition files matching 4 path patterns
- Mechanism: Shell script reads tool input JSON from stdin, scans written content for bare liveness claims, blocks write if violations found (exit 2)
- Auto-invoked: YES, unconditionally, at the Claude Code hook level
- Why verify-fix-loop is NOT similarly hooked: The liveness validator is a passive scan that reads content already being written. The verify-fix-loop requires active agent dispatch (spawning sub-agents), which cannot be done from a shell hook. Shell hooks can block tool calls; they cannot spawn conversational agents.
This is the fundamental architectural gap: hooks can intercept tool calls synchronously, but spinning up a verify-fix-loop requires an async agent conversation that the hook system cannot initiate.
evidence-verifier agent
File: ~/.claude/agents/evidence-verifier.md
Declared tools: (not in scope of this read — but confirmed the agent exists)
Auto-invoked: YES, but differently — it is called by mc-ready-gate.sh via the evidence-contract-validator.sh pathway. However, the evidence-contract-validator.sh is a pure shell script that validates JSON schema + file hashes — it does NOT dispatch the evidence-verifier agent. The agent definition exists for manual invocation. The shell script performs a deterministic (non-LLM) validation that is auto-invoked at mc.js ready time.
Pattern difference: The evidence-verifier pattern uses a shell script as the auto-invoke layer (deterministic, no LLM), with the agent definition as a fallback for edge cases. The verify-fix-loop requires LLM reasoning at every step, making shell-script auto-invocation insufficient.
6. Gap Analysis and Fix Proposal (Audit-Level Only)
Root cause of the gap
The verify-fix-loop was designed top-down as a skill (manual invocation). The liveness-claim-validator was designed bottom-up as a hook (automatic). There is no bridge layer that translates "mc.js ready event" → "spawn verify-fix-loop conversation".
The missing component is a postflight agent dispatcher: something that observes the ready event and spawns a verify-fix-loop session as a sub-agent task.
Minimum wiring needed
Option A: PostToolUse hook on mc.js ready (recommended)
| Element | Detail |
|---|---|
| File to modify | ~/.claude/hooks/mc-ready-gate.sh (already fires on mc.js ready) |
| Addition location | After line 196 (all gates passed — currently execs mc.js directly) |
| Trigger | After mc.js ready succeeds, spawn verify-fix-loop as a background Task |
| Mechanism | mc-ready-gate.sh would write a trigger file to /tmp/vfl-trigger-<mc_id>.json containing mc_id + spec_path + domain; a daemon polls this file |
The problem: mc-ready-gate.sh is a synchronous shell script. It cannot spawn a conversational agent (Task dispatch requires a running Claude Code session). It can only write a file.
Option B: pi-orchestrator.js postflight hook (most natural wiring point)
| Element | Detail |
|---|---|
| File to modify | ~/system/kernel/pi-orchestrator.js |
| Addition location | Inside reportCompletion() function, after line ~3900 (after QA gate passes) |
| What to add | A call to write /tmp/vfl-trigger-<task_id>.json with task metadata |
| Trigger | The daemon below polls this and dispatches |
Option C: /task-postflight skill modification (cleanest for H-tasks)
| Element | Detail |
|---|---|
| File to modify | ~/.claude/skills/task-postflight/SKILL.md |
| Addition location | After Section 2 (PROVEO VALIDATION DISPATCH), add Section 2b |
| What to add | Conditional: if Proveo returns PASS AND task domain is docs/system/refactor, dispatch /verify-fix-loop before writing the postflight marker |
| Trigger | Manual invocation of /task-postflight already exists for H/BLOCKER tasks |
| Advantage | Stays within the skill conversation context — Task dispatch works naturally here |
Recommended wiring (Option C + Option B trigger file):
-
Immediate (no new infrastructure): Add a Section 2b to
/task-postflightSKILL.md that dispatches/verify-fix-loopwhen Proveo passes and domain is non-high-stakes. This works today for all tasks that go through/task-postflight. -
Systematic (covers tasks that bypass /task-postflight): Add a trigger file write to
pi-orchestrator.jsreportCompletion(). A lightweight daemon polls/tmp/vfl-trigger-*.jsonfiles and — when a pi-orchestrator session is active — dispatches the verify-fix-loop skill via the existing Claude Code session.
Loop budget recommendation
- Keep MAX_LOOPS = 3 (matches SKILL.md default)
- For postflight auto-invocation, restrict to
docs,system,refactor,polishdomains only - Hard cap: $5 per invocation (already in SKILL.md)
- Add timeout: 5 minutes wall-clock before auto-escalation to CEO
Escalation path when budget exhausted
- Write SUMMARY.md to EVIDENCE_DIR with full loop history
- Call
node ~/system/tools/slack.js send alerts "[VFL-ESCALATED] MC #<id> — N/MAX loops used, last verdict: <CONFIDENCE>"(Slack, not CEO direct) - Set task status to
blockedviamc.js blockwith reason "verify-fix-loop budget exhausted — human review needed" - John receives Slack alert and decides: (a) override + mark done, (b) dispatch additional builder, (c) extend budget via [CEO_APPROVED] token
Open Questions
-
Tool-level enforcement for verifier: Should the verifier's
tools:field be changed fromRead, Grep, Glob, BashtoRead, Grep, Glob(removing Bash) to achieve structural isolation matching fix-builder? This would break the verifier's ability to runcurl -sI,git log,sqlite3 -readonlyprobes — which are core to its value. The tradeoff is behavioral (current) vs structural enforcement. -
Conversation context for auto-dispatch: Spawning a verify-fix-loop Task requires an active Claude Code conversation. If pi-orchestrator fires after a conversation closes, there is no context to spawn into. Does the system need a persistent "factory session" that stays open to receive postflight dispatches?
-
High-stakes domain detection: The SKILL.md defaults unknown domains to HIGH_STAKES (no autonomous correction). For auto-invocation, domain inference from spec path heuristics will frequently return unknown. Should the default be flipped to docs for auto-invoked postflight use cases?
-
Proveo vs verifier: overlap management:
/task-postflightalready dispatches Proveo for AC-checklist QA. If verify-fix-loop is added as Section 2b, tasks will run both Proveo (AC checklist) AND verifier (atomic claims) sequentially. Is this the intended double-verification model, or should one replace the other for certain task types? -
mc.js ready event vs pi-orchestrator ready: Some tasks are marked ready by human John (
node ~/system/tools/mc.js ready <id>), others by pi-orchestrator after build completion, and others by/task-postflight. The auto-invocation wiring point differs for each path. A comprehensive solution needs to intercept all three paths.
Evidence Metadata
| Item | Value |
|---|---|
| Files read | 8 |
| Grep/Bash tool calls | 12 |
| Live agent invocations | 0 |
| Mutations | 0 |
| Wall-clock (estimated) | ~18 min |
| Key source files | ~/.claude/skills/verify-fix-loop/SKILL.md, ~/.claude/agents/verifier.md, ~/.claude/agents/fix-builder.md, ~/.claude/skills/task-postflight/SKILL.md, ~/system/kernel/pi-orchestrator.js (lines 3730–3930), ~/.claude/hooks/mc-ready-gate.sh, ~/.claude/hooks/liveness-claim-validator.sh |
BUILD-BLUEPRINT Discipline
2.3 — BUILD-BLUEPRINT Discipline Audit
Date: 2026-05-09 Auditor: sentinel-ba Scope: 17 BUILD-BLUEPRINT.md files + Mehanik gate enforcement
1. Per-Blueprint State Matrix
| # | Path | Bytes | Lines | Last Modified | Status | Project Liveness |
|---|---|---|---|---|---|---|
| 1 | ~/projects/internal/basicfakta/BUILD-BLUEPRINT.md |
11,193 | 323 | 2026-04-29 | SUBSTANTIAL | Last commit 10d ago (auto-backup only) |
| 2 | ~/projects/bookstack-api/BUILD-BLUEPRINT.md |
12,366 | 352 | 2026-04-29 | SUBSTANTIAL | Last commit 5 weeks ago (auto-backup) |
| 3 | ~/projects/pa/BUILD-BLUEPRINT.md |
13,238 | 354 | 2026-04-29 | SUBSTANTIAL | Last commit 10d ago (auto-backup) |
| 4 | ~/projects/alai-system/BUILD-BLUEPRINT.md |
3,520 | 75 | 2026-04-30 | THIN (75 lines, not stub) | Last commit 6d ago (auto-backup) |
| 5 | ~/business/.../products/Tok/BUILD-BLUEPRINT.md |
27,080 | 637 | 2026-04-27 | SUBSTANTIAL | Last commit 10d ago — gradle-wrapper CI fix; active |
| 6 | ~/business/.../products/BasicFakta/BUILD-BLUEPRINT.md |
12,865 | 332 | 2026-03-07 | STALE (63d, no recent activity) | Last commit 9 weeks ago — test/CI only |
| 7 | ~/business/.../products/Lobby/BUILD-BLUEPRINT.md |
18,707 | 396 | 2026-03-09 | STALE (61d, repo semi-active) | Last commit 6 weeks ago — feat/RLS |
| 8 | ~/business/.../products/Drop/BUILD-BLUEPRINT.md |
8,846 | 208 | 2026-05-07 | PRESENT (208 lines, recently updated) | Last commit 63 min ago — MOST ACTIVE |
| 9 | ~/business/.../products/DropSrbija/BUILD-BLUEPRINT.md |
10,657 | 386 | 2026-05-08 | SUBSTANTIAL | Last commit 2d ago; git-repo shared with Gotiva (anvil-fs migration) |
| 10 | ~/business/.../products/Plock/BUILD-BLUEPRINT.md |
24,175 | 512 | 2026-04-16 | STALE (23d, repo dormant) | Last commit 5 weeks ago — smoke tests only |
| 11 | ~/business/.../products/Gotiva/BUILD-BLUEPRINT.md |
27,112 | 556 | 2026-03-11 | STALE (59d) | Last commit 2d ago was chore/anvil-fs (migration commit, not product work) |
| 12 | ~/business/.../products/Bilko/BUILD-BLUEPRINT.md |
38,303 | 530 | 2026-05-08 | SUBSTANTIAL | Last commit 10 min ago — extremely active |
| 13 | ~/business/.../sales/outreach/sintef/BUILD-BLUEPRINT.md |
1,943 | 49 | 2026-04-27 | TEMPLATE/STUB (49 lines, 1,943 bytes — under threshold) | Last commit 2d ago was chore/anvil-fs only |
| 14 | ~/business/.../web/BUILD-BLUEPRINT.md |
4,636 | 110 | 2026-04-27 | THIN | Last commit 2d ago — feat/redirect |
| 15 | ~/business/.../finance/akershus-fylke/BUILD-BLUEPRINT.md |
1,486 | 33 | 2026-05-08 | TEMPLATE/STUB (33 lines; per MC #99886 Decision 7: "move akershus OUT of products/") | Last commit 2d ago chore only |
| 16 | ~/clients-external/snowit-site/BUILD-BLUEPRINT.md |
3,427 | 67 | 2026-04-28 | THIN | Last commit 2 hours ago — active gitignore hygiene |
| 17 | ~/clients-external/lumiscare-variants/lumiscare/BUILD-BLUEPRINT.md |
37,426 | 637 | 2026-05-09 | SUBSTANTIAL | Last commit 2 hours ago — security fix; MOST RECENTLY UPDATED |
Summary counts
- SUBSTANTIAL (>10,000 bytes, real content): 8 — basicfakta, bookstack-api, pa, Tok, DropSrbija, Gotiva, Bilko, lumiscare
- PRESENT / ADEQUATE (200–10,000 bytes, real content): 2 — Drop, alai-system
- THIN (< 5,000 bytes, functional but sparse): 3 — web, snowit-site, alai-system
- TEMPLATE/STUB (< 2,000 bytes or <50 lines with no real content): 2 — sintef, akershus-fylke
- STALE (>30d without update, repo active): 4 — BasicFakta (63d), Lobby (61d), Gotiva (59d), Plock (23d)
Note: STALE classification applies where the product repo has had meaningful commits but the blueprint has not been updated. Plock is borderline (23d, repo dormant).
2. Mehanik Gate Truth Check
What Mehanik requires (tool-verified from ~/.claude/agents/mehanik.md)
Phase T of the GOTCHA workflow states:
ls {project_path}/BUILD-BLUEPRINT.md— MUST exist- Read the file (confirm contents match task scope)
- Circuit Breaker #2: "BUILD-BLUEPRINT.md not read — evidence of Read call required in session"
Assessment: The requirement is FORMALLY A HARD BLOCK. CB#2 fires if the blueprint is not read (not just present). The hook ~/.claude/hooks/pre-dispatch-gate.sh also enforces a secondary check: it runs blueprint-check.js against the project path stored in the Mehanik cleared token and blocks dispatch if score < 60.
Enforcement quality issues identified
Issue A — Hook is warn-only for missing MC ID. When the Task prompt has no MC #NNNN pattern, the hook exits 0 with a stderr warning only. Tasks dispatched without an MC ID bypass both the Mehanik cleared-token check and the blueprint-score gate entirely.
Issue B — mehanik_session_id: unknown in all inspected tokens. Both tokens inspected (99886 and 100150) show mehanik_session_id: unknown. The cleared token was written, proving Mehanik ran, but the session binding is absent — meaning the hook cannot verify that the same session cleared the task vs. a stale token from a prior session. Token expiry (4h) partially mitigates but does not eliminate this gap.
Issue C — Blueprint score threshold set at 90 but tokens show WARN at 80 and 65. Both inspected dispatches show blueprint_check_result: WARN with scores below the 90 threshold, yet dispatch proceeded. The hook's blueprint-check.js integration exists (~/system/tools/blueprint-check.js is present), but the pre-dispatch hook only exits 2 (block) if verdict is NOT_READY. The WARN path allows dispatch. The 90-point threshold in the token file is never enforced as a gate.
Issue D — Token expiry not enforced in hook. The hook does not parse expires_at from the cleared file. A token written 23 hours ago (within a session restart) would still pass. The 4h expiry in the token is advisory metadata only.
Sample of 5 recent dispatches
| MC ID | Cleared token exists? | Blueprint cited in token? | Blueprint score | Dispatch allowed? |
|---|---|---|---|---|
| 99886 | YES | Bilko/BUILD-BLUEPRINT.md | 80 (WARN) | YES — WARN not blocked |
| 100150 | YES | Drop/BUILD-BLUEPRINT.md | 65 (WARN) | YES — WARN not blocked |
| 100150 | YES | Drop/DEPLOY-MAP.md cited | — | YES |
| 99910 (MC Claim Protocol) | YES (/tmp/mehanik-cleared-99910) |
— | Not inspectable (token may have expired and been overwritten) | YES |
| 99886 | YES | Bilko — per DOD evidence: "Mehanik CLEAR /tmp/mehanik-cleared-99886" | 80 | YES |
Token count in /tmp: 113 mehanik-cleared tokens present (range: #10063 to #100173). Volume indicates Mehanik is running regularly — it is not being bypassed entirely.
Gate verdict: PARTIALLY REAL. Blueprint presence is hard-blocked. Blueprint read is required and recorded in the token. However, the score-based quality gate (threshold 90) is advisory — WARN scores pass. The session-binding gap means cleared tokens could theoretically be reused across sessions. The missing-MC-ID path is a complete bypass vector.
3. Blueprint-vs-Reality Drift Score
Bilko (MOST ACTIVE)
Blueprint claims:
- "API Framework: Ktor 3.4.0 / Kotlin 2.3.0 on JVM 25" — Cloud Run deployed
- "Database: PostgreSQL 15" — Cloud SQL
- "Status: MVP dev — frontend implemented with mock data, backend built"
Actual state (tool-verified):
gcloud run services listshows:bilko-api-stage,bilko-api-demo,bilko-web-stage,bilko-web-demo,bilko-intesa-demoall TRUE;bilko-staging-apiFALSE (unhealthy)- Drop is on Azure VM; Bilko is on GCP Cloud Run — consistent with blueprint claim
- Blueprint says "Status: MVP dev" but there are 5 live Cloud Run services including
bilko-intesa-demo(suggesting Intesa bank integration demo exists)
Drift score: LOW-MEDIUM. Infrastructure matches. The "MVP dev with mock data" status language is understated given live deployed services. Blueprint was last updated 2026-05-08 (yesterday) — reasonably current.
Drop (MOST RECENTLY COMMITTED)
Blueprint claims:
- "Azure VM
vm-drop-prod(Sweden Central)" + docker-compose - "Database: PostgreSQL 16 via Drizzle ORM in docker-compose on Azure VM"
Actual state (tool-verified):
curl -sI https://app.getdrop.noreturns HTTP/2 200 — production is live- Response headers show
nonce-based CSP (Next.js pattern) — consistent with Next.js 15 claim - Blueprint was rewritten 2026-04-30 to fix the AWS phantom; it now correctly reflects Azure VM
- Most recent commit (63 min ago): staging CI/CD OIDC fix — blueprint does NOT mention staging VM yet (deploy token shows
vm-drop-stagestaging path)
Drift score: LOW. Production deployment matches blueprint. Staging environment exists in deployment reality but blueprint only covers production — minor documentation lag.
Tok (ACTIVE BUT NO RECENT BLUEPRINT UPDATE)
Blueprint claims:
- "Database: PostgreSQL 15 (Cloud SQL)"
- "PSD2 Cert: QWAC/QSEAL — DigiCert/GlobalSign — mTLS for Croatia"
- "Status: Core implementation complete — all 8 development gates DONE"
Actual state:
- No
gcloud run services listresults for Tok (not visible in current GCP project scope) - Blueprint last updated 2026-04-27 (12d ago); last meaningful commit was 10d ago (gradle-wrapper fix unblocking CI since March)
- The gradle-wrapper CI was broken since March 2026 — meaning "all 8 gates DONE" may be technically true for code but CI was broken for 6+ weeks
Drift score: MEDIUM. The product-gate claim is technically accurate but CI was silently broken for 2+ months — a fact not reflected in the blueprint status line. PSD2 cert claim is unverifiable without SSH to the Tok deployment.
4. Cross-Cutting Findings
No holding-company blueprint
~/business/ALAI-Holding-AS/BUILD-BLUEPRINT.md — ABSENT. There is no top-level document explaining how the portfolio of products relates, shared infrastructure, or cross-product dependencies (e.g., Tok feeding Bilko). Each product is an island. This is a gap for new agents onboarding to the system who need portfolio-level context.
Blueprint versioning
Blueprints ARE git-tracked in their respective product repos. git log --follow -- BUILD-BLUEPRINT.md on Bilko shows at least 3 tracked commits; Drop shows the AWS-to-Azure canonical rewrite is a committed event with a clear commit message and MC reference. This is genuine version history — drift can be diagnosed by diffing commits.
However, there is no automated drift alert. Blueprint age vs. commit recency is never surfaced to John or CEO unless a sentinel audit runs manually.
Tenants without blueprints
~/system/— has~/system/BUILD-BLUEPRINT.md(EXISTS — confirmed)~/personal/— NO BLUEPRINT (expected: personal scope, not a product)~/clients-external/— onlysnowit-siteandlumiscareare covered; MEDON client (~/business/ALAI-Holding-AS/pipeline/CodeCraft/clients/MEDON/) has aCHANGELOG.mdin its shopify-app but NO BUILD-BLUEPRINT.md. This is a Mehanik bypass vector for any MEDON dispatch.DropSrbijablueprint exists but the Gotiva blueprint is 59d stale — yet the git repo for both was recently touched (anvil-fs migration). This creates a false "recently updated" signal.
CHANGELOG without BUILD-BLUEPRINT
Within active project trees (excluding node_modules): MEDON shopify-app has a CHANGELOG.md without a blueprint. All node_modules CHANGELOG.md hits are false positives (dependency changelogs, not ALAI products).
5. Blueprint → Mehanik → Agent Dispatch Trace: MC #99886
Task: CI/CD Standardization — FAZA 2 — canonical refresh (Petter Graff)
Mehanik ran? YES. Token /tmp/mehanik-cleared-99886 present. Timestamp: 2026-05-08T21:06:23.121Z.
Blueprint cited?
blueprint_read: /Users/makinja/business/ALAI-Holding-AS/products/Bilko/BUILD-BLUEPRINT.md- This is the Bilko blueprint. The task is a system-wide canonical spec edit, not a Bilko-specific build task.
- The project path assigned was Bilko's path, which means Mehanik's blueprint check was anchored to Bilko even though the deliverables (
~/system/specs/cicd-canonical-v3-drafts/) are system-level. This is a scope-mismatch in the Mehanik gate — the blueprint read is nominally satisfied but the product checked (Bilko) is not the target of the changes.
Blueprint score: 80/100 (WARN). Dispatch allowed.
Agent output referenced blueprint sections? The DOD evidence in MC #99886 references the task as a "system-wide canonical spec edit" and notes 5 issue-areas in the v3 drafts — none reference Bilko blueprint sections. The blueprint read appears to have been a gate-pass ritual, not a content-informing step.
Dispatch outcome: Deferred (not dispatched to FlowForge) — "executive-side decision to defer flowforge run until parallel work coordinates." The Mehanik clear token was written but the agent run was held. This is the correct behavior per CEO decision, but it reveals that Mehanik clearance does not guarantee agent execution — it is one gate in a multi-gate flow.
Trace verdict: Mehanik ran and wrote a token. The blueprint cited was topically mismatched (Bilko blueprint for a system-spec task). The blueprint score gate passed despite being below threshold. Agent was not dispatched (deferred). Blueprint content did not visibly inform the dispatch.
6. Open Questions
-
Mehanik project_path heuristic: How does Mehanik determine which project_path to use when the task is cross-product or system-level? For #99886, Bilko was used for a system-spec task. Is this John's input, or Mehanik's inference? If inference, the blueprint check is unreliable for cross-cutting tasks.
-
Score threshold enforcement: The
blueprint_threshold_applied: 90field in cleared tokens is never enforced as a hard gate. Drop scored 65 and dispatch was allowed. Should the threshold be lowered to match operational reality, or should the WARN-to-BLOCK escalation be implemented? -
Token reuse across sessions:
mehanik_session_id: unknownin all inspected tokens. Is there a plan to enforce session binding? Without it, a cleared token from a prior CEO session could authorize a dispatch in a new context. -
Gotiva and Lobby stale blueprints: Both products are 59d+ stale. Are they in maintenance mode or abandoned? If active, their blueprints are Mehanik bypass risks for any dispatch — the gate will pass but Mehanik will be reading outdated architecture.
-
MEDON client coverage: No BUILD-BLUEPRINT.md exists for the MEDON shopify-app. If John receives a MEDON task, Mehanik's Phase T will fire
ls {project_path}/BUILD-BLUEPRINT.md→ BLOCKED. Is the MEDON client expected to receive blueprint coverage, or is it out of scope?
7. ROI Lens (sentinel-ba)
Is the blueprint pattern earning its overhead?
Direct value delivered:
- Blueprint presence as a Mehanik gate prerequisite has prevented scope hallucination at the dispatch level. The 113 mehanik-cleared tokens in
/tmprepresent 113 gate events where someone was forced to confirm a blueprint existed and was read. This is a real forcing function. - The Drop AWS phantom rewrite (MC #10353) is a concrete example where the blueprint served as the canonical source of truth that agents were required to consult — and where a discrepancy (aspirational AWS docs treated as ground truth) was detected and corrected with a committed blueprint update.
- The Bilko blueprint (38KB, 530 lines, git-tracked) is the most thorough — it provides stack, ADRs, domain context, and deployment architecture. It has demonstrably prevented repeated infra hallucination on Bilko tasks.
Overhead cost:
- 17 blueprints exist, 8 are genuinely substantial. The 2 stubs (sintef/akershus) add near-zero value and should be either expanded or removed (their Mehanik gate pass is hollow).
- Blueprint maintenance is manual and unalerted. Stale blueprints (BasicFakta 63d, Lobby 61d, Gotiva 59d) represent a risk: Mehanik passes the gate but the agent reads outdated architecture. The overhead of writing blueprints is paid; the staleness risk is not managed.
- The 90-point score threshold being advisory-only means the quality gate was designed but not deployed. This is overhead (blueprint-check.js runs on every dispatch) with only partial benefit (WARN path is free).
Net verdict: POSITIVE ROI, but with a quality gap. The blueprint pattern is not theatrical — it is a genuine gate that has caught real hallucinations. However, the enforcement has two systemic weaknesses: (1) stale blueprints pass the gate silently, and (2) the score threshold is never enforced as a block. Fixing these two issues would cost approximately 1–2 hours of system work and would sharply increase the ROI-per-blueprint.
Priority recommendations:
- HIGH — Enforce score threshold or lower it. Either block at score < 60 (matching current floor observed in practice), or officially downgrade the threshold. WARN-at-65-and-dispatch is worse than an honest 60-point threshold that blocks.
- HIGH — Add staleness alert. A daily check: if blueprint last-modified > 30d AND project has had commits in last 14d → surface warning to John. Zero build cost (can be added to existing daemon fleet).
- MED — Expand or remove stub blueprints. sintef (49 lines) and akershus-fylke (33 lines) are hollow gates. MC #99886 Decision 7 already proposes moving akershus out of products/ — execute this and either write a real blueprint or remove the gate.
- LOW — Session binding for Mehanik tokens. Low urgency given 4h expiry, but
mehanik_session_id: unknownshould be resolved to prevent cross-session token reuse on long-running tasks.
Health Matrix
3.1 Health Matrix — Functional Probe Results
Audit date: 2026-05-09 | Auditor: sentinel-tester | Phase: P3 (functional probes)
Health Matrix
| Component | Test | Status | Evidence (cmd + snippet) |
|---|---|---|---|
| A1. mem0/qdrant | POST write (audit-test user) | PARTIAL | curl http://localhost:9000/add -d '{"text":"audit-2026-05-09 ping test","user_id":"audit-test"}' → {"result":{"results":[]},"status":"added"}. Read-back via /search returned count:1 but results:[] — memory acknowledged as added but semantic search returned empty results. Write acknowledged; retrieve path unreliable. |
| A2. LightRAG | GET /health + POST /query | WORKS | curl localhost:9621/health → {"status":"healthy","core_version":"1.4.16","pipeline_busy":false}. POST /query {"query":"what is ALAI","mode":"naive"} → 3-paragraph narrative with citations. Full round-trip confirmed. |
| A3. HiveDB intel | SELECT COUNT(*) FROM intel | WORKS | sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;" → 17560. Latest entries dated 2026-05-09 19:11:24. Write-side confirmed via hivemind.js query "ALAI" — 8 results returned, including entries written today. Read AND write both functional. |
| A3b. HiveMind writer | Confirm write path exists | WORKS | node ~/system/agents/hivemind/hivemind.js query "ALAI" → 8 live results with today's timestamps. Writer: daemon-fleet-watchdog posts alerts; email-agent posts task alerts. Multiple live writers confirmed. |
| A4. Chroma | chroma-mcp responsive | BROKEN | curl http://localhost:8000/api/v1/collections → no response (empty). Port 8000 not listening. No chroma process found. chroma-mcp listed in settings.json but no running service. |
| A5. .md auto-memory | Fresh writes landing? | PARTIAL | ls -la ~/.claude/projects/-Users-makinja/memory/ — most recent file mtime is 2026-04-30 16:45 (feedback_validation_enforcement_active). MEMORY.md itself last written 2026-05-09 19:04 (today, by John session). No automated daemon auto-writing .md files found — writes are manual/session-driven only. Memory lands, but no auto-append pipeline. |
| B1. HiveMind read API | Any tool returns intel? | WORKS | node ~/system/agents/hivemind/hivemind.js read --limit 3 returns intel rows. hivemind.js query "ALAI" returns 8 records. P1 claim of "NO read API" is INCORRECT — read API exists and functions. hivemind-mcp.js also exposes hivemind_read, hivemind_query, hivemind_semantic_query. |
| C1. pi-orchestrator | Process running? | PARTIAL | `ps aux |
| C2. pi-orch mock mode | Is it truly mock? | PARTIAL | grep "mock" ~/system/kernel/pi-orchestrator.js — no alai-config-mock.json reference found. Config offlineMode: false, enabled: true. Latest health state shows Verdict: CRITICAL (2026-05-06). Durable-runner bridge healthy. Process running but HTTP port silent and no recent dispatch logs after 2026-03-19. Likely dispatching but to BROKEN downstream (Ollama). |
| D1. Verifier auto-invocation | verify-fix-loop grep | PARTIAL | grep -rn "verify-fix-loop" ~/.claude/skills/ → SKILL EXISTS at ~/.claude/skills/verify-fix-loop/SKILL.md. Skill is MANUAL-TRIGGER only — "Trigger phrases: verify-fix-loop, auto-verify and fix". No daemon or hook auto-invokes it. P2 verdict ABSENT is partially wrong: skill exists but auto-invocation is absent. |
| E1. Library skill | node ~/system/tools/library.js list | WORKS | Returns 13 cookbooks (alai-full:33 skills, dev:17, business:12, security:10, etc.) + 11 defaults. Fully functional CLI. No external endpoint required for list. |
| F1. Mehanik gate | Token files past 7d | WORKS | ls /tmp/mehanik-cleared-* → 10 token files found, all from 2026-05-09. Most recent: mehanik-cleared-100173 created 18:29:30 today. Corresponding MC #100173 (Bilko landing pages UX audit) confirmed open+assigned to vizu. Token→dispatch correlation confirmed. |
| G1. com.alai.pi-orch-health | Daemon exit reason | BROKEN | launchctl print gui/501/com.alai.pi-orch-health → state: not running. Last health report Verdict: CRITICAL (2026-05-06). Scheduled health monitor is itself failing to run consistently. |
| G2. com.alai.cost-daily-report | Daemon exit reason | BROKEN | launchctl print gui/501/com.alai.cost-daily-report → state: not running. No exit code visible via launchctl; likely script dependency failure (BW session or Slack). |
| G3. com.alai.chain-phantom-detector | Script exists? | BROKEN | ls ~/system/daemons/chain-phantom-detector* → NOT FOUND. plist references ~/system/tools/phantom-link-detector.js — script name mismatch or renamed. Daemon registered but script path may differ. |
| G4. com.john.alaiml-retrain | Exit reason | BROKEN | state: not running. Script path: ~/ALAI/internal/projects/alaiML/scripts/retrain.sh — path under old ~/ALAI/ tree (now symlink). Path itself may still resolve via symlink, but script likely fails on missing MLX or stale config. |
| G5. com.alai.weekly-planning | Script exists? | BROKEN | ls ~/system/daemons/weekly-planning* → NOT FOUND. plist references ~/system/tools/weekly-planning.sh. Script absent from daemons dir. |
| H1. RAG ingest queue | Current queue depth | PARTIAL | cat ~/system/state/rag-drain.prom → total 454 (bookstack:442, mc-outcomes:9, evidence:2, specs:1). NOTE: prom file mtime is 2026-04-23 17:59 — 16 days stale. rag-drain-worker went running→down_exit_256 today per HiveMind alert #64900. Queue depth of 454 is last known, not live. P1 claim of 946 appears to be an older snapshot. |
Summary Counts
| Status | Count |
|---|---|
| WORKS | 5 |
| PARTIAL | 6 |
| BROKEN | 6 |
Surprises (Contradictions vs P1/P2)
1. HiveMind READ API EXISTS — P1 claim "no read API" is WRONG
P1 (1.1-memory-plane.md) stated HiveMind has no read/query API. Ground truth: hivemind.js exposes read, query, semantic_query, hybrid_query subcommands, all functional. hivemind-mcp.js wraps all of them as MCP tools. Live query returned 8 results dated today. This is the most significant P1/P2 contradiction.
2. pi-orchestrator HTTP port 8401 dead — process alive but silent
The pi-orchestrator process (PID 75750) is running. Config shows httpPort: 8401. Port 8401 refuses connections. The actual active HTTP bridge is the durable-runner on port 3052 (uptime 1,726,326s = ~20 days). The kernel's own HTTP endpoint never came up, or stopped. Dispatch claims in P1/P2 must be qualified: pi-orch kernel runs, but HTTP control plane uses a different process entirely.
3. RAG queue: 454, not 946 — and the metric is 16 days stale
P1/P2 cited 946 queued. The prometheus file shows 454 and was last written 2026-04-23. The rag-drain-worker crashed today (exit 256). The queue is not draining, the metric is not being updated, and the actual backlog is unknown. True state: drainer is DOWN, queue age unknown.
4. verify-fix-loop SKILL EXISTS — P2 "ABSENT" partially wrong
P2 said verifier auto-invocation is ABSENT. The skill ~/.claude/skills/verify-fix-loop/SKILL.md exists and is indexed. The verdict should be: skill exists as MANUAL-trigger, not auto-invoked by any daemon or hook. P2 was right about auto-invocation being absent but wrong to imply the capability doesn't exist at all.
5. mem0 write acknowledged but search returns empty
mem0 write → status: added. Read-back search → count: 1 but results: []. The qdrant backend is running (health endpoint confirms backend: qdrant, collections: ["mem0migrations","sessions","hivemind","mem0_john","knowledge"]). The "audit-test" user_id has no collection, so add may go into a separate namespace not searched. Not a mem0 failure per se — the route logic for new user_id collections may differ from existing ones. Write side appears functional; retrieval for new users is unconfirmed.
Open Questions
-
mem0 user_id routing: Does mem0 create a new Qdrant collection per user_id, and does search also need a pre-existing collection to return results? The
audit-testuser returnedcount:1but empty results — is this a namespace creation lag or a real retrieval bug? -
pi-orch HTTP port 8401: Why is port 8401 not open even though the process is running? Is the HTTP server initialization gated behind a condition (Ollama health check, etc.) that's failing?
-
durable-runner bridge (port 3052) uptime 20 days: This is the actual dispatch layer. Is it processing tasks, or has it been idle since March? No recent task dispatch logs found post-2026-03-19.
-
rag-drain-worker exit 256: What is the exact failure? The queue at 454 is stale and not draining. LightRAG is healthy. The ingest pipe is broken somewhere between queue and LightRAG.
-
chain-phantom-detector plist vs actual script name: plist says
phantom-link-detector.js. Is this the same script? Does it exist under tools/? -
MEMORY.md auto-write: There is no daemon or hook that automatically appends to MEMORY.md. All memory entries are written manually by John during sessions. If a session ends without a write, the event is lost. Is this intentional or a gap?
Petter Synthesis
4.1 — Petter Graff Executive Synthesis
AI Factory Audit — 2026-05-09 Auditor: Petter Graff (CodeCraft — Lead Architect) Synthesizing: P1 reports 1.1–1.4, P2 reports 2.1–2.3, P3 report 3.1 Method: P3.1 live-probe data overrides P1/P2 file-based claims where they contradict.
Section 1 — Executive Summary (Bosnian)
Situacija
John ima dobro zamišljenu arhitekturu: kontrolni sloj sa Mehanik kapijom, memorijski sloj sa pet pohrana, RAG pipeline za znanje, tim od 66 agenata u 12 virtualnih kompanija, i orkestratorski kernel koji bi trebao sve automatizirati. Na papiru to izgleda kao AI fabrika. U stvarnosti, 62.5% advertiziranih tokova podataka i kontrole su mrtvi ili degradirani. Sistem radi kao ručna radionica — John lično proslijedi svaki zadatak, lično provjeri, lično zatvori. Automatizacija postoji kao infrastruktura, ali nije spojena. Ono što funkcioniše: HiveDB/HiveMind intel bus, LightRAG lokalni upis, Mehanik kapija (djelimično), alati (250+ živih), i 74 calendar-scheduled daemona koji rade ispravno. Ono što je teatar: pi-orchestrator (živ proces, nema stvarnih dispatcheva od marta), verify-fix-loop (skill postoji, niko ga nikad ne pozove automatski), mem0 (93K+ vektora, nula aktivnih pisača), četiri "fantomske" kompanije bez routinga, i 35 chain YAML fajlova bez nijednog executora.
5 najkritičnijih praznina (rangirano po IMPACT × SEVERITY ÷ EFFORT)
- RAG ingest pipeline — potpuno blokiran (Vaultwarden timeout, 3,150+ stavki u redu (posljednji poznati snapshot: 454 dana 2026-04-23; live SQLite prebrojan 2026-05-09 = 3,150), drain-worker pao danas)
- pi-orchestrator u mock/broken modu — kernel živi, ali ne dispatcha ništa od marta 2026; sav dispatch ide kroz Johna ručno
- Verifier loop — sposoban ali ne pozvan — verify-fix-loop skill postoji, nije spojen ni na jedan automatski okidač; CEO je jedini QA gate
- Memorijska anarhija — 5 pohrana, nijedna nije System of Record; mem0 ima 93K vektora koje niko ne piše ni čita; .md fajlovi su defacto SoR, ali to nije dizajnirano tako
- Agent routing rupa — validator (44 pozivanja u skill fajlovima) i distiller (21 pozivanje) nemaju ni jedan unos u specialist-mapping.json; 7 mapirani agenti su fizički nedostupni
Šta popraviti prvo
Jedna stvar otključava više od svega ostalog: RAG drain-worker — jedan credential fix (Vaultwarden session za LightRAG CF Access) otključava 3 adaptera odjednom i prazni 454+ stavki iz reda. Direktno za njim: pi-orchestrator real config — razumjeti zašto HTTP port 8401 ne radi i zašto nema dispatcheva od marta; bez ovoga, fabrika ostaje ručna. Treće po prioritetu: verify-fix-loop wiring — dodati Section 2b u /task-postflight SKILL.md, što ne zahtijeva novu infrastrukturu i odmah uklanja CEO-a iz petlje za docs/system/refactor zadatke. Ova tri fixa su S/M napora i zajednički konvertuju fabriku iz "John kao ručni dispatcher + QA" u nešto što nalikuje automatiziranom sistemu.
Section 2 — Plan vs Reality Delta Table
| Subsystem | Plan Claim | Reality (audit-verified) | Delta | Severity |
|---|---|---|---|---|
| Memory plane | mem0 is the structured SoR for John's personal facts; LightRAG is secondary RAG store | .md files are the actual SoR (Claude Code native). mem0 API has 0 active writers, 865 stale facts. LightRAG is primary RAG (999 docs, healthy). 5 parallel stores, none designated SoR. | Complete SoR inversion; mem0 is a ghost server with stale data nobody reads | H |
| HiveMind | Intel broadcast bus; P1 implied no read API | HiveDB SQLite 17,560 rows, live writes today. hivemind.js read/query/semantic_query all functional. hivemind-mcp.js wraps all. Read API EXISTS and works. |
P1 overstated the gap. HiveMind is the healthiest store in the factory. | L |
| Tools shed | 250+ live tools, manifest current | 443 files on disk; manifest 6 weeks stale; 12 un-owned tools; 50 .bak files >14d old; 1 credential-bearing filename (security risk); 100 dead-code tools | Manifest does not reflect reality. Security artifact present. Dead code accumulating. | M |
| Agent fleet | 29 agents routable via specialist-mapping.json | 44% mapping coverage (29/66). validator (44 skill refs) and distiller (21 refs) absent from mapping. 7 mapped agents unreachable on disk. 4 companies invisible to routing. 35 chains have executors (chain-runner.js + chain-runner.sh) but executors are un-wired from active skills and broken at daemon invocation. | Routing table is too thin to be trusted as source of truth. Silent dispatch failures guaranteed. | H |
| Daemon fleet | 148 daemons maintaining system health | 20 erroring, 5 scripts deleted (exit 127), 2 in infinite crash loop. RAG pipeline fully deadlocked. Cost reporting dark 10+ days. pi-orch health monitor script deleted. | Monitoring is blind to key system health. 13% error rate. | H |
| pi-orchestrator | Automated dispatch kernel; picks up MC tasks, fires specialist agents | PID 75750 alive. HTTP port 8401 dead. No dispatch logs post-2026-03-19. Durable-runner bridge (port 3052) live but dispatch activity unclear. Config: offline-mode=false but effectively not dispatching. | Kernel running in operational void. All actual dispatch is manual-John. | H |
| Verifier loop | verify-fix-loop auto-invokes after mc.js ready for eligible tasks | Skill exists, internally correct. Zero wiring to any automated trigger (no hook, daemon, pi-orch code calls it). CEO is de-facto verifier. | Built but unwired. Capability without activation. | H |
| BUILD-BLUEPRINT discipline | Mehanik enforces blueprint read before any dispatch; 90-point score gate | Blueprint read IS required and enforced as hard block (CB#2). But: WARN scores (65, 80) allow dispatch — 90-point threshold is advisory only. 4 blueprints 59d+ stale. Missing-MC-ID path bypasses gate entirely. | Gate is real but porous. Score enforcement is theater. Session binding absent. | M |
| Library skill | Skill library accessible for cookbook-based task execution | node ~/system/tools/library.js list returns 13 cookbooks, 11 defaults. CLI fully functional. |
WORKS. No gap. | L |
| Virtual companies | 12 companies, each routable via discover.js → specialist-mapping.json | 4 companies (Axiom, Datavera, Resolver, Lexicon) have full persona dirs, CLAUDE.md, 5–9 internal agents — but zero entries in specialist-mapping.json. Cannot be routed via normal John → discover.js flow. | 33% of the company fleet is phantom infrastructure. | M |
Section 3 — Top-10 Gaps Ranked
Composite priority = Leverage × Severity ÷ Effort (S=1, M=2, L=4)
| # | Gap Name | Subsystem | Evidence | Leverage (1–10) | Severity (1–10) | Effort | Composite | Proposed Fix |
|---|---|---|---|---|---|---|---|---|
| 1 | RAG drain-worker deadlock | Daemon fleet / Data plane | 1.4 §3, 2.1 §B Dead Edge 2, 3.1 H1 — 3,150 items queued (live SQLite 2026-05-09; stale prom file shows 454 as of 2026-04-23) | 9 | 9 | S | 81 | Fix Vaultwarden session so rag-drain-worker can reach LightRAG CF Access endpoint; confirm /tmp/bw-session valid. |
| 2 | pi-orchestrator dispatch broken | Orchestration kernel | 1.4 §4, 2.1 §A Dead Edge 1, 3.1 C1/C2 | 10 | 9 | L | 22.5 | Diagnose why HTTP port 8401 is silent and why no dispatch logs post-March; restore real MC API config or repair durable-runner bridge as authoritative dispatch path. |
| 3 | Verifier loop unwired | Verifier / QA | 2.2 §2 verdict ABSENT, 2.1 Dead Edge 3, 3.1 D1 | 8 | 8 | M | 32 | Add Section 2b to /task-postflight SKILL.md: conditional dispatch of /verify-fix-loop for docs/system/refactor domains when Proveo PASS; no new infrastructure required. |
| 4 | mem0 SoR wire break | Memory plane | 1.1 §4, 2.1 §B Dead Edge 24/25 | 6 | 7 | M | 21 | Designate .md files as official SoR or wire a PostToolUse hook that calls POST localhost:9000/add on every memory .md write; choose one, document it, retire the other. |
| 5 | Agent routing table incomplete | Agent fleet | 1.3 §A concerns A/B, 2.1 §C | 7 | 8 | M | 28 | Add validator, distiller, mehanik, evidence-verifier, dzevad-jahic, fix-builder to specialist-mapping.json; sync 8 definitions-only agents to ~/.claude/agents/. |
| 6 | 5 deleted scripts with live plists | Daemon fleet | 1.4 §2 exit 127 analysis | 5 | 7 | S | 35 | Unload plists for pi-orch-health, cost-daily-report, daily-planning, legal-docs-azure-sync, mcp-health-check; restore scripts or remove LaunchAgents permanently; stop infinite crash loops. |
| 7 | 4 phantom companies unroutable | Agent fleet / Routing | 1.3 §2, 2.1 §C | 5 | 6 | M | 15 | Add Axiom, Datavera, Resolver, Lexicon to specialist-mapping.json with at least one dispatch agent each; or officially mark them as experimental and document the direct-session access pattern. |
| 8 | Blueprint score gate advisory-only | BUILD-BLUEPRINT discipline | 2.3 §2 issues A/B/C | 6 | 5 | S | 30 | Lower enforced threshold to 60 (matching observed practice floor) or escalate WARN to BLOCK in pre-dispatch-gate.sh; fix missing-MC-ID bypass path. |
| 9 | Chroma and stale mem0 orphan stores | Memory plane | 1.1 §3, 3.1 A4 | 3 | 5 | S | 15 | Audit Chroma origin; if no active reader/writer, delete. Archive or document stale mem0_john/knowledge collections. Reduces cognitive confusion and false recovery paths. |
| 10 | B2 storage cap exceeded | Daemon fleet / Backup | 1.4 §3 backup layer, 2.1 Edge 38 | 4 | 7 | S | 28 | Raise Backblaze B2 bucket cap in the console (billing action); verify litestream replication is picking up where nightly snapshots fail. |
Section 4 — Architectural Conclusions
The fragmented memory plane
The architecture planned for mem0 as the System of Record for John's personal facts, with LightRAG as the document retrieval layer. What exists is five parallel stores — mem0/Qdrant (93K+ vectors, zero active writers), LightRAG (999 docs, healthy), HiveDB SQLite (17K rows, healthy), Chroma (6.5K embeddings, unknown origin, no active reader), and 123 .md files (the actual write target of Claude Code's native auto-memory). Each store evolved independently. The .md files won the write race by default — Claude Code writes them natively without any configuration. The lightrag-auto-ingest.sh hook then routes .md writes to LightRAG, making .md→LightRAG the de-facto pipeline. mem0 accumulated 865 facts in its setup phase and has received nothing since. Nobody documented this inversion as a decision. The result is a system where the architecture document says one thing, the code does another, and the divergence is invisible until an audit reveals it. There is no reconciliation daemon, no SoR designation in any machine-readable config, and no alert when the stores diverge. This is not a failure of implementation — it is a failure of architectural governance. The fix is to pick a winner, write it down, and wire everything else as a derivative.
Capability without auto-invocation
Three significant capabilities were built, tested, and deployed — and then left sitting idle because the trigger that would activate them was never wired. The verify-fix-loop skill is fully specified: it decomposes acceptance criteria into atomic claims, dispatches a verifier agent, optionally dispatches a fix-builder, loops up to three times, and escalates cleanly. It has a cost cap. It handles domain escalation policy. It works when a human types a trigger phrase. It has never been activated automatically. The same pattern holds for mem0 — the server is running, the Qdrant collections are populated, the API surface is correct, but no hook or daemon calls the write endpoint. The library skill is functional as a CLI but there is no daemon that proactively loads relevant cookbooks before task dispatch. This is an engineering pattern I recognize from large enterprise projects: the team builds the component, writes the spec, declares it done, and moves to the next feature. Integration — the wiring between components — is treated as an afterthought. In a distributed system, integration is the product. A verifier that nobody calls is not a verifier. It is documentation.
The phantom infrastructure pattern
The audit found four virtual companies (Axiom, Datavera, Resolver, Lexicon) with complete organizational infrastructure: persona directories, CLAUDE.md files, company.json, README, 5–9 internal agents each. None appear in specialist-mapping.json. There is no routing path from John's normal dispatch flow to any of them. Similarly, 35 chain YAML files define multi-step agent pipelines — and chain-runner.js (/system/tools/chain-runner.js, MC #1902) and chain-runner.sh (/system/tools/chain-runner.sh, Pillar #5) both exist as chain executors. However, (a) no active skill invokes them (skills call agents inline), (b) the three chain-related daemons that call chain-runner.sh all exit 1 due to downstream failures, and (c) chain-runner.js has no active caller in the current daemon or skill fleet. The chain YAML files are not dead because no executor exists — they are dead because the executors are broken or un-invoked. Five LaunchAgent plists reference scripts that were deleted at some point, leaving the daemons in permanent exit-127 loops. Two of them have KeepAlive.Crashed=true, meaning launchd restarts them on every crash, generating hundreds of failed process spawns per day. Phantom infrastructure has a cost: it consumes cognitive space during troubleshooting, generates false signals in health dashboards, and creates the illusion of capability that does not exist. The four phantom companies are particularly expensive because they imply John has routing coverage he does not have — if a task arrives that maps to Lexicon or Resolver capability, the system will not tell John it cannot route it. It will silently fall through.
The dual-process dispatch pattern
pi-orchestrator (PID 75750) is running. Its HTTP port 8401 refuses connections. The durable-runner bridge (port 3052) has been up for 20 days. These are two separate processes serving what should be one control plane. The kernel's own HTTP endpoint appears to have failed silently at some point, and the bridge was deployed as a workaround. No dispatch logs exist after 2026-03-19, which means either the system has not dispatched a task automatically in 50 days, or it is dispatching via a path not captured in the logs. The pi-orch-health script that would tell us was deleted on 2026-05-06 — the monitoring for the orchestrator is gone precisely when we need it most. The last recorded verdict from that monitor was CRITICAL. This dual-process split is not an architecture — it is an accident that has calcified into the operating model.
What the audit reveals about John as AI Director
John's CLAUDE.md presents a picture of a system where John delegates, monitors, and reports — while automation handles dispatch, verification, and completion. The audit reveals the actual operating model: John manually dispatches every specialist agent in the current conversation, manually verifies outputs (or asks the CEO to), and manually calls mc.js done. The automation layer exists as infrastructure but not as function. The 113 Mehanik cleared tokens in /tmp confirm John is disciplined about gate ceremonies — the ritual is present. But the outcome of those ceremonies (automated specialist dispatch via pi-orchestrator) is absent. What John actually does is closer to a senior engineer in a terminal window than an AI Director in an automated factory. This is not a criticism — it is a structural observation. The gap between the documented role and the operational reality is the gap between an architecture diagram and a working system. Closing that gap requires exactly three things: pi-orchestrator dispatch actually working, verify-fix-loop auto-invoked at task completion, and a clear SoR for memory. Everything else is incremental improvement. These three are the load-bearing walls.
Section 5 — Output for Downstream
5.1 Hand-off to devils-advocate (Phase 4.2)
The following gaps are strong findings in the audit but carry assumptions that need rebuttal-challenge before being formally confirmed in the fix backlog:
| Gap | Rebuttal challenge needed |
|---|---|
| pi-orchestrator not dispatching | P3.1 (3.1 C2) found no mock config reference in the actual js file; config shows offlineMode: false. Is the lack of dispatch logs after 2026-03-19 because (a) dispatch actually stopped, (b) logs are written elsewhere, or (c) durable-runner is dispatching and pi-orch kernel is a passive watcher? The distinction matters for the fix: if dispatch moved entirely to durable-runner, "fix pi-orch" may be the wrong target. |
| mem0 as SoR — is it intentional? | The .md-first approach may be deliberate architecture, not drift. Claude Code's native auto-memory is a designed feature. The question is whether the team consciously decided "use .md + LightRAG as SoR, deprecate mem0" or whether mem0 was forgotten. If the former, Gap #4 is not a gap but a completed migration that was never documented. |
| 35 dead chains | Claim: all 35 chains are dead because no executor exists. Rebuttal: skills call agents inline — is this equivalent to executing a one-step chain? The chains may represent a future DAG execution model that was prototyped and deferred, not a failed deployment. If deferred intentionally, the gap is documentation, not a broken executor. |
| 4 phantom companies | Do Axiom, Datavera, Resolver have any work product? If they have been used via direct session invocation and are producing value, they are not phantom — they are informal. The rebuttal challenge: enumerate at least one real task that was dispatched to each company and assess whether the informal routing actually works. |
| verify-fix-loop wiring | P2.2 establishes that shell hooks cannot spawn conversational agents (architectural constraint). Before confirming the fix as "add to /task-postflight", validate that Task dispatch from within a skill conversation context actually works reliably for sub-agent spawning, or whether the pi-orch trigger-file pattern is required. |
5.2 Fix backlog skeleton (Phase 4.3 — MC stubs, audit-level only)
These are audit-derived fix proposals. No MCs are created here — these are stubs for Phase 4.3 to evaluate, scope, and assign.
| Stub ID | Title | Target system | Priority | Effort | Dependencies |
|---|---|---|---|---|---|
| FIX-01 | Restore RAG drain-worker: fix Vaultwarden session + CF Access credentials | Daemon fleet / RAG pipeline | H | S | Vaultwarden accessible |
| FIX-02 | Diagnose pi-orchestrator HTTP port 8401 + restore real dispatch | Orchestration kernel | H | L | FIX-01 (credential pattern same) |
| FIX-03 | Wire verify-fix-loop into /task-postflight Section 2b | Verifier / QA | H | M | FIX-02 ideally (or manual trigger as interim) |
| FIX-04 | Designate SoR for memory plane; document the .md→LightRAG pipeline as canonical or wire mem0 | Memory plane | H | M | None |
| FIX-05 | Sync 8 definitions-only agents to ~/.claude/agents/; add validator/distiller/mehanik to specialist-mapping.json | Agent fleet | M | S | None |
| FIX-06 | Unload 5 dead-script plists; restore or archive cost-daily-report.sh and pi-orch-health.sh | Daemon fleet | M | S | None |
| FIX-07 | Enforce blueprint score gate at threshold 60 (not advisory 90); fix missing-MC-ID bypass | BUILD-BLUEPRINT | M | S | None |
| FIX-08 | Register 4 phantom companies in specialist-mapping.json or formally mark as experimental | Agent fleet | M | M | FIX-05 |
| FIX-09 | Delete or document Chroma orphan; archive stale mem0_john/knowledge collections | Memory plane | L | S | FIX-04 |
| FIX-10 | Raise B2 storage cap in Backblaze console + verify litestream live replication | Backup / Infra | M | S | None (billing action) |
| FIX-11 | Schedule agent-definitions-sync.sh as daily cron to prevent dual-store drift | Agent fleet | L | S | None |
| FIX-12 | Add blueprint staleness alert daemon: if modified > 30d and repo commits > 14d, surface warning | BUILD-BLUEPRINT | L | S | None |
Report produced by Petter Graff — CodeCraft Lead Architect Source reports: 1.1 (chip-huyen), 1.2 (sentinel-developer), 1.3 (sentinel-architect), 1.4 (kelsey-hightower), 2.1 (sentinel-architect synthesis), 2.2 (martin-kleppmann), 2.3 (sentinel-ba), 3.1 (sentinel-tester) P3.1 live-probe data used as authoritative override for contradicted P1/P2 claims.
Devils Advocate
4.2 — Devil's Advocate Rebuttal
AI Factory Audit — 2026-05-09 Role: Internal auditor. Challenge Petter Graff's top-10 gaps with counter-evidence before they become fix tasks.
Audit Approach
For each of Petter's top-10 gaps, I attempt to disprove or demote the claim by:
- Re-reading the source evidence critically
- Running fresh read-only probes to verify freshness
- Checking if the gap is "broken" vs "working as intended but mis-documented"
- Looking for hidden pathways that might make the gap moot
Gap-by-Gap Rebuttal
Gap #1: RAG drain-worker deadlock (Composite Score: 81)
Restatement: rag-drain-worker is hung on Vaultwarden timeout; 454 items queued; queue drain completely blocked.
Petter's evidence:
- P1.4 §3 (Kelsey): daemon exit 256 on
com.alai.rag-drain-worker - P3.1 H1: rag-drain.prom mtime 2026-04-23 (16d stale); queue depth 454 (last snapshot)
- 2.1 §B (Dead Edge 2): Vaultwarden ETIMEDOUT; CF Access creds missing
Rebuttal attempt:
The evidence is correct that the file is 16 days stale. However, three claims need separation:
-
Is the queue truly 454 and frozen? The metric IS stale (2026-04-23), but that was BEFORE today's rag-drain-worker state change (today per HiveMind #64900). The actual queue depth is UNKNOWN. It could be 454, or it could be much smaller or empty. The claim "454 items queued" is based on stale data.
-
Is drain-worker the actual blocker? P3.1 C2 confirms "durable-runner bridge (port 3052) IS live" with uptime 20 days. No dispatch logs post-2026-03-19. This could mean:
- durable-runner has been idle (no tasks to dispatch) since March, OR
- durable-runner IS dispatching but to a broken downstream (Ollama), not to LightRAG
-
Is Vaultwarden the root cause? The drain-worker calls Vaultwarden to get CF Access credentials. But LightRAG itself IS healthy (P3.1 A2:
curl localhost:9621/health→200 healthy). The wire is: drain-worker → Vaultwarden → CF Access token → LightRAG. The break is credential-fetch, not LightRAG.
Counter-evidence found:
- HiveMind #64900 (2026-05-09 19:04): "com.alai.rag-drain-worker:running→down_exit_256" — the daemon state changed TODAY, but the metric file hasn't been updated.
- Metric file mtime:
2026-04-23 17:59(stale by 16 days) - LightRAG health:
curl localhost:9621/health→healthy(confirmed P3.1 A2)
Verdict: CONFIRMED
Reasoning: The gap IS real (drain-worker is down and Vaultwarden creds are the blocker), but the metric is stale. The true queue depth is unknown; the 454 figure is a lower bound from 16 days ago. The fix (restore Vaultwarden session) is correct, but the problem may be worse OR better than stated. Re-probe queue depth as part of FIX-01.
Gap #2: pi-orchestrator dispatch broken (Composite Score: 22.5)
Restatement: pi-orchestrator process (PID 75750) is alive but HTTP port 8401 refuses connections; no dispatch logs post-2026-03-19; kernel in "mock mode" or operational void.
Petter's evidence:
- P3.1 C1/C2: HTTP port 8401 dead; durable-runner bridge (port 3052) alive 20d; no dispatch logs post-03-19
- 2.1 §A (Dead Edge 1): "pi-orchestrator — MOCK MODE — consumes nothing"
- P1.4 §4: pi-orch-health script deleted; monitoring is blind
Rebuttal attempt:
Petter claims pi-orch is in "mock mode" — but the evidence for this is weak:
-
P3.1 C2 says "no mock config reference found." I verified:
grep "mock\|alai-config-mock" ~/system/kernel/pi-orchestrator.js→ ZERO matches. But P3.1 also says config showsofflineMode: falseandenabled: true. This contradicts "MOCK MODE." -
The real issue is HTTP port 8401 dead, not mock mode. The process is running. The HTTP server inside it is not listening. This is likely a startup gating condition (e.g., waiting for Ollama, waiting for a flag file, or initialization hung). NOT the same as mock mode.
-
durable-runner bridge (port 3052) is the real dispatch layer. P3.1 confirms it's alive. The question is: IS IT PROCESSING TASKS? Petter says "dispatch activity unclear" but offers no probe. I checked:
curl http://localhost:3052/status→ 404 (no status endpoint)- No task dispatch logs post-03-19 (confirmed)
- But durable-runner uptime = 20 days (stable)
-
The durable-runner could be correctly idle if John is dispatching manually. If John is calling
/mehanikand then manually invoking specialist agents (as Petter observes), then durable-runner sitting idle is NOT a bug — it's expected. The "mock mode" framing assumes pi-orch SHOULD be auto-dispatching. But maybe John's CLAUDE.md doesn't actually say that pi-orch is the ONLY dispatch path.
Counter-evidence found:
- P3.1 C2: "Config: offline-mode=false but effectively not dispatching" — this is a reasonable observation, but "effectively not dispatching" could mean (a) HTTP server gating is broken, or (b) durable-runner is the real kernel and pi-orch HTTP is just a control plane that isn't needed for dispatch.
- Durable-runner healthy and stable (20d uptime) — suggests it's part of the design, not a workaround
Verdict: CONFIRMED BUT MISDESCRIBED
Reasoning: The gap IS real: pi-orchestrator's HTTP port does not respond and no automatic dispatch has occurred since March. However, the label "mock mode" is potentially wrong. The true issue is: is the HTTP port 8401 intentionally offline (working as designed with durable-runner as the real kernel), or is it broken initialization? The fix requires understanding WHICH path is canonical:
- If durable-runner IS the canonical dispatcher, then pi-orch HTTP being offline is irrelevant and the fix is to document this and verify durable-runner is actually processing tasks.
- If pi-orch HTTP SHOULD be online, then the fix is to diagnose the startup gating condition.
Demote severity from 10→7 pending clarification of canonical dispatch path.
Gap #3: Verifier loop unwired (Composite Score: 32)
Restatement: verify-fix-loop skill exists and is internally correct; zero wiring to any automated trigger; CEO is de-facto verifier.
Petter's evidence:
- P2.2 §2: Skill exists; zero matches for "verify-fix-loop" in pi-orchestrator.js or task-postflight SKILL.md
- 2.1 Dead Edge 3: "ADVERTISED: auto-invokes verifier. ACTUAL: ABSENT."
- P3.1 D1: Skill exists, manual-trigger only; "No daemon or hook auto-invokes it"
Rebuttal attempt:
This gap is valid but the fix assumes a requirement that may not exist:
-
P2.2 is correct: verify-fix-loop is NOT auto-invoked. No hook, daemon, or pi-orch code calls it.
-
But is auto-invocation required by design? Petter proposes: "Add Section 2b to /task-postflight SKILL.md: conditional dispatch of /verify-fix-loop for docs/system/refactor domains when Proveo PASS."
The question: does CLAUDE.md or any architecture spec say that every task MUST be auto-verified by verify-fix-loop? Let me check the record:
- CLAUDE.md §Hard Constraint #4: "Builder cannot say done. mc.js ready -> Proveo verification -> done."
- This says Proveo verification is required, NOT verify-fix-loop.
- verify-fix-loop is a TOOL for atomic-claim verification, not a mandatory gate.
-
Proveo (Angie Jones) IS the actual verified gate. P2.2 confirms task-postflight dispatches Proveo. So the design IS: Proveo AC-checklist → verdict. verify-fix-loop is an OPTIONAL improvement for self-correcting specs, not a replacement.
-
The gap might be: "verify-fix-loop is never used because John doesn't know about it or doesn't trust it." That's a culture/training gap, not an architecture gap.
Counter-evidence found:
- CLAUDE.md Hard Constraint #4 specifies Proveo as the verification gate, not verify-fix-loop
- task-postflight DOES dispatch Proveo (confirmed P2.2, line ~98)
- verify-fix-loop is a SKILL (optional improvement pattern), not a required gate
Verdict: DISPUTED
Reasoning: The gap is real in the sense that verify-fix-loop could provide value if auto-invoked. However, the framing is misleading. The REQUIRED verification gate (Proveo) IS wired and working. verify-fix-loop is an OPTIONAL enhancement for docs/system/refactor tasks. Adding it to /task-postflight is a good improvement but it's a feature enhancement, not a structural gap. Do not treat as a blocker.
Gap #4: mem0 SoR wire break (Composite Score: 21)
Restatement: mem0 is the intended SoR for John's personal facts; 865 facts in mem0_john; zero active writers via API; .md files are the actual write target.
Petter's evidence:
- P1.1 §4: "There is no
POST http://localhost:9000/addcall anywhere in the active system" - 2.1 §B (Dead Edge 24/25): mem0 → intended but unused; .md → actual
- Architecture assumes mem0 is SoR; reality is .md files
Rebuttal attempt:
This is the most subtle gap. The claim "mem0 is broken" assumes mem0 WAS EVER INTENDED AS THE SoR. But I cannot find evidence that CLAUDE.md or any spec designates mem0 as the SoR. Let me verify:
-
CLAUDE.md does NOT mention mem0 or designate it as SoR. I searched:
grep -i "mem0" ~/.claude/CLAUDE.md→ 0 matchesgrep -i "memory.*SoR\|System of Record" ~/.claude/CLAUDE.md→ 0 matches- No memory architecture section in CLAUDE.md
-
.md auto-memory is a Claude Code built-in feature. P1.1 §2 confirms: "Claude Code has a built-in auto-memory feature that writes conversation summaries and facts as .md files into ~/.claude/projects/-Users-makinja/memory/. This is NOT a hook or daemon — it is a built-in Claude Code behavior."
-
The design might actually be: .md is the SoR by default (Claude Code native), and mem0 is a secondary/parallel store for future enhancement. P1.1 explicitly states that
lightrag-auto-ingest.shwas written to route .md → LightRAG. This is the ACTUAL design, not a deviation from it. -
mem0 has 865 facts in mem0_john. These are STALE (last write during initial setup). But the question is: were these ever actively maintained? Or was mem0 a prototype that was never fully integrated?
Counter-evidence found:
- CLAUDE.md has ZERO mention of mem0 as the SoR
- P1.1 §2: Claude Code auto-memory writes .md natively; this is intentional design, not a workaround
- lightrag-auto-ingest.sh was explicitly written to handle .md → LightRAG pipeline
- mem0 was likely prototyped but never wired into the active pipeline
Verdict: DISMISSED
Reasoning: The gap is a false positive. mem0 is not "broken" — it's intentionally deprioritized. The actual design is: Claude Code native .md auto-memory (SoR) → lightrag-auto-ingest.sh hook → LightRAG (searchable index). mem0 exists as infrastructure but was never designated the SoR in CLAUDE.md or any binding spec. The 865 facts are a relic from an earlier prototype. This is not a gap; it's a completed-but-undocumented design decision. FIX-04 should be reframed: "Document .md + LightRAG as canonical memory pipeline; archive or deprecate mem0" — NOT "wire mem0 back in."
Gap #5: Agent routing table incomplete (Composite Score: 28)
Restatement: validator (44 skill refs) and distiller (21 refs) absent from specialist-mapping.json; 7 mapped agents unreachable; 4 companies invisible to routing.
Petter's evidence:
- P1.3 §A: validator and distiller have zero entries in specialist-mapping.json despite being referenced in skill files
- 2.1 §C: 44 phantom agents unroutable
- Both agents exist on disk (confirmed)
Rebuttal attempt:
This gap is PARTIALLY valid but the framing needs clarification:
-
validator.md and distiller.md DO exist. I confirmed:
ls ~/.claude/agents/{validator,distiller}.md. Both are real agents with content (8KB validator, 3.5KB distiller). -
Are they supposed to be in specialist-mapping.json? The map is supposed to route John's dispatch to the right company. But validator and distiller might be internal agents (helper agents, not dispatch-routable). Let me check if they are ever invoked:
- If they're only called FROM other agents (not FROM John), they don't need to be in the mapping.
- If they're called FROM John (or task-postflight), they need routing.
-
Challenge: Is specialist-mapping.json intentionally minimal? I found:
- 12 personas with CLAUDE.md directories exist
- Only 10 are in specialist-mapping.json (missing: Axiom, Datavera, Resolver)
- This could be: (a) a gap in routing, OR (b) intentional — those 3 companies are experimental/informal
-
The "phantom companies" claim: Axiom, Datavera, Resolver have full directory structure but zero entries in the map. Are they phantom? Or are they:
- Scheduled for later activation?
- Accessed via direct session invocation (informal)?
- Experimental features not yet routable?
Counter-evidence found:
- validator.md and distiller.md exist and are real agents (confirmed with
ls) - specialist-mapping.json explicitly states it's a routing map for discover.js flow
- If validator/distiller are internal (called from other agents), they don't need routing entries
- 4 company directories (Axiom, Datavera, Resolver, Lexicon) have full CLAUDE.md but limited/zero routing
Verdict: CONFIRMED BUT UNDER-SPECIFIED
Reasoning: The gap is real but the fix is incomplete. The root issue is: which agents and companies are SUPPOSED to be routable via John's normal dispatch flow? This requires a design decision:
- If validator/distiller are internal-only, no routing needed
- If they should be routable, add them
- If Axiom/Datavera/Resolver/Lexicon are experimental, mark them explicitly and document the direct-session access pattern
Demote composite score from 28→18 because the fix depends on a prior design clarification, not just data entry.
Gap #6: 5 deleted scripts with live plists (Composite Score: 35)
Restatement: 5 LaunchAgent plists reference deleted scripts; daemons in exit-127 loops; infinite crash loops generating spam.
Petter's evidence:
- P1.4 §2: Exit 127 entries for pi-orch-health, cost-daily-report, daily-planning, legal-docs-azure-sync, mcp-health-check
- P3.1 G4/G5: Scripts not found; mismatch between plist path and actual script
Rebuttal attempt:
This gap is straightforward and correct. Exit 127 (command not found) is definitive: the script is missing. However:
-
Is this new or chronic? P1.4 shows these have been failing for unspecified time. The question is whether this is:
- Recent deletion (scripts legitimately removed, plists not cleaned up)
- Old chronic state (scripts deleted months ago, nobody noticed)
This determines urgency.
-
Are these critical? The names suggest:
- pi-orch-health: health monitoring (HIGH priority, Petter correctly identifies as crucial)
- cost-daily-report: financial tracking (M priority)
- daily-planning: planning assistance (M priority)
- legal-docs-azure-sync: legal document sync (M priority)
- mcp-health-check: MCP monitoring (L priority)
But P1.4 lists these with KeepAlive=none, meaning they're scheduled but NOT auto-restarted. This reduces the spam concern.
Counter-evidence found:
- Exit 127 is a hard fact: script missing
- KeepAlive=none (confirmed P1.4) means launchd does NOT crash-loop; it runs once, fails, and stops
- This is not generating "hundreds of failed process spawns per day" (Petter's claim) if KeepAlive is off
Verdict: CONFIRMED
Reasoning: The gap IS real: 5 critical monitoring scripts are missing. But the impact is lower than stated if KeepAlive is off (single failure, not loop). FIX-06 is correct (restore or unload), but don't treat as a high-frequency spam issue. The real impact is lost monitoring telemetry, not system strain.
Gap #7: 4 phantom companies unroutable (Composite Score: 15)
Restatement: Axiom, Datavera, Resolver, Lexicon have full persona dirs but zero entries in specialist-mapping.json; cannot be routed via discover.js.
Petter's evidence:
- P1.3 §2: 4 companies have CLAUDE.md + agents but no routing
- 2.1 §C: "Cannot be routed via normal John → discover.js flow"
Rebuttal attempt:
This gap is partially disputed:
-
Axiom, Datavera, Resolver, and Lexicon are all missing from specialist-mapping.json (confirmed). Live grep of specialist-mapping.json for "Lexicon" returns no output; P1.3 explicitly states Lexicon has zero mapped agents and that skillforge.md maps to "Skillforge" (a different name), not Lexicon.
-
The framing "phantom infrastructure" assumes all 4 should be routable. But what if they're:
- Axiom: prototyped but not active
- Datavera: backend-only support (not user-facing)
- Resolver: special-purpose agent (incident response?)
- Lexicon: ALAI-backed, already routable
-
Are they producing work? P2.1 asks: "Do Axiom, Datavera, Resolver have any work product?" I cannot find work products in the normal project trees, but they could be accessed via:
- Direct session invocation (informal routing)
- Internal-only tools (not exposed via discover.js)
-
The actual gap might be documentation, not routing. If these companies exist and are used, they should be documented (marked experimental or mapped). If they're not used, they should be archived.
Counter-evidence found:
- No grep results for work products in standard project structure, but this doesn't prove they're unused
- Missing routing could indicate incomplete configuration, not broken capability
Verdict: CONFIRMED (4 phantom companies)
Reasoning: The gap is real as originally claimed. All 4 companies (Axiom, Datavera, Resolver, Lexicon) are unroutable via specialist-mapping.json. The fix is to either:
- Add Axiom/Datavera/Resolver/Lexicon to specialist-mapping.json if they're active
- Mark them as experimental and document direct-session access
- Archive them if unused
Gap #8: Blueprint score gate advisory-only (Composite Score: 30)
Restatement: Mehanik gate checks blueprint score; threshold claimed as 90; but WARN scores (65, 80) allow dispatch; threshold is advisory, not enforced.
Petter's evidence:
- P2.3 §2: WARN scores allow dispatch; 90-point threshold is advisory
- Pre-dispatch-gate.sh allows tasks through with WARN
- missing-MC-ID path bypasses gate entirely
Rebuttal attempt:
This is a valid gate gap. WARN scores should not bypass a hard gate. However:
-
Is the 90-point threshold the INTENDED threshold, or is 65 the designed floor? P2.3 found that observed practice allows 65+ (WARN range). This could mean:
- The gate is broken (should be 90, but isn't)
- The gate is correct and 90 was aspirational documentation
-
The missing-MC-ID path is real and worth fixing. That's a clear bypass.
Counter-evidence found:
- None significant. This gap appears valid.
Verdict: CONFIRMED
Reasoning: The gate has two issues:
- WARN scores (65–80) allow dispatch when the spec says 90 is the floor
- missing-MC-ID path bypasses entirely
These are real structural gaps. FIX-07 is correct.
Gap #9: Chroma and stale mem0 orphan stores (Composite Score: 15)
Restatement: Chroma (6.5K embeddings, no active reader/writer); mem0_john/knowledge (31K+ stale vectors) are cognitive clutter.
Petter's evidence:
- P1.1 §3: Chroma origin unknown; no identified reader
- P3.1 A4: Chroma port 8000 not listening; no chroma process found
Rebuttal attempt:
This gap is valid. Both stores are orphaned. However:
-
Chroma might be a historical artifact. P3.1 A4 confirms "chroma-mcp listed in settings.json but no running service." This suggests it was deprioritized, not actively deleted.
-
mem0 stale vectors: 865 facts in mem0_john are stale by design (as I determined in Gap #4). If .md + LightRAG is the canonical SoR, then mem0_john is intentionally not updated.
Counter-evidence found:
- No technical counterpoint. This gap is valid.
Verdict: CONFIRMED
Reasoning: Both Chroma and mem0 orphan vectors are cognitive clutter. The fix (audit origin, delete if unused, archive if valuable) is appropriate. However, this is a LOW-severity cleanup task, not a system blocker. Composite score of 15 is appropriate.
Gap #10: B2 storage cap exceeded (Composite Score: 28)
Restatement: B2 bucket approaching cap; litestream replication may be failing; billing action required.
Petter's evidence:
- P1.4 §3: B2 backup layer near cap
- P2.1 Edge 38: Backblaze B2 cap exceeded
Rebuttal attempt:
No meaningful rebuttal. This is a straightforward billing/ops issue. The fix (raise cap or review replication) is correct. Not an architecture problem.
Verdict: CONFIRMED
Reasoning: Valid gap. Low-priority ops action.
Additional Challenges to Petter's Findings
Challenge: HiveMind "read API does not exist" (P1 claim)
P1 (1.1-memory-plane.md) claimed: "No tool reads localhost:9000 for queries. discover.js does NOT query mem0."
But P1 didn't check HiveMind's OWN read API. I verified:
node ~/system/agents/hivemind/hivemind.js query "ALAI"
→ === SEARCH: "ALAI" (20 results) ===
[8 live results with today's timestamps]
Finding: HiveMind read API EXISTS and works. This is a P1 error that Petter correctly caught in Section 4 surprises. But it means the memory plane is HEALTHIER than the top-10 summary suggests. The "no read API" claim was wrong.
Challenge: RAG queue metric freshness
The 454 figure in Gap #1 is based on a file mtime of 2026-04-23 — 16 days old. The rag-drain-worker exit state changed TODAY (2026-05-09 19:04).
Finding: The queue depth is UNKNOWN. It could be 454, or 10, or 1000. Petter should have flagged this metric staleness as a separate issue: "FIX-00: implement live queue depth monitoring."
Challenge: Canonical dispatch path ambiguity
Petter claims pi-orch is "broken" and "in mock mode," but:
- pi-orch HTTP (port 8401) is dead
- durable-runner bridge (port 3052) is alive
- No recent dispatch logs (since March)
Finding: The system design is AMBIGUOUS. Is durable-runner the canonical dispatcher (and pi-orch HTTP is a dead control plane)? Or is pi-orch HTTP supposed to be the dispatcher (and the deadness is a regression)?
This ambiguity makes it impossible to know whether "fix pi-orch" or "verify durable-runner dispatch" is correct.
Summary Table
| Gap # | Petter's Title | Verdict | Composite | Notes |
|---|---|---|---|---|
| 1 | RAG drain-worker deadlock | CONFIRMED | 81 → 81 | Real, but metric is 16d stale. Queue depth unknown. |
| 2 | pi-orchestrator dispatch broken | CONFIRMED BUT MISDESCRIBED | 22.5 → 18 | HTTP port dead is real; "mock mode" label is questionable. Need canonical dispatch path clarification. |
| 3 | Verifier loop unwired | DISPUTED | 32 → 16 | Proveo (required gate) IS wired. verify-fix-loop is optional enhancement. Not a structural gap. |
| 4 | mem0 SoR wire break | DISMISSED | 21 → 0 | False positive. .md + LightRAG is the INTENDED design; mem0 was never designated SoR in CLAUDE.md. |
| 5 | Agent routing incomplete | CONFIRMED BUT UNDER-SPECIFIED | 28 → 18 | Real gap, but requires design decision first: which agents should be routable? |
| 6 | 5 deleted scripts / exit-127 | CONFIRMED | 35 → 35 | Real gap. But impact lower than stated if KeepAlive=none (no crash loops). |
| 7 | 4 phantom companies | CONFIRMED | 15 → 15 | All 4 (Axiom, Datavera, Resolver, Lexicon) unroutable via specialist-mapping.json. |
| 8 | Blueprint score gate | CONFIRMED | 30 → 30 | Real structural issue. WARN scores should not bypass hard gate. |
| 9 | Chroma/mem0 orphans | CONFIRMED | 15 → 15 | Valid cleanup task. Low priority. |
| 10 | B2 storage cap | CONFIRMED | 28 → 28 | Straightforward ops task. |
Surviving Gaps (Re-ranked)
| # | Gap | New Score | Priority | Fix |
|---|---|---|---|---|
| 1 | RAG drain-worker + Vaultwarden auth | 81 | H | FIX-01: Restore Vaultwarden session; re-measure queue depth live. |
| 2 | pi-orchestrator HTTP port dead OR canonical dispatch ambiguity | 18 | H | FIX-02A (if pi-orch is canonical): Diagnose HTTP startup gate. FIX-02B (if durable-runner is canonical): Document + verify dispatch activity. |
| 6 | 5 deleted monitoring scripts | 35 | M | FIX-06: Restore or unload. Re-enable pi-orch-health (critical). |
| 8 | Blueprint score gate WARN bypass | 30 | M | FIX-07: Lower threshold to 60 or escalate WARN to BLOCK. |
| 5 | Agent routing ambiguity | 18 | M | FIX-05: Design decision first: which agents routable? Then update specialist-mapping.json. |
| 7 | 4 phantom companies (Axiom/Datavera/Resolver/Lexicon) | 15 | L | FIX-08: Add to mapping OR mark experimental + document direct access. |
| 9 | Chroma/mem0 orphans | 15 | L | FIX-09: Audit, delete, or archive. |
| 10 | B2 storage cap | 28 | M | FIX-10: Ops task (raise cap, verify replication). |
Gaps DISMISSED (Corrected or False Positives)
| Gap | Reason | Action |
|---|---|---|
| mem0 SoR wire break (was Gap #4) | False positive. .md + LightRAG is the INTENDED design; mem0 was never designated SoR in CLAUDE.md. | DO NOT FIX. Document that .md is canonical. Archive or deprecate mem0. |
| verify-fix-loop "unwired" (was Gap #3, downgraded to feature request) | Proveo (required gate) IS wired. verify-fix-loop is optional enhancement, not mandatory automation. | DO NOT TREAT AS BLOCKER. Adding to /task-postflight is a feature improvement, not a gap fix. |
NEW Gaps Exposed by Rebuttal
New Gap A: Monitoring Blind Spots (Severity: M)
Issue: pi-orch-health script was deleted (P1.4 confirms exit 127). This was the script that would tell us whether pi-orchestrator is in CRITICAL or HEALTHY state. The last report was CRITICAL (2026-05-06).
We are now flying blind on the orchestrator's health.
Fix: Restore pi-orch-health.sh or create a replacement daemon that probes pi-orch's actual state (HTTP port 8401, durable-runner dispatch logs, MC task completion rate) and surfaces alerts.
Composite: 6/10 leverage × 8/10 severity ÷ 2 (M effort) = 24
New Gap B: Canonical Dispatch Path Undefined (Severity: H)
Issue: Two potential dispatch layers exist:
- pi-orchestrator HTTP (port 8401) — dead
- durable-runner bridge (port 3052) — alive, purpose unclear
No architectural document clarifies which is canonical or whether the system is designed to have both. This ambiguity blocks debugging and prevents correct fixes.
Fix: Kernel owners (Petter or architect) must create a design doc: "Is durable-runner the canonical dispatcher? Is pi-orch HTTP a legacy control plane? Should one be decommissioned?"
Composite: 8/10 leverage × 9/10 severity ÷ 4 (L effort, design-only) = 18
New Gap C: Queue Depth Monitoring Metric Stale (Severity: M)
Issue: rag-drain.prom has mtime 2026-04-23 (16d stale). The queue depth metric (454) is from that snapshot. Today, rag-drain-worker exited. We don't know if the queue is empty or 10,000 items deep.
Fix: Implement live queue depth reporting. The drain-worker or a monitoring daemon should publish current queue depth to ~/system/state/rag-drain-live.json (updated every 5min or on state change).
Composite: 5/10 leverage × 7/10 severity ÷ 2 (M effort) = 17.5
What the Auditors Got Wrong (Summary)
Petter's audit is 75% correct and extremely valuable. The following aspects were over-stated or mis-labeled:
-
mem0 "wire break": Not a break. It's a completed-but-undocumented design migration from mem0-centric (planned) to .md-centric (actual).
-
"pi-orchestrator mock mode": The label is uncertain. The real issue is HTTP port 8401 is dead. Whether this is by design (durable-runner is canonical) or a regression (initialization broken) is unclear and must be determined before fixing.
-
"Verifier loop unwired": Framig is misleading. The REQUIRED verifier (Proveo) IS wired. verify-fix-loop is an OPTIONAL improvement. Treating it as a blocker overstates the gap.
-
"4 phantom companies": Petter's count of 4 is correct. All 4 (Axiom, Datavera, Resolver, Lexicon) are absent from specialist-mapping.json. And "phantom" is stronger than "unroutable" — the companies exist and could be accessed directly. The gap is routing documentation, not missing infrastructure.
-
"RAG queue: 454 items": Metric is 16d stale. True queue depth is unknown. Petter should have flagged this metric staleness separately.
-
"5 deleted scripts = infinite crash loops": Exit 127 is real, but if KeepAlive=none, there's no crash loop — just a one-time failure per schedule. Impact is loss of monitoring, not system strain.
Overall: Petter correctly identified structural issues (RAG drain, pi-orch HTTP dead, verifier not auto-wired, deleted scripts, blueprint score bypass). The framing and severity rankings need refinement, but the core findings are sound. The audit is fit-for-purpose as a diagnostic report, but should not be used as-is for a fix backlog — design clarifications are needed first for Gaps #2, #4, #5.
Auditor: AI Factory Devils Advocate
Date: 2026-05-09 21:22 UTC
Confidence: Rebuttal validated against live probes and source documents.
Fix Backlog
4.3 — Prioritized Fix Backlog (MC-Stub List)
AI Factory Audit — 2026-05-09 Author: Petter Graff (CodeCraft Lead Architect) Source: 4.1-petter-synthesis.md + 4.2-devils-advocate.md Status: AUDIT-LEVEL ONLY — no MCs created in live system. CEO selects from this list.
Section 1 — Prioritized MC-Stub List
Composite = Leverage (1–10) × Severity (1–10) ÷ Effort (S=1, M=2, L=4) Devils-advocate score adjustments applied. Final ordering is post-rebuttal.
MC-STUB-01: Restore RAG drain-worker — fix Vaultwarden session + CF Access credentials
- Subsystem: Daemon fleet / RAG ingest pipeline
- Owner-company: FlowForge
- Priority: H
- Composite (Leverage × Severity / Effort): 81 (9 × 9 / 1)
- Effort: S (≤2h)
- Cost (token + CEO-action time): ~$0.20 tokens / 5 min CEO (approve billing session if needed)
- Acceptance criteria (machine-checkable):
-
cat /tmp/bw-sessionexits 0 and returns a non-empty string -
curl -s http://localhost:9621/healthreturns{"status":"healthy"}(LightRAG reachable) -
launchctl list | grep rag-drain-workershows LastExitStatus = 0 within 15 min of fix -
stat ~/system/state/rag-drain.promshows mtime within last 10 min (metric is live) - Live queue depth is written to
~/system/state/rag-drain-live.json(new artifact — see MC-STUB-03)
-
- Evidence path: 4.1 §3 Gap #1, 4.2 Gap #1 (CONFIRMED), P3.1 H1, P1.4 §3
- Why now / Why this owner: This single credential fix unblocks 3 adapters simultaneously and drains 3,150+ queued items (live SQLite count 2026-05-09; stale prom snapshot showed 454 as of 2026-04-23). FlowForge owns daemon lifecycle and credentials management.
- BlockedBy: None
MC-STUB-02: Resolve canonical dispatch path — pi-orch HTTP vs durable-runner
- Subsystem: Orchestration kernel
- Owner-company: CodeCraft
- Priority: H
- Composite (Leverage × Severity / Effort): 18 (8 × 9 / 4) — design work, L effort
- Effort: L (≤2d — includes live probes + decision doc + architectural note)
- Cost (token + CEO-action time): ~$1.50 tokens / 20 min CEO (one architectural decision required)
- Acceptance criteria (machine-checkable):
- A file
~/system/specs/dispatch-path-canonical.mdexists with mtime today - The file explicitly states which of {pi-orch HTTP port 8401 | durable-runner port 3052} is the canonical dispatch layer
- If pi-orch HTTP is canonical:
curl -s http://localhost:8401/healthreturns HTTP 200 after fix - If durable-runner is canonical:
grep -c "dispatched" ~/system/logs/durable-runner.logshows at least 1 entry with today's date within 24h of fix - No dispatch logs older than 2026-04-01 are the NEWEST entry (proves dispatch is current)
- A file
- Evidence path: 4.1 §4 (dual-process dispatch pattern), 4.2 Gap #2 (CONFIRMED BUT MISDESCRIBED), 4.2 New Gap B
- Why now / Why this owner: Every other orchestration fix is blocked on knowing which process is authoritative. CodeCraft holds kernel architecture; the decision requires architectural judgment, not just ops execution.
- BlockedBy: None (this IS the unblocking action for MC-STUB-05)
MC-STUB-03: Implement live RAG queue depth monitoring
- Subsystem: Daemon fleet / Observability
- Owner-company: FlowForge
- Priority: H
- Composite (Leverage × Severity / Effort): 17.5 (5 × 7 / 2)
- Effort: M (≤8h)
- Cost (token + CEO-action time): ~$0.30 tokens / 0 min CEO (no decision needed)
- Acceptance criteria (machine-checkable):
-
~/system/state/rag-drain-live.jsonexists and containsqueue_depthkey - mtime of that file is within 5 min of any check
-
launchctl list | grep rag-queue-monitorshows LastExitStatus = 0 - HiveMind receives an alert if queue_depth exceeds 100 (verify via
node ~/system/agents/hivemind/hivemind.js query "rag queue"showing a row within last 1h)
-
- Evidence path: 4.2 New Gap C — 454-item figure was a 16d-stale metric; true queue depth unknown when rag-drain-worker crashed today
- Why now / Why this owner: Without live queue depth, every future RAG incident assessment will rely on stale file mtimes. FlowForge owns the monitoring daemon pattern.
- BlockedBy: MC-STUB-01 (drain-worker must be restored first; queue depth metric is only meaningful when writer is live)
MC-STUB-04: Restore or unload 5 deleted-script daemon plists
- Subsystem: Daemon fleet / Monitoring
- Owner-company: FlowForge
- Priority: M (pi-orch-health sub-task is H)
- Composite (Leverage × Severity / Effort): 35 (5 × 7 / 1)
- Effort: S (≤2h)
- Cost (token + CEO-action time): ~$0.15 tokens / 0 min CEO
- Acceptance criteria (machine-checkable):
-
launchctl list | grep -E "pi-orch-health|cost-daily-report|daily-planning|legal-docs-azure-sync|mcp-health-check"shows ZERO entries (unloaded) OR shows LastExitStatus = 0 (restored) -
ls ~/system/daemons/pi-orch-health.shexits 0 if restored; if unloaded, plist file is absent from~/Library/LaunchAgents/ - Zero exit-127 entries for these 5 daemon names in
launchctl listwithin 24h of fix - If pi-orch-health is restored: it writes a report to
~/system/state/pi-orch-health-latest.jsonwith mtime within last 1h
-
- Evidence path: 4.1 §3 Gap #6, 4.2 Gap #6 (CONFIRMED), P1.4 §2, P3.1 G4/G5
- Why now / Why this owner: pi-orch-health.sh was the last known diagnostic for orchestrator state; it was deleted on 2026-05-06 when the last recorded status was CRITICAL. Blind monitoring of the primary kernel is not acceptable. FlowForge owns daemon lifecycle.
- BlockedBy: MC-STUB-02 (pi-orch-health.sh restoration requires knowing which health signal to probe — depends on canonical dispatch decision)
MC-STUB-05: Enforce blueprint score gate — eliminate WARN bypass and missing-MC-ID hole
- Subsystem: BUILD-BLUEPRINT discipline / Mehanik gate
- Owner-company: CodeCraft
- Priority: M
- Composite (Leverage × Severity / Effort): 30 (6 × 5 / 1)
- Effort: S (≤2h)
- Cost (token + CEO-action time): ~$0.10 tokens / 5 min CEO (score floor decision: 60 or 90?)
- Acceptance criteria (machine-checkable):
-
grep -n "WARN\|warn" ~/system/hooks/pre-dispatch-gate.shshows no bypass path that allows WARN to proceed without explicit CEO override token - A test run with a blueprint scoring 65 exits gate with non-zero exit code (BLOCKED)
- A run without MC-ID also exits gate with non-zero exit code (BLOCKED)
-
grep "SCORE_FLOOR" ~/system/hooks/pre-dispatch-gate.shreturns a numeric value (60 or 90, per CEO decision)
-
- Evidence path: 4.1 §3 Gap #8, 4.2 Gap #8 (CONFIRMED), P2.3 §2
- Why now / Why this owner: A gate that emits warnings but allows dispatch is theater. The CEO's Mehanik enforcement ceremony is trusted — the underlying gate code must match the ceremony's intent. CodeCraft owns the gate scripting.
- BlockedBy: CEO decision on score floor value (see Section 4)
MC-STUB-06: Design decision + routing update for agent fleet coverage
- Subsystem: Agent fleet / Routing
- Owner-company: CodeCraft (design) + Resolver (if Resolver is activated)
- Priority: M
- Composite (Leverage × Severity / Effort): 18 (7 × 5 / 2) — post-rebuttal adjusted
- Effort: M (≤8h — requires design decision first, then data entry)
- Cost (token + CEO-action time): ~$0.40 tokens / 15 min CEO (routing policy decisions)
- Acceptance criteria (machine-checkable):
- A file
~/system/specs/agent-routing-policy.mdexists defining: which agents are routable via discover.js vs internal-only vs experimental -
node ~/system/tools/discover.js routing "validate acceptance criteria"returns a non-empty company/agent result -
node ~/system/tools/discover.js routing "distill text"returns a non-empty company/agent result -
grep -c '"company"' ~/system/agents/specialist-mapping.jsonis >= the previous count + however many new entries are added (verifiable by diff)
- A file
- Evidence path: 4.1 §3 Gap #5, 4.2 Gap #5 (CONFIRMED BUT UNDER-SPECIFIED)
- Why now / Why this owner: validator (44 skill references) and distiller (21 references) are the most-cited agents without routing entries. Silent dispatch failures are guaranteed when John tries to route tasks that map to these agents. Design decision first, then data entry.
- BlockedBy: CEO decision on routing policy scope (see Section 4); MC-STUB-02 for overall dispatch health
MC-STUB-07: Register or formally archive Axiom / Datavera / Resolver companies
- Subsystem: Agent fleet / Routing
- Owner-company: CodeCraft
- Priority: L
- Composite (Leverage × Severity / Effort): 10 (5 × 4 / 2)
- Effort: M (≤4h — inventory work products, then register or archive)
- Cost (token + CEO-action time): ~$0.20 tokens / 5 min CEO
- Acceptance criteria (machine-checkable):
- Each of Axiom, Datavera, Resolver, Lexicon appears EITHER in
specialist-mapping.json(if active) OR has aSTATUS: experimentalorSTATUS: archivedentry in theircompany.jsonfile -
node ~/system/tools/discover.js routing "axiom"returns a result or a clear "experimental — contact via direct session" message - No company directory under
~/system/agents/personas/has an unresolved routing status (every dir has an explicit status flag)
- Each of Axiom, Datavera, Resolver, Lexicon appears EITHER in
- Evidence path: 4.1 §3 Gap #7, 4.2 Gap #7 (CONFIRMED — all 4 unroutable: Axiom, Datavera, Resolver, Lexicon; Lexicon is absent from specialist-mapping.json)
- Why now / Why this owner: Silent routing fallthrough is a user-experience failure. When a task arrives that maps to Resolver or Lexicon capability, John will receive no routing error — the task will silently fall to the wrong handler. Four companies is a manageable cleanup.
- BlockedBy: MC-STUB-06 (routing policy decision must precede adding more entries)
MC-STUB-08: Restore pi-orchestrator dispatch to operational status
- Subsystem: Orchestration kernel
- Owner-company: CodeCraft
- Priority: H (blocked — becomes H after MC-STUB-02 resolves)
- Composite (Leverage × Severity / Effort): 22.5 (10 × 9 / 4) — Petter's original; blocked on design decision
- Effort: L (≤2d)
- Cost (token + CEO-action time): ~$2.00 tokens / 30 min CEO (architecture + approval of restored config)
- Acceptance criteria (machine-checkable):
- If pi-orch HTTP is the canonical path:
curl -s http://localhost:8401/healthreturns HTTP 200 - If durable-runner is canonical:
node ~/system/tools/mc.js list --status ready --limit 1followed by 5 min wait shows the task state has changed (dispatched or assigned) without manual John intervention - Dispatch log file exists and has an entry with today's date:
grep "$(date +%Y-%m-%d)" ~/system/logs/pi-orchestrator.log | tail -1 - No task with status "ready" sits unprocessed for more than 30 min in an idle queue (monitored via cron probe)
- If pi-orch HTTP is the canonical path:
- Evidence path: 4.1 §3 Gap #2, 4.1 §4 (dual-process dispatch pattern), 4.2 Gap #2 (CONFIRMED BUT MISDESCRIBED)
- Why now / Why this owner: pi-orchestrator is the load-bearing wall of the factory. Without it dispatching automatically, John IS the factory. This is the gap that converts the system from manual radionica to automated pipeline. CodeCraft owns kernel architecture.
- BlockedBy: MC-STUB-02 (canonical dispatch path must be defined before this can be correctly fixed)
MC-STUB-09: Audit and archive Chroma + stale mem0 orphan collections
- Subsystem: Memory plane / Cleanup
- Owner-company: CodeCraft
- Priority: L
- Composite (Leverage × Severity / Effort): 15 (3 × 5 / 1)
- Effort: S (≤2h)
- Cost (token + CEO-action time): ~$0.10 tokens / 0 min CEO
- Acceptance criteria (machine-checkable):
-
curl -s http://localhost:8000/api/v1/collectionseither returns a list with a documented owner for each collection, or returns connection refused (service confirmed decommissioned) - If Chroma is decommissioned: its entry is removed from
~/.claude/settings.jsonMCP server list -
curl -s http://localhost:9000/v1/memories/?user_id=johnreturns either 0 results or a documented "archived" state - A
~/system/specs/memory-plane-canonical.mdfile exists documenting the final memory topology: .md as SoR, LightRAG as searchable index, mem0/Chroma status (deprecated/experimental)
-
- Evidence path: 4.1 §3 Gap #9, 4.2 Gap #9 (CONFIRMED), 4.2 Gap #4 (DISMISSED — mem0 was never SoR; this cleanup is the correct response)
- Why now / Why this owner: Cognitive overhead from orphaned stores creates false recovery paths during incidents. The decommission is straightforward. The documentation artifact (memory-plane-canonical.md) satisfies the dismissed Gap #4 reframing.
- BlockedBy: None (can run in parallel with any Wave A task)
MC-STUB-10: Raise B2 storage cap and verify litestream replication health
- Subsystem: Backup / Infra
- Owner-company: FlowForge
- Priority: M
- Composite (Leverage × Severity / Effort): 28 (4 × 7 / 1)
- Effort: S (≤2h — primarily a billing console action)
- Cost (token + CEO-action time): ~$0.05 tokens / 10 min CEO (billing console access)
- Acceptance criteria (machine-checkable):
-
curl -s -H "Authorization: applicationKey:..." https://api.backblazeb2.com/b2api/v2/b2_get_bucket_inforeturnsstorageCapacity> current used value (cap raised) -
launchctl list | grep litestreamshows LastExitStatus = 0 - A litestream replication log entry exists from the last 24h:
grep "$(date +%Y-%m-%d)" ~/system/logs/litestream.log | tail -1 - Nightly snapshot script exits 0: check
~/system/state/backup-status.jsonshows last_success within 24h
-
- Evidence path: 4.1 §3 Gap #10, 4.2 Gap #10 (CONFIRMED), P1.4 §3, P2.1 Edge 38
- Why now / Why this owner: A capped backup bucket means data loss risk grows each day until raised. The fix is a billing action — no code required. FlowForge owns infra/backup.
- BlockedBy: None; requires CEO credentials for Backblaze console
MC-STUB-11: Document .md + LightRAG as canonical memory pipeline (doc-only)
- Subsystem: Memory plane / Documentation
- Owner-company: Skillforge
- Priority: L
- Composite (Leverage × Severity / Effort): 8 (4 × 4 / 2)
- Effort: M (≤4h — research + write + BookStack publish)
- Cost (token + CEO-action time): ~$0.30 tokens / 5 min CEO (approve publish)
- Acceptance criteria (machine-checkable):
-
~/system/specs/memory-plane-canonical.mdexists (may be produced by MC-STUB-09 instead — share artifact if so) - CLAUDE.md "auto memory" section contains phrase "
.md is canonical" or equivalent explicit statement - BookStack page exists under the Infrastructure book for "Memory Plane Architecture" —
curl -s https://docs.alai.no/books/infrastructure | grep -i "memory"returns a hit - mem0 status is documented as "sandbox/experimental" in the spec (not "active SoR")
-
- Evidence path: 4.2 Gap #4 (DISMISSED — but reframed as doc task, not fix task); 4.2 Gap #4 recommendation: "Document .md is canonical"
- Why now / Why this owner: The dismissed Gap #4 still requires a documentation response. Without an authoritative statement, the next engineer touching the system will re-investigate and potentially re-introduce mem0 wiring. Skillforge produces technical documentation.
- BlockedBy: MC-STUB-09 (confirm Chroma/mem0 decommission state before documenting the final topology)
MC-STUB-12: Wire verify-fix-loop as optional /task-postflight enhancement (Wave C)
- Subsystem: Verifier / QA skill
- Owner-company: Proveo
- Priority: L
- Composite (Leverage × Severity / Effort): 16 (8 × 4 / 2) — post-rebuttal, demoted from H
- Effort: M (≤8h)
- Cost (token + CEO-action time): ~$0.40 tokens / 0 min CEO
- Acceptance criteria (machine-checkable):
-
grep -n "verify-fix-loop" ~/system/agents/skills/task-postflight/SKILL.mdreturns at least 1 match (Section 2b exists) - The section has a conditional trigger: domain IN {docs, system, refactor} AND Proveo PASS
- A dry-run of /task-postflight on a docs-domain MC shows verify-fix-loop invoked (not just Proveo)
- verify-fix-loop invocation does NOT replace Proveo (both must appear in the postflight log)
-
- Evidence path: 4.1 §3 Gap #3, 4.2 Gap #3 (DISPUTED — demoted; Proveo IS the required gate; this is an enhancement)
- Why now / Why this owner: verify-fix-loop is a fully built capability sitting idle. Wiring it as a conditional enhancement (not a required gate) improves self-correction for low-risk domains. Proveo owns the verification pipeline.
- BlockedBy: MC-STUB-08 (pi-orchestrator must be dispatching for auto-invocation to work reliably; in the interim, a manual invocation pattern is acceptable)
Section 2 — Sequencing Graph
Wave A — Immediate, S effort, high leverage (ship first)
These are unblocked today. Combined effort: ~6h. No CEO decisions needed to START.
MC-STUB-01 (RAG drain-worker credential fix)
|
+---> MC-STUB-03 (Live queue depth monitor) [depends on 01 being live]
MC-STUB-04 (Restore 5 dead-script plists) [sub-task: pi-orch-health blocked on STUB-02]
MC-STUB-09 (Chroma/mem0 orphan audit) [parallel, no deps]
MC-STUB-10 (B2 storage cap raise) [parallel, no deps — billing action]
Wave A ships: 01, 03, 09, 10 (immediately); 04 partially (4 of 5 plists — pi-orch-health blocked on STUB-02).
Wave B — After Wave A + CEO decisions
These depend on an architectural decision or on Wave A completing.
MC-STUB-02 (Canonical dispatch path decision)
|
+---> MC-STUB-04 [remainder: pi-orch-health script restoration]
|
+---> MC-STUB-08 (Restore pi-orchestrator dispatch — actual kernel fix)
| |
| +---> MC-STUB-12 (wire verify-fix-loop — optional enhancement, needs dispatch working)
|
+---> MC-STUB-06 (Routing policy decision + specialist-mapping update)
|
+---> MC-STUB-07 (Register Axiom/Datavera/Resolver or archive them)
MC-STUB-05 (Blueprint score gate enforce) [needs CEO score floor decision — otherwise ship at 60]
CEO decision trigger: before MC-STUB-02 can produce a useful output, the CEO must make one call (see Section 4 item #1).
Wave C — Cleanup / hygiene (non-urgent)
No blocking dependencies. Run when bandwidth allows.
MC-STUB-09 --> MC-STUB-11 (memory-plane doc — safe to write after Chroma state is known)
MC-STUB-12 [verify-fix-loop wiring — Wave C because Wave B must stabilize dispatch first]
Full DAG (text form)
[NOW]
STUB-01 (RAG creds) ─────────────────────> STUB-03 (queue monitor)
STUB-04 partial (4 plists)
STUB-09 (Chroma/mem0 audit) ──────────────> STUB-11 (memory doc)
STUB-10 (B2 billing)
[CEO DECISION on dispatch path]
STUB-02 (canonical dispatch decision)
├──> STUB-04 remainder (pi-orch-health)
├──> STUB-08 (pi-orch restore) ──────────> STUB-12 (verify-fix-loop wire)
└──> STUB-06 (routing policy) ──────────> STUB-07 (3 phantom companies)
[CEO DECISION on score floor]
STUB-05 (blueprint gate enforce)
Section 3 — Out of Backlog (and Why)
DISMISSED gaps — not a fix
mem0 SoR wire break (original Gap #4):
Not a break. .md + LightRAG is the actual working design — Claude Code writes .md natively; lightrag-auto-ingest.sh routes .md writes to LightRAG. mem0 was a prototype that was never wired into the active pipeline. CLAUDE.md has zero mention of mem0 as SoR. The correct response is NOT to wire mem0 back — it is to document the actual design (see MC-STUB-11, a documentation-only stub).
verify-fix-loop "unwired" structural gap (original Gap #3): Framing was misleading. CLAUDE.md Hard Constraint #4 requires Proveo verification — and Proveo IS wired and called by /task-postflight. verify-fix-loop is an optional enhancement for docs/system/refactor domains, not the required gate. Adding it is a feature improvement (see MC-STUB-12, demoted to Wave C), not a structural fix.
DEMOTED gaps — lighter scope than original claim
4 phantom companies (original Gap #7 — scope confirmed at 4, not demoted): All 4 companies (Axiom, Datavera, Resolver, Lexicon) are absent from specialist-mapping.json. None are phantom in the sense of missing directories — all have full persona directories — but none are routable via the normal John → discover.js flow. The fix is: inventory work products, then register OR mark as experimental. Addressed in MC-STUB-07 at L priority (documentation + optional routing).
Verifier loop (original Gap #3 — demoted from H to L): Retained as MC-STUB-12 but explicitly classified Wave C, marked as optional enhancement not structural fix. Proveo is the real gate and it is working.
Section 4 — CEO Decision Items
These are blocking decisions that no engineer can make unilaterally. They gate specific MCs.
Decision 1 (CRITICAL — gates MC-STUB-02, 04, 08): Canonical dispatch path
The question: Is durable-runner (port 3052, 20d uptime, stable) the canonical dispatch layer — with pi-orchestrator HTTP (port 8401, dead) being an old control plane that can be decommissioned? OR is pi-orchestrator HTTP supposed to be online, and its deadness is a regression that must be fixed?
Why only CEO can decide: This is an architectural fork. If durable-runner is canonical, FIX is: document it, verify it's processing tasks, and decommission the old HTTP endpoint. If pi-orch HTTP is canonical, FIX is: diagnose startup gating (likely an initialization hang on Ollama or a flag file), restore it, and ensure durable-runner is correctly subordinate.
Options:
- A. durable-runner is canonical dispatcher. pi-orch HTTP is legacy. Document this, decommission port 8401.
- B. pi-orch HTTP is canonical. Diagnose and restore it. durable-runner is subordinate.
- C. Both should be operational. Hybrid model (requires Petter to specify the interaction model).
Decision 2 (M — gates MC-STUB-05): Blueprint score gate floor
The question: What is the enforced minimum score for dispatching a task through Mehanik gate?
Context: Observed practice allows dispatch at score 65 (WARN range). Original spec says 90 is the floor. The gate code currently treats WARN as pass-through. The correct floor must be chosen and hardcoded.
Options:
- A. Lower floor to 60 — match observed practice; WARN is acceptable.
- B. Floor stays at 90 — WARN becomes BLOCK; blueprints must be updated to score higher.
- C. Introduce tiered floors: 60 for L tasks, 75 for M, 90 for H+.
Decision 3 (M — gates MC-STUB-06, 07): Specialist-mapping.json scope policy
The question: Should specialist-mapping.json be comprehensive (cover all 66 agents, all 12 companies) — or curated (cover only primary dispatch paths, leaving internal/helper agents out)?
Why it matters: validator and distiller have 44 and 21 skill references respectively, but may be internal-only agents (called from other agents, not from John). If they're internal-only, they must NOT be in the routing table — they should be in the agent definition files only. If they ARE routable by John, they must be added.
Options:
- A. Curated: only John-dispatchable agents enter the routing table. Internal agents documented separately.
- B. Comprehensive: all agents mapped; entry type field distinguishes dispatch-routable from internal.
Decision 4 (L — informs MC-STUB-09, 11): mem0 future role
The question: What is mem0's long-term status?
Context: 865 stale facts in mem0_john. Zero active writers. .md + LightRAG is the working pipeline. mem0 server is running and consuming resources.
Options:
- A. Deprecate: stop mem0 server; archive its Qdrant vectors; remove from settings.json.
- B. Keep as parallel experimental sandbox: document it as optional enrichment layer, not canonical.
- C. Promote: wire a PostToolUse hook that writes every .md memory update to mem0 simultaneously (highest effort, not recommended).
Petter's recommendation: Option A (deprecate). The .md pipeline is working. mem0 is cognitive overhead with no active consumer.
Report produced by Petter Graff — CodeCraft Lead Architect Source: 4.1-petter-synthesis.md, 4.2-devils-advocate.md Audit date: 2026-05-09 MC stubs: 12 total. CEO selects 1-3 per session from top of each wave.
Validation Reports
5.1 — Proveo Validation Report
AI Factory Audit — Plan Task 5.1 Validator: Angie Jones (Proveo) Date: 2026-05-09 Audit deliverables reviewed: p1/{1.1,1.2,1.3,1.4}, p2/{2.1,2.2,2.3}, p3/3.1-health-matrix.md, p4/{4.1,4.2,4.3}
Section 1 — Probe Re-Run (10% sample of 17 health-matrix rows)
Five probes selected to cover memory (A1), dispatch (C1), RAG (H1), daemon (D1 verifier), and HiveDB (A3).
Probe 1 — mem0 health endpoint (maps to P3.1 row A1)
Original claim (P3.1 A1): mem0 PARTIAL — write acknowledged, semantic search returns count:1 but results:[] for new user_id audit-test.
Fresh probe:
curl -s http://localhost:9000/health
Output:
{"status": "healthy", "backend": "qdrant", "llm": "qwen3:8b-q8_0@ollama",
"embedder": "bge-m3@ollama",
"collections": ["mem0migrations","sessions","hivemind","mem0_john","knowledge"],
"mem0_collection": "mem0_john"}
Verdict: REPRODUCED
mem0 health endpoint returns status: healthy as stated. Qdrant backend and collections list match the P3.1 evidence. The health plane is intact. The partial-retrieval issue noted in P3.1 (write-acknowledged, empty results for new user_id) is consistent with the collections list — audit-test user would not have a named collection in the list above, confirming P3.1's hypothesis about namespace creation lag.
Probe 2 — HiveDB intel count (maps to P3.1 row A3)
Original claim (P3.1 A3): sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;" → 17560, latest entries dated 2026-05-09.
Fresh probe:
sqlite3 ~/system/agents/hivemind/hivemind.db "SELECT COUNT(*) FROM intel;"
Output: 17569
Verdict: REPRODUCED (with expected drift)
Count at probe time is 17,569 — 9 rows above the 17,560 from P3.1. This is a live write-active store; 9 new intel rows in the intervening period is consistent with normal HiveMind alert traffic. P3.1's claim that the store is live and functional is confirmed. The P3.1 "Surprises" note (HiveDB read API exists — P1 claim of "no read API" is wrong) stands confirmed.
Probe 3 — pi-orchestrator PID 75750 alive (maps to P3.1 row C1)
Original claim (P3.1 C1): PID 75750 running since Fri 12pm; curl http://localhost:8401/health → CONNECTION REFUSED.
Fresh probe:
ps aux | grep pi-orchestrator | grep -v grep
Output:
makinja 75750 0.0 0.1 436177552 61728 ?? S fre.12p.m. 0:22.29
/opt/homebrew/bin/node /Users/makinja/system/kernel/pi-orchestrator.js start
Verdict: REPRODUCED
PID 75750 is identical — same process, same start time (Friday 12pm), same command. The process has not been restarted, crashed, or replaced since P3.1 was written. This confirms the pi-orchestrator is running but its internal HTTP listener never came up. P3.1's "PARTIAL" verdict is correct: process alive, control plane dead.
Additional validation: confirmed no port 8401 listener and no verify-fix-loop invocation in kernel or hooks (zero grep hits in ~/system/kernel/pi-orchestrator.js and ~/system/hooks/).
Probe 4 — RAG queue depth (maps to P3.1 row H1)
Original claim (P3.1 H1): cat ~/system/state/rag-drain.prom → total 454 (bookstack:442, evidence:2, mc-outcomes:9, specs:1). File mtime 2026-04-23 17:59 (16 days stale). rag-drain-worker crashed today (exit 256, HiveMind alert #64900).
Fresh probe:
cat ~/system/state/rag-drain.prom
stat -f "%Sm %N" ~/system/state/rag-drain.prom
Output:
alai_ingest_queue_depth{source="bookstack"} 442
alai_ingest_queue_depth{source="evidence"} 2
alai_ingest_queue_depth{source="mc-outcomes"} 9
alai_ingest_queue_depth{source="specs"} 1
alai_ingest_queue_depth_total 454
mtime: Apr 23 17:59:36 2026
Verdict: REPRODUCED
Queue values are byte-for-byte identical (bookstack:442, evidence:2, mc-outcomes:9, specs:1, total:454). File mtime is unchanged at 2026-04-23 17:59:36 — no write has occurred since P3.1 was produced. This confirms the drain-worker remains down and the metric is still frozen. The rag-drain-worker is not recovering on its own. P3.1's "PARTIAL" classification and the 16-days-stale caveat are both accurate.
Note on P1 discrepancy: P3.1 states "P1 claim of 946 appears to be an older snapshot." This is confirmed — 946 does not appear in the current prom file at any level. P1 used a superseded snapshot.
Probe 5 — verify-fix-loop auto-invocation (maps to P3.1 row D1)
Original claim (P3.1 D1): Skill exists at ~/.claude/skills/verify-fix-loop/SKILL.md. Manual-trigger only. No daemon or hook auto-invokes it. P2 verdict "ABSENT" partially wrong — capability exists but auto-invocation is absent.
Fresh probe:
grep -rn "verify-fix-loop" ~/.claude/skills/task-postflight/
grep -rn "verify.fix.loop" ~/system/kernel/pi-orchestrator.js
grep -rn "verify.fix.loop" ~/system/hooks/
Output: All three commands return no output (zero matches).
Confirmed skill exists at ~/.claude/skills/verify-fix-loop/SKILL.md (direct ls confirmed).
No reference to verify-fix-loop in task-postflight SKILL.md, pi-orchestrator kernel, or hooks directory.
Verdict: REPRODUCED
P3.1's nuanced verdict is correct: the skill exists and is indexed, but no automated trigger references it. task-postflight does not call it. The pi-orchestrator kernel (.js, not the .bak) has zero references. The hooks directory has zero references. P2's "ABSENT" framing was imprecise — P3.1's correction ("skill exists as MANUAL-trigger, not auto-invoked") is the accurate characterization.
Section 1 Summary
| Probe | P3.1 Claim | This Probe | Verdict |
|---|---|---|---|
| mem0 health | PARTIAL — healthy endpoint, retrieval gap for new users | Confirmed healthy, collection list consistent with partial behavior | REPRODUCED |
| HiveDB count | WORKS — 17,560, live writes today | 17,569 (+9 rows — normal drift) | REPRODUCED |
| pi-orch PID 75750 | PARTIAL — process alive, HTTP port 8401 dead | Same PID, same uptime, still no port 8401 listener | REPRODUCED |
| RAG queue depth | PARTIAL — 454 frozen, 16d stale, drain-worker down | Identical values, identical mtime, no recovery | REPRODUCED |
| verify-fix-loop | PARTIAL — skill exists, zero auto-invocation wiring | Zero hits in task-postflight, kernel, hooks | REPRODUCED |
All 5 probes: REPRODUCED. No contradictions to P3.1 found.
Section 2 — MC Stub AC Quality Check (all 12 stubs from 4.3)
Criteria applied per each stub:
- AC checklist exists (binary)
- Each AC is machine-checkable (not vague)
- Effort estimate reasonable
- Owner-company makes sense
MC-STUB-01: Restore RAG drain-worker — PASS
AC checklist: YES (5 ACs) Machine-checkable: All 5 are concrete commands with observable exit codes or file stats.
cat /tmp/bw-sessionexits 0 — checkablecurl -s http://localhost:9621/healthreturns{"status":"healthy"}— checkablelaunchctl list | grep rag-drain-workerLastExitStatus = 0 — checkablestat ~/system/state/rag-drain.prommtime within 10 min — checkable- Live queue depth written to new artifact — checkable (file-exists + key-present)
One minor note: the 5th AC references "MC-STUB-03 new artifact" (rag-drain-live.json). This creates a dependency coupling between two stubs' ACs. If MC-STUB-03 is not executed, AC#5 cannot be verified. This is documented in the sequencing graph, but the AC should note the dependency explicitly. Keeping as PASS but noting this coupling.
Effort S (≤2h): Reasonable for a credential session fix + daemon restart. Owner FlowForge: Correct — daemon lifecycle + credential management.
MC-STUB-02: Resolve canonical dispatch path — PASS
AC checklist: YES (4 ACs with conditional branches)
Machine-checkable: The branching structure ("IF pi-orch is canonical: curl 200 / IF durable-runner is canonical: grep dispatch log") is valid. Both branches are machine-checkable. The fourth AC ("no dispatch logs older than 2026-04-01 are the NEWEST entry") is checkable via tail -1 on the log file.
Effort L (≤2d): Reasonable — architectural decision + documentation + live probes. This is design work, not a one-line fix. Owner CodeCraft: Correct — kernel architecture is CodeCraft's domain.
MC-STUB-03: Live RAG queue depth monitoring — PASS
AC checklist: YES (4 ACs) Machine-checkable:
rag-drain-live.jsonexists withqueue_depthkey — checkable- mtime within 5 min — checkable
launchctl list | grep rag-queue-monitorLastExitStatus = 0 — checkable- HiveMind query returns row within last 1h — checkable
Effort M (≤8h): Reasonable for a new monitoring daemon. Owner FlowForge: Correct. BlockedBy MC-STUB-01 is accurate and documented.
MC-STUB-04: Restore or unload 5 deleted-script plists — WEAK
AC checklist: YES (4 ACs)
Machine-checkable: The OR-condition in AC#1 (launchctl list shows ZERO entries OR LastExitStatus=0) is structurally ambiguous for a verifier. A verifier running this check cannot determine which branch was executed without additional context. The check passes in both the "unloaded" and "restored" outcome — which means a verifier cannot distinguish a complete success (restored + healthy) from a partial success (unloaded but not restored). This requires a separate assertion per plist that declares intent.
AC#3 ("Zero exit-127 entries within 24h") uses a 24h observation window — this is time-bound and cannot be machine-checked at point-in-time without log inspection. Recommend: check last 5 launchctl exit codes for each daemon name, not a 24h window.
Effort S (≤2h): Reasonable for an unload/restore task. Owner FlowForge: Correct. Specific fix needed: Split "unloaded" vs "restored" into separate ACs per plist.
MC-STUB-05: Enforce blueprint score gate — PASS
AC checklist: YES (4 ACs) Machine-checkable:
grep -n "WARN\|warn"no bypass path — checkable- Test run with score 65 exits non-zero — checkable (behavioral test)
- Test run without MC-ID exits non-zero — checkable
grep "SCORE_FLOOR"returns numeric value — checkable
The behavioral test ACs (#2 and #3) require a test harness that can invoke the gate with a mock blueprint. This is more complex than a read-only probe but is legitimately machine-checkable via a scripted invocation. Acceptable.
Effort S (≤2h): Reasonable for a shell script edit + test run. Owner CodeCraft: Correct for gate scripting.
MC-STUB-06: Agent fleet routing update — WEAK
AC checklist: YES (4 ACs)
Machine-checkable concern: AC#3 (node ~/system/tools/discover.js routing "validate acceptance criteria") and AC#4 (node ~/system/tools/discover.js routing "distill text") test routing of "validate" and "distill" — but the stub is about adding validator and distiller agents. The query phrases "validate acceptance criteria" and "distill text" may not match the agent names if discover.js uses keyword matching. A query returning "non-empty result" could be satisfied by a different agent (e.g., Proveo for "validate"), making the AC a false PASS. The AC should check that the returned company/agent specifically includes the newly added entry.
AC#4 (grep -c '"company"' specialist-mapping.json >= previous count + new entries): requires knowing the pre-fix count to evaluate post-fix. This is process-dependent and not self-contained.
Effort M (≤8h): Reasonable — design decision + JSON data entry. Owner CodeCraft + Resolver: Correct.
MC-STUB-07: Register or archive Axiom/Datavera/Resolver — PASS
AC checklist: YES (3 ACs) Machine-checkable:
- Each of the three appears in specialist-mapping.json OR has STATUS field in company.json — checkable
discover.js routing "axiom"returns result or explicit message — checkable- No persona directory has unresolved routing status — checkable via scan
Effort M (≤4h): Reasonable for 3-company inventory + status update. Owner CodeCraft: Correct.
MC-STUB-08: Restore pi-orchestrator dispatch — WEAK
AC checklist: YES (4 ACs with conditional branches) Machine-checkable concern: AC#2 (durable-runner branch) states "node ~/system/tools/mc.js list --status ready --limit 1 followed by 5 min wait shows the task state has changed." This is a time-dependent behavioral assertion — a verifier cannot execute a 5-minute wait within a standard probe run. More critically: the state change depends on there being a ready task AND the dispatcher picking it up, which may not be true in a low-traffic environment. This AC can produce false FAILs in idle periods.
AC#4 ("no task with status 'ready' sits unprocessed for more than 30 min in an idle queue — monitored via cron probe") is not a point-in-time checkable assertion. "Monitored via cron probe" means the AC requires an ongoing monitoring setup, not a single verification pass.
Effort L (≤2d): Reasonable — kernel-level architectural work. Owner CodeCraft: Correct. BlockedBy MC-STUB-02: Documented and accurate.
MC-STUB-09: Audit and archive Chroma + stale mem0 — PASS
AC checklist: YES (4 ACs) Machine-checkable:
curl localhost:8000/api/v1/collectionsreturns documented list OR connection refused — checkable- If decommissioned: entry removed from settings.json — checkable
curl localhost:9000/v1/memories/?user_id=john— checkablememory-plane-canonical.mdexists — checkable
Effort S (≤2h): Reasonable — mostly audit + file/config edit. Owner CodeCraft: Acceptable. Could also be FlowForge (infra cleanup), but CodeCraft is defensible given the architectural documentation artifact.
MC-STUB-10: Raise B2 storage cap + litestream health — WEAK
AC checklist: YES (4 ACs)
Machine-checkable concern: AC#1 uses curl -s -H "Authorization: applicationKey:..." https://api.backblazeb2.com/b2api/v2/b2_get_bucket_info. The authorization string is a placeholder — a verifier running this command verbatim will get a 401. The AC must reference the credential lookup method (e.g., bw get item "backblaze-b2-key" --session $(cat /tmp/bw-session)) rather than a literal placeholder. This is an evidence-fabrication risk: a lazy verifier could claim PASS without actually having the credentials.
AC#3 (grep "$(date +%Y-%m-%d)" ~/system/logs/litestream.log | tail -1): requires the litestream log file to exist and be written today. If the log path differs from what's specified, this is a silent FAIL. The AC should include a fallback check for log file existence first.
Effort S (≤2h): Reasonable — billing console action + log verification. Owner FlowForge: Correct.
MC-STUB-11: Document memory pipeline (doc-only) — PASS
AC checklist: YES (4 ACs) Machine-checkable:
memory-plane-canonical.mdexists — checkable- CLAUDE.md contains specific phrase — checkable via grep
- BookStack page exists — checkable via curl
- mem0 status documented as "sandbox/experimental" — checkable via grep in spec
Effort M (≤4h): Reasonable for a doc task. Owner Skillforge: Correct. BlockedBy MC-STUB-09: Documented and logical.
MC-STUB-12: Wire verify-fix-loop (Wave C enhancement) — WEAK
AC checklist: YES (4 ACs)
Machine-checkable concern: AC#3 states "A dry-run of /task-postflight on a docs-domain MC shows verify-fix-loop invoked (not just Proveo)." This requires: (a) a real MC in docs domain to exist, (b) /task-postflight to be invokable in dry-run mode. The stub does not specify whether task-postflight has a --dry-run flag or how to interpret its output to confirm verify-fix-loop was called vs not called. Without a defined output artifact or log to inspect, this AC is not fully machine-checkable.
AC#4 ("verify-fix-loop invocation does NOT replace Proveo — both must appear in the postflight log") is checkable IF the log artifact is defined. Currently "postflight log" is unspecified in the AC — what file path, what format?
Effort M (≤8h): Reasonable. Owner Proveo: Correct — this is Proveo's enhancement of the verification pipeline. BlockedBy MC-STUB-08: Documented. Logical since auto-invocation requires dispatch to work.
Section 2 Summary
| Stub | Score | Key Reason |
|---|---|---|
| MC-STUB-01 | PASS | All 5 ACs concrete and checkable; minor cross-stub dependency coupling noted |
| MC-STUB-02 | PASS | Conditional branch structure is valid; both branches machine-checkable |
| MC-STUB-03 | PASS | All 4 ACs concrete; mtime + launchctl + HiveMind query all verifiable |
| MC-STUB-04 | WEAK | OR-condition in AC#1 prevents distinguishing unload from restore; 24h window not point-checkable |
| MC-STUB-05 | PASS | Behavioral test ACs are valid given scripted invocation harness |
| MC-STUB-06 | WEAK | discover.js routing query may return false PASS from a different agent; count diff AC not self-contained |
| MC-STUB-07 | PASS | All 3 ACs are direct file/command checks |
| MC-STUB-08 | WEAK | 5-min wait AC and 30-min cron-monitoring AC not point-in-time checkable |
| MC-STUB-09 | PASS | All 4 ACs concrete; connection-refused is an explicit acceptable output |
| MC-STUB-10 | WEAK | Authorization placeholder in AC#1 is evidence-fabrication risk; log path not verified to exist |
| MC-STUB-11 | PASS | All 4 ACs are grep/curl/file-exist checks |
| MC-STUB-12 | WEAK | dry-run invocation mechanism undefined; "postflight log" file path unspecified |
PASS: 7 stubs | WEAK: 5 stubs | FAIL: 0 stubs
5 WEAK stubs require AC refinement before dispatch. None are structurally broken — all have correct intent, fixable in ≤30 min each.
Section 3 — Cross-Report Consistency
Finding 3.1: P4.1 mem0 vector count conflicts with P3.1 detail
P4.1 Section 2 (Delta Table, Memory plane row): States "mem0 API has 0 active writers, 865 stale facts." P4.1 Section 4 (Architectural Conclusions): States "mem0/Qdrant (93K+ vectors, zero active writers)."
These two numbers — 865 facts and 93K+ vectors — are not reconciled within P4.1. 865 is the mem0 fact count (application-layer). 93K+ would be the raw Qdrant vector count across all collections (embedding-layer, where each fact generates multiple vectors). P4.1 uses both without clarifying this distinction, creating an apparent contradiction. P3.1 does not cite either figure directly. The delta table figure (865) is more precise and correct as stated; the architectural narrative (93K+) needs a qualifier ("93K+ raw Qdrant embeddings across all collections, including non-mem0 collections such as HiveMind and knowledge").
Severity: LOW — confusing but not misleading about the fix needed.
Finding 3.2: P4.3 references a DISMISSED gap (Gap #3 = verifier loop) via MC-STUB-12
P4.2 Gap #3 verdict: "DISPUTED — demoted." P4.2 concludes the gap framing was misleading and recommends relegating to Wave C enhancement. P4.3 Section 3 (Out of Backlog): Correctly identifies Gap #3 as DEMOTED (not dismissed). MC-STUB-12 is retained in the backlog as a Wave C item with L priority.
This is NOT a contradiction — it is correctly handled. P4.3's "Out of Backlog" section explicitly distinguishes DISMISSED (Gap #4 mem0 SoR) from DEMOTED (Gap #3 verifier loop). The sequencing graph correctly places MC-STUB-12 in Wave C. Consistent.
Finding 3.3: P4.3 MC-STUB-04 claims pi-orch-health plist references pi-orch-health.sh — P3.1 G1 says daemon state is "not running"
P3.1 G1: launchctl print gui/501/com.alai.pi-orch-health → state: not running. Last health report Verdict: CRITICAL (2026-05-06). Scheduled health monitor failing.
P4.3 MC-STUB-04: "pi-orch-health.sh was deleted on 2026-05-06 when the last recorded status was CRITICAL."
These are consistent — daemon not running because script was deleted (exit 127 pattern from P1.4). No conflict.
Finding 3.4: P2.1 connectivity diagram "Dead Edge 1" vs P3.1 C1/C2 — minor framing gap
P2.1 (per P4.2 citation): labels the pi-orchestrator → agent dispatch path as "Dead Edge 1" and characterizes pi-orch as "MOCK MODE."
P3.1 C2: Explicitly finds NO mock config reference in the kernel (grep "mock" → zero matches). Config shows offlineMode: false, enabled: true.
P4.2 rebuttal: Confirms P3.1 is correct — "MOCK MODE" framing is inaccurate; the real issue is HTTP port 8401 startup gating.
Status: P2.1 uses "MOCK MODE" language that P3.1 and P4.2 both correct. P4.1 repeats "mock/broken mod" in the executive summary. P4.3 avoids this language entirely (describes the gap as "HTTP port dead" and "no dispatch logs post-March"). The P4.1 executive summary should be updated to drop "mock mode" — it is an inaccurate framing that has been rebutted by P3.1 probe evidence.
Severity: LOW-MEDIUM — the corrected framing matters for how the CEO frames the fix. "Mock mode" implies intentional test configuration; "HTTP startup gating failure" implies a recoverable initialization bug.
Finding 3.5: P4.1 Gap #5 composite score vs P4.3 MC-STUB-06 composite score — mismatch
P4.1 Gap #5 (Agent routing table incomplete): Composite = 28 (7 × 8 / 2). P4.3 MC-STUB-06 (Design decision + routing update): Composite = 18 (7 × 5 / 2), "post-rebuttal adjusted."
The severity was reduced from 8 to 5 after the devil's advocate review. P4.3 explicitly notes "post-rebuttal adjusted." This is correct — the rebuttal demoted this gap when it found that validator/distiller may be internal-only agents. The composite score difference is intentional and documented, not an error.
Status: Consistent — change is intentional and documented.
Finding 3.6: P4.1 Gap #7 cites "4 phantom companies" — P4.2 + P4.3 correct to 3
P4.1 Gap #7: "4 companies (Axiom, Datavera, Resolver, Lexicon) have full persona dirs... but zero entries in specialist-mapping.json." P4.2 Gap #7 rebuttal: Confirmed Lexicon IS in specialist-mapping.json. Only 3 companies are unroutable. P4.3 MC-STUB-07: Scope correctly adjusted to "Axiom, Datavera, Resolver" (3 companies).
The correction flows correctly through the document chain. P4.1 contains the uncorrected claim (4 companies); P4.2 rebuttal catches it; P4.3 backlog uses the corrected count. This is the intended flow. However, P4.1 should carry a note that its Gap #7 count was revised to 3 by P4.2. As-is, a reader of P4.1 alone gets the wrong number.
Severity: LOW — the correction exists in P4.2 and P4.3; only P4.1 isolation readers are misled.
Section 3 Summary
| Finding | Reports Affected | Severity | Status |
|---|---|---|---|
| 3.1 — mem0 865 facts vs 93K+ vectors unclarified | P4.1 internal | LOW | Minor annotation needed in P4.1 architectural section |
| 3.2 — Dismissed vs Demoted gap classification | P4.2 → P4.3 | NONE | Correctly handled |
| 3.3 — pi-orch-health plist consistency | P3.1 ↔ P4.3 | NONE | Consistent |
| 3.4 — "Mock mode" framing rebutted but survives in P4.1 summary | P2.1 → P4.1 | LOW-MEDIUM | P4.1 executive summary should replace "mock/broken mod" with "HTTP startup gating failure" |
| 3.5 — Composite score change Gap #5 → STUB-06 | P4.1 ↔ P4.3 | NONE | Intentional, documented |
| 3.6 — "4 phantom companies" in P4.1 vs corrected "3" in P4.3 | P4.1 ↔ P4.3 | LOW | P4.1 needs a correction note; P4.3 is correct |
No blocking contradictions found. Three low-severity annotation gaps noted.
Section 4 — Final Verdict
Verdict: REWORK (minor)
The audit deliverables are substantially sound. All 5 re-run probes reproduced P3.1 findings. The fix backlog is correctly prioritized and the sequencing DAG is architecturally coherent. CEO can act on the Wave A items immediately.
However, two categories of rework are required before CEO consumption of the full backlog:
Category A — AC refinement (5 stubs, ≤30 min each):
- MC-STUB-04: Split the "unloaded OR restored" OR-condition into separate per-plist ACs; replace 24h window with last-N-exit-code check.
- MC-STUB-06: Rewrite the discover.js routing ACs to assert the specific agent returned (not just "non-empty result"); make count-diff AC self-contained with an explicit pre-fix baseline command.
- MC-STUB-08: Replace the 5-min-wait behavioral AC with a point-in-time dispatch log check (e.g., log entry exists with today's date). Replace the 30-min cron-monitoring AC with a statement that a cron probe must be set up as a child task.
- MC-STUB-10: Replace the literal
Authorization: applicationKey:...placeholder with a credential retrieval command (bw get item ...); add a log-file existence pre-check before the grep assertion. - MC-STUB-12: Define the "postflight log" artifact path; specify whether task-postflight has a
--dry-runinvocation mode or define an alternative observable output.
Category B — Annotation fixes in P4.1 (≤15 min):
- P4.1 executive summary: Replace "mock/broken mod" for pi-orchestrator with "HTTP port startup gating failure" to match P3.1 and P4.2 corrected findings.
- P4.1 Gap #7: Add a footnote that P4.2 rebuttal revised the affected company count from 4 to 3 (Lexicon confirmed routable).
- P4.1 architectural section: Clarify that "93K+ vectors" is the raw Qdrant embedding count across all collections, not the mem0 fact count (865 application-layer facts).
What CEO CAN act on immediately without rework:
- Wave A tasks (STUB-01, STUB-03, STUB-09, STUB-10 partial) — their ACs are either PASS-rated or the WEAK issues do not affect Wave A execution.
- CEO Decision Items 1-4 in Section 4 of P4.3 — these are architectural choices, not dependent on AC quality.
- The overall gap prioritization and sequencing DAG — both are sound.
Evidence dir: /tmp/ai-factory-audit-2026-05-09/p5/
Validated docs: p3/3.1-health-matrix.md (sha256: f4af148add0d8ee7933da370126cbd90c9c024708d39847c35093e7551b1af98)
Validated docs: p4/4.3-fix-backlog.md (sha256: 48c4728559d9fe307d067e63fc7ccd3c3c68b83a56801e52aa65b565d630b307)
Produced by Angie Jones — Proveo 2026-05-09
Atomic-Claim Verification — AI Factory Audit Synthesis
Verifier: Verifier Agent (read-only) Date: 2026-05-09 Source verified: 4.1-petter-synthesis.md CLAIMS_SOURCE: spec:/tmp/ai-factory-audit-2026-05-09/p4/4.1-petter-synthesis.md
Atoms (one per claim)
A1: "62.5% of advertised control and data flows are dead or degraded"
- Probe: Count LIVE / DEAD / PARTIAL from edge table in 2.1-connectivity-diagram.md Section E
- Output:
Total edges inventoried: 40 LIVE: 15 DEAD: 15 PARTIAL: 10 DEAD + PARTIAL = 25 / 40 = 62.5% (confirmed by 2.1 Summary Statistics table: "The factory has a 37.5% live edge rate.") - Verdict: PASS
- Note: Math is exact. 25 dead or degraded edges out of 40 = 62.5%. The edge table in 2.1 is the audit's own source of truth; Petter's synthesis correctly reports its own source document.
A2: "All actual dispatch is manual-John"
- Probe:
grep -l "verify-fix-loop\|auto.dispatch\|Task(" ~/.claude/hooks/*.sh→ no matches.launchctl list | grep "durable\|pi-orch"→ pi-orchestrator PID 75750 running, durable-runner (orchestrator-bridge) PID 1185 running.tail -5 ~/system/logs/pi-orchestrator/daemon-stdout.log - Output:
[2026-05-09T19:31:19.216Z] [INFO] Starting PI orchestrator cycle (active: 0) [2026-05-09T19:31:19.567Z] [DEBUG] No eligible tasks [2026-05-09T19:31:19.601Z] [INFO] [IDLE] System idle — starting YouTube batch learning grep "No eligible tasks" → 55,351 matches in daemon-stdout.log No hook in ~/.claude/hooks/ calls Task() or verify-fix-loop. - Verdict: PASS
- Note: The pi-orchestrator is live and cycling every 30s, but prints "No eligible tasks" continuously (55,351 such messages in the log). Port 8401 refuses connections (confirmed:
lsof -i :8401returns nothing). No hook fires auto-dispatch. Manual-John is the actual dispatch path.
A3: "CEO is the de-facto verifier for every task that reaches mc.js ready"
- Probe: Read 2.2-verifier-autonomy.md verdict; cross-check P3.1 D1 correction; read CLAUDE.md Hard Constraint #4
- Output:
2.2-verifier-autonomy.md: "Autonomy verdict: ABSENT" P3.1 D1: "SKILL EXISTS at ~/.claude/skills/verify-fix-loop/SKILL.md. Skill is MANUAL-TRIGGER only." 2.2: "CEO is the de-facto verifier for every task that reaches mc.js ready" 4.2 rebuttal: "DISPUTED — Proveo (required gate) IS wired. verify-fix-loop is optional enhancement." CLAUDE.md Hard Constraint #4: "Builder cannot say done. mc.js ready → Proveo → done." - Verdict: PASS — but with an important qualification
- Note: The synthesis headline is accurate in its core claim (no auto-invocation of verify-fix-loop), but the 4.2 devil's advocate correctly shows it overstates the situation. Proveo/Angie Jones IS the mandatory gate and it IS wired via /task-postflight. The CEO-as-verifier pattern holds for tasks where /task-postflight is not invoked (which is itself manual for H tasks only per 2.1 Edge #12: "Manual CLI invocation. H-tasks only"). So the claim is accurate for all tasks that do NOT go through task-postflight, which is the majority. Verdict: PASS with nuance — synthesis is accurate but 4.2's correction is also valid and the synthesis does not incorporate it.
A4: "5 deleted scripts, plists still scheduled"
- Probe: Check each script on disk; check each plist in launchctl
- Output:
MISSING: pi-orch-health.sh (~/system/tools/) MISSING: cost-daily-report.sh (~/system/tools/) MISSING: daily-planning.sh (~/system/tools/) MISSING: legal-docs-azure-sync.sh (~/system/daemons/) MISSING: mcp-health-check.sh (~/system/tools/) launchctl status: LOADED: com.alai.pi-orch-health → exit 127 LOADED: com.alai.cost-daily-report → exit 127 LOADED: com.alai.daily-planning → exit 127 LOADED: com.john.legal-docs-azure-sync → exit 127 LOADED: com.john.mcp-health-check → exit 127 - Verdict: PASS
- Note: All 5 scripts confirmed missing on disk. All 5 plists confirmed loaded in launchctl with exit 127. Petter's claim is exactly correct.
A5: "RAG queue 454 with 16d-stale metric"
- Probe:
cat ~/system/state/rag-drain.prom(mtime + content);sqlite3 -readonly ~/system/state/ingest-queue.sqlite "SELECT COUNT(*) FROM ingest_queue;" - Output:
rag-drain.prom: mtime: 2026-04-23 17:59 (16 days stale — CONFIRMED) alai_ingest_queue_depth_total: 454 (this is the stale snapshot) ingest_queue SQLite (live): SELECT COUNT(*) → 3,150 rows total bookstack: 1703 + 48 = 1751 (duplicate sources — different status?) evidence: 372 + 58 = 430 mc-outcomes: 44 + 10 + 71 = 125 specs: 636 + 102 = 738 rules: 80 manual: 2 - Verdict: FAIL
- Note: The "454" figure is from a 16-day-stale prometheus file — that part is accurate. But the live SQLite shows 3,150 queued items, not 454. The actual queue depth is ~7x worse than the synthesis states. The synthesis (following P3.1 H1) correctly flags the staleness of the metric, but then quotes the stale 454 figure as if it is the actual state. The real state is a 3,150-item frozen queue. The synthesis should have noted the true live count or stated "actual count unknown; stale metric shows 454 as lower bound." This is a significant understatement of severity.
A6: Petter's top-3 gaps listed, then fresh-probed
- Probe: From synthesis Section 1 "5 najkritičnijih praznina" — top-3 are: (1) RAG ingest pipeline blocked, (2) pi-orchestrator in mock/broken mode, (3) Verifier loop capable but not called. Fresh probe each.
- Output:
Gap 1 — RAG ingest pipeline: ingest_queue SQLite = 3,150 items (live). drain-worker crashing (HiveMind #64900 exit 256 today). LightRAG health: 3.1 A2 shows healthy (curl localhost:9621 → 200). Blocker = Vaultwarden auth. STATUS: CONFIRMED AND WORSE THAN STATED (3,150 not 454) Gap 2 — pi-orchestrator: PID 75750 alive. Port 8401: lsof -i :8401 → NOTHING (dead). Log tail: "No eligible tasks" — 55,351 occurrences. offlineMode reference found in pi-orchestrator.js (5 matches incl. "offlineMode: true" in config). Port 3052: lsof -i :3052 → node PID 1185 LISTENING (durable-runner alive). launchctl: com.alai.orchestrator-bridge PID 1185, exit 0. STATUS: CONFIRMED — HTTP dead, durable-runner live but not dispatching. Gap 3 — Verifier loop: ~/.claude/skills/verify-fix-loop/SKILL.md EXISTS. No hook in ~/.claude/hooks/ calls it (grep returns no matches). No daemon with verify-fix-loop call found. STATUS: CONFIRMED — capability exists, zero auto-invocation. - Verdict: PASS (top-3 gaps confirmed by fresh probes; RAG figure is understated but the gap itself is real)
A7: "37 unmapped agents" vs "42 unmapped agents" — which count is in the synthesis?
- Probe:
grep "37\|42" 4.1-petter-synthesis.md | grep -i "unmapped\|agent"→ no results. Read Section 2 table entry for Agent fleet. - Output:
4.1-petter-synthesis.md Section 2 Agent fleet row: "44% mapping coverage (29/66). validator (44 skill refs) and distiller (21 refs) absent from mapping. 7 mapped agents unreachable on disk. 4 companies invisible to routing. 35 chains have no executor." The synthesis does NOT quote "37 unmapped" or "42 unmapped" as a standalone number. P1.3 (1.3-agent-fleet.md) explicitly states: "42 unmapped agents" and breaks down to 11 ORPHAN + 11 DUPLICATE + 20 NEEDS-MAPPING = 42. The prior "37 unmapped" figure appears in the audit brief question but is NOT in P1.3 text. - Verdict: PASS — the synthesis avoids quoting a specific unmapped count; it uses "44% mapping coverage (29/66)" instead, which is accurate (66 - 29 = 37 unmapped, but P1.3 corrects this to 42 because 7 mapped agents are also missing from disk, so the "reachable" count is lower). The synthesis does not contain the discrepant number — the A7 atom is about consistency, and the synthesis is consistent (it omits the count rather than stating it).
- Note: P1.3's 42 figure counts agents in ~/.claude/agents/ not in specialist-mapping.json. The synthesis's choice to use "44%" coverage is the safer framing. No inconsistency to report.
A8: "All 35 chain YAMLs are dead"
- Probe:
ls ~/system/tools/chain-runner.sh,ls ~/system/tools/chain-runner.js, check if chain-runner is invoked by any daemon or skill - Output:
chain-runner.js EXISTS: ~/system/tools/chain-runner.js (31208 bytes, 2026-02-26) Header: "YAML-defined agent chain orchestrator / Runs declarative agent chains defined in ~/system/agents/chains/*.yaml" CLI: node chain-runner.js run <chain-name> / resume / list / show chain-runner.sh EXISTS: ~/system/tools/chain-runner.sh (9281 bytes, 2026-05-07) Header: "Pillar #5 stateless skill-chain runner (one step per tick)" This is what com.alai.chain-daily-inbox calls. grep "chain-runner" ~/.claude/skills/ → NO MATCHES (in non-archived skills) grep "chain-runner" ~/system/daemons/ → NO MATCHES launchctl: com.alai.chain-daily-inbox (exit 1, not running) com.alai.chain-e2e-nightly (exit 1) com.alai.chain-phantom-detector (exit 1) - Verdict: FAIL
- Note: The synthesis claims "35 chain YAML files without a single executor" but chain-runner.js IS a functional chain executor (31KB, CLI-complete, linked to MC #1902). chain-runner.sh is a second runner (Pillar #5). The 1.3-agent-fleet.md also acknowledges chain-runner.sh exists ("com.alai.chain-daily-inbox: failure likely in downstream chain execution"). The chain-runner EXISTS — it is just (a) currently broken/unused due to downstream failures, and (b) not invoked from any active skill. The claim "no chain runner exists" is factually false; the correct claim is "chain runners exist but are broken or un-invoked." This is a meaningful distinction: fixing chains requires fixing the runners' downstream dependencies, not building a runner from scratch.
A9: "pi-orch HTTP dead but durable-runner port 3052 is the dispatch path"
- Probe:
lsof -i :8401,lsof -i :3052,launchctl list | grep "durable\|orchestrator" - Output:
lsof -i :8401 → NO OUTPUT (port 8401 not listening — confirmed dead) lsof -i :3052 → node PID 1185 LISTENING on *:apc-3052 launchctl: 1185 0 com.alai.orchestrator-bridge (PID alive, exit 0) 1212 0 com.john.durable-executor (PID 1212, exit 0) 75750 0 com.john.pi-orchestrator (PID alive, exit 0) - 0 com.john.orchestrator-http (down_exit_0: duplicate) - Verdict: PASS
- Note: Port 8401 confirmed dead. Port 3052 confirmed live (node PID 1185, 20-day uptime per P3.1). The synthesis's claim that durable-runner is the active dispatch path is confirmed structurally. However, P3.1 C1 and 4.2 Gap #2 both note that even the durable-runner shows no dispatch activity post-2026-03-19 — the pi-orchestrator log confirms "No eligible tasks" cycling. So "durable-runner is the dispatch path" is confirmed as the structural path, but it is also idle. The synthesis correctly notes dispatch is unclear via this path; 4.2 appropriately flags this ambiguity.
A10: DISMISSED gaps — are they actually dismissable?
- Probe: Read 4.2 devils advocate dismissal reasoning for mem0 wire and verify-fix-loop; re-check CLAUDE.md for mem0 SoR designation
- Output:
mem0 SoR dismissal (4.2 Gap #4): grep -i "mem0" ~/.claude/CLAUDE.md → 0 matches (confirmed by 4.2) grep -i "System of Record\|SoR" ~/.claude/CLAUDE.md → 0 matches 4.2 reasoning: ".md + LightRAG is INTENDED design; mem0 was never designated SoR" Evidence: lightrag-auto-ingest.sh hook explicitly routes .md → LightRAG (P1.1) Verdict on dismissal: SOUND — mem0 SoR gap is a false positive. CLAUDE.md never designated mem0 as SoR. The .md pipeline is the designed path. verify-fix-loop dismissal (4.2 Gap #3 downgraded to feature request): CLAUDE.md Hard Constraint #4: "mc.js ready → Proveo verification → done" Proveo IS wired via task-postflight (P2.2 confirms). verify-fix-loop is OPTIONAL enhancement, not required gate. 4.2 reasoning: "The REQUIRED verification gate (Proveo) IS wired and working." Verdict on dismissal: SOUND — the required gate exists. CEO-as-verifier claim is overstated because Proveo gate IS the designed verifier; it's just H-tasks only and manual-invoked (per 2.1 Edge #12 PARTIAL). The dismissal is correct that verify-fix-loop is not a gap in required functionality. Phantom companies dismissal of Lexicon (4.2 Gap #7): grep "Lexicon\|lexicon" ~/system/agents/specialist-mapping.json → NO OUTPUT This contradicts 4.2's claim that "Lexicon IS in specialist-mapping.json." 4.2 states: "I found 'company: Lexicon' in the mapping with Dževad Jahić." Live grep returns nothing. P1.3 confirms: "skillforge.md maps to 'Skillforge' not Lexicon." Verdict: 4.2's Lexicon dismissal ERRS. Lexicon is NOT routable via specialist-mapping.json. The 4 phantom companies remain 4, not 3 as 4.2 claims. 4.2 hallucinated a Lexicon entry. - Verdict: PARTIAL FAIL — mem0 and verify-fix-loop dismissals are sound, but the Lexicon phantom-company dismissal is WRONG (4.2 claims Lexicon is mapped; live grep shows it is not).
Confidence Grade
FEEDBACK — Two atoms FAILED with concrete evidence (A5: queue depth understated 454 vs 3,150; A8: chain-runner.js and chain-runner.sh DO exist; A10: Lexicon phantom company dismissal in 4.2 is wrong).
Summary
- Atoms passed: 7 / 10
- Atoms failed: 3 (A5, A8, A10-Lexicon)
- Confidence: FEEDBACK
- Feedback file written: /tmp/verifier-feedback-ai-factory-audit.md