Health Matrix

3.1 Health Matrix — Functional Probe Results 
 Audit date: 2026-05-09 | Auditor: sentinel-tester | Phase: P3 (functional probes) 
 
 Health Matrix 
 
 
 
 Component 
 Test 
 Status 
 Evidence (cmd + snippet) 
 
 
 
 
 A1. mem0/qdrant 
 POST write (audit-test user) 
 PARTIAL 
 curl http://localhost:9000/add -d '{"text":"audit-2026-05-09 ping test","user_id":"audit-test"}' → {"result":{"results":[]},"status":"added"} . Read-back via /search returned count:1 but results:[] — memory acknowledged as added but semantic search returned empty results. Write acknowledged; retrieve path unreliable. 
 
 
 A2. LightRAG 
 GET /health + POST /query 
 WORKS 
 curl localhost:9621/health → {"status":"healthy","core_version":"1.4.16","pipeline_busy":false} . POST /query {"query":"what is ALAI","mode":"naive"} → 3-paragraph narrative with citations. Full round-trip confirmed. 
 
 
 A3. HiveDB intel 
 SELECT COUNT(*) FROM intel 
 WORKS 
 sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;" → 17560 . Latest entries dated 2026-05-09 19:11:24. Write-side confirmed via hivemind.js query "ALAI" — 8 results returned, including entries written today. Read AND write both functional. 
 
 
 A3b. HiveMind writer 
 Confirm write path exists 
 WORKS 
 node ~/system/agents/hivemind/hivemind.js query "ALAI" → 8 live results with today's timestamps. Writer: daemon-fleet-watchdog posts alerts; email-agent posts task alerts. Multiple live writers confirmed. 
 
 
 A4. Chroma 
 chroma-mcp responsive 
 BROKEN 
 curl http://localhost:8000/api/v1/collections → no response (empty). Port 8000 not listening. No chroma process found. chroma-mcp listed in settings.json but no running service. 
 
 
 A5. .md auto-memory 
 Fresh writes landing? 
 PARTIAL 
 ls -la ~/.claude/projects/-Users-makinja/memory/ — most recent file mtime is 2026-04-30 16:45 (feedback_validation_enforcement_active). MEMORY.md itself last written 2026-05-09 19:04 (today, by John session). No automated daemon auto-writing .md files found — writes are manual/session-driven only. Memory lands, but no auto-append pipeline. 
 
 
 B1. HiveMind read API 
 Any tool returns intel? 
 WORKS 
 node ~/system/agents/hivemind/hivemind.js read --limit 3 returns intel rows. hivemind.js query "ALAI" returns 8 records. P1 claim of "NO read API" is INCORRECT — read API exists and functions. hivemind-mcp.js also exposes hivemind_read , hivemind_query , hivemind_semantic_query . 
 
 
 C1. pi-orchestrator 
 Process running? 
 PARTIAL 
 `ps aux 
 
 
 C2. pi-orch mock mode 
 Is it truly mock? 
 PARTIAL 
 grep "mock" ~/system/kernel/pi-orchestrator.js — no alai-config-mock.json reference found. Config offlineMode: false , enabled: true . Latest health state shows Verdict: CRITICAL (2026-05-06). Durable-runner bridge healthy. Process running but HTTP port silent and no recent dispatch logs after 2026-03-19. Likely dispatching but to BROKEN downstream (Ollama). 
 
 
 D1. Verifier auto-invocation 
 verify-fix-loop grep 
 PARTIAL 
 grep -rn "verify-fix-loop" ~/.claude/skills/ → SKILL EXISTS at ~/.claude/skills/verify-fix-loop/SKILL.md . Skill is MANUAL-TRIGGER only — "Trigger phrases: verify-fix-loop, auto-verify and fix". No daemon or hook auto-invokes it. P2 verdict ABSENT is partially wrong: skill exists but auto-invocation is absent. 
 
 
 E1. Library skill 
 node ~/system/tools/library.js list 
 WORKS 
 Returns 13 cookbooks (alai-full:33 skills, dev:17, business:12, security:10, etc.) + 11 defaults. Fully functional CLI. No external endpoint required for list . 
 
 
 F1. Mehanik gate 
 Token files past 7d 
 WORKS 
 ls /tmp/mehanik-cleared-* → 10 token files found, all from 2026-05-09. Most recent: mehanik-cleared-100173 created 18:29:30 today. Corresponding MC #100173 (Bilko landing pages UX audit) confirmed open+assigned to vizu. Token→dispatch correlation confirmed. 
 
 
 G1. com.alai.pi-orch-health 
 Daemon exit reason 
 BROKEN 
 launchctl print gui/501/com.alai.pi-orch-health → state: not running . Last health report Verdict: CRITICAL (2026-05-06). Scheduled health monitor is itself failing to run consistently. 
 
 
 G2. com.alai.cost-daily-report 
 Daemon exit reason 
 BROKEN 
 launchctl print gui/501/com.alai.cost-daily-report → state: not running . No exit code visible via launchctl; likely script dependency failure (BW session or Slack). 
 
 
 G3. com.alai.chain-phantom-detector 
 Script exists? 
 BROKEN 
 ls ~/system/daemons/chain-phantom-detector* → NOT FOUND. plist references ~/system/tools/phantom-link-detector.js — script name mismatch or renamed. Daemon registered but script path may differ. 
 
 
 G4. com.john.alaiml-retrain 
 Exit reason 
 BROKEN 
 state: not running . Script path: ~/ALAI/internal/projects/alaiML/scripts/retrain.sh — path under old ~/ALAI/ tree (now symlink). Path itself may still resolve via symlink, but script likely fails on missing MLX or stale config. 
 
 
 G5. com.alai.weekly-planning 
 Script exists? 
 BROKEN 
 ls ~/system/daemons/weekly-planning* → NOT FOUND. plist references ~/system/tools/weekly-planning.sh . Script absent from daemons dir. 
 
 
 H1. RAG ingest queue 
 Current queue depth 
 PARTIAL 
 cat ~/system/state/rag-drain.prom → total 454 (bookstack:442, mc-outcomes:9, evidence:2, specs:1). NOTE: prom file mtime is 2026-04-23 17:59 — 16 days stale. rag-drain-worker went running→down_exit_256 today per HiveMind alert #64900. Queue depth of 454 is last known, not live. P1 claim of 946 appears to be an older snapshot. 
 
 
 
 
 Summary Counts 
 
 
 
 Status 
 Count 
 
 
 
 
 WORKS 
 5 
 
 
 PARTIAL 
 6 
 
 
 BROKEN 
 6 
 
 
 
 
 Surprises (Contradictions vs P1/P2) 
 1. HiveMind READ API EXISTS — P1 claim "no read API" is WRONG 
 P1 (1.1-memory-plane.md) stated HiveMind has no read/query API. Ground truth: hivemind.js exposes read , query , semantic_query , hybrid_query subcommands, all functional. hivemind-mcp.js wraps all of them as MCP tools. Live query returned 8 results dated today. This is the most significant P1/P2 contradiction. 
 2. pi-orchestrator HTTP port 8401 dead — process alive but silent 
 The pi-orchestrator process (PID 75750) is running. Config shows httpPort: 8401 . Port 8401 refuses connections. The actual active HTTP bridge is the durable-runner on port 3052 ( uptime 1,726,326s = ~20 days ). The kernel's own HTTP endpoint never came up, or stopped. Dispatch claims in P1/P2 must be qualified: pi-orch kernel runs, but HTTP control plane uses a different process entirely. 
 3. RAG queue: 454, not 946 — and the metric is 16 days stale 
 P1/P2 cited 946 queued. The prometheus file shows 454 and was last written 2026-04-23. The rag-drain-worker crashed today (exit 256). The queue is not draining, the metric is not being updated, and the actual backlog is unknown. True state: drainer is DOWN, queue age unknown. 
 4. verify-fix-loop SKILL EXISTS — P2 "ABSENT" partially wrong 
 P2 said verifier auto-invocation is ABSENT. The skill ~/.claude/skills/verify-fix-loop/SKILL.md exists and is indexed. The verdict should be: skill exists as MANUAL-trigger, not auto-invoked by any daemon or hook. P2 was right about auto-invocation being absent but wrong to imply the capability doesn't exist at all. 
 5. mem0 write acknowledged but search returns empty 
 mem0 write → status: added . Read-back search → count: 1 but results: [] . The qdrant backend is running (health endpoint confirms backend: qdrant , collections: ["mem0migrations","sessions","hivemind","mem0_john","knowledge"] ). The "audit-test" user_id has no collection, so add may go into a separate namespace not searched. Not a mem0 failure per se — the route logic for new user_id collections may differ from existing ones. Write side appears functional; retrieval for new users is unconfirmed. 
 
 Open Questions 
 
 
 mem0 user_id routing : Does mem0 create a new Qdrant collection per user_id, and does search also need a pre-existing collection to return results? The audit-test user returned count:1 but empty results — is this a namespace creation lag or a real retrieval bug? 
 
 
 pi-orch HTTP port 8401 : Why is port 8401 not open even though the process is running? Is the HTTP server initialization gated behind a condition (Ollama health check, etc.) that's failing? 
 
 
 durable-runner bridge (port 3052) uptime 20 days : This is the actual dispatch layer. Is it processing tasks, or has it been idle since March? No recent task dispatch logs found post-2026-03-19. 
 
 
 rag-drain-worker exit 256 : What is the exact failure? The queue at 454 is stale and not draining. LightRAG is healthy. The ingest pipe is broken somewhere between queue and LightRAG. 
 
 
 chain-phantom-detector plist vs actual script name : plist says phantom-link-detector.js . Is this the same script? Does it exist under tools/? 
 
 
 MEMORY.md auto-write : There is no daemon or hook that automatically appends to MEMORY.md. All memory entries are written manually by John during sessions. If a session ends without a write, the event is lost. Is this intentional or a gap?