# Health Matrix

# 3.1 Health Matrix — Functional Probe Results
**Audit date:** 2026-05-09 | **Auditor:** sentinel-tester | **Phase:** P3 (functional probes)

---

## Health Matrix

| Component | Test | Status | Evidence (cmd + snippet) |
|---|---|---|---|
| **A1. mem0/qdrant** | POST write (audit-test user) | PARTIAL | `curl http://localhost:9000/add -d '{"text":"audit-2026-05-09 ping test","user_id":"audit-test"}'` → `{"result":{"results":[]},"status":"added"}`. Read-back via `/search` returned `count:1` but `results:[]` — memory acknowledged as added but semantic search returned empty results. Write acknowledged; retrieve path unreliable. |
| **A2. LightRAG** | GET /health + POST /query | WORKS | `curl localhost:9621/health` → `{"status":"healthy","core_version":"1.4.16","pipeline_busy":false}`. POST /query `{"query":"what is ALAI","mode":"naive"}` → 3-paragraph narrative with citations. Full round-trip confirmed. |
| **A3. HiveDB intel** | SELECT COUNT(*) FROM intel | WORKS | `sqlite3 ~/system/databases/hivemind.db "SELECT COUNT(*) FROM intel;"` → `17560`. Latest entries dated 2026-05-09 19:11:24. Write-side confirmed via `hivemind.js query "ALAI"` — 8 results returned, including entries written today. Read AND write both functional. |
| **A3b. HiveMind writer** | Confirm write path exists | WORKS | `node ~/system/agents/hivemind/hivemind.js query "ALAI"` → 8 live results with today's timestamps. Writer: daemon-fleet-watchdog posts alerts; email-agent posts task alerts. Multiple live writers confirmed. |
| **A4. Chroma** | chroma-mcp responsive | BROKEN | `curl http://localhost:8000/api/v1/collections` → no response (empty). Port 8000 not listening. No chroma process found. chroma-mcp listed in settings.json but no running service. |
| **A5. .md auto-memory** | Fresh writes landing? | PARTIAL | `ls -la ~/.claude/projects/-Users-makinja/memory/` — most recent file mtime is `2026-04-30 16:45` (feedback_validation_enforcement_active). `MEMORY.md` itself last written `2026-05-09 19:04` (today, by John session). No automated daemon auto-writing .md files found — writes are manual/session-driven only. Memory lands, but no auto-append pipeline. |
| **B1. HiveMind read API** | Any tool returns intel? | WORKS | `node ~/system/agents/hivemind/hivemind.js read --limit 3` returns intel rows. `hivemind.js query "ALAI"` returns 8 records. P1 claim of "NO read API" is INCORRECT — read API exists and functions. hivemind-mcp.js also exposes `hivemind_read`, `hivemind_query`, `hivemind_semantic_query`. |
| **C1. pi-orchestrator** | Process running? | PARTIAL | `ps aux | grep pi-orchestrator` → PID 75750, running since Fri 12pm. BUT `curl http://localhost:8401/health` → CONNECTION REFUSED. HTTP server not accepting connections. Config shows `httpPort: 8401` but port not open. Dispatch activity unclear — last orchestrator/ dated 2026-03-19. The durable-runner bridge (port 3052) IS live: `{"status":"ok","service":"orchestrator-http","version":"2.0.0","uptime":1726326s}`. |
| **C2. pi-orch mock mode** | Is it truly mock? | PARTIAL | `grep "mock" ~/system/kernel/pi-orchestrator.js` — no `alai-config-mock.json` reference found. Config `offlineMode: false`, `enabled: true`. Latest health state shows `Verdict: CRITICAL` (2026-05-06). Durable-runner bridge healthy. Process running but HTTP port silent and no recent dispatch logs after 2026-03-19. Likely dispatching but to BROKEN downstream (Ollama). |
| **D1. Verifier auto-invocation** | verify-fix-loop grep | PARTIAL | `grep -rn "verify-fix-loop" ~/.claude/skills/` → SKILL EXISTS at `~/.claude/skills/verify-fix-loop/SKILL.md`. Skill is MANUAL-TRIGGER only — "Trigger phrases: verify-fix-loop, auto-verify and fix". No daemon or hook auto-invokes it. P2 verdict ABSENT is partially wrong: skill exists but auto-invocation is absent. |
| **E1. Library skill** | node ~/system/tools/library.js list | WORKS | Returns 13 cookbooks (alai-full:33 skills, dev:17, business:12, security:10, etc.) + 11 defaults. Fully functional CLI. No external endpoint required for `list`. |
| **F1. Mehanik gate** | Token files past 7d | WORKS | `ls /tmp/mehanik-cleared-*` → 10 token files found, all from 2026-05-09. Most recent: `mehanik-cleared-100173` created 18:29:30 today. Corresponding MC #100173 (Bilko landing pages UX audit) confirmed open+assigned to vizu. Token→dispatch correlation confirmed. |
| **G1. com.alai.pi-orch-health** | Daemon exit reason | BROKEN | `launchctl print gui/501/com.alai.pi-orch-health` → `state: not running`. Last health report `Verdict: CRITICAL` (2026-05-06). Scheduled health monitor is itself failing to run consistently. |
| **G2. com.alai.cost-daily-report** | Daemon exit reason | BROKEN | `launchctl print gui/501/com.alai.cost-daily-report` → `state: not running`. No exit code visible via launchctl; likely script dependency failure (BW session or Slack). |
| **G3. com.alai.chain-phantom-detector** | Script exists? | BROKEN | `ls ~/system/daemons/chain-phantom-detector*` → NOT FOUND. plist references `~/system/tools/phantom-link-detector.js` — script name mismatch or renamed. Daemon registered but script path may differ. |
| **G4. com.john.alaiml-retrain** | Exit reason | BROKEN | `state: not running`. Script path: `~/ALAI/internal/projects/alaiML/scripts/retrain.sh` — path under old `~/ALAI/` tree (now symlink). Path itself may still resolve via symlink, but script likely fails on missing MLX or stale config. |
| **G5. com.alai.weekly-planning** | Script exists? | BROKEN | `ls ~/system/daemons/weekly-planning*` → NOT FOUND. plist references `~/system/tools/weekly-planning.sh`. Script absent from daemons dir. |
| **H1. RAG ingest queue** | Current queue depth | PARTIAL | `cat ~/system/state/rag-drain.prom` → total **454** (bookstack:442, mc-outcomes:9, evidence:2, specs:1). NOTE: prom file mtime is **2026-04-23 17:59** — 16 days stale. rag-drain-worker went `running→down_exit_256` today per HiveMind alert #64900. Queue depth of 454 is last known, not live. P1 claim of 946 appears to be an older snapshot. |

---

## Summary Counts

| Status | Count |
|--------|-------|
| WORKS | 5 |
| PARTIAL | 6 |
| BROKEN | 6 |

---

## Surprises (Contradictions vs P1/P2)

### 1. HiveMind READ API EXISTS — P1 claim "no read API" is WRONG
P1 (1.1-memory-plane.md) stated HiveMind has no read/query API. Ground truth: `hivemind.js` exposes `read`, `query`, `semantic_query`, `hybrid_query` subcommands, all functional. `hivemind-mcp.js` wraps all of them as MCP tools. Live query returned 8 results dated today. This is the most significant P1/P2 contradiction.

### 2. pi-orchestrator HTTP port 8401 dead — process alive but silent
The pi-orchestrator process (PID 75750) is running. Config shows `httpPort: 8401`. Port 8401 refuses connections. The *actual* active HTTP bridge is the durable-runner on port 3052 (`uptime 1,726,326s = ~20 days`). The kernel's own HTTP endpoint never came up, or stopped. Dispatch claims in P1/P2 must be qualified: pi-orch kernel runs, but HTTP control plane uses a different process entirely.

### 3. RAG queue: 454, not 946 — and the metric is 16 days stale
P1/P2 cited 946 queued. The prometheus file shows 454 and was last written 2026-04-23. The rag-drain-worker crashed today (exit 256). The queue is not draining, the metric is not being updated, and the actual backlog is unknown. True state: drainer is DOWN, queue age unknown.

### 4. verify-fix-loop SKILL EXISTS — P2 "ABSENT" partially wrong
P2 said verifier auto-invocation is ABSENT. The skill `~/.claude/skills/verify-fix-loop/SKILL.md` exists and is indexed. The verdict should be: skill exists as MANUAL-trigger, not auto-invoked by any daemon or hook. P2 was right about auto-invocation being absent but wrong to imply the capability doesn't exist at all.

### 5. mem0 write acknowledged but search returns empty
mem0 write → `status: added`. Read-back search → `count: 1` but `results: []`. The qdrant backend is running (health endpoint confirms `backend: qdrant`, collections: `["mem0migrations","sessions","hivemind","mem0_john","knowledge"]`). The "audit-test" user_id has no collection, so add may go into a separate namespace not searched. Not a mem0 failure per se — the route logic for new user_id collections may differ from existing ones. Write side appears functional; retrieval for new users is unconfirmed.

---

## Open Questions

1. **mem0 user_id routing**: Does mem0 create a new Qdrant collection per user_id, and does search also need a pre-existing collection to return results? The `audit-test` user returned `count:1` but empty results — is this a namespace creation lag or a real retrieval bug?

2. **pi-orch HTTP port 8401**: Why is port 8401 not open even though the process is running? Is the HTTP server initialization gated behind a condition (Ollama health check, etc.) that's failing?

3. **durable-runner bridge (port 3052) uptime 20 days**: This is the actual dispatch layer. Is it processing tasks, or has it been idle since March? No recent task dispatch logs found post-2026-03-19.

4. **rag-drain-worker exit 256**: What is the exact failure? The queue at 454 is stale and not draining. LightRAG is healthy. The ingest pipe is broken somewhere between queue and LightRAG.

5. **chain-phantom-detector plist vs actual script name**: plist says `phantom-link-detector.js`. Is this the same script? Does it exist under tools/?

6. **MEMORY.md auto-write**: There is no daemon or hook that automatically appends to MEMORY.md. All memory entries are written manually by John during sessions. If a session ends without a write, the event is lost. Is this intentional or a gap?