# Inventory: Daemon Fleet

# AI Factory Daemon Fleet Audit — 2026-05-09

**Auditor:** kelsey-hightower  
**Timestamp:** 2026-05-09T20:48 UTC  
**Source of truth:** `launchctl list` + `daemon-fleet-status.json` (generated 2026-05-09T18:33:52Z) + plist reads + error log sampling  
**Fleet size (watchdog):** 148 tracked entries | 47 running keepalive | 74 calendar_ok | 3 down | 20 erroring  
**Fleet size (launchctl live):** 168 rows matching alai/john/no.alai pattern (includes daemons not in watchdog)

---

## 1. Live Exit-Code Matrix

Column key: PID (`-` = not running) | Last Exit | Plist location | KeepAlive policy | Schedule

### 1a. RUNNING (keepalive, PID alive, exit 0 or -15/SIGTERM)

| Daemon | PID | Exit | Plist Path | KeepAlive | Schedule |
|--------|-----|------|-----------|-----------|----------|
| com.alai.agent-timeout-monitor | 1163 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.cc-api-server | 1183 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.credit-monitor | 1223 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.idle-learning-daemon | 1196 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.litestream | 51452 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.mem0-server | 65706 | -15 (SIGTERM) | Library/LaunchAgents | always | continuous |
| com.alai.mlx-gemma4 | 27321 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen25-coder-32b | 31120 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-32b | 29227 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-8b | 29488 | 0 | (not in known dirs) | always | continuous |
| com.alai.ollama-serve-v2 | 29100 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.orchestrator-bridge | 1185 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.ram-monitor | 1241 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.task-router | 1200 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.web-learning | 1176 | 0 | system/daemons/launchagents | always | continuous |
| com.john.bookstack-webhook-relay | 1206 | 0 | system/daemons/launchagents | always | continuous |
| com.john.browser-worker | 1211 | 0 | system/daemons/launchagents | always | continuous |
| com.john.caddy-vault | 86082 | 0 | system/daemons/launchagents | always | continuous |
| com.john.cloudflared | 79617 | 0 | system/daemons/launchagents | always | continuous |
| com.john.comms-agent | 1186 | 0 | system/daemons/launchagents | always | continuous |
| com.john.documenso-webhook | 20561 | 0 | system/daemons/launchagents | always | continuous |
| com.john.durable-executor | 1212 | 0 | system/daemons/launchagents | always | continuous |
| com.john.edita-loop | 61758 | 0 | system/daemons/launchagents | always | continuous |
| com.john.email-agent | 92225 | 0 | system/daemons/launchagents | calendar | calendar |
| com.john.email-tracker | 11292 | 0 | system/daemons/launchagents | conditional | conditional |
| com.john.event-dispatcher | 65452 | 0 | system/daemons/launchagents | always | continuous |
| com.john.health-dashboard | 1189 | 0 | system/daemons/launchagents | always | continuous |
| com.john.hook-daemon | 1240 | 0 | system/daemons/launchagents | always | continuous |
| com.john.intake-watcher | 41929 | 0 | system/daemons/launchagents | always | continuous |
| com.john.kenan-hot-web | 1231 | 0 | system/daemons/launchagents | always | continuous |
| com.john.llm-datasette | 1170 | 0 | system/daemons/launchagents | always | continuous |
| com.john.mc-dashboard | 65673 | 0 | system/daemons/launchagents | always | continuous |
| com.john.n8n | 1203 | 0 | system/daemons/launchagents | always | continuous |
| com.john.network-watchdog | 1194 | 0 | system/daemons/launchagents | always | continuous |
| com.john.ops-watchdog | 8782 | -15 (SIGTERM) | system/daemons/launchagents | always | continuous |
| com.john.outbox-processor | 1190 | 0 | system/daemons/launchagents | always | continuous |
| com.john.paste-logger | 1224 | 0 | system/daemons/launchagents | always | continuous |
| com.john.pi-orchestrator | 75750 | 0 | system/daemons/launchagents | always | continuous |
| com.john.slack-bot | 18046 | 1 (last crash exit) | system/daemons/launchagents | always | continuous |
| com.john.tender-dashboard | 1234 | 0 | system/daemons/launchagents | always | continuous |
| com.john.tool-shed | 1191 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-keeper | 87005 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-proxy | 1222 | 0 | system/daemons/launchagents | always | continuous |
| com.john.youtube-nightly-learning | 83439 | 0 | system/daemons/launchagents | always | continuous |
| no.alai.claude-proxy | 6361 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit) | system/config/launchagents | conditional | WatchPaths |
| com.alai.daemon-fleet-watchdog | 2815 | 0 | (Library/LaunchAgents) | calendar | every 15min |

### 1b. DOWN — Exit 0 (intentional one-shot or conditional)

| Daemon | PID | Exit | Notes |
|--------|-----|------|-------|
| com.john.autocoder-ui | - | 0 | down_exit_0: one-shot complete |
| com.john.draft-sender | - | 0 | down_exit_0: conditional, no pending drafts |
| com.john.orchestrator-http | - | 0 | down_exit_0: DUPLICATE — orchestrator-bridge runs same script on port 3052 |

### 1c. CALENDAR SCHEDULED — Exit 0 last run (healthy)

These fired successfully on last scheduled run. Not exhaustively listed — watchdog confirms 74 in this state.  
Key members: `com.alai.apply-knowledge`, `com.alai.archive-first-scan`, `com.alai.chain-weekly-report`, `com.alai.docker-watchdog`, `com.alai.gcloud-auth`, `com.alai.john-daily-digest`, `com.alai.lightrag-backup`, `com.alai.memory-watchdog`, `com.alai.meta-agent-loop`, `com.alai.restore-drill`, `com.alai.skill-audit`, `com.alai.team-sync`, `com.alai.wal-checkpoint`, `com.alai.weekly-planning`, `com.alai.zombie-cleanup`, `com.john.agentforge`, `com.john.bookstack-sync`, `com.john.calendar-bridge`, `com.john.critical-tools-healthcheck`, `com.john.daemon-health`, `com.john.db-archival-sweep`, `com.john.db-backup`, `com.john.domain-audit`, `com.john.drift-detector`, `com.john.email-briefing`, `com.john.forge-watchdog`, `com.john.log-rotate`, `com.john.mc-session-worker`, `com.john.morning-routine`, `com.john.offsite-backup`, `com.john.pi2-override-audit`, `com.john.review-drain`, `com.john.session-archiver`, `com.john.session-extractor`, `com.john.spam-recovery-scan`, `com.john.system-guardian`, `com.john.tldr-actionizer`, `com.john.tldr-briefing`, `com.john.tldr-watch`, `com.john.tldr-weekly-synthesis`, `com.john.weekly-synthesis`, `no.alai.email-body-integrity`, `no.alai.meta-agent`, `no.alai.resolver`, `no.alai.spend-guard`.

### 1d. FAILING — Non-zero exit codes

| Daemon | PID | Exit Code | Plist Location | KeepAlive | Schedule |
|--------|-----|-----------|---------------|-----------|----------|
| com.alai.azure-db-backup | - | 1 (exit 256 internal) | system/config/launchagents | none (RunAtLoad=false) | every 4h |
| com.alai.blueprint-fleet-watchdog | - | 1 (exit 256) | Library/LaunchAgents | none | daily 06:15 |
| com.alai.cert-expiry-monitor | - | 1 (exit 256) | system/config/launchagents | none | daily 07:00 |
| com.alai.chain-daily-inbox | - | 1 (exit 256) | Library/LaunchAgents | none | daily 07:00 |
| com.alai.chain-e2e-nightly | - | 1 (exit 256) | Library/LaunchAgents | none | daily 02:00 |
| com.alai.chain-phantom-detector | - | 1 (exit 256) | Library/LaunchAgents | none | every 15min |
| com.alai.cost-daily-report | - | 127 | Library/LaunchAgents | none | daily 23:55 |
| com.alai.daily-planning | - | 127 | Library/LaunchAgents | none | daily 07:30 |
| com.alai.filesystem-audit | - | 1 (exit 256) | Library/LaunchAgents | none | Monday 08:00 |
| com.alai.pi-orch-health | - | 127 | Library/LaunchAgents | none | daily 23:00 |
| com.alai.rag-bookstack-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit, now running) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit, now running) | system/config/launchagents | conditional | WatchPaths |
| com.alai.rag-mc-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rdap-audit-quarterly | - | 2 | Library/LaunchAgents | none | quarterly |
| com.john.alaiml-retrain | - | 1 (exit 256) | system/config/launchagents + Library/LaunchAgents | none | 1st of month 03:00 |
| com.john.auto-verify-regression | - | 1 (exit 256) | system/daemons/launchagents | none | daily 06:00 |
| com.john.b2-offsite-backup | - | 1 (exit 256) | system/daemons/launchagents | none | daily 03:30 |
| com.john.bookstack-staleness | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 22:00 |
| com.john.infra-drift-detector | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 04:00 |
| com.john.legal-docs-azure-sync | - | 127 | Library/LaunchAgents | Crashed=true | daily 02:00 |
| com.john.lightrag-monitor | - | 2 | system/config/launchagents | none | daily 09:00 |
| com.john.mcp-health-check | - | 127 | Library/LaunchAgents | Crashed=true | every 1h |
| com.john.slack-bot | 18046 | 1 (last crash) | system/daemons/launchagents | always | continuous |

### 1e. NOT LOADED (watchdog knows them, launchctl does not)

| Daemon | State |
|--------|-------|
| com.alai.lightrag-migrate-pump | not_loaded |
| com.alai.lightrag-outbox-ingest | not_loaded |
| com.alai.lightrag-watchdog | not_loaded |
| com.john.rdap-audit-quarterly | not_loaded |

---

## 2. Failure Cohort — Root Cause Analysis

### EXIT 127 — Script/binary not found (BROKEN — script deleted)

These five daemons have plists in Library/LaunchAgents pointing to scripts that no longer exist on disk. Exit 127 is bash's "command not found" — the script path itself is gone.

| Daemon | Missing Script | Last Successful Run | Category |
|--------|---------------|--------------------|-|
| com.alai.pi-orch-health | `~/system/tools/pi-orch-health.sh` | 2026-05-06 (verdict: CRITICAL) | BROKEN |
| com.alai.cost-daily-report | `~/system/tools/cost-daily-report.sh` | 2026-04-29 | BROKEN |
| com.alai.daily-planning | `~/system/tools/daily-planning.sh` | unknown | BROKEN |
| com.john.legal-docs-azure-sync | `~/system/daemons/legal-docs-azure-sync.sh` | unknown | BROKEN |
| com.john.mcp-health-check | `~/system/tools/mcp-health-check.sh` | unknown | BROKEN |

**Note on legal-docs-azure-sync and mcp-health-check:** Both have `KeepAlive.Crashed=true`, meaning launchd will restart them on crash. Since they always exit 127, they are in a guaranteed restart loop (throttled). This wastes process spawns indefinitely.

### EXIT 1 / 256 — Script exists but fails at runtime (BROKEN — dependency missing)

| Daemon | Script | Root Cause | Category |
|--------|--------|-----------|----------|
| com.alai.rag-bookstack-adapter | `rag-bookstack-adapter.js` | Queue depth 946 > 500 backpressure gate — never drains because drain-worker cannot reach LightRAG | BROKEN (cascade) |
| com.alai.rag-drain-worker | `rag-drain-worker.js` | Vaultwarden ETIMEDOUT → CF credentials unavailable → LightRAG unreachable | BROKEN |
| com.alai.rag-mc-adapter | `rag-mc-adapter.js` | Same backpressure cascade, queue depth 946 | BROKEN (cascade) |
| com.alai.rag-fsevents-adapter | `rag-fsevents-adapter.js` | Queue depth >500 backpressure, runs but skips all enqueues | BROKEN (cascade) |
| com.alai.azure-db-backup | `azure-db-backup.sh` | `az storage blob upload` SIGTERM'd (line 116); temp dirs leaked in /tmp | TRANSIENT |
| com.alai.cert-expiry-monitor | `cert-expiry-monitor.sh` | Script exists, no error log found — likely network/curl failure | TRANSIENT |
| com.alai.chain-daily-inbox | `chain-runner.sh --enqueue daily-inbox-triage` | chain-runner.sh exists; failure likely in downstream chain execution | TRANSIENT |
| com.alai.chain-e2e-nightly | `chain-e2e-nightly.sh` | Script exists; likely Playwright/network dependency failure | TRANSIENT |
| com.alai.chain-phantom-detector | `phantom-link-detector.js` | Script does NOT exist on disk — MISSING | BROKEN |
| com.alai.filesystem-audit | `~/bin/anvil-audit.sh` | Script exists; last exit 256 may be diff/rename limit warning elevated to exit | TRANSIENT |
| com.alai.blueprint-fleet-watchdog | `~/system/daemons/blueprint-fleet-watchdog.js` | Script exists; likely a missing dep or API auth failure | TRANSIENT |
| com.john.alaiml-retrain | `~/ALAI/internal/projects/alaiML/scripts/retrain.sh` | Script exists; DUPLICATE plist (both config and Library/LaunchAgents); likely venv path or MC dep failure | BROKEN (duplicate) |
| com.john.auto-verify-regression | `auto-verify-regression.js` | Script exists; calls `claim-verifier.js` — probable missing dep or API failure | TRANSIENT |
| com.john.b2-offsite-backup | `b2-offsite-backup.sh` | B2 storage cap EXCEEDED (403 storage_cap_exceeded) and auth token limit errors | BROKEN (infra) |
| com.john.bookstack-staleness | `bookstack-staleness.js` | API parse error "Unexpected end of JSON input" on page 2553+ — BookStack API truncating responses | BROKEN |
| com.john.infra-drift-detector | `infra-drift-detector.sh` | `diff.renameLimit` warning elevated to non-zero exit; git rename detection failing on large repos | TRANSIENT |
| com.john.slack-bot | `(node process)` | WebSocket pong timeouts (ETIMEDOUT); process alive and heartbeating, but launchd saw a crash exit | TRANSIENT |

### EXIT 2 — Logic/health failure

| Daemon | Script | Root Cause | Category |
|--------|--------|-----------|----------|
| com.alai.rdap-audit-quarterly | plist not found in known dirs | Script path unknown, likely MISSING | BROKEN |
| com.john.lightrag-monitor | `lightrag-health-with-alert.sh` | Script exits 1/2 when LightRAG is degraded — this is INTENTIONAL ALERTING behavior, but LightRAG IS degraded | EXPECTED (alarm correctly firing) |

---

## 3. Producer-Consumer Wiring

### RAG Ingest Pipeline (currently DEADLOCKED)

```
com.alai.rag-fsevents-adapter   watches ~/system/evidence, ~/system/specs, ~/system/rules
com.alai.rag-bookstack-adapter  polls BookStack API every 5min
com.alai.rag-mc-adapter         reads ~/system/logs/mc-task-outcomes.jsonl
  --> all three WRITE to ~/system/state/ingest-queue.sqlite (queue depth: 946, frozen)

com.alai.rag-drain-worker (keepalive) reads ingest-queue.sqlite
  --> attempts POST to https://lightrag.basicconsulting.no (via CF Access)
  --> CF credentials lookup: Vaultwarden ETIMEDOUT (bw-session stale or vault unreachable)
  --> LightRAG unreachable → queue never drains → backpressure locks all three producers

ORPHAN OUTPUT: ~/system/metrics/ingest_pipeline.prom written by rag-drain-worker
  --> nothing confirmed reading this file (no Prometheus scrape config found in audit)
```

**This is the single most critical broken pipeline in the factory.** 946 items queued, zero being processed.

### Memory / Knowledge Layer

```
com.alai.mem0-server (PID 65706, keepalive)
  reads/writes: http://localhost:6333 (Qdrant vector store)
  produces: REST API on localhost:9000 (port cslistener)
  consumed by: discover.js, agent tools calling /v1/memories
  STATUS: alive and healthy (health 200, Qdrant 200)
  NOTE: exit -15 (SIGTERM) in launchctl = prior graceful restart; current run is clean

com.alai.litestream (PID 51452, keepalive)
  reads: SQLite DBs in ~/system/state/ (flywheel.db, health-events.db, etc.)
  writes: B2 bucket alai-studio-backup (replication stream)
  STATUS: running but b2-offsite-backup.sh (separate) hitting B2 storage cap

com.alai.wal-checkpoint (calendar, exit 0)
  reads/writes: SQLite WAL files in ~/system/state/
  consumed by: litestream (clean WAL = cleaner replication)
```

### Orchestration Kernel

```
com.john.pi-orchestrator (PID 75750, keepalive)
  reads: Planka MC API (boards.basicconsulting.no per mock config)
  writes: ~/system/logs/pi-orchestrator/daemon-*.log
  STATUS: running, cycling every 30s, "No eligible tasks" — running in MOCK MODE
  NOTE: alai-config-mock.json loaded; real config resolver likely not resolving

com.alai.orchestrator-bridge (PID 1185, keepalive)
  runs: orchestrator-http-server.js on port 3052
  produces: HTTP API for triggering orchestrator actions
  STATUS: running healthy

com.john.orchestrator-http (down_exit_0)
  DUPLICATE of orchestrator-bridge — same script, same port (3052)
  Watchdog says down_exit_0: port already bound by bridge when this tried to start
  ORPHAN: plist in Library/LaunchAgents, shadow of orchestrator-bridge
```

### Backup Layer

```
com.john.b2-offsite-backup (calendar, exit 1)
  reads: ~/system/state/ SQLite snapshots
  writes: B2 bucket alai-studio-backup
  STATUS: BLOCKED — B2 storage cap exceeded (403)

com.alai.azure-db-backup (calendar, exit 1)
  reads: Azure SQL databases (via az CLI)
  writes: ~/system/daemons/azure-db-backup.sh → Azure Blob Storage
  STATUS: TRANSIENT failures, az upload SIGTERM'd (timeout in script or process kill)
  ORPHAN TEMP: /tmp/az-backup-* directories leaking (rm fails on non-empty dirs)
```

### Comms / Slack

```
com.john.slack-bot (PID 18046, keepalive)
  reads: Slack WebSocket (socket-mode)
  writes: Slack messages, ~/system/logs/slack-bot.log
  STATUS: alive, heartbeating, WebSocket reconnects successfully (~once per session)
  CONCERN: 300min silent (no incoming Slack messages received in 5h as of audit time)

no.alai.email-body-integrity (calendar, exit 0)
  reads: IMAP one.com (email body verification)
  writes: ~/system/logs/email-integrity.log
  STATUS: healthy last run
```

### Monitoring / Health

```
com.john.lightrag-monitor (calendar, exit 2)
  reads: LightRAG API health endpoint
  writes: /tmp/lightrag-task-context.json, ~/system/evidence/lightrag-health-*.md
  STATUS: correctly reporting LightRAG as degraded; Slack alert delivery ALSO failing
  ORPHAN OUTPUT: lightrag-health-*.md files accumulating in ~/system/evidence/
    (rag-fsevents-adapter trying to enqueue these — but queue full — circular feedback)

com.alai.daemon-fleet-watchdog (PID 2815, every 15min)
  reads: launchctl list, all plist dirs
  writes: ~/system/state/daemon-fleet-status.json
  STATUS: healthy, data current as of 18:33:52Z today

com.alai.pi-orch-health (calendar, exit 127)
  was: reads pi-orchestrator state, writes ~/system/state/pi-orch-health-*.json
  STATUS: BROKEN — script deleted. Last known verdict (2026-05-06): CRITICAL
```

### MLX / Inference Layer

```
com.alai.mlx-gemma4 (PID 27321)
com.alai.mlx-qwen3-32b (PID 29227)
com.alai.mlx-qwen3-8b (PID 29488)
com.alai.mlx-qwen25-coder-32b (PID 31120)
com.alai.ollama-serve-v2 (PID 29100)
  STATUS: all running (keepalive), exit 0
  PRODUCES: inference endpoints on ANVIL (local)
  Note: plists not found in audited dirs — loaded from unknown location (possibly ~/Library/LaunchAgents subdirs)
```

---

## 4. Critical-Path Daemon Assessment

### com.john.pi-orchestrator
- **PID:** 75750 | **Exit:** 0 | **Status:** RUNNING
- **Healthy?** Process is alive and cycling every 30s. However, it is running in MOCK MODE (`alai-config-mock.json`). The config resolver is not resolving real service URLs (Planka localhost:3100 is not listening per MEMORY.md). "No eligible tasks" every cycle.
- **Produces:** Cycle logs to `~/system/logs/pi-orchestrator/daemon-stdout.log`
- **Consumes:** MC/Planka API (currently mocked, not reaching real board)
- **Verdict:** Process alive but effectively IDLE. Not orchestrating anything. Mock mode = silent failure.

### com.alai.pi-orch-health
- **PID:** - | **Exit:** 127 | **Status:** BROKEN
- **Root cause:** `~/system/tools/pi-orch-health.sh` was deleted. Script ran last on 2026-05-06 with verdict CRITICAL. Now permanently broken until script is restored.
- **Produces:** `~/system/state/pi-orch-health-*.json` (last written 2026-05-06)
- **Verdict:** BROKEN — monitoring of the orchestrator kernel has gone dark.

### com.alai.mem0-server
- **PID:** 65706 | **Exit:** -15 (prior SIGTERM) | **Status:** ALIVE AND HEALTHY
- **Root cause of -15:** launchctl records the exit code of the previous run; the current process (PID 65706) started clean. SIGTERM was a graceful restart, not a crash.
- **Evidence:** Port 9000 listening (lsof confirmed), `/health` returns 200, Qdrant at localhost:6333 returns 200.
- **Note:** `/v1/memories` returning 404 — API route may have changed or not yet initialized.
- **Verdict:** ALIVE. Exit -15 is misleading — current instance is healthy.

### com.john.lightrag-monitor
- **PID:** - | **Exit:** 2 | **Status:** EXPECTED ALARM
- **Root cause:** Script correctly exits non-zero when LightRAG is degraded. LightRAG IS degraded (drain-worker cannot reach it due to missing CF credentials). Slack alert also failing (alert delivery broken).
- **Produces:** `~/system/evidence/lightrag-health-*.md`, `/tmp/lightrag-task-context.json`
- **Verdict:** Monitor itself is working correctly. The degradation it reports is real and severe.

### com.alai.lightrag-keepwarm
- **PID:** - | **Exit:** 0 | **Status:** calendar_ok
- **Plist location:** `~/Library/LaunchAgents/com.alai.lightrag-keepwarm.plist`
- **Schedule:** unknown (plist content not captured in this audit — found late)
- **Produces:** Keepwarm pings to LightRAG
- **Verdict:** Last run exited 0. Likely the keepwarm pings succeed against the local endpoint even while drain-worker cannot auth through CF Access. Not broken.

### com.alai.archive-first-scan
- **PID:** - | **Exit:** 0 | **Status:** calendar_ok | **Schedule:** daily 06:00
- **Script:** `~/bin/archive-first-scan.sh` — EXISTS
- **Produces:** `/tmp/archive-first-scan-report-<date>.txt`, writes to `~/system/state/archive-first-ledger.jsonl`
- **Consumes:** Filesystem scan of unarchived candidates
- **Verdict:** HEALTHY. Running as designed.

### com.john.session-archiver
- **PID:** - | **Exit:** 0 | **Status:** calendar_ok | **Schedule:** daily 03:00
- **Script:** `~/system/tools/session-archiver.js` — EXISTS (10928 bytes, 2026-02-23)
- **Produces:** Cleaned-up session artifacts
- **Consumes:** Claude session logs/state
- **Verdict:** HEALTHY. Last run clean.

### com.alai.cost-daily-report
- **PID:** - | **Exit:** 127 | **Status:** BROKEN | **Schedule:** daily 23:55
- **Root cause:** `~/system/tools/cost-daily-report.sh` deleted. Last successful run 2026-04-29.
- **Produces:** `~/system/reports/cost-daily.md`
- **Consumes:** Cost tracker data
- **Verdict:** BROKEN — daily cost visibility dark for 10 days.

### com.alai.weekly-planning
- **PID:** - | **Exit:** 0 | **Status:** calendar_ok | **Schedule:** Tuesday 08:00
- **Script:** `~/system/tools/weekly-planning.sh` — MISSING from disk
- BUT watchdog says last exit was 0 and state is calendar_ok. Contradiction.
- **Likely explanation:** Ran successfully before script was deleted; launchd has not triggered it since (last Tuesday before deletion date). Will fail as exit 127 next Tuesday.
- **Verdict:** TICKING TIME BOMB — will fail next Tuesday 08:00.

### no.alai.email-body-integrity
- **PID:** - | **Exit:** 0 | **Status:** calendar_ok | **Schedule:** daily 03:00
- **Script:** `~/system/tools/email-body-integrity-check.js` — EXISTS
- **Produces:** `~/system/logs/email-integrity.log`
- **Verdict:** HEALTHY.

---

## 5. Daemon-Fleet-Watchdog State

**File:** `~/system/state/daemon-fleet-status.json`  
**Generated:** 2026-05-09T18:33:52Z (approx 2h15m before this audit)

Watchdog summary from file:
```
total:       148
running:      47 (keepalive processes alive)
calendar_ok:  74 (last scheduled run exit 0)
down:          3 (down_exit_0: autocoder-ui, draft-sender, orchestrator-http)
err:          20 (non-zero exit codes)
```

**Watchdog accuracy notes:**
- Watchdog correctly identifies 20 erroring daemons but exit codes are internally translated (256 = bash exit 1; 32512 = bash exit 127).
- Watchdog does NOT cover all 168 launchctl rows — 4 daemons marked `not_loaded` (lightrag-migrate-pump, lightrag-outbox-ingest, lightrag-watchdog, rdap-audit-quarterly).
- `com.alai.mem0-server` shows `last_exit: 15` (SIGTERM of prior instance) but `state: running` — correct, the current instance is healthy.
- `com.john.slack-bot` shows `running/pid 18046` but `last_exit: 256` — launchd records last crash before current keepalive restart. Process is currently alive.

---

## Open Questions

1. **Pi-orchestrator mock mode:** Why is `alai-config-mock.json` being loaded instead of real config? Is the Planka/MC API intentionally offline, or is the config resolver broken? The orchestrator is spinning idle.

2. **LightRAG CF credentials:** Vaultwarden ETIMEDOUT in `rag-drain-worker`. Is `/tmp/bw-session` stale? Is Vaultwarden (vault.basicconsulting.no) reachable? This single broken auth is deadlocking the entire RAG ingest pipeline (946 items queued).

3. **B2 storage cap:** `403 storage_cap_exceeded` on Backblaze B2. Is this a billing cap that needs to be raised in the B2 console? Litestream is still replicating but the nightly snapshot job fails.

4. **Five deleted scripts:** Who deleted `pi-orch-health.sh`, `cost-daily-report.sh`, `daily-planning.sh`, `legal-docs-azure-sync.sh`, `mcp-health-check.sh`? Were they intentionally removed (deprecated)? If deprecated, the plists should be unloaded. If accidental deletion, restore from backup.

5. **Duplicate alaiml-retrain plist:** Plist exists in BOTH `system/config/launchagents` AND `Library/LaunchAgents`. Two crons would fire. Which is canonical?

6. **`com.john.orchestrator-http` duplicate:** Identical to `com.alai.orchestrator-bridge` (same script, same port). orchestrator-http shows down_exit_0 because bridge already bound the port. Dead plist.

7. **LightRAG health-*.md circular feedback:** The `lightrag-monitor` evidence files are being watched by `rag-fsevents-adapter`, which tries to enqueue them into LightRAG — a monitoring artifact feeding back into the broken pipeline it monitors.

8. **Slack bot silent 300 min:** No incoming Slack messages for 5h at audit time. Is anyone sending messages? Or is the Socket Mode token scope broken for receiving?

---

## Highest-Leverage Fix Candidates (audit-level only)

**Priority 1 — Unlocks entire RAG pipeline (946 items unblocked)**
- Fix `rag-drain-worker` CF Access credentials: ensure Vaultwarden item "LightRAG-CF-Access" exists and `/tmp/bw-session` is valid. One credential fix unblocks bookstack-adapter + mc-adapter + fsevents-adapter simultaneously.

**Priority 2 — Restore cost visibility (10-day blind spot)**
- Restore or recreate `~/system/tools/cost-daily-report.sh`. Last output was 2026-04-29. CEO-visible reporting dark for 10 days.

**Priority 3 — Fix orchestrator mock mode**
- Determine why pi-orchestrator loads mock config. If Planka/MC API is down, restore it. If config resolver is broken, fix `alai-config.js`. The orchestration kernel is running but doing nothing.

**Priority 4 — Raise B2 storage cap**
- B2 bucket `alai-studio-backup` has hit its cap. Nightly database snapshots are not landing. This is a billing action in the Backblaze console, not a code fix.

**Priority 5 — Unload dead plists (5 scripts deleted)**
- `com.alai.pi-orch-health`, `com.alai.cost-daily-report`, `com.alai.daily-planning`, `com.john.legal-docs-azure-sync`, `com.john.mcp-health-check` should either have scripts restored or be unloaded from launchd. `legal-docs-azure-sync` and `mcp-health-check` have `KeepAlive.Crashed=true` creating infinite restart loops.

**Priority 6 — Unload `com.john.orchestrator-http` duplicate plist**
- Dead shadow of orchestrator-bridge. Causes confusion in watchdog counts.

**Priority 7 — Restore `weekly-planning.sh` before next Tuesday**
- Script missing but plist active. Will fail exit 127 at 08:00 next Tuesday.

**Priority 8 — Fix `phantom-link-detector.js` missing script**
- `com.alai.chain-phantom-detector` runs every 15min calling a script that does not exist. High-frequency failure (96 times/day).