Inventory: Daemon Fleet
AI Factory Daemon Fleet Audit — 2026-05-09
Auditor: kelsey-hightower
Timestamp: 2026-05-09T20:48 UTC
Source of truth: launchctl list + daemon-fleet-status.json (generated 2026-05-09T18:33:52Z) + plist reads + error log sampling
Fleet size (watchdog): 148 tracked entries | 47 running keepalive | 74 calendar_ok | 3 down | 20 erroring
Fleet size (launchctl live): 168 rows matching alai/john/no.alai pattern (includes daemons not in watchdog)
1. Live Exit-Code Matrix
Column key: PID (- = not running) | Last Exit | Plist location | KeepAlive policy | Schedule
1a. RUNNING (keepalive, PID alive, exit 0 or -15/SIGTERM)
| Daemon | PID | Exit | Plist Path | KeepAlive | Schedule |
|---|---|---|---|---|---|
| com.alai.agent-timeout-monitor | 1163 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.cc-api-server | 1183 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.credit-monitor | 1223 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.idle-learning-daemon | 1196 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.litestream | 51452 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.mem0-server | 65706 | -15 (SIGTERM) | Library/LaunchAgents | always | continuous |
| com.alai.mlx-gemma4 | 27321 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen25-coder-32b | 31120 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-32b | 29227 | 0 | (not in known dirs) | always | continuous |
| com.alai.mlx-qwen3-8b | 29488 | 0 | (not in known dirs) | always | continuous |
| com.alai.ollama-serve-v2 | 29100 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.orchestrator-bridge | 1185 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.ram-monitor | 1241 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.task-router | 1200 | 0 | system/daemons/launchagents | always | continuous |
| com.alai.web-learning | 1176 | 0 | system/daemons/launchagents | always | continuous |
| com.john.bookstack-webhook-relay | 1206 | 0 | system/daemons/launchagents | always | continuous |
| com.john.browser-worker | 1211 | 0 | system/daemons/launchagents | always | continuous |
| com.john.caddy-vault | 86082 | 0 | system/daemons/launchagents | always | continuous |
| com.john.cloudflared | 79617 | 0 | system/daemons/launchagents | always | continuous |
| com.john.comms-agent | 1186 | 0 | system/daemons/launchagents | always | continuous |
| com.john.documenso-webhook | 20561 | 0 | system/daemons/launchagents | always | continuous |
| com.john.durable-executor | 1212 | 0 | system/daemons/launchagents | always | continuous |
| com.john.edita-loop | 61758 | 0 | system/daemons/launchagents | always | continuous |
| com.john.email-agent | 92225 | 0 | system/daemons/launchagents | calendar | calendar |
| com.john.email-tracker | 11292 | 0 | system/daemons/launchagents | conditional | conditional |
| com.john.event-dispatcher | 65452 | 0 | system/daemons/launchagents | always | continuous |
| com.john.health-dashboard | 1189 | 0 | system/daemons/launchagents | always | continuous |
| com.john.hook-daemon | 1240 | 0 | system/daemons/launchagents | always | continuous |
| com.john.intake-watcher | 41929 | 0 | system/daemons/launchagents | always | continuous |
| com.john.kenan-hot-web | 1231 | 0 | system/daemons/launchagents | always | continuous |
| com.john.llm-datasette | 1170 | 0 | system/daemons/launchagents | always | continuous |
| com.john.mc-dashboard | 65673 | 0 | system/daemons/launchagents | always | continuous |
| com.john.n8n | 1203 | 0 | system/daemons/launchagents | always | continuous |
| com.john.network-watchdog | 1194 | 0 | system/daemons/launchagents | always | continuous |
| com.john.ops-watchdog | 8782 | -15 (SIGTERM) | system/daemons/launchagents | always | continuous |
| com.john.outbox-processor | 1190 | 0 | system/daemons/launchagents | always | continuous |
| com.john.paste-logger | 1224 | 0 | system/daemons/launchagents | always | continuous |
| com.john.pi-orchestrator | 75750 | 0 | system/daemons/launchagents | always | continuous |
| com.john.slack-bot | 18046 | 1 (last crash exit) | system/daemons/launchagents | always | continuous |
| com.john.tender-dashboard | 1234 | 0 | system/daemons/launchagents | always | continuous |
| com.john.tool-shed | 1191 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-keeper | 87005 | 0 | system/daemons/launchagents | always | continuous |
| com.john.vault-proxy | 1222 | 0 | system/daemons/launchagents | always | continuous |
| com.john.youtube-nightly-learning | 83439 | 0 | system/daemons/launchagents | always | continuous |
| no.alai.claude-proxy | 6361 | 0 | Library/LaunchAgents | always | continuous |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit) | system/config/launchagents | conditional | WatchPaths |
| com.alai.daemon-fleet-watchdog | 2815 | 0 | (Library/LaunchAgents) | calendar | every 15min |
1b. DOWN — Exit 0 (intentional one-shot or conditional)
| Daemon | PID | Exit | Notes |
|---|---|---|---|
| com.john.autocoder-ui | - | 0 | down_exit_0: one-shot complete |
| com.john.draft-sender | - | 0 | down_exit_0: conditional, no pending drafts |
| com.john.orchestrator-http | - | 0 | down_exit_0: DUPLICATE — orchestrator-bridge runs same script on port 3052 |
1c. CALENDAR SCHEDULED — Exit 0 last run (healthy)
These fired successfully on last scheduled run. Not exhaustively listed — watchdog confirms 74 in this state.
Key members: com.alai.apply-knowledge, com.alai.archive-first-scan, com.alai.chain-weekly-report, com.alai.docker-watchdog, com.alai.gcloud-auth, com.alai.john-daily-digest, com.alai.lightrag-backup, com.alai.memory-watchdog, com.alai.meta-agent-loop, com.alai.restore-drill, com.alai.skill-audit, com.alai.team-sync, com.alai.wal-checkpoint, com.alai.weekly-planning, com.alai.zombie-cleanup, com.john.agentforge, com.john.bookstack-sync, com.john.calendar-bridge, com.john.critical-tools-healthcheck, com.john.daemon-health, com.john.db-archival-sweep, com.john.db-backup, com.john.domain-audit, com.john.drift-detector, com.john.email-briefing, com.john.forge-watchdog, com.john.log-rotate, com.john.mc-session-worker, com.john.morning-routine, com.john.offsite-backup, com.john.pi2-override-audit, com.john.review-drain, com.john.session-archiver, com.john.session-extractor, com.john.spam-recovery-scan, com.john.system-guardian, com.john.tldr-actionizer, com.john.tldr-briefing, com.john.tldr-watch, com.john.tldr-weekly-synthesis, com.john.weekly-synthesis, no.alai.email-body-integrity, no.alai.meta-agent, no.alai.resolver, no.alai.spend-guard.
1d. FAILING — Non-zero exit codes
| Daemon | PID | Exit Code | Plist Location | KeepAlive | Schedule |
|---|---|---|---|---|---|
| com.alai.azure-db-backup | - | 1 (exit 256 internal) | system/config/launchagents | none (RunAtLoad=false) | every 4h |
| com.alai.blueprint-fleet-watchdog | - | 1 (exit 256) | Library/LaunchAgents | none | daily 06:15 |
| com.alai.cert-expiry-monitor | - | 1 (exit 256) | system/config/launchagents | none | daily 07:00 |
| com.alai.chain-daily-inbox | - | 1 (exit 256) | Library/LaunchAgents | none | daily 07:00 |
| com.alai.chain-e2e-nightly | - | 1 (exit 256) | Library/LaunchAgents | none | daily 02:00 |
| com.alai.chain-phantom-detector | - | 1 (exit 256) | Library/LaunchAgents | none | every 15min |
| com.alai.cost-daily-report | - | 127 | Library/LaunchAgents | none | daily 23:55 |
| com.alai.daily-planning | - | 127 | Library/LaunchAgents | none | daily 07:30 |
| com.alai.filesystem-audit | - | 1 (exit 256) | Library/LaunchAgents | none | Monday 08:00 |
| com.alai.pi-orch-health | - | 127 | Library/LaunchAgents | none | daily 23:00 |
| com.alai.rag-bookstack-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rag-drain-worker | 3640 | 1 (prev exit, now running) | system/config/launchagents | always | continuous |
| com.alai.rag-fsevents-adapter | 64755 | 1 (prev exit, now running) | system/config/launchagents | conditional | WatchPaths |
| com.alai.rag-mc-adapter | - | 1 (exit 256) | system/config/launchagents | none | every 5min |
| com.alai.rdap-audit-quarterly | - | 2 | Library/LaunchAgents | none | quarterly |
| com.john.alaiml-retrain | - | 1 (exit 256) | system/config/launchagents + Library/LaunchAgents | none | 1st of month 03:00 |
| com.john.auto-verify-regression | - | 1 (exit 256) | system/daemons/launchagents | none | daily 06:00 |
| com.john.b2-offsite-backup | - | 1 (exit 256) | system/daemons/launchagents | none | daily 03:30 |
| com.john.bookstack-staleness | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 22:00 |
| com.john.infra-drift-detector | - | 1 (exit 256) | system/daemons/launchagents | none | Sunday 04:00 |
| com.john.legal-docs-azure-sync | - | 127 | Library/LaunchAgents | Crashed=true | daily 02:00 |
| com.john.lightrag-monitor | - | 2 | system/config/launchagents | none | daily 09:00 |
| com.john.mcp-health-check | - | 127 | Library/LaunchAgents | Crashed=true | every 1h |
| com.john.slack-bot | 18046 | 1 (last crash) | system/daemons/launchagents | always | continuous |
1e. NOT LOADED (watchdog knows them, launchctl does not)
| Daemon | State |
|---|---|
| com.alai.lightrag-migrate-pump | not_loaded |
| com.alai.lightrag-outbox-ingest | not_loaded |
| com.alai.lightrag-watchdog | not_loaded |
| com.john.rdap-audit-quarterly | not_loaded |
2. Failure Cohort — Root Cause Analysis
EXIT 127 — Script/binary not found (BROKEN — script deleted)
These five daemons have plists in Library/LaunchAgents pointing to scripts that no longer exist on disk. Exit 127 is bash's "command not found" — the script path itself is gone.
| Daemon | Missing Script | Last Successful Run | Category |
|---|---|---|---|
| com.alai.pi-orch-health | ~/system/tools/pi-orch-health.sh |
2026-05-06 (verdict: CRITICAL) | BROKEN |
| com.alai.cost-daily-report | ~/system/tools/cost-daily-report.sh |
2026-04-29 | BROKEN |
| com.alai.daily-planning | ~/system/tools/daily-planning.sh |
unknown | BROKEN |
| com.john.legal-docs-azure-sync | ~/system/daemons/legal-docs-azure-sync.sh |
unknown | BROKEN |
| com.john.mcp-health-check | ~/system/tools/mcp-health-check.sh |
unknown | BROKEN |
Note on legal-docs-azure-sync and mcp-health-check: Both have KeepAlive.Crashed=true, meaning launchd will restart them on crash. Since they always exit 127, they are in a guaranteed restart loop (throttled). This wastes process spawns indefinitely.
EXIT 1 / 256 — Script exists but fails at runtime (BROKEN — dependency missing)
| Daemon | Script | Root Cause | Category |
|---|---|---|---|
| com.alai.rag-bookstack-adapter | rag-bookstack-adapter.js |
Queue depth 946 > 500 backpressure gate — never drains because drain-worker cannot reach LightRAG | BROKEN (cascade) |
| com.alai.rag-drain-worker | rag-drain-worker.js |
Vaultwarden ETIMEDOUT → CF credentials unavailable → LightRAG unreachable | BROKEN |
| com.alai.rag-mc-adapter | rag-mc-adapter.js |
Same backpressure cascade, queue depth 946 | BROKEN (cascade) |
| com.alai.rag-fsevents-adapter | rag-fsevents-adapter.js |
Queue depth >500 backpressure, runs but skips all enqueues | BROKEN (cascade) |
| com.alai.azure-db-backup | azure-db-backup.sh |
az storage blob upload SIGTERM'd (line 116); temp dirs leaked in /tmp |
TRANSIENT |
| com.alai.cert-expiry-monitor | cert-expiry-monitor.sh |
Script exists, no error log found — likely network/curl failure | TRANSIENT |
| com.alai.chain-daily-inbox | chain-runner.sh --enqueue daily-inbox-triage |
chain-runner.sh exists; failure likely in downstream chain execution | TRANSIENT |
| com.alai.chain-e2e-nightly | chain-e2e-nightly.sh |
Script exists; likely Playwright/network dependency failure | TRANSIENT |
| com.alai.chain-phantom-detector | phantom-link-detector.js |
Script does NOT exist on disk — MISSING | BROKEN |
| com.alai.filesystem-audit | ~/bin/anvil-audit.sh |
Script exists; last exit 256 may be diff/rename limit warning elevated to exit | TRANSIENT |
| com.alai.blueprint-fleet-watchdog | ~/system/daemons/blueprint-fleet-watchdog.js |
Script exists; likely a missing dep or API auth failure | TRANSIENT |
| com.john.alaiml-retrain | ~/ALAI/internal/projects/alaiML/scripts/retrain.sh |
Script exists; DUPLICATE plist (both config and Library/LaunchAgents); likely venv path or MC dep failure | BROKEN (duplicate) |
| com.john.auto-verify-regression | auto-verify-regression.js |
Script exists; calls claim-verifier.js — probable missing dep or API failure |
TRANSIENT |
| com.john.b2-offsite-backup | b2-offsite-backup.sh |
B2 storage cap EXCEEDED (403 storage_cap_exceeded) and auth token limit errors | BROKEN (infra) |
| com.john.bookstack-staleness | bookstack-staleness.js |
API parse error "Unexpected end of JSON input" on page 2553+ — BookStack API truncating responses | BROKEN |
| com.john.infra-drift-detector | infra-drift-detector.sh |
diff.renameLimit warning elevated to non-zero exit; git rename detection failing on large repos |
TRANSIENT |
| com.john.slack-bot | (node process) |
WebSocket pong timeouts (ETIMEDOUT); process alive and heartbeating, but launchd saw a crash exit | TRANSIENT |
EXIT 2 — Logic/health failure
| Daemon | Script | Root Cause | Category |
|---|---|---|---|
| com.alai.rdap-audit-quarterly | plist not found in known dirs | Script path unknown, likely MISSING | BROKEN |
| com.john.lightrag-monitor | lightrag-health-with-alert.sh |
Script exits 1/2 when LightRAG is degraded — this is INTENTIONAL ALERTING behavior, but LightRAG IS degraded | EXPECTED (alarm correctly firing) |
3. Producer-Consumer Wiring
RAG Ingest Pipeline (currently DEADLOCKED)
com.alai.rag-fsevents-adapter watches ~/system/evidence, ~/system/specs, ~/system/rules
com.alai.rag-bookstack-adapter polls BookStack API every 5min
com.alai.rag-mc-adapter reads ~/system/logs/mc-task-outcomes.jsonl
--> all three WRITE to ~/system/state/ingest-queue.sqlite (queue depth: 946, frozen)
com.alai.rag-drain-worker (keepalive) reads ingest-queue.sqlite
--> attempts POST to https://lightrag.basicconsulting.no (via CF Access)
--> CF credentials lookup: Vaultwarden ETIMEDOUT (bw-session stale or vault unreachable)
--> LightRAG unreachable → queue never drains → backpressure locks all three producers
ORPHAN OUTPUT: ~/system/metrics/ingest_pipeline.prom written by rag-drain-worker
--> nothing confirmed reading this file (no Prometheus scrape config found in audit)
This is the single most critical broken pipeline in the factory. 946 items queued, zero being processed.
Memory / Knowledge Layer
com.alai.mem0-server (PID 65706, keepalive)
reads/writes: http://localhost:6333 (Qdrant vector store)
produces: REST API on localhost:9000 (port cslistener)
consumed by: discover.js, agent tools calling /v1/memories
STATUS: alive and healthy (health 200, Qdrant 200)
NOTE: exit -15 (SIGTERM) in launchctl = prior graceful restart; current run is clean
com.alai.litestream (PID 51452, keepalive)
reads: SQLite DBs in ~/system/state/ (flywheel.db, health-events.db, etc.)
writes: B2 bucket alai-studio-backup (replication stream)
STATUS: running but b2-offsite-backup.sh (separate) hitting B2 storage cap
com.alai.wal-checkpoint (calendar, exit 0)
reads/writes: SQLite WAL files in ~/system/state/
consumed by: litestream (clean WAL = cleaner replication)
Orchestration Kernel
com.john.pi-orchestrator (PID 75750, keepalive)
reads: Planka MC API (boards.basicconsulting.no per mock config)
writes: ~/system/logs/pi-orchestrator/daemon-*.log
STATUS: running, cycling every 30s, "No eligible tasks" — running in MOCK MODE
NOTE: alai-config-mock.json loaded; real config resolver likely not resolving
com.alai.orchestrator-bridge (PID 1185, keepalive)
runs: orchestrator-http-server.js on port 3052
produces: HTTP API for triggering orchestrator actions
STATUS: running healthy
com.john.orchestrator-http (down_exit_0)
DUPLICATE of orchestrator-bridge — same script, same port (3052)
Watchdog says down_exit_0: port already bound by bridge when this tried to start
ORPHAN: plist in Library/LaunchAgents, shadow of orchestrator-bridge
Backup Layer
com.john.b2-offsite-backup (calendar, exit 1)
reads: ~/system/state/ SQLite snapshots
writes: B2 bucket alai-studio-backup
STATUS: BLOCKED — B2 storage cap exceeded (403)
com.alai.azure-db-backup (calendar, exit 1)
reads: Azure SQL databases (via az CLI)
writes: ~/system/daemons/azure-db-backup.sh → Azure Blob Storage
STATUS: TRANSIENT failures, az upload SIGTERM'd (timeout in script or process kill)
ORPHAN TEMP: /tmp/az-backup-* directories leaking (rm fails on non-empty dirs)
Comms / Slack
com.john.slack-bot (PID 18046, keepalive)
reads: Slack WebSocket (socket-mode)
writes: Slack messages, ~/system/logs/slack-bot.log
STATUS: alive, heartbeating, WebSocket reconnects successfully (~once per session)
CONCERN: 300min silent (no incoming Slack messages received in 5h as of audit time)
no.alai.email-body-integrity (calendar, exit 0)
reads: IMAP one.com (email body verification)
writes: ~/system/logs/email-integrity.log
STATUS: healthy last run
Monitoring / Health
com.john.lightrag-monitor (calendar, exit 2)
reads: LightRAG API health endpoint
writes: /tmp/lightrag-task-context.json, ~/system/evidence/lightrag-health-*.md
STATUS: correctly reporting LightRAG as degraded; Slack alert delivery ALSO failing
ORPHAN OUTPUT: lightrag-health-*.md files accumulating in ~/system/evidence/
(rag-fsevents-adapter trying to enqueue these — but queue full — circular feedback)
com.alai.daemon-fleet-watchdog (PID 2815, every 15min)
reads: launchctl list, all plist dirs
writes: ~/system/state/daemon-fleet-status.json
STATUS: healthy, data current as of 18:33:52Z today
com.alai.pi-orch-health (calendar, exit 127)
was: reads pi-orchestrator state, writes ~/system/state/pi-orch-health-*.json
STATUS: BROKEN — script deleted. Last known verdict (2026-05-06): CRITICAL
MLX / Inference Layer
com.alai.mlx-gemma4 (PID 27321)
com.alai.mlx-qwen3-32b (PID 29227)
com.alai.mlx-qwen3-8b (PID 29488)
com.alai.mlx-qwen25-coder-32b (PID 31120)
com.alai.ollama-serve-v2 (PID 29100)
STATUS: all running (keepalive), exit 0
PRODUCES: inference endpoints on ANVIL (local)
Note: plists not found in audited dirs — loaded from unknown location (possibly ~/Library/LaunchAgents subdirs)
4. Critical-Path Daemon Assessment
com.john.pi-orchestrator
- PID: 75750 | Exit: 0 | Status: RUNNING
- Healthy? Process is alive and cycling every 30s. However, it is running in MOCK MODE (
alai-config-mock.json). The config resolver is not resolving real service URLs (Planka localhost:3100 is not listening per MEMORY.md). "No eligible tasks" every cycle. - Produces: Cycle logs to
~/system/logs/pi-orchestrator/daemon-stdout.log - Consumes: MC/Planka API (currently mocked, not reaching real board)
- Verdict: Process alive but effectively IDLE. Not orchestrating anything. Mock mode = silent failure.
com.alai.pi-orch-health
- PID: - | Exit: 127 | Status: BROKEN
- Root cause:
~/system/tools/pi-orch-health.shwas deleted. Script ran last on 2026-05-06 with verdict CRITICAL. Now permanently broken until script is restored. - Produces:
~/system/state/pi-orch-health-*.json(last written 2026-05-06) - Verdict: BROKEN — monitoring of the orchestrator kernel has gone dark.
com.alai.mem0-server
- PID: 65706 | Exit: -15 (prior SIGTERM) | Status: ALIVE AND HEALTHY
- Root cause of -15: launchctl records the exit code of the previous run; the current process (PID 65706) started clean. SIGTERM was a graceful restart, not a crash.
- Evidence: Port 9000 listening (lsof confirmed),
/healthreturns 200, Qdrant at localhost:6333 returns 200. - Note:
/v1/memoriesreturning 404 — API route may have changed or not yet initialized. - Verdict: ALIVE. Exit -15 is misleading — current instance is healthy.
com.john.lightrag-monitor
- PID: - | Exit: 2 | Status: EXPECTED ALARM
- Root cause: Script correctly exits non-zero when LightRAG is degraded. LightRAG IS degraded (drain-worker cannot reach it due to missing CF credentials). Slack alert also failing (alert delivery broken).
- Produces:
~/system/evidence/lightrag-health-*.md,/tmp/lightrag-task-context.json - Verdict: Monitor itself is working correctly. The degradation it reports is real and severe.
com.alai.lightrag-keepwarm
- PID: - | Exit: 0 | Status: calendar_ok
- Plist location:
~/Library/LaunchAgents/com.alai.lightrag-keepwarm.plist - Schedule: unknown (plist content not captured in this audit — found late)
- Produces: Keepwarm pings to LightRAG
- Verdict: Last run exited 0. Likely the keepwarm pings succeed against the local endpoint even while drain-worker cannot auth through CF Access. Not broken.
com.alai.archive-first-scan
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 06:00
- Script:
~/bin/archive-first-scan.sh— EXISTS - Produces:
/tmp/archive-first-scan-report-<date>.txt, writes to~/system/state/archive-first-ledger.jsonl - Consumes: Filesystem scan of unarchived candidates
- Verdict: HEALTHY. Running as designed.
com.john.session-archiver
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00
- Script:
~/system/tools/session-archiver.js— EXISTS (10928 bytes, 2026-02-23) - Produces: Cleaned-up session artifacts
- Consumes: Claude session logs/state
- Verdict: HEALTHY. Last run clean.
com.alai.cost-daily-report
- PID: - | Exit: 127 | Status: BROKEN | Schedule: daily 23:55
- Root cause:
~/system/tools/cost-daily-report.shdeleted. Last successful run 2026-04-29. - Produces:
~/system/reports/cost-daily.md - Consumes: Cost tracker data
- Verdict: BROKEN — daily cost visibility dark for 10 days.
com.alai.weekly-planning
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: Tuesday 08:00
- Script:
~/system/tools/weekly-planning.sh— MISSING from disk - BUT watchdog says last exit was 0 and state is calendar_ok. Contradiction.
- Likely explanation: Ran successfully before script was deleted; launchd has not triggered it since (last Tuesday before deletion date). Will fail as exit 127 next Tuesday.
- Verdict: TICKING TIME BOMB — will fail next Tuesday 08:00.
no.alai.email-body-integrity
- PID: - | Exit: 0 | Status: calendar_ok | Schedule: daily 03:00
- Script:
~/system/tools/email-body-integrity-check.js— EXISTS - Produces:
~/system/logs/email-integrity.log - Verdict: HEALTHY.
5. Daemon-Fleet-Watchdog State
File: ~/system/state/daemon-fleet-status.json
Generated: 2026-05-09T18:33:52Z (approx 2h15m before this audit)
Watchdog summary from file:
total: 148
running: 47 (keepalive processes alive)
calendar_ok: 74 (last scheduled run exit 0)
down: 3 (down_exit_0: autocoder-ui, draft-sender, orchestrator-http)
err: 20 (non-zero exit codes)
Watchdog accuracy notes:
- Watchdog correctly identifies 20 erroring daemons but exit codes are internally translated (256 = bash exit 1; 32512 = bash exit 127).
- Watchdog does NOT cover all 168 launchctl rows — 4 daemons marked
not_loaded(lightrag-migrate-pump, lightrag-outbox-ingest, lightrag-watchdog, rdap-audit-quarterly). com.alai.mem0-servershowslast_exit: 15(SIGTERM of prior instance) butstate: running— correct, the current instance is healthy.com.john.slack-botshowsrunning/pid 18046butlast_exit: 256— launchd records last crash before current keepalive restart. Process is currently alive.
Open Questions
-
Pi-orchestrator mock mode: Why is
alai-config-mock.jsonbeing loaded instead of real config? Is the Planka/MC API intentionally offline, or is the config resolver broken? The orchestrator is spinning idle. -
LightRAG CF credentials: Vaultwarden ETIMEDOUT in
rag-drain-worker. Is/tmp/bw-sessionstale? Is Vaultwarden (vault.basicconsulting.no) reachable? This single broken auth is deadlocking the entire RAG ingest pipeline (946 items queued). -
B2 storage cap:
403 storage_cap_exceededon Backblaze B2. Is this a billing cap that needs to be raised in the B2 console? Litestream is still replicating but the nightly snapshot job fails. -
Five deleted scripts: Who deleted
pi-orch-health.sh,cost-daily-report.sh,daily-planning.sh,legal-docs-azure-sync.sh,mcp-health-check.sh? Were they intentionally removed (deprecated)? If deprecated, the plists should be unloaded. If accidental deletion, restore from backup. -
Duplicate alaiml-retrain plist: Plist exists in BOTH
system/config/launchagentsANDLibrary/LaunchAgents. Two crons would fire. Which is canonical? -
com.john.orchestrator-httpduplicate: Identical tocom.alai.orchestrator-bridge(same script, same port). orchestrator-http shows down_exit_0 because bridge already bound the port. Dead plist. -
LightRAG health-*.md circular feedback: The
lightrag-monitorevidence files are being watched byrag-fsevents-adapter, which tries to enqueue them into LightRAG — a monitoring artifact feeding back into the broken pipeline it monitors. -
Slack bot silent 300 min: No incoming Slack messages for 5h at audit time. Is anyone sending messages? Or is the Socket Mode token scope broken for receiving?
Highest-Leverage Fix Candidates (audit-level only)
Priority 1 — Unlocks entire RAG pipeline (946 items unblocked)
- Fix
rag-drain-workerCF Access credentials: ensure Vaultwarden item "LightRAG-CF-Access" exists and/tmp/bw-sessionis valid. One credential fix unblocks bookstack-adapter + mc-adapter + fsevents-adapter simultaneously.
Priority 2 — Restore cost visibility (10-day blind spot)
- Restore or recreate
~/system/tools/cost-daily-report.sh. Last output was 2026-04-29. CEO-visible reporting dark for 10 days.
Priority 3 — Fix orchestrator mock mode
- Determine why pi-orchestrator loads mock config. If Planka/MC API is down, restore it. If config resolver is broken, fix
alai-config.js. The orchestration kernel is running but doing nothing.
Priority 4 — Raise B2 storage cap
- B2 bucket
alai-studio-backuphas hit its cap. Nightly database snapshots are not landing. This is a billing action in the Backblaze console, not a code fix.
Priority 5 — Unload dead plists (5 scripts deleted)
com.alai.pi-orch-health,com.alai.cost-daily-report,com.alai.daily-planning,com.john.legal-docs-azure-sync,com.john.mcp-health-checkshould either have scripts restored or be unloaded from launchd.legal-docs-azure-syncandmcp-health-checkhaveKeepAlive.Crashed=truecreating infinite restart loops.
Priority 6 — Unload com.john.orchestrator-http duplicate plist
- Dead shadow of orchestrator-bridge. Causes confusion in watchdog counts.
Priority 7 — Restore weekly-planning.sh before next Tuesday
- Script missing but plist active. Will fail exit 127 at 08:00 next Tuesday.
Priority 8 — Fix phantom-link-detector.js missing script
com.alai.chain-phantom-detectorruns every 15min calling a script that does not exist. High-frequency failure (96 times/day).