Inventory: Daemon Fleet

AI Factory Daemon Fleet Audit — 2026-05-09

Auditor: kelsey-hightower
Timestamp: 2026-05-09T20:48 UTC
Source of truth: launchctl list + daemon-fleet-status.json (generated 2026-05-09T18:33:52Z) + plist reads + error log sampling
Fleet size (watchdog): 148 tracked entries | 47 running keepalive | 74 calendar_ok | 3 down | 20 erroring
Fleet size (launchctl live): 168 rows matching alai/john/no.alai pattern (includes daemons not in watchdog)


1. Live Exit-Code Matrix

Column key: PID (- = not running) | Last Exit | Plist location | KeepAlive policy | Schedule

1a. RUNNING (keepalive, PID alive, exit 0 or -15/SIGTERM)

Daemon PID Exit Plist Path KeepAlive Schedule
com.alai.agent-timeout-monitor 1163 0 system/daemons/launchagents always continuous
com.alai.cc-api-server 1183 0 system/daemons/launchagents always continuous
com.alai.credit-monitor 1223 0 system/daemons/launchagents always continuous
com.alai.idle-learning-daemon 1196 0 system/daemons/launchagents always continuous
com.alai.litestream 51452 0 Library/LaunchAgents always continuous
com.alai.mem0-server 65706 -15 (SIGTERM) Library/LaunchAgents always continuous
com.alai.mlx-gemma4 27321 0 (not in known dirs) always continuous
com.alai.mlx-qwen25-coder-32b 31120 0 (not in known dirs) always continuous
com.alai.mlx-qwen3-32b 29227 0 (not in known dirs) always continuous
com.alai.mlx-qwen3-8b 29488 0 (not in known dirs) always continuous
com.alai.ollama-serve-v2 29100 0 system/daemons/launchagents always continuous
com.alai.orchestrator-bridge 1185 0 system/daemons/launchagents always continuous
com.alai.ram-monitor 1241 0 system/daemons/launchagents always continuous
com.alai.task-router 1200 0 system/daemons/launchagents always continuous
com.alai.web-learning 1176 0 system/daemons/launchagents always continuous
com.john.bookstack-webhook-relay 1206 0 system/daemons/launchagents always continuous
com.john.browser-worker 1211 0 system/daemons/launchagents always continuous
com.john.caddy-vault 86082 0 system/daemons/launchagents always continuous
com.john.cloudflared 79617 0 system/daemons/launchagents always continuous
com.john.comms-agent 1186 0 system/daemons/launchagents always continuous
com.john.documenso-webhook 20561 0 system/daemons/launchagents always continuous
com.john.durable-executor 1212 0 system/daemons/launchagents always continuous
com.john.edita-loop 61758 0 system/daemons/launchagents always continuous
com.john.email-agent 92225 0 system/daemons/launchagents calendar calendar
com.john.email-tracker 11292 0 system/daemons/launchagents conditional conditional
com.john.event-dispatcher 65452 0 system/daemons/launchagents always continuous
com.john.health-dashboard 1189 0 system/daemons/launchagents always continuous
com.john.hook-daemon 1240 0 system/daemons/launchagents always continuous
com.john.intake-watcher 41929 0 system/daemons/launchagents always continuous
com.john.kenan-hot-web 1231 0 system/daemons/launchagents always continuous
com.john.llm-datasette 1170 0 system/daemons/launchagents always continuous
com.john.mc-dashboard 65673 0 system/daemons/launchagents always continuous
com.john.n8n 1203 0 system/daemons/launchagents always continuous
com.john.network-watchdog 1194 0 system/daemons/launchagents always continuous
com.john.ops-watchdog 8782 -15 (SIGTERM) system/daemons/launchagents always continuous
com.john.outbox-processor 1190 0 system/daemons/launchagents always continuous
com.john.paste-logger 1224 0 system/daemons/launchagents always continuous
com.john.pi-orchestrator 75750 0 system/daemons/launchagents always continuous
com.john.slack-bot 18046 1 (last crash exit) system/daemons/launchagents always continuous
com.john.tender-dashboard 1234 0 system/daemons/launchagents always continuous
com.john.tool-shed 1191 0 system/daemons/launchagents always continuous
com.john.vault-keeper 87005 0 system/daemons/launchagents always continuous
com.john.vault-proxy 1222 0 system/daemons/launchagents always continuous
com.john.youtube-nightly-learning 83439 0 system/daemons/launchagents always continuous
no.alai.claude-proxy 6361 0 Library/LaunchAgents always continuous
com.alai.rag-drain-worker 3640 1 (prev exit) system/config/launchagents always continuous
com.alai.rag-fsevents-adapter 64755 1 (prev exit) system/config/launchagents conditional WatchPaths
com.alai.daemon-fleet-watchdog 2815 0 (Library/LaunchAgents) calendar every 15min

1b. DOWN — Exit 0 (intentional one-shot or conditional)

Daemon PID Exit Notes
com.john.autocoder-ui - 0 down_exit_0: one-shot complete
com.john.draft-sender - 0 down_exit_0: conditional, no pending drafts
com.john.orchestrator-http - 0 down_exit_0: DUPLICATE — orchestrator-bridge runs same script on port 3052

1c. CALENDAR SCHEDULED — Exit 0 last run (healthy)

These fired successfully on last scheduled run. Not exhaustively listed — watchdog confirms 74 in this state.
Key members: com.alai.apply-knowledge, com.alai.archive-first-scan, com.alai.chain-weekly-report, com.alai.docker-watchdog, com.alai.gcloud-auth, com.alai.john-daily-digest, com.alai.lightrag-backup, com.alai.memory-watchdog, com.alai.meta-agent-loop, com.alai.restore-drill, com.alai.skill-audit, com.alai.team-sync, com.alai.wal-checkpoint, com.alai.weekly-planning, com.alai.zombie-cleanup, com.john.agentforge, com.john.bookstack-sync, com.john.calendar-bridge, com.john.critical-tools-healthcheck, com.john.daemon-health, com.john.db-archival-sweep, com.john.db-backup, com.john.domain-audit, com.john.drift-detector, com.john.email-briefing, com.john.forge-watchdog, com.john.log-rotate, com.john.mc-session-worker, com.john.morning-routine, com.john.offsite-backup, com.john.pi2-override-audit, com.john.review-drain, com.john.session-archiver, com.john.session-extractor, com.john.spam-recovery-scan, com.john.system-guardian, com.john.tldr-actionizer, com.john.tldr-briefing, com.john.tldr-watch, com.john.tldr-weekly-synthesis, com.john.weekly-synthesis, no.alai.email-body-integrity, no.alai.meta-agent, no.alai.resolver, no.alai.spend-guard.

1d. FAILING — Non-zero exit codes

Daemon PID Exit Code Plist Location KeepAlive Schedule
com.alai.azure-db-backup - 1 (exit 256 internal) system/config/launchagents none (RunAtLoad=false) every 4h
com.alai.blueprint-fleet-watchdog - 1 (exit 256) Library/LaunchAgents none daily 06:15
com.alai.cert-expiry-monitor - 1 (exit 256) system/config/launchagents none daily 07:00
com.alai.chain-daily-inbox - 1 (exit 256) Library/LaunchAgents none daily 07:00
com.alai.chain-e2e-nightly - 1 (exit 256) Library/LaunchAgents none daily 02:00
com.alai.chain-phantom-detector - 1 (exit 256) Library/LaunchAgents none every 15min
com.alai.cost-daily-report - 127 Library/LaunchAgents none daily 23:55
com.alai.daily-planning - 127 Library/LaunchAgents none daily 07:30
com.alai.filesystem-audit - 1 (exit 256) Library/LaunchAgents none Monday 08:00
com.alai.pi-orch-health - 127 Library/LaunchAgents none daily 23:00
com.alai.rag-bookstack-adapter - 1 (exit 256) system/config/launchagents none every 5min
com.alai.rag-drain-worker 3640 1 (prev exit, now running) system/config/launchagents always continuous
com.alai.rag-fsevents-adapter 64755 1 (prev exit, now running) system/config/launchagents conditional WatchPaths
com.alai.rag-mc-adapter - 1 (exit 256) system/config/launchagents none every 5min
com.alai.rdap-audit-quarterly - 2 Library/LaunchAgents none quarterly
com.john.alaiml-retrain - 1 (exit 256) system/config/launchagents + Library/LaunchAgents none 1st of month 03:00
com.john.auto-verify-regression - 1 (exit 256) system/daemons/launchagents none daily 06:00
com.john.b2-offsite-backup - 1 (exit 256) system/daemons/launchagents none daily 03:30
com.john.bookstack-staleness - 1 (exit 256) system/daemons/launchagents none Sunday 22:00
com.john.infra-drift-detector - 1 (exit 256) system/daemons/launchagents none Sunday 04:00
com.john.legal-docs-azure-sync - 127 Library/LaunchAgents Crashed=true daily 02:00
com.john.lightrag-monitor - 2 system/config/launchagents none daily 09:00
com.john.mcp-health-check - 127 Library/LaunchAgents Crashed=true every 1h
com.john.slack-bot 18046 1 (last crash) system/daemons/launchagents always continuous

1e. NOT LOADED (watchdog knows them, launchctl does not)

Daemon State
com.alai.lightrag-migrate-pump not_loaded
com.alai.lightrag-outbox-ingest not_loaded
com.alai.lightrag-watchdog not_loaded
com.john.rdap-audit-quarterly not_loaded

2. Failure Cohort — Root Cause Analysis

EXIT 127 — Script/binary not found (BROKEN — script deleted)

These five daemons have plists in Library/LaunchAgents pointing to scripts that no longer exist on disk. Exit 127 is bash's "command not found" — the script path itself is gone.

Daemon Missing Script Last Successful Run Category
com.alai.pi-orch-health ~/system/tools/pi-orch-health.sh 2026-05-06 (verdict: CRITICAL) BROKEN
com.alai.cost-daily-report ~/system/tools/cost-daily-report.sh 2026-04-29 BROKEN
com.alai.daily-planning ~/system/tools/daily-planning.sh unknown BROKEN
com.john.legal-docs-azure-sync ~/system/daemons/legal-docs-azure-sync.sh unknown BROKEN
com.john.mcp-health-check ~/system/tools/mcp-health-check.sh unknown BROKEN

EXIT 1 / 256 — Script exists but fails at runtime (BROKEN — dependency missing)

Daemon Script Root Cause Category
com.alai.rag-bookstack-adapter rag-bookstack-adapter.js Queue depth 946 > 500 backpressure gate — never drains because drain-worker cannot reach LightRAG BROKEN (cascade)
com.alai.rag-drain-worker rag-drain-worker.js Vaultwarden ETIMEDOUT → CF credentials unavailable → LightRAG unreachable BROKEN
com.alai.rag-mc-adapter rag-mc-adapter.js Same backpressure cascade, queue depth 946 BROKEN (cascade)
com.alai.rag-fsevents-adapter rag-fsevents-adapter.js Queue depth >500 backpressure, runs but skips all enqueues BROKEN (cascade)
com.alai.azure-db-backup azure-db-backup.sh az storage blob upload SIGTERM'd (line 116); temp dirs leaked in /tmp TRANSIENT
com.alai.cert-expiry-monitor cert-expiry-monitor.sh Script exists, no error log found — likely network/curl failure TRANSIENT
com.alai.chain-daily-inbox chain-runner.sh --enqueue daily-inbox-triage chain-runner.sh exists; failure likely in downstream chain execution TRANSIENT
com.alai.chain-e2e-nightly chain-e2e-nightly.sh Script exists; likely Playwright/network dependency failure TRANSIENT
com.alai.chain-phantom-detector phantom-link-detector.js Script does NOT exist on disk — MISSING BROKEN
com.alai.filesystem-audit ~/bin/anvil-audit.sh Script exists; last exit 256 may be diff/rename limit warning elevated to exit TRANSIENT
com.alai.blueprint-fleet-watchdog ~/system/daemons/blueprint-fleet-watchdog.js Script exists; likely a missing dep or API auth failure TRANSIENT
com.john.alaiml-retrain ~/ALAI/internal/projects/alaiML/scripts/retrain.sh Script exists; DUPLICATE plist (both config and Library/LaunchAgents); likely venv path or MC dep failure BROKEN (duplicate)
com.john.auto-verify-regression auto-verify-regression.js Script exists; calls claim-verifier.js — probable missing dep or API failure TRANSIENT
com.john.b2-offsite-backup b2-offsite-backup.sh B2 storage cap EXCEEDED (403 storage_cap_exceeded) and auth token limit errors BROKEN (infra)
com.john.bookstack-staleness bookstack-staleness.js API parse error "Unexpected end of JSON input" on page 2553+ — BookStack API truncating responses BROKEN
com.john.infra-drift-detector infra-drift-detector.sh diff.renameLimit warning elevated to non-zero exit; git rename detection failing on large repos TRANSIENT
com.john.slack-bot (node process) WebSocket pong timeouts (ETIMEDOUT); process alive and heartbeating, but launchd saw a crash exit TRANSIENT

EXIT 2 — Logic/health failure

Daemon Script Root Cause Category
com.alai.rdap-audit-quarterly plist not found in known dirs Script path unknown, likely MISSING BROKEN
com.john.lightrag-monitor lightrag-health-with-alert.sh Script exits 1/2 when LightRAG is degraded — this is INTENTIONAL ALERTING behavior, but LightRAG IS degraded EXPECTED (alarm correctly firing)

3. Producer-Consumer Wiring

RAG Ingest Pipeline (currently DEADLOCKED)

com.alai.rag-fsevents-adapter   watches ~/system/evidence, ~/system/specs, ~/system/rules
com.alai.rag-bookstack-adapter  polls BookStack API every 5min
com.alai.rag-mc-adapter         reads ~/system/logs/mc-task-outcomes.jsonl
  --> all three WRITE to ~/system/state/ingest-queue.sqlite (queue depth: 946, frozen)

com.alai.rag-drain-worker (keepalive) reads ingest-queue.sqlite
  --> attempts POST to https://lightrag.basicconsulting.no (via CF Access)
  --> CF credentials lookup: Vaultwarden ETIMEDOUT (bw-session stale or vault unreachable)
  --> LightRAG unreachable → queue never drains → backpressure locks all three producers

ORPHAN OUTPUT: ~/system/metrics/ingest_pipeline.prom written by rag-drain-worker
  --> nothing confirmed reading this file (no Prometheus scrape config found in audit)

This is the single most critical broken pipeline in the factory. 946 items queued, zero being processed.

Memory / Knowledge Layer

com.alai.mem0-server (PID 65706, keepalive)
  reads/writes: http://localhost:6333 (Qdrant vector store)
  produces: REST API on localhost:9000 (port cslistener)
  consumed by: discover.js, agent tools calling /v1/memories
  STATUS: alive and healthy (health 200, Qdrant 200)
  NOTE: exit -15 (SIGTERM) in launchctl = prior graceful restart; current run is clean

com.alai.litestream (PID 51452, keepalive)
  reads: SQLite DBs in ~/system/state/ (flywheel.db, health-events.db, etc.)
  writes: B2 bucket alai-studio-backup (replication stream)
  STATUS: running but b2-offsite-backup.sh (separate) hitting B2 storage cap

com.alai.wal-checkpoint (calendar, exit 0)
  reads/writes: SQLite WAL files in ~/system/state/
  consumed by: litestream (clean WAL = cleaner replication)

Orchestration Kernel

com.john.pi-orchestrator (PID 75750, keepalive)
  reads: Planka MC API (boards.basicconsulting.no per mock config)
  writes: ~/system/logs/pi-orchestrator/daemon-*.log
  STATUS: running, cycling every 30s, "No eligible tasks" — running in MOCK MODE
  NOTE: alai-config-mock.json loaded; real config resolver likely not resolving

com.alai.orchestrator-bridge (PID 1185, keepalive)
  runs: orchestrator-http-server.js on port 3052
  produces: HTTP API for triggering orchestrator actions
  STATUS: running healthy

com.john.orchestrator-http (down_exit_0)
  DUPLICATE of orchestrator-bridge — same script, same port (3052)
  Watchdog says down_exit_0: port already bound by bridge when this tried to start
  ORPHAN: plist in Library/LaunchAgents, shadow of orchestrator-bridge

Backup Layer

com.john.b2-offsite-backup (calendar, exit 1)
  reads: ~/system/state/ SQLite snapshots
  writes: B2 bucket alai-studio-backup
  STATUS: BLOCKED — B2 storage cap exceeded (403)

com.alai.azure-db-backup (calendar, exit 1)
  reads: Azure SQL databases (via az CLI)
  writes: ~/system/daemons/azure-db-backup.sh → Azure Blob Storage
  STATUS: TRANSIENT failures, az upload SIGTERM'd (timeout in script or process kill)
  ORPHAN TEMP: /tmp/az-backup-* directories leaking (rm fails on non-empty dirs)

Comms / Slack

com.john.slack-bot (PID 18046, keepalive)
  reads: Slack WebSocket (socket-mode)
  writes: Slack messages, ~/system/logs/slack-bot.log
  STATUS: alive, heartbeating, WebSocket reconnects successfully (~once per session)
  CONCERN: 300min silent (no incoming Slack messages received in 5h as of audit time)

no.alai.email-body-integrity (calendar, exit 0)
  reads: IMAP one.com (email body verification)
  writes: ~/system/logs/email-integrity.log
  STATUS: healthy last run

Monitoring / Health

com.john.lightrag-monitor (calendar, exit 2)
  reads: LightRAG API health endpoint
  writes: /tmp/lightrag-task-context.json, ~/system/evidence/lightrag-health-*.md
  STATUS: correctly reporting LightRAG as degraded; Slack alert delivery ALSO failing
  ORPHAN OUTPUT: lightrag-health-*.md files accumulating in ~/system/evidence/
    (rag-fsevents-adapter trying to enqueue these — but queue full — circular feedback)

com.alai.daemon-fleet-watchdog (PID 2815, every 15min)
  reads: launchctl list, all plist dirs
  writes: ~/system/state/daemon-fleet-status.json
  STATUS: healthy, data current as of 18:33:52Z today

com.alai.pi-orch-health (calendar, exit 127)
  was: reads pi-orchestrator state, writes ~/system/state/pi-orch-health-*.json
  STATUS: BROKEN — script deleted. Last known verdict (2026-05-06): CRITICAL

MLX / Inference Layer

com.alai.mlx-gemma4 (PID 27321)
com.alai.mlx-qwen3-32b (PID 29227)
com.alai.mlx-qwen3-8b (PID 29488)
com.alai.mlx-qwen25-coder-32b (PID 31120)
com.alai.ollama-serve-v2 (PID 29100)
  STATUS: all running (keepalive), exit 0
  PRODUCES: inference endpoints on ANVIL (local)
  Note: plists not found in audited dirs — loaded from unknown location (possibly ~/Library/LaunchAgents subdirs)

4. Critical-Path Daemon Assessment

com.john.pi-orchestrator

com.alai.pi-orch-health

com.alai.mem0-server

com.john.lightrag-monitor

com.alai.lightrag-keepwarm

com.alai.archive-first-scan

com.john.session-archiver

com.alai.cost-daily-report

com.alai.weekly-planning

no.alai.email-body-integrity


5. Daemon-Fleet-Watchdog State

File: ~/system/state/daemon-fleet-status.json
Generated: 2026-05-09T18:33:52Z (approx 2h15m before this audit)

Watchdog summary from file:

total:       148
running:      47 (keepalive processes alive)
calendar_ok:  74 (last scheduled run exit 0)
down:          3 (down_exit_0: autocoder-ui, draft-sender, orchestrator-http)
err:          20 (non-zero exit codes)

Watchdog accuracy notes:


Open Questions

  1. Pi-orchestrator mock mode: Why is alai-config-mock.json being loaded instead of real config? Is the Planka/MC API intentionally offline, or is the config resolver broken? The orchestrator is spinning idle.

  2. LightRAG CF credentials: Vaultwarden ETIMEDOUT in rag-drain-worker. Is /tmp/bw-session stale? Is Vaultwarden (vault.basicconsulting.no) reachable? This single broken auth is deadlocking the entire RAG ingest pipeline (946 items queued).

  3. B2 storage cap: 403 storage_cap_exceeded on Backblaze B2. Is this a billing cap that needs to be raised in the B2 console? Litestream is still replicating but the nightly snapshot job fails.

  4. Five deleted scripts: Who deleted pi-orch-health.sh, cost-daily-report.sh, daily-planning.sh, legal-docs-azure-sync.sh, mcp-health-check.sh? Were they intentionally removed (deprecated)? If deprecated, the plists should be unloaded. If accidental deletion, restore from backup.

  5. Duplicate alaiml-retrain plist: Plist exists in BOTH system/config/launchagents AND Library/LaunchAgents. Two crons would fire. Which is canonical?

  6. com.john.orchestrator-http duplicate: Identical to com.alai.orchestrator-bridge (same script, same port). orchestrator-http shows down_exit_0 because bridge already bound the port. Dead plist.

  7. LightRAG health-*.md circular feedback: The lightrag-monitor evidence files are being watched by rag-fsevents-adapter, which tries to enqueue them into LightRAG — a monitoring artifact feeding back into the broken pipeline it monitors.

  8. Slack bot silent 300 min: No incoming Slack messages for 5h at audit time. Is anyone sending messages? Or is the Socket Mode token scope broken for receiving?


Highest-Leverage Fix Candidates (audit-level only)

Priority 1 — Unlocks entire RAG pipeline (946 items unblocked)

Priority 2 — Restore cost visibility (10-day blind spot)

Priority 3 — Fix orchestrator mock mode

Priority 4 — Raise B2 storage cap

Priority 5 — Unload dead plists (5 scripts deleted)

Priority 6 — Unload com.john.orchestrator-http duplicate plist

Priority 7 — Restore weekly-planning.sh before next Tuesday

Priority 8 — Fix phantom-link-detector.js missing script


Revision #2
Created 2026-05-09 19:44:20 UTC by John
Updated 2026-06-14 20:02:55 UTC by John