Skip to main content

Daemon Fleet — dr-sync & tldr-watch fix (MC #104330)

MC #104330 — fleet-watchdog alert resolution

Alert: [FLEET-WATCHDOG] 2026-06-25T06:36:57Z — CRITICAL: 2 daemons in failed state: com.john.dr-sync, com.john.tldr-watch

Root cause 1 — com.john.dr-sync (rsync exit 20)

The rsync exclude pattern *.bak does not match backup files named *.bak-<suffix>. An 18G stale backup mission-control.db.bak-pre-p2p-correction-20260529 (live db is 35M) was being rsynced to the mac-mini every 6h; the oversized transfer kept getting interrupted (exit 20), so the databases target failed (8/9 success) and the daemon exited non-zero.

Fix: ~/system/daemons/dr-sync.sh — added --exclude=*.bak-* and --exclude=*.bak[0-9]*.

Proof:

  • Directory-mode dry-run: 18G file NOT in transfer list; live .db files still sync.
  • launchd kickstart run: LastExitStatus = 0.
  • Log 2026-06-26 10:46:15: Total targets: 9 | Success: 9 | Failed: 0 | Duration: 17s (was 358s).

Root cause 2 — com.john.tldr-watch (exit 2)

Not a crash. tldr-watch is a health-monitor that exits 2 BY DESIGN when verdict=FAIL (script lines 119-122), and it owns its own alert path (#exec Slack + HiveMind intel). The fleet-watchdog only whitelisted exit 1/256, so tldr-watch's issue-found exit 2/512 was misclassified as a failed daemon.

Fix: ~/bin/daemon-fleet-watchdog.sh — added com.john.tldr-watch to EXIT1_NORMAL and extended allowed issue-found codes to 1/2/3 (+ launchd-encoded 256/512/768).

Proof: reclassification against live daemon-fleet-status.json → tldr-watch no longer critical.

End-to-end verification (L2+)

fleet-watchdog run 2026-06-26T08:46:40Z:

  • com.john.dr-sync: calendar_err_256 → calendar_ok
  • NO CRITICAL: N daemons in failed state line (present on every prior run)
  • err count 4 → 2

Follow-ups (separate, non-blocking)

  1. Disk hygiene: 18G stale backup still on disk (96% full / 42G free). Recommend CEO-approved deletion of mission-control.db.bak-pre-p2p-correction-20260529. Not deleted unilaterally (irreversible, not self-created).
  2. TLDR pipeline dormant: tldr-watch's FAIL is real — actionizer produces 0 insights/0 tasks daily, db counts static at 620,8,612,8 since ≥06-23. Decide: revive or retire tldr-briefing/actionizer.