Runbook: LightRAG ingest LaunchAgent fix (MC #10286)

Overview

This runbook documents the investigation and fix applied to three LightRAG-related LaunchAgents on the ALAI Mac Studio host in MC #10286. The fix was validated by Proveo (Angie Jones) with a PARTIAL verdict: 3 PASS, 1 PARTIAL (AC3), 1 FAIL (AC4 — same-day unverifiable). CF Access root cause is tracked separately in MC #10298.


1. Symptom — How to Detect This Failure

These signals indicate the com.alai.lightrag-outbox-ingest LaunchAgent is failing silently:


2. Root Cause

The primary failure was in com.alai.lightrag-outbox-ingest:

Workaround applied: Changed LIGHTRAG_URL to http://localhost:9621 in the plist. The CF Access token 302 root cause (why the local host receives a redirect instead of being authorized) is tracked in MC #10298 (priority: M).

The other two daemons were not functionally broken:


3. Fix Procedure

Preconditions: You have shell access to the Mac Studio host. LightRAG is running locally on port 9621.

Step 1: Verify current plist URL

grep -A1 "LIGHTRAG_URL" ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

If the value is https://lightrag.alai.no, proceed. If already http://localhost:9621, skip to Step 4.

Step 2: Edit the plist

nano ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

Change the LIGHTRAG_URL string value from https://lightrag.alai.no to http://localhost:9621. The correct plist line:

<key>LIGHTRAG_URL</key><string>http://localhost:9621</string>

Step 3: Unload all 3 lightrag plists

launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl unload ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 4: Reload all 3 lightrag plists

launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl load -w ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 5: Drain the outbox manually (if backlog exists)

node ~/system/tools/lightrag-outbox-ingest.js

The script is idempotent — it uses outbox-ingest.sqlite with correlation_id as PRIMARY KEY dedup gate. Running it multiple times is safe. Expected output when backlog is cleared: processed: 0, skipped: N, failed: 0.

Step 6: Kickstart the ingest daemon to verify immediate fire

launchctl kickstart -k gui/$(id -u)/com.alai.lightrag-outbox-ingest

Check the log immediately after:

tail -20 ~/system/logs/lightrag-outbox-ingest.log

Expected: A [ingest] DONE line with exit success.

Step 7: Confirm watchdog detects healthy state

bash ~/bin/daemon-fleet-watchdog.sh 2>&1 | grep lightrag

Expected: All 3 labels in calendar_ok state. No calendar_err_* or not_loaded transitions.


4. Verification Commands

# 1. All 3 plists loaded with LastExitStatus=0
launchctl list | grep lightrag

# 2. Checkpoint DB row count (should match mc-task-outcomes.jsonl line count)
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT count(*) FROM processed"

# 3. Most recent ingest timestamp
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT MAX(ingested_at) FROM processed"

# 4. LightRAG pipeline health
curl http://localhost:9621/documents/pipeline_status

# 5. LightRAG document total count
curl http://localhost:9621/documents | jq .total

# 6. Outbox log last run summary
grep "DONE" ~/system/logs/lightrag-outbox-ingest.log | tail -5

# 7. Watchdog recent transitions for lightrag
grep lightrag ~/system/logs/daemon-fleet-watchdog.log | tail -20

5. Known Limitations


6. Watchdog Coverage

The daemon-fleet-watchdog at ~/bin/daemon-fleet-watchdog.sh covers all 3 LightRAG plists via its glob at line 39:

for plist in "$HOME"/Library/LaunchAgents/com.{alai,john}.*.plist

This glob automatically includes any new LightRAG LaunchAgents matching the pattern without code changes. The watchdog runs every 15 minutes via com.alai.daemon-fleet-watchdog.

Alert states to watch for:

Healthy state: calendar_ok (LastExitStatus=0, plist loaded)


MCTitleStatusNotes
#10286 Fix LightRAG ingest LaunchAgents — drain 312 outbox + add watchdog DONE (PARTIAL verify) This fix. Delivered by Kelsey Hightower. Proveo: 3 PASS, 1 PARTIAL, 1 FAIL.
#10298 CF Access service token 302 root cause investigation OPEN (priority: M) Why does https://lightrag.alai.no return 302 for local host? Resolves the need for the localhost bypass.


Revision #2
Created 2026-04-30 13:31:46 UTC by John
Updated 2026-05-31 20:07:03 UTC by John