Infrastructure

Infrastructure runbooks: daemons, email, backups, monitoring

Email Agent Runbook

Email Agent Runbook

Service: Email Agent Daemon
Location: ~/system/daemons/email-agent.js
LaunchAgent: com.john.email-agent
Interval: Every 5 minutes (300s)
Last Updated: 2026-04-15


1. Architecture

What It Does

The Email Agent is a 24/7 daemon that:

Accounts Monitored

Account Key Email Address Bitwarden Vault Name
johnjohn@basicconsulting.noEmail - john@basicconsulting.no
infoinfo@basicconsulting.noEmail - info@basicconsulting.no
alaijohn@alai.noEmail - john@alai.no
alemalem@alai.noEmail - alem@alai.no
devdev@alai.noEmail - dev@alai.no
gmailalembasic@gmail.comEmail - alembasic@gmail.com

Classification Pipeline

  1. VIP Bypass: Emails from CEO/family → forced to ACTION/high, label: CEO FORWARD
  2. Quick Filter: Pattern-based detection for OWN emails and known SPAM
  3. Ollama Classification: Remaining emails sent to local llama3.1:8b model
  4. Circuit Breaker: Falls back to pattern heuristics if Ollama is down (3 failure threshold)

VIP Senders (CEO Bypass List)

Emails from these addresses bypass all filters and are always classified as ACTION/high with label CEO FORWARD:

Transport: Himalaya Adapter

The daemon uses ~/system/tools/himalaya-adapter.js, which wraps the Rust-based himalaya CLI (/opt/homebrew/bin/himalaya).

Config: ~/.config/himalaya/config.toml — all 6 accounts configured.


2. Credentials

Bitwarden Storage

All email accounts are stored in Bitwarden with vault item names following the pattern: Email - <address>.

Gmail Account (Special Configuration)

The Gmail account (alembasic@gmail.com) uses App Password authentication (not the regular Google account password).

Bitwarden Item: Email - alembasic@gmail.com
Custom Fields in Vault:

Himalaya Config

File: ~/.config/himalaya/config.toml

Contains 6 account blocks with IMAP/SMTP settings. Credentials are loaded from Bitwarden at runtime via mail-native.js.


3. How to Verify

Is the Daemon Running?

launchctl list | grep email-agent
# Expected output: PID + exit status 0
# Example: 12345  0  com.john.email-agent

Last Heartbeat (Should Be < 10 Minutes Ago)

cat ~/system/logs/email-agent-heartbeat.txt
# Shows timestamp of last successful run

Recent Activity Log

tail -20 ~/system/logs/email-agent-launchd.log
# Should show recent classification activity like:
# {"timestamp":"2026-04-15T13:49:06.450Z","service":"email-agent","level":"info","message":"Classifying via Ollama: ..."}

Pending Emails (Email Inbox Tool)

node ~/system/tools/email-inbox.js pending
# Lists emails waiting for classification or action

Daemon Status (Full Details)

launchctl print gui/$(id -u)/com.john.email-agent
# Shows full launchd status, last run time, exit codes

4. Troubleshooting

Problem: Daemon Dead (MODULE_NOT_FOUND Error)

Symptom:

tail -20 ~/system/logs/email-agent-launchd-error.log
# Shows: Error: Cannot find module '~/system/tools/himalaya-adapter'

Root Cause: The himalaya-adapter.js file was accidentally archived or deleted.

Fix:

  1. Verify the file exists: ls -lh ~/system/tools/himalaya-adapter.js
  2. If missing, restore from ~/system/tools/archive/ or Git history
  3. Restart the daemon:
    launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist
    launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist
    
  4. Verify restart: launchctl list | grep email-agent

Problem: Gmail "Unknown Account" Error

Symptom:

Error: Unknown account: gmail. Available: john, info, alai, alem, dev

Root Cause: The gmail key is missing from the VAULT_NAMES object in ~/system/tools/mail-native.js.

Fix:

  1. Open ~/system/tools/mail-native.js
  2. Locate the VAULT_NAMES object (around line 20)
  3. Add the gmail entry:
    const VAULT_NAMES = {
      john: 'Email - john@basicconsulting.no',
      info: 'Email - info@basicconsulting.no',
      alai: 'Email - john@alai.no',
      alem: 'Email - alem@alai.no',
      dev: 'Email - dev@alai.no',
      gmail: 'Email - alembasic@gmail.com'  // Add this line
    };
    
  4. Save and reload daemon

Problem: Gmail Hanging Daemon (High CPU/Memory)

Symptom:

Root Cause: Gmail IMAP fetch is hanging indefinitely, causing overlapping daemon instances.

Fix:

  1. Identify stuck process:
    ps aux | grep email-agent
    
  2. Kill the stuck process gracefully:
    kill -QUIT <PID>
    # Or if unresponsive:
    kill -9 <PID>
    
  3. Unload and reload daemon:
    launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist
    launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist
    

Problem: Vault Credentials Unavailable (Circuit Breaker Triggered)

Symptom:

Error: Bitwarden session not available
# Or: Circuit breaker OPEN for account: john

Root Cause: Bitwarden CLI session expired or /tmp/bw-session is empty.

Fix:

  1. Check session file:
    cat /tmp/bw-session
    # Should contain a session token string
    
  2. If empty, unlock Bitwarden and regenerate session:
    bw unlock --raw > /tmp/bw-session
    # Enter master password when prompted
    
  3. Verify session works:
    bw get item "Email - john@basicconsulting.no" --session $(cat /tmp/bw-session)
    
  4. Circuit breaker will reset automatically on next successful run (backoff resets after threshold period)

Problem: Alem's Emails Not Showing as ACTION

Symptom: Emails from CEO are classified as INFO or SPAM instead of ACTION/high.

Root Cause: VIP_SENDERS list is incomplete or outdated.

Fix:

  1. Open ~/system/daemons/email-agent.js
  2. Locate the VIP_SENDERS array (around line 92)
  3. Ensure all Alem's addresses are present:
    const VIP_SENDERS = [
      'alem@alai.no',
      'alem@basicconsulting.no',
      'alem.basic@gmail.com',
      'alembasic@gmail.com',
      'sibilabasic@gmail.com',
      'riadbasic007@gmail.com'
    ];
    
  4. Save and reload daemon

Problem: Ollama Circuit Breaker Open (Fallback Mode)

Symptom:

WARN: Ollama circuit breaker OPEN — using pattern heuristic

Root Cause: Ollama service is down or unresponsive (3+ consecutive failures).

Fix:

  1. Check Ollama service:
    curl http://localhost:11434/api/tags
    # Should return JSON list of models
    
  2. If unresponsive, restart Ollama:
    brew services restart ollama
    # Or manually:
    ollama serve
    
  3. Circuit breaker will auto-reset after backoff period (starts at 10s, max 5 minutes)
  4. Emails will still be processed using pattern-based heuristics during circuit breaker OPEN state

5. Gmail App Password Setup

If the Gmail App Password needs to be regenerated (e.g., after credential rotation or security incident):

  1. Go to https://myaccount.google.com/apppasswords (must be logged in as alembasic@gmail.com)
  2. Click Generate
  3. Select app: Mail
  4. Select device: Mac (or custom name like "IMAP Daemon")
  5. Copy the 16-character App Password (no spaces)
  6. Update Bitwarden:
    bw get item "Email - alembasic@gmail.com" --session $(cat /tmp/bw-session) | \
      jq '.login.password = "<NEW_APP_PASSWORD>"' | \
      bw encode | \
      bw edit item $(bw get item "Email - alembasic@gmail.com" --session $(cat /tmp/bw-session) | jq -r .id) --session $(cat /tmp/bw-session)
    
    Or update manually via Bitwarden web vault.
  7. Reload daemon:
    launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist
    launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist
    

6. Key Files and Locations

File Purpose
~/system/daemons/email-agent.jsMain daemon script
~/system/tools/mail-native.jsVAULT_NAMES map + credential loader
~/system/tools/himalaya-adapter.jsHimalaya CLI wrapper (IMAP/SMTP)
~/.config/himalaya/config.tomlHimalaya account configuration
~/Library/LaunchAgents/com.john.email-agent.plistLaunchAgent config (5-minute interval)
~/system/logs/email-agent-launchd.logDaemon stdout log
~/system/logs/email-agent-launchd-error.logDaemon stderr log
~/system/logs/email-agent-heartbeat.txtLast successful run timestamp
~/system/logs/email-triage-results.jsonlJSONL log of all classifications
/tmp/bw-sessionBitwarden CLI session token

7. Escalation

If the daemon is down for > 30 minutes and troubleshooting steps do not resolve:

  1. Check email-agent-launchd-error.log for stack traces
  2. Capture full logs:
    tail -100 ~/system/logs/email-agent-launchd.log > /tmp/email-agent-debug.log
    tail -100 ~/system/logs/email-agent-launchd-error.log >> /tmp/email-agent-debug.log
    launchctl print gui/$(id -u)/com.john.email-agent >> /tmp/email-agent-debug.log
    
  3. Slack alert to #ops:
    node ~/system/tools/slack.js send ops "@john Email Agent daemon DOWN for 30+ minutes. Logs: /tmp/email-agent-debug.log"
    
  4. Fallback: manually check inboxes via webmail until daemon is restored

Document Status: ✅ Production
Owner: John (primary agent)
Last Incident: 2026-02-25 — MODULE_NOT_FOUND (himalaya-adapter archived)
Last Review: 2026-04-15

Runbook: LightRAG ingest LaunchAgent fix (MC #10286)

Overview

This runbook documents the investigation and fix applied to three LightRAG-related LaunchAgents on the ALAI Mac Studio host in MC #10286. The fix was validated by Proveo (Angie Jones) with a PARTIAL verdict: 3 PASS, 1 PARTIAL (AC3), 1 FAIL (AC4 — same-day unverifiable). CF Access root cause is tracked separately in MC #10298.


1. Symptom — How to Detect This Failure

These signals indicate the com.alai.lightrag-outbox-ingest LaunchAgent is failing silently:


2. Root Cause

The primary failure was in com.alai.lightrag-outbox-ingest:

Workaround applied: Changed LIGHTRAG_URL to http://localhost:9621 in the plist. The CF Access token 302 root cause (why the local host receives a redirect instead of being authorized) is tracked in MC #10298 (priority: M).

The other two daemons were not functionally broken:


3. Fix Procedure

Preconditions: You have shell access to the Mac Studio host. LightRAG is running locally on port 9621.

Step 1: Verify current plist URL

grep -A1 "LIGHTRAG_URL" ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

If the value is https://lightrag.alai.no, proceed. If already http://localhost:9621, skip to Step 4.

Step 2: Edit the plist

# Open in editor — change the LIGHTRAG_URL string value:
# FROM: https://lightrag.alai.no
# TO:   http://localhost:9621
nano ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

The relevant section in the plist:

<key>LIGHTRAG_URL</key><string>http://localhost:9621</string>

Step 3: Unload all 3 lightrag plists

launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl unload ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 4: Reload all 3 lightrag plists

launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl load -w ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 5: Drain the outbox manually (if backlog exists)

node ~/system/tools/lightrag-outbox-ingest.js

The script is idempotent — it uses outbox-ingest.sqlite with correlation_id as PRIMARY KEY dedup gate. Running it multiple times is safe. Expected output when backlog is cleared: processed: 0, skipped: N, failed: 0.

Step 6: Kickstart the ingest daemon to verify immediate fire

launchctl kickstart -k gui/$(id -u)/com.alai.lightrag-outbox-ingest

Check the log immediately after:

tail -20 ~/system/logs/lightrag-outbox-ingest.log

Expected: A [ingest] DONE line with exit success.

Step 7: Confirm watchdog detects healthy state

bash ~/bin/daemon-fleet-watchdog.sh 2>&1 | grep lightrag

Expected: All 3 labels in calendar_ok or calendar_ok state. No calendar_err_* or not_loaded transitions.


4. Verification Commands

# 1. All 3 plists loaded with LastExitStatus=0
launchctl list | grep lightrag

# 2. Checkpoint DB row count (should match mc-task-outcomes.jsonl line count)
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT count(*) FROM processed"

# 3. Most recent ingest timestamp
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT MAX(ingested_at) FROM processed"

# 4. LightRAG pipeline health
curl http://localhost:9621/documents/pipeline_status

# 5. LightRAG document total count
curl http://localhost:9621/documents | jq .total

# 6. Outbox log last run summary
grep "DONE" ~/system/logs/lightrag-outbox-ingest.log | tail -5

# 7. Watchdog recent transitions for lightrag
grep lightrag ~/system/logs/daemon-fleet-watchdog.log | tail -20

5. Known Limitations


6. Watchdog Coverage

The daemon-fleet-watchdog at ~/bin/daemon-fleet-watchdog.sh covers all 3 LightRAG plists via its glob at line 39:

for plist in "$HOME"/Library/LaunchAgents/com.{alai,john}.*.plist

This glob automatically includes any new LightRAG LaunchAgents matching the pattern without code changes. The watchdog runs every 15 minutes via com.alai.daemon-fleet-watchdog.

Alert states to watch for:

Healthy state: calendar_ok (LastExitStatus=0, plist loaded)


MCTitleStatusNotes
#10286Fix LightRAG ingest LaunchAgents — drain 312 outbox + add watchdogDONE (PARTIAL verify)This fix. Delivered by Kelsey Hightower. Proveo: 3 PASS, 1 PARTIAL, 1 FAIL.
#10298CF Access service token 302 root cause investigationOPEN (priority: M)Why does https://lightrag.alai.no return 302 for local host? Should resolve the need for the localhost bypass.

Runbook: LightRAG ingest LaunchAgent fix (MC #10286)

Overview

This runbook documents the investigation and fix applied to three LightRAG-related LaunchAgents on the ALAI Mac Studio host in MC #10286. The fix was validated by Proveo (Angie Jones) with a PARTIAL verdict: 3 PASS, 1 PARTIAL (AC3), 1 FAIL (AC4 — same-day unverifiable). CF Access root cause is tracked separately in MC #10298.


1. Symptom — How to Detect This Failure

These signals indicate the com.alai.lightrag-outbox-ingest LaunchAgent is failing silently:


2. Root Cause

The primary failure was in com.alai.lightrag-outbox-ingest:

Workaround applied: Changed LIGHTRAG_URL to http://localhost:9621 in the plist. The CF Access token 302 root cause (why the local host receives a redirect instead of being authorized) is tracked in MC #10298 (priority: M).

The other two daemons were not functionally broken:


3. Fix Procedure

Preconditions: You have shell access to the Mac Studio host. LightRAG is running locally on port 9621.

Step 1: Verify current plist URL

grep -A1 "LIGHTRAG_URL" ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

If the value is https://lightrag.alai.no, proceed. If already http://localhost:9621, skip to Step 4.

Step 2: Edit the plist

nano ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

Change the LIGHTRAG_URL string value from https://lightrag.alai.no to http://localhost:9621. The correct plist line:

<key>LIGHTRAG_URL</key><string>http://localhost:9621</string>

Step 3: Unload all 3 lightrag plists

launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl unload ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 4: Reload all 3 lightrag plists

launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl load -w ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 5: Drain the outbox manually (if backlog exists)

node ~/system/tools/lightrag-outbox-ingest.js

The script is idempotent — it uses outbox-ingest.sqlite with correlation_id as PRIMARY KEY dedup gate. Running it multiple times is safe. Expected output when backlog is cleared: processed: 0, skipped: N, failed: 0.

Step 6: Kickstart the ingest daemon to verify immediate fire

launchctl kickstart -k gui/$(id -u)/com.alai.lightrag-outbox-ingest

Check the log immediately after:

tail -20 ~/system/logs/lightrag-outbox-ingest.log

Expected: A [ingest] DONE line with exit success.

Step 7: Confirm watchdog detects healthy state

bash ~/bin/daemon-fleet-watchdog.sh 2>&1 | grep lightrag

Expected: All 3 labels in calendar_ok state. No calendar_err_* or not_loaded transitions.


4. Verification Commands

# 1. All 3 plists loaded with LastExitStatus=0
launchctl list | grep lightrag

# 2. Checkpoint DB row count (should match mc-task-outcomes.jsonl line count)
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT count(*) FROM processed"

# 3. Most recent ingest timestamp
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT MAX(ingested_at) FROM processed"

# 4. LightRAG pipeline health
curl http://localhost:9621/documents/pipeline_status

# 5. LightRAG document total count
curl http://localhost:9621/documents | jq .total

# 6. Outbox log last run summary
grep "DONE" ~/system/logs/lightrag-outbox-ingest.log | tail -5

# 7. Watchdog recent transitions for lightrag
grep lightrag ~/system/logs/daemon-fleet-watchdog.log | tail -20

5. Known Limitations


6. Watchdog Coverage

The daemon-fleet-watchdog at ~/bin/daemon-fleet-watchdog.sh covers all 3 LightRAG plists via its glob at line 39:

for plist in "$HOME"/Library/LaunchAgents/com.{alai,john}.*.plist

This glob automatically includes any new LightRAG LaunchAgents matching the pattern without code changes. The watchdog runs every 15 minutes via com.alai.daemon-fleet-watchdog.

Alert states to watch for:

Healthy state: calendar_ok (LastExitStatus=0, plist loaded)


MCTitleStatusNotes
#10286 Fix LightRAG ingest LaunchAgents — drain 312 outbox + add watchdog DONE (PARTIAL verify) This fix. Delivered by Kelsey Hightower. Proveo: 3 PASS, 1 PARTIAL, 1 FAIL.
#10298 CF Access service token 302 root cause investigation OPEN (priority: M) Why does https://lightrag.alai.no return 302 for local host? Resolves the need for the localhost bypass.

Runbook: LightRAG ingest LaunchAgent fix (MC #10286)

Overview

This runbook documents the investigation and fix applied to three LightRAG-related LaunchAgents on the ALAI Mac Studio host in MC #10286. The fix was validated by Proveo (Angie Jones) with a PARTIAL verdict: 3 PASS, 1 PARTIAL (AC3), 1 FAIL (AC4 — same-day unverifiable). CF Access root cause is tracked separately in MC #10298.


1. Symptom — How to Detect This Failure

These signals indicate the com.alai.lightrag-outbox-ingest LaunchAgent is failing silently:


2. Root Cause

The primary failure was in com.alai.lightrag-outbox-ingest:

Workaround applied: Changed LIGHTRAG_URL to http://localhost:9621 in the plist. The CF Access token 302 root cause (why the local host receives a redirect instead of being authorized) is tracked in MC #10298 (priority: M).

The other two daemons were not functionally broken:


3. Fix Procedure

Preconditions: You have shell access to the Mac Studio host. LightRAG is running locally on port 9621.

Step 1: Verify current plist URL

grep -A1 "LIGHTRAG_URL" ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

If the value is https://lightrag.alai.no, proceed. If already http://localhost:9621, skip to Step 4.

Step 2: Edit the plist

nano ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist

Change the LIGHTRAG_URL string value from https://lightrag.alai.no to http://localhost:9621. The correct plist line:

<key>LIGHTRAG_URL</key><string>http://localhost:9621</string>

Step 3: Unload all 3 lightrag plists

launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl unload ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl unload ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 4: Reload all 3 lightrag plists

launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-outbox-ingest.plist
launchctl load -w ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
launchctl load -w ~/Library/LaunchAgents/com.john.lightrag-monitor.plist

Step 5: Drain the outbox manually (if backlog exists)

node ~/system/tools/lightrag-outbox-ingest.js

The script is idempotent — it uses outbox-ingest.sqlite with correlation_id as PRIMARY KEY dedup gate. Running it multiple times is safe. Expected output when backlog is cleared: processed: 0, skipped: N, failed: 0.

Step 6: Kickstart the ingest daemon to verify immediate fire

launchctl kickstart -k gui/$(id -u)/com.alai.lightrag-outbox-ingest

Check the log immediately after:

tail -20 ~/system/logs/lightrag-outbox-ingest.log

Expected: A [ingest] DONE line with exit success.

Step 7: Confirm watchdog detects healthy state

bash ~/bin/daemon-fleet-watchdog.sh 2>&1 | grep lightrag

Expected: All 3 labels in calendar_ok state. No calendar_err_* or not_loaded transitions.


4. Verification Commands

# 1. All 3 plists loaded with LastExitStatus=0
launchctl list | grep lightrag

# 2. Checkpoint DB row count (should match mc-task-outcomes.jsonl line count)
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT count(*) FROM processed"

# 3. Most recent ingest timestamp
sqlite3 ~/system/state/outbox-ingest.sqlite "SELECT MAX(ingested_at) FROM processed"

# 4. LightRAG pipeline health
curl http://localhost:9621/documents/pipeline_status

# 5. LightRAG document total count
curl http://localhost:9621/documents | jq .total

# 6. Outbox log last run summary
grep "DONE" ~/system/logs/lightrag-outbox-ingest.log | tail -5

# 7. Watchdog recent transitions for lightrag
grep lightrag ~/system/logs/daemon-fleet-watchdog.log | tail -20

5. Known Limitations


6. Watchdog Coverage

The daemon-fleet-watchdog at ~/bin/daemon-fleet-watchdog.sh covers all 3 LightRAG plists via its glob at line 39:

for plist in "$HOME"/Library/LaunchAgents/com.{alai,john}.*.plist

This glob automatically includes any new LightRAG LaunchAgents matching the pattern without code changes. The watchdog runs every 15 minutes via com.alai.daemon-fleet-watchdog.

Alert states to watch for:

Healthy state: calendar_ok (LastExitStatus=0, plist loaded)


MCTitleStatusNotes
#10286 Fix LightRAG ingest LaunchAgents — drain 312 outbox + add watchdog DONE (PARTIAL verify) This fix. Delivered by Kelsey Hightower. Proveo: 3 PASS, 1 PARTIAL, 1 FAIL.
#10298 CF Access service token 302 root cause investigation OPEN (priority: M) Why does https://lightrag.alai.no return 302 for local host? Resolves the need for the localhost bypass.