Ops Agent Runbook
Last Verified: 2026-02-17 | Owner: John
Runbook: Ops Agent
Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)
What It Does
Autonomous operations agent that runs 24/7:
- MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
- Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
- Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
- Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
- Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
- Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
- Escalation — creates HIGH priority MC task + MM alert when it can't resolve
Status Check
# Is it running?
launchctl list | grep ops-agent
# Recent activity
tail -50 ~/system/logs/ops-agent.log
# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log
# State file
cat /tmp/ops-agent-state.json
# Stats
cat ~/system/agents/state/ops.json
Restart
# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Verify
launchctl list | grep ops-agent
Manual Run (Testing)
# Run one cycle manually
node ~/system/daemons/ops-agent.js
# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty
Troubleshooting
Ops agent not running
# Check if loaded
launchctl list | grep ops-agent
# Expected: "- 0 com.john.ops-agent"
# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist
Not processing messages
# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool
# Check MM connectivity
node ~/system/tools/mm.js status
# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool
# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping
Classification wrong (Ollama issues)
# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
"stream": false,
"options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
# If Ollama down, ops-agent falls back to keyword heuristics (still works)
Health check reporting false positives
# Run health check directly
node ~/system/tools/health-check.js
# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool
# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>
Auto-fix loop (service keeps restarting)
# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool
# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json
# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>
Planka card not created
# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
-H "Content-Type: application/json" \
-d '{"emailOrUsername":"john","password":"BasicAS2026!"}'
# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10
Dependencies
| Service | Required | Fallback |
|---|---|---|
| Mattermost (8065) | YES | Agent skips MM check cycle |
| Ollama (11434) | NO | Falls back to keyword classification |
| MC (mc.js) | YES | Tasks not created (error logged) |
| Planka (3100) | NO | Cards not created (task still created in MC) |
| HiveMind | NO | Intel not posted (ops still works) |
Configuration
Monitored MM Teams
Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad
Ignored Users (bots)
john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js
Billable Logic
basicteam = INTERNAL (not billable)wizard,rendrom,riad= BILLABLE (client teams)
Health Check Services
Defined in health-check.js:
- 8 Docker containers
- 6 HTTP endpoints
- 2 system checks (disk, memory)
- 4 LaunchAgent daemons
Files
| File | Purpose |
|---|---|
| ~/system/daemons/ops-agent.js | Main daemon code |
| ~/Library/LaunchAgents/com.john.ops-agent.plist | LaunchAgent config |
| ~/system/tools/health-check.js | Service health monitor |
| ~/system/tools/auto-fix.js | Automated recovery |
| ~/system/agents/identities/ops.md | Agent identity card |
| ~/system/agents/state/ops.json | Persistent state |
| /tmp/ops-agent-state.json | Runtime state (last check timestamp) |
| /tmp/mm-token.json | Cached MM auth token |
| /tmp/ops-fix-history.json | Auto-fix attempt tracking |
| ~/system/logs/ops-agent.log | Activity log |
| ~/system/logs/ops-agent-launchd.log | LaunchAgent stdout |
| ~/system/logs/ops-agent-launchd-error.log | LaunchAgent stderr |
Disaster Recovery
Complete reset
# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json
# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Note: First run will check messages from last 30 minutes only (default)
Rollback to mm-responder
# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist
# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)
Metrics
Check via MC:
node ~/system/tools/mc.js stats # Task creation stats
node ~/system/tools/mc.js list --owner ops # Tasks created by ops-agent
Check via state:
cat ~/system/agents/state/ops.json # Cumulative stats
cat /tmp/ops-agent-state.json # Current cycle stats
Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10
No comments to display
No comments to display