Operations
Runbooks, cold start procedures, service registry, monitoring.
- Overview
- BookStack Runbook
- BookStack MFA Setup
- CEO Dashboard Runbook
- Infrastructure Runbook
- Mission Control Dashboard
- Planka Runbook
- Ops Agent Runbook
- Service Registry
- Ops Agent
- Daemons & Services
- Go-Live Runbook
- Operational Runbook
- Incident Report
- Post-Mortem
- SLA Report
- Terminal & Tmux Shortcuts
- Baikal CalDAV Runbook
- ALAI Infrastructure Map & Ops Runbooks
- ALAI Infrastructure Map & Ops Runbooks
- System Map — Infrastructure & Services
- ALAI Domain Migration — basicconsulting.no → alai.no
- AWS CLI Setup — john-deploy IAM
- Slack alaiops Bot — Backend Architecture
- Documenso Self-Hosted — sign.basicconsulting.no
- Azure Blob Offsite Backup Setup
- ANVIL Memory Troubleshooting — Mac Studio
- Email Pipeline + Edita PA — Runbook
- Email Pipeline + Edita PA — Runbook
- Ollama Fleet Architecture
- Static Hosting Migration — Progress Log
- ANVIL DR Bootstrap Runbook (Mac Air)
- Incident — 2026-04-21 alai.no Contact Form Failure
- Incident Postmortem — Bilko Deploy Fix 2026-04-22
- pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile
- Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)
- Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887
- ALAI Mail Topology — Migadu Domains, Mailbox Inventory, John's 19-Account Ingest Loop (2026-06-08)
Overview
Operations Overview
Runbooks, cold start procedures, service registry, and monitoring documentation.
Owner: John Last Verified: 2026-02-17
Contents
To be populated from ~/system/ops/
BookStack Runbook
Last Verified: 2026-02-17 | Owner: John
Runbook: BookStack
Service Type: Wiki / Knowledge Base Container: bookstack (lscr.io/linuxserver/bookstack:latest) Ports: 6875 (external) → 80 (internal) Internal URL: http://localhost:6875 External URL: http://192.168.68.61:6875 (LAN only, no Cloudflare tunnel yet) Database: MariaDB (bookstack_db) Compose File: ~/system/services/bookstack/docker-compose.yml
Service Info
BookStack is the documentation wiki for BasicAS Group. Stores runbooks, system docs, org info.
Stack:
- bookstack - Main app (LinuxServer.io build)
- bookstack_db - MariaDB (LinuxServer.io build)
Access:
- Admin URL: http://localhost:6875 or http://192.168.68.61:6875
- Admin Email: admin@admin.com
- Admin Password: password
- WARNING: Default admin credentials! Change immediately after first login.
API:
- Token ID: jpipe2-c33b96497a61ca91
- Token Secret: 100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4
- Config: ~/system/config/bookstack.json
- Sync Tool: node ~/system/tools/bookstack-sync.js sync
Status Check
Container Health
docker ps | grep bookstack
Expected output:
bookstack Up X hours
bookstack_db Up X hours
HTTP Check
curl -I http://localhost:6875
Expected: 200 OK or 302 Found
API Check
curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/docs.json | head -5
Expected: JSON response with API docs.
Database Check
docker exec bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp -e "SELECT count(*) FROM pages;"
Restart Procedure
Quick Restart (Container Only)
docker restart bookstack
Full Stack Restart (Container + Database)
cd ~/system/services/bookstack
docker compose down
docker compose up -d
Wait 30 seconds, then verify:
docker ps | grep bookstack
curl -I http://localhost:6875
Sync System Docs to BookStack
BookStack is auto-populated from ~/system/ using the sync tool.
Sync All Mapped Content
node ~/system/tools/bookstack-sync.js sync
Sync Single File
node ~/system/tools/bookstack-sync.js sync ~/system/rules/development.md
Check Sync Status
node ~/system/tools/bookstack-sync.js status
Force Overwrite All
node ~/system/tools/bookstack-sync.js push
Mapping File: ~/system/config/bookstack-sync-map.json State File: ~/system/config/bookstack-sync-state.json
Troubleshooting
Problem: Container won't start
Check logs:
docker logs bookstack --tail 100
Common causes:
- Database not ready - wait 30s and retry
- Port 6875 already bound - check
lsof -i :6875 - Volume permission issues - check ~/system/services/bookstack/data/
Fix:
cd ~/system/services/bookstack
docker compose down
docker compose up -d bookstack_db
sleep 30
docker compose up -d bookstack
Problem: Can't login (wrong password)
Check if admin credentials were changed in UI:
- Default: admin@admin.com / password
- If changed, use new credentials or reset via database
Reset admin password:
docker exec -it bookstack php /app/www/artisan bookstack:create-admin --email=admin@admin.com --name=Admin --password=newpassword
Problem: API returns 401 Unauthorized
Check token exists:
cat ~/system/config/bookstack.json
Regenerate token in UI:
- Login to BookStack
- Go to Settings → API Tokens
- Create new token
- Update ~/system/config/bookstack.json
Problem: Sync tool fails (500 error)
Check BookStack is running:
curl -I http://localhost:6875
Check API endpoint:
curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves | head -20
Check logs:
docker logs bookstack --tail 100
Problem: Database connection issues
Check database health:
docker exec bookstack_db mysqladmin -u bookstack -pB4s1cAS_w1k1_2026! ping
Expected: mysqld is alive
Check connection settings:
docker exec bookstack env | grep DB_
Expected:
DB_HOST=bookstack_db
DB_PORT=3306
DB_USERNAME=bookstack
DB_PASSWORD=B4s1cAS_w1k1_2026!
DB_DATABASE=bookstackapp
API Usage
List Shelves
curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves
List Books
curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/books
List Pages
curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/pages
Create Page
curl -X POST -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" \
-H "Content-Type: application/json" \
-d '{"book_id":1,"name":"Page Title","markdown":"# Content"}' \
http://localhost:6875/api/pages
Full API docs: http://localhost:6875/api/docs
Dependencies
- Docker - Service runtime
- No external dependencies - LAN-only access
Backup
Database Dump
docker exec bookstack_db mysqldump -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d-%H%M%S).sql.gz
Data Volumes (includes uploads, images)
cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d-%H%M%S).tar.gz data/
Restore from Backup
# Stop service
cd ~/system/services/bookstack
docker compose down
# Restore database
gunzip -c ~/backups/bookstack-YYYYMMDD-HHMMSS.sql.gz | docker exec -i bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp
# Restore data (if needed)
cd ~/system/services/bookstack
tar -xzf ~/backups/bookstack-data-YYYYMMDD-HHMMSS.tar.gz
# Start service
docker compose up -d
Configuration
Key Environment Variables
APP_URL- Public URL (http://192.168.68.61:6875)APP_KEY- Laravel encryption key (base64-encoded)DB_HOST- Database host (bookstack_db)DB_USERNAME- Database user (bookstack)DB_PASSWORD- Database passwordDB_DATABASE- Database name (bookstackapp)QUEUE_CONNECTION- Job queue driver (database)PUID/PGID- User/group IDs (1000/1000)TZ- Timezone (Europe/Sarajevo)
Full config: ~/system/services/bookstack/docker-compose.yml
Application Settings (via UI)
- Access: Settings (gear icon, top-right)
- Customize: Branding, registration, auth, permissions
Content Structure
BookStack organizes content as:
Shelf (top-level category)
└─ Book (collection of pages)
└─ Page (markdown document)
└─ Chapter (optional grouping)
Current structure (as of 2026-02-10):
- 2 shelves (BasicAS System, Organization)
- 15 books (System Architecture, Operations, Runbooks, etc.)
- 43 pages (GOTCHA framework, rules, agent docs, runbooks, etc.)
Notes
- Admin password: Default is
password- MUST be changed! - External access: LAN-only (no Cloudflare tunnel) - consider adding tunnel for remote access
- API token: Stored in plaintext in config file - secure via file permissions (chmod 600)
- Sync tool: Auto-updates BookStack from ~/system/ markdown files
- Timezone: Europe/Sarajevo (BiH time)
- LinuxServer.io build: Community-maintained, not official BookStack image
Last updated: 2026-02-10 Maintained by: John (AI Director)
BookStack MFA Setup
Last Verified: 2026-02-17 | Owner: John
BookStack MFA and API Token Setup
Service: BookStack Knowledge Base URL: http://localhost:6875 or http://192.168.68.61:6875
Overview
This runbook covers:
- Setting up Multi-Factor Authentication (MFA) for admin accounts
- Creating new API tokens after admin account changes
- Security best practices
Prerequisites
- BookStack is running and accessible
- Admin account: john@alai.no (password: BkStk_J0hn_2026!Secure)
- Browser access to BookStack web interface
Part 1: Enable MFA (Multi-Factor Authentication)
Step 1: Login as Admin
Step 2: Access Account Settings
- Click on your profile icon (top-right corner)
- Select "Edit Profile" or "My Account"
Step 3: Enable MFA
-
Scroll to "Multi-Factor Authentication" section
-
Click "Setup MFA"
-
Choose method:
- TOTP (Recommended): Time-based One-Time Password (Google Authenticator, Authy, etc.)
- Backup Codes: Generate backup recovery codes
-
For TOTP setup:
- Scan QR code with authenticator app
- Enter 6-digit verification code
- Save backup codes in secure location (~/system/config/bookstack-mfa-backup.txt)
-
Click "Confirm" to enable MFA
Step 4: Test MFA
- Log out
- Log back in with same credentials
- Verify you're prompted for MFA code
- Enter code from authenticator app
- Successful login confirms MFA is working
Part 2: Create New API Token
The old API token was invalidated when the default admin@admin.com account was deleted. You need to create a new token for the john@alai.no account.
Step 1: Navigate to API Settings
- Login to BookStack as john@alai.no
- Click profile icon (top-right)
- Select "Edit Profile" or "My Account"
- Click on "API Tokens" tab
Step 2: Create Token
- Click "Create Token"
- Enter token details:
- Name: System Integration Token
- Expiry: Never (or set appropriate expiry)
- Click "Save"
Step 3: Copy Token Credentials
IMPORTANT: Token secret is only shown once!
You will see:
- Token ID: (example: jpipe2-abc123xyz)
- Token Secret: (long hexadecimal string)
Copy both values immediately.
Step 4: Update Config File
Update ~/system/config/bookstack.json with new token:
# Edit the config file
nano ~/system/config/bookstack.json
Replace token_id and token_secret with new values:
{
"url": "http://localhost:6875",
"external_url": "http://192.168.68.61:6875",
"token_id": "YOUR_NEW_TOKEN_ID",
"token_secret": "YOUR_NEW_TOKEN_SECRET",
"admin_email": "john@alai.no",
"admin_password": "BkStk_J0hn_2026!Secure",
"alem_email": "alem@basicconsulting.no",
"alem_password": "V4YawdA13PdsRBIOtFz9"
}
Save the file (Ctrl+O, Enter, Ctrl+X in nano).
Step 5: Test API Token
# Read token from config
TOKEN_ID=$(cat ~/system/config/bookstack.json | grep token_id | cut -d'"' -f4)
TOKEN_SECRET=$(cat ~/system/config/bookstack.json | grep token_secret | cut -d'"' -f4)
# Test API call
curl -s -H "Authorization: Token $TOKEN_ID:$TOKEN_SECRET" http://localhost:6875/api/shelves
Expected: JSON response with list of shelves.
If you see {"error":{"message":"No matching API token was found"...}}, the token is incorrect.
Part 3: Additional Security Measures
Disable Guest Access (Optional)
If you want to require authentication for all access:
-
Edit docker-compose.yml:
cd ~/system/services/bookstack nano docker-compose.yml -
Change:
- ALLOW_GUEST_ACCESS=trueto:
- ALLOW_GUEST_ACCESS=false -
Restart BookStack:
docker compose restart bookstack
Review User Permissions
- Login as admin
- Go to Settings (gear icon) → Users
- Review all user accounts
- Set appropriate roles (Admin, Editor, Viewer)
- Remove or deactivate unused accounts
Enable Audit Log
- Settings → Audit Log
- Enable logging of user actions
- Review periodically for suspicious activity
Regular Backups
Ensure regular backups are configured:
# Database backup
docker exec bookstack_db mysqldump -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d).sql.gz
# Data backup
cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d).tar.gz data/
Add to daily cron job or LaunchAgent.
Troubleshooting
MFA Not Working
Problem: Can't login with MFA code
Solutions:
- Check time sync on server and phone (TOTP requires accurate time)
- Use backup codes if available
- Reset MFA via database (emergency only):
docker exec bookstack_db mysql -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp \ -e "UPDATE users SET mfa_values = NULL WHERE email = 'john@alai.no';"
Lost API Token
Problem: Token was not saved and is no longer visible
Solution:
- Delete old token in web UI (API Tokens tab)
- Create new token (see Part 2)
- Update config file
Cannot Access Web UI
Problem: BookStack returns 500 error or won't load
Solutions:
- Check container status:
docker ps | grep bookstack - Check logs:
docker logs bookstack --tail 100 - Restart service:
cd ~/system/services/bookstack && docker compose restart
Security Best Practices
- MFA on all admin accounts - Always enable MFA for admins
- Strong passwords - Use 20+ character passwords with mixed case, numbers, symbols
- Regular token rotation - Rotate API tokens every 90 days
- Least privilege - Give users minimum permissions needed
- Audit logs - Review regularly for suspicious activity
- Backups - Daily database + data backups
- HTTPS - Use Cloudflare tunnel for external access (see bookstack.md)
- Keep updated - Update BookStack image regularly
Next Steps
After completing this setup:
- Enable MFA for john@alai.no
- Create new API token
- Update ~/system/config/bookstack.json
- Test API token works
- Enable MFA for alem@basicconsulting.no
- Review and set user permissions
- Configure daily backups
- Consider Cloudflare tunnel for external access
Last updated: 2026-02-17 Maintained by: John (AI Director) Related: ~/system/context/docs/runbooks/bookstack.md
CEO Dashboard Runbook
Last Verified: 2026-02-17 | Owner: John
CEO Dashboard
URL: http://localhost:3030/ceo Server: Mission Control Dashboard (port 3030) Auto-refresh: 60 seconds Theme: Dark (ALAI brand)
Overview
The CEO Dashboard provides Alem with a single-screen view of all critical business metrics. It aggregates data from multiple sources (Mission Control tasks, sales pipeline, invoices, support tickets, decisions) into a real-time executive view.
Sections
1. Revenue Overview (Banner)
- MRR (Monthly Recurring Revenue) — Estimated from total invoiced / months
- Outstanding — Total unpaid invoices
- 3-Month Trend — Revenue trend (TODO: implement calculation)
- Next Invoice Due — Next upcoming payment deadline
Data Source: invoice-generator.js stats and invoice-generator.js list
2. Pipeline Funnel
Visual funnel showing lead progression:
- Prospect → Qualified → Proposal Sent → Negotiating → Won
- Each stage shows count of active leads
Data Source: sales-pipeline.js stats
3. Active Projects (Kanban)
Project status board with 3 columns:
- Active — In progress tasks with project tag
- Pending — Paused tasks with project tag
- Stalled — Blocked tasks with project tag
Data Source: Mission Control tasks table (filtered by project IS NOT NULL)
4. Decisions Pending
Top 5 GO/NO-GO decisions awaiting Alem's response:
- Title of decision
- Recommendation (MUST GO / GO / CONDITIONAL GO / NO-GO)
- Visual badge indicating action needed
Data Source: ~/system/specs/alem-decisions-2026-02.md (parsed from markdown)
5. Alerts Panel
Critical alerts requiring attention:
- Overdue invoices (from
invoice-generator.js check-overdue) - SLA breaches (from
ticket-sla-checker.js) - Stale tasks (open >7 days from MC)
Color coding:
- 🔴 Critical (red) — SLA breaches
- ⚠️ Warning (yellow) — Overdue invoices
- ℹ️ Info (blue) — Stale tasks
Data Sources: invoice-generator.js, ticket-sla-checker.js, MC tasks table
6. Upcoming Deadlines
Timeline of upcoming deadlines (next 14 days):
- Tasks with "deadline" keyword in description
- Sorted by creation date (proxy for urgency)
Data Source: Mission Control tasks table (filtered by description LIKE '%deadline%')
Technical Details
Implementation
- Added as route
/ceoto existing MC dashboard server - Server file:
~/system/tools/mc-dashboard.js - HTML file:
~/system/tools/ceo-dashboard.html - API endpoint:
GET /api/ceo/dashboard(JSON)
Data Aggregation
Dashboard uses child_process.execSync to call existing tools:
const invoiceStatsRaw = execSync('node ~/system/tools/invoice-generator.js stats 2>/dev/null');
const pipelineRaw = execSync('node ~/system/tools/sales-pipeline.js stats 2>/dev/null');
Data is cached for 60 seconds to avoid hammering tools on every browser refresh.
Styling
- Pure CSS (no frameworks)
- ALAI brand colors:
- Background:
#09090b - Accent:
#00E5A0 - Cards:
#18181b - Borders:
#27272a - Text:
#e4e4e7
- Background:
- Responsive grid layout
- Mobile-friendly (single column on mobile)
Auto-refresh
Two mechanisms:
- HTML meta refresh:
<meta http-equiv="refresh" content="60"> - JavaScript interval:
setInterval(loadDashboard, 60000)
Access
Local
- Direct: http://localhost:3030/ceo
- From MC dashboard: Click "CEO Dashboard" link (TODO: add link to MC dashboard)
LAN Access
Dashboard is bound to 0.0.0.0:3030, accessible from any device on the network:
- Find Mac Studio IP:
ifconfig | grep "inet " | grep -v 127.0.0.1 - Access from phone/tablet:
http://[MAC_IP]:3030/ceo
Mobile
Fully responsive. Recommended for iPad/tablet in landscape mode for best experience.
Future Enhancements
Phase 2 (Interactive)
- Click on decisions to mark GO/NO-GO (updates alem-decisions file)
- Click on alerts to take action (send reminder, escalate ticket)
- Filter pipeline by source/date range
- Drill-down from project kanban to task list
Phase 3 (Advanced Metrics)
- Revenue trend calculation (3-month moving average)
- Pipeline conversion rates (qualified → won)
- Task velocity (tasks closed per week)
- SLA compliance percentage over time
- Contract expiration warnings
Phase 4 (AI Insights)
- Weekly digest summary (Ollama-generated)
- Anomaly detection (sudden drop in pipeline, spike in alerts)
- Predictive revenue forecasting
- Recommendations engine (which decision to prioritize)
Maintenance
Update Decision File
When Alem makes decisions, update:
~/system/specs/alem-decisions-2026-02.md
Dashboard will auto-parse on next refresh.
Restart Dashboard
If changes are made to server code:
launchctl kickstart -k gui/$(id -u)/com.john.mc-dashboard
Check Logs
tail -f ~/system/logs/mc-dashboard.log
tail -f ~/system/logs/mc-dashboard-error.log
Troubleshooting
Dashboard shows "Loading..." indefinitely
- Check API endpoint:
curl http://localhost:3030/api/ceo/dashboard - Check browser console for JavaScript errors
- Verify MC dashboard daemon is running:
launchctl list | grep mc-dashboard
Data shows 0 or N/A
- Verify tool outputs:
node ~/system/tools/invoice-generator.js stats - Check tool paths in
mc-dashboard.jsAPI route - Ensure database files exist in
~/system/databases/
Mobile layout broken
- Clear browser cache
- Test responsive design in browser dev tools
- Check CSS media queries in
ceo-dashboard.html
Related Files
- Server:
/Users/makinja/system/tools/mc-dashboard.js - HTML:
/Users/makinja/system/tools/ceo-dashboard.html - Daemon:
~/Library/LaunchAgents/com.john.mc-dashboard.plist - Manifest:
~/system/tools/manifest.md - Decisions:
~/system/specs/alem-decisions-2026-02.md
Infrastructure Runbook
Last Verified: 2026-02-17 | Owner: John
Runbook: Local Infrastructure
Platform: Mac Studio M3 Ultra, 96GB RAM, macOS Services: Docker containers, LaunchAgents, Cloudflare tunnels
Docker Services
Status Check
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
Services
| Container | Image | Port | Health |
|---|---|---|---|
| mattermost | mattermost/mattermost-enterprise | 8065 | healthcheck |
| mattermost-db | postgres:13 | 5432 (internal) | — |
| planka | ghcr.io/plankanban/planka | 3100→1337 | healthcheck |
| planka-db | postgres:15-alpine | 5433 (internal) | healthcheck |
| documenso | documenso/documenso | 3003 | — |
| documenso-db | postgres | 5434 (internal) | healthcheck |
| bookstack | lscr.io/linuxserver/bookstack | 6875→80 | — |
| bookstack_db | lscr.io/linuxserver/mariadb | 3306 (internal) | — |
Restart a container
docker restart <container_name>
# Example: docker restart mattermost
Restart all
# Mattermost stack
cd ~/system/services/mattermost && docker compose down && docker compose up -d
# Planka stack
cd ~/system/services/planka && docker compose down && docker compose up -d
# Documenso
cd ~/system/services/documenso && docker compose down && docker compose up -d
# BookStack
cd ~/system/services/bookstack && docker compose down && docker compose up -d
View logs
docker logs <container_name> --tail 50
docker logs <container_name> -f # follow
Disk cleanup (if disk >90%)
docker system prune -f # Remove unused images, containers, networks
docker volume prune -f # Remove unused volumes (CAREFUL: data loss)
Cloudflare Tunnels
Config
cat ~/.cloudflared/config.yml
Routes
| Hostname | Target | Service |
|---|---|---|
| mm.basicconsulting.no | localhost:8065 | Mattermost |
| boards.basicconsulting.no | localhost:3100 | Planka |
| sign.basicconsulting.no | localhost:3003 | Documenso |
Status
cloudflared tunnel info mattermost
Restart tunnel
# Tunnel runs as LaunchAgent
launchctl unload ~/Library/LaunchAgents/com.cloudflare.tunnel.plist
launchctl load ~/Library/LaunchAgents/com.cloudflare.tunnel.plist
LaunchAgents (Daemons)
List all custom daemons
launchctl list | grep -E "com\.(john|edita|cloudflare)"
Expected daemons
| Daemon | Interval | Location |
|---|---|---|
| com.john.ops-agent | 5 min | ~/Library/LaunchAgents/ |
| com.edita.autowork | 30 min | ~/Library/LaunchAgents/ |
| com.john.mc-dashboard | always | ~/Library/LaunchAgents/ |
| com.john.mc-session-worker | on events | ~/Library/LaunchAgents/ |
Load/unload
launchctl load ~/Library/LaunchAgents/<plist-name>.plist
launchctl unload ~/Library/LaunchAgents/<plist-name>.plist
Ollama (Local AI)
Status
curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models']]"
Models
| Model | Size | Use |
|---|---|---|
| llama3.1:8b | 5GB | Fast classification (ops-agent) |
| qwen2.5-coder:32b | 19GB | Code generation, contextual responses |
| llama3.1:70b | 40GB | Research, writing |
Restart Ollama
# Ollama runs as macOS app
killall ollama 2>/dev/null
open -a Ollama
Mission Control Dashboard
Status
curl -s http://localhost:3030 | head -1
Restart
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Full Health Check
# Human-readable
node ~/system/tools/health-check.js
# JSON (programmatic)
node ~/system/tools/health-check.js --json
# Quick (HTTP only)
node ~/system/tools/health-check.js --quick
After System Reboot
All LaunchAgents with RunAtLoad: true start automatically. Verify:
# 1. Check Docker is running
docker ps
# 2. Check all daemons
launchctl list | grep -E "com\.(john|edita|cloudflare)"
# 3. Run health check
node ~/system/tools/health-check.js
# 4. If anything missing, load it
launchctl load ~/Library/LaunchAgents/<missing>.plist
Created: 2026-02-10 Last Updated: 2026-02-10
Mission Control Dashboard
Last Verified: 2026-02-17 | Owner: John
Runbook: Mission Control Dashboard
Service Type: Task Management Web UI Runtime: Node.js (Express) Port: 3030 (internal + LAN accessible) Internal URL: http://localhost:3030 LAN URL: http://192.168.68.61:3030 (mobile-friendly) Database: SQLite (~/system/databases/mission-control.db) LaunchAgent: com.john.mc-dashboard Source: ~/system/tools/mc-dashboard.js
Service Info
Mission Control Dashboard is the web UI for task management. Provides CRUD operations, priority management, status tracking, and team coordination.
Features:
- Task list with filters (open/closed, owner, priority)
- Create/edit/delete tasks
- Start/pause/resume tasks
- Priority management (H/M/L)
- Owner assignment (john/edita/—)
- Real-time status updates
- Mobile-responsive design
- Auto-refresh every 30 seconds
CLI Alternative:
node ~/system/tools/mc.js list|add|start|done|pause|resume|block
Status Check
LaunchAgent Status
launchctl list | grep mc-dashboard
Expected output: PID shown (e.g., 12345 0 com.john.mc-dashboard)
If not running: - 0 com.john.mc-dashboard (no PID)
HTTP Check
curl -I http://localhost:3030
Expected: 200 OK
LAN Access Check (from another device)
curl -I http://192.168.68.61:3030
Expected: 200 OK
Database Check
sqlite3 ~/system/databases/mission-control.db "SELECT count(*) FROM tasks WHERE status = 'open';"
Restart Procedure
Stop Service
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Start Service
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Restart (Stop + Start)
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Note: LaunchAgent auto-restarts on crash (KeepAlive=true).
View Logs
stdout (General logs)
tail -f ~/system/logs/mc-dashboard.log
stderr (Error logs)
tail -f ~/system/logs/mc-dashboard.err
Recent errors
tail -50 ~/system/logs/mc-dashboard.err
Troubleshooting
Problem: Dashboard won't start
Check LaunchAgent:
launchctl list | grep mc-dashboard
Check error log:
tail -50 ~/system/logs/mc-dashboard.err
Common causes:
- Port 3030 already bound - check
lsof -i :3030 - Database locked - check for stale processes using SQLite
- Node.js not found - check
which node - Permission issues - check file ownership
Fix:
# Kill any process on port 3030
lsof -ti :3030 | xargs kill -9
# Restart
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Problem: Can't connect from mobile (LAN)
Check service is listening on all interfaces:
lsof -i :3030
Expected: *:3030 (listening on all IPs, not just 127.0.0.1)
Check firewall:
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
If firewall is on, allow Node.js:
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /opt/homebrew/bin/node
Check Mac IP:
ipconfig getifaddr en0 # WiFi
ipconfig getifaddr en1 # Ethernet
Expected: 192.168.68.61 (or similar)
Problem: Tasks not updating (stale data)
Check database integrity:
sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"
Expected: ok
Check last write:
ls -lh ~/system/databases/mission-control.db
Restart dashboard:
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Problem: 500 errors in UI
Check server logs:
tail -f ~/system/logs/mc-dashboard.log ~/system/logs/mc-dashboard.err
Check database:
sqlite3 ~/system/databases/mission-control.db "SELECT * FROM tasks LIMIT 1;"
Common causes:
- Database schema mismatch - migrate database
- Corrupted task data - fix in SQLite
- Node.js error - check stack trace in error log
CLI Integration
Mission Control has two interfaces:
- Dashboard (UI) - http://localhost:3030
- CLI - node ~/system/tools/mc.js
Both read/write the same SQLite database: ~/system/databases/mission-control.db
CLI Commands
# List tasks
node ~/system/tools/mc.js list
node ~/system/tools/mc.js list --owner john
# Start task (creates /tmp/mc-active-task)
node ~/system/tools/mc.js start <id>
# Complete task
node ~/system/tools/mc.js done <id> "outcome summary"
# Pause task (removes /tmp/mc-active-task)
node ~/system/tools/mc.js pause <id>
# Block task
node ~/system/tools/mc.js block <id> "blocker reason"
# Show full details
node ~/system/tools/mc.js show <id>
# Who's working on what
node ~/system/tools/mc.js active
Dependencies
- Node.js - Runtime (/opt/homebrew/bin/node)
- SQLite3 - Database (built-in with Node.js)
- LaunchAgent - Auto-start on login
- No external services - Fully local
Backup
Database Backup
cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +%Y%m%d-%H%M%S).db
Automated Backup (daily)
Add to crontab or LaunchAgent:
0 2 * * * cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +\%Y\%m\%d).db
Restore from Backup
# Stop dashboard
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
# Restore database
cp ~/backups/mission-control-YYYYMMDD-HHMMSS.db ~/system/databases/mission-control.db
# Start dashboard
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Configuration
LaunchAgent Plist
Path: ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Key settings:
KeepAlive: true- Auto-restart on crashRunAtLoad: true- Start on loginStandardOutPath- Log stdoutStandardErrorPath- Log stderrEnvironmentVariables: HOME- User home directory
Application Config
Port: 3030 (hardcoded in mc-dashboard.js) Database: ~/system/databases/mission-control.db (hardcoded) Auto-refresh: 30 seconds (client-side)
To change port:
- Edit ~/system/tools/mc-dashboard.js
- Change
const PORT = 3030;to desired port - Restart LaunchAgent
Related Services
Mission Control Session Worker
LaunchAgent: com.john.mc-session-worker Purpose: Background daemon for session-level task monitoring
Status check:
launchctl list | grep mc-session-worker
Notes
- Access: LAN-accessible (no auth) - consider adding auth for remote access
- Mobile-friendly: Responsive design, touch-optimized
- No auth: Anyone on LAN can create/modify tasks - secure network required
- Auto-refresh: Dashboard auto-refreshes every 30s
- Active task enforcement: ~/system/.claude/hooks/gotcha-enforcer.py checks /tmp/mc-active-task before Write/Edit
- CLI vs UI: Both interfaces are equal - use whichever is convenient
Last updated: 2026-02-10 Maintained by: John (AI Director)
Planka Runbook
Last Verified: 2026-02-17 | Owner: John
Runbook: Planka
Service Type: Kanban Board / Project Management Container: planka (ghcr.io/plankanban/planka:2.0.0-rc.4) Ports: 3100 (external) → 1337 (internal) External URL: https://boards.basicconsulting.no Database: PostgreSQL 15 (planka-db) Compose File: ~/system/services/planka/docker-compose.yml
Service Info
Planka is the visual project management tool for BasicAS Group. Kanban-style boards for task tracking.
Stack:
- planka - Main app (RC4)
- planka-db - PostgreSQL 15 (alpine)
External Access:
- Exposed via Cloudflare Tunnel: boards.basicconsulting.no
- Trust proxy enabled for correct client IPs
Admin Access:
- Web UI: http://localhost:3100 (local) or https://boards.basicconsulting.no
- Username: john
- Password: BasicAS2026!
- Email: john@basicconsulting.no
- Database: postgresql://postgres@planka-db/planka (internal only, no auth)
Status Check
Container Health
docker ps | grep planka
Expected output:
planka Up X hours (healthy)
planka-db Up X hours (healthy)
HTTP Check
curl -I http://localhost:3100
Expected: 200 OK or 302 Found
External Access Check
curl -I https://boards.basicconsulting.no
Expected: 200 OK or 302 Found
Database Check
docker exec planka-db psql -U postgres -d planka -c "SELECT count(*) FROM \"user\";"
Restart Procedure
Quick Restart (Container Only)
docker restart planka
Full Stack Restart (Container + Database)
cd ~/system/services/planka
docker compose down
docker compose up -d
Wait 30 seconds for healthcheck to pass, then verify:
docker ps | grep planka
curl -I http://localhost:3100
Troubleshooting
Problem: Container won't start
Check logs:
docker logs planka --tail 100
Common causes:
- Database not ready - wait 30s and retry
- Port 3100 already bound - check
lsof -i :3100 - Volume permission issues - check docker volumes
Fix:
cd ~/system/services/planka
docker compose down
docker compose up -d planka-db
sleep 30
docker compose up -d planka
Problem: Login issues (can't sign in with admin credentials)
Check environment variables:
docker exec planka env | grep DEFAULT_ADMIN
Expected:
DEFAULT_ADMIN_EMAIL=john@basicconsulting.no
DEFAULT_ADMIN_PASSWORD=BasicAS2026!
DEFAULT_ADMIN_NAME=John AI
DEFAULT_ADMIN_USERNAME=john
If admin was changed in UI, default credentials won't work. Reset via database:
docker exec planka-db psql -U postgres -d planka -c "SELECT email, username FROM \"user\" WHERE \"isAdmin\" = true;"
Problem: 502 Bad Gateway (external access)
Check container is running:
docker ps | grep planka
Check Cloudflare tunnel:
cloudflared tunnel info boards
Check BASE_URL:
docker exec planka env | grep BASE_URL
Expected: BASE_URL=https://boards.basicconsulting.no
Problem: Database connection issues
Check database health:
docker exec planka-db pg_isready -U postgres -d planka
Check connection string:
docker exec planka env | grep DATABASE_URL
Expected: DATABASE_URL=postgresql://postgres@planka-db/planka
API Access
Planka has a REST API. Example:
Get Boards (requires auth token)
curl -H "Authorization: Bearer <TOKEN>" http://localhost:3100/api/boards
Get Token:
- Login via UI
- Inspect browser Network tab → find
accessTokenin response - Or use user credentials to authenticate programmatically
Dependencies
- Docker - Service runtime
- Cloudflare Tunnel - External access (boards.basicconsulting.no)
No dependencies on other local services.
Backup
Database Dump
docker exec planka-db pg_dump -U postgres planka | gzip > ~/backups/planka-$(date +%Y%m%d-%H%M%S).sql.gz
Docker Volumes (includes file uploads)
docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-db-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .
Restore from Backup
# Stop service
cd ~/system/services/planka
docker compose down
# Restore database
gunzip -c ~/backups/planka-YYYYMMDD-HHMMSS.sql.gz | docker exec -i planka-db psql -U postgres -d planka
# Restore volumes (if needed)
docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-data-YYYYMMDD-HHMMSS.tar.gz -C /data
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-db-data-YYYYMMDD-HHMMSS.tar.gz -C /data
# Start service
docker compose up -d
Configuration
Key Environment Variables
BASE_URL- External URL (https://boards.basicconsulting.no)DATABASE_URL- PostgreSQL connection stringSECRET_KEY- Encryption key for sessions/tokensTOKEN_EXPIRES_IN- JWT token expiry (365 days)DEFAULT_LANGUAGE- UI language (en-US)DEFAULT_ADMIN_*- Initial admin user credentialsTRUST_PROXY- Enable for correct IPs behind Cloudflare
Full config: ~/system/services/planka/docker-compose.yml
Notes
- Version: 2.0.0-rc.4 (release candidate, not stable)
- Auth method: Password-based (no SSO/LDAP yet)
- Database: Uses PostgreSQL with
trustauth (no password) - secure as internal-only - Token expiry: 365 days (1 year) - very long, consider shorter for security
- Admin password: Stored in docker-compose.yml (plaintext) - consider secrets management
Last updated: 2026-02-10 Maintained by: John (AI Director)
Ops Agent Runbook
Last Verified: 2026-02-17 | Owner: John
Runbook: Ops Agent
Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)
What It Does
Autonomous operations agent that runs 24/7:
- MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
- Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
- Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
- Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
- Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
- Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
- Escalation — creates HIGH priority MC task + MM alert when it can't resolve
Status Check
# Is it running?
launchctl list | grep ops-agent
# Recent activity
tail -50 ~/system/logs/ops-agent.log
# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log
# State file
cat /tmp/ops-agent-state.json
# Stats
cat ~/system/agents/state/ops.json
Restart
# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Verify
launchctl list | grep ops-agent
Manual Run (Testing)
# Run one cycle manually
node ~/system/daemons/ops-agent.js
# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty
Troubleshooting
Ops agent not running
# Check if loaded
launchctl list | grep ops-agent
# Expected: "- 0 com.john.ops-agent"
# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist
Not processing messages
# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool
# Check MM connectivity
node ~/system/tools/mm.js status
# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool
# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping
Classification wrong (Ollama issues)
# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
"stream": false,
"options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
# If Ollama down, ops-agent falls back to keyword heuristics (still works)
Health check reporting false positives
# Run health check directly
node ~/system/tools/health-check.js
# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool
# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>
Auto-fix loop (service keeps restarting)
# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool
# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json
# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>
Planka card not created
# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
-H "Content-Type: application/json" \
-d '{"emailOrUsername":"john","password":"BasicAS2026!"}'
# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10
Dependencies
| Service | Required | Fallback |
|---|---|---|
| Mattermost (8065) | YES | Agent skips MM check cycle |
| Ollama (11434) | NO | Falls back to keyword classification |
| MC (mc.js) | YES | Tasks not created (error logged) |
| Planka (3100) | NO | Cards not created (task still created in MC) |
| HiveMind | NO | Intel not posted (ops still works) |
Configuration
Monitored MM Teams
Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad
Ignored Users (bots)
john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js
Billable Logic
basicteam = INTERNAL (not billable)wizard,rendrom,riad= BILLABLE (client teams)
Health Check Services
Defined in health-check.js:
- 8 Docker containers
- 6 HTTP endpoints
- 2 system checks (disk, memory)
- 4 LaunchAgent daemons
Files
| File | Purpose |
|---|---|
| ~/system/daemons/ops-agent.js | Main daemon code |
| ~/Library/LaunchAgents/com.john.ops-agent.plist | LaunchAgent config |
| ~/system/tools/health-check.js | Service health monitor |
| ~/system/tools/auto-fix.js | Automated recovery |
| ~/system/agents/identities/ops.md | Agent identity card |
| ~/system/agents/state/ops.json | Persistent state |
| /tmp/ops-agent-state.json | Runtime state (last check timestamp) |
| /tmp/mm-token.json | Cached MM auth token |
| /tmp/ops-fix-history.json | Auto-fix attempt tracking |
| ~/system/logs/ops-agent.log | Activity log |
| ~/system/logs/ops-agent-launchd.log | LaunchAgent stdout |
| ~/system/logs/ops-agent-launchd-error.log | LaunchAgent stderr |
Disaster Recovery
Complete reset
# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json
# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Note: First run will check messages from last 30 minutes only (default)
Rollback to mm-responder
# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist
# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)
Metrics
Check via MC:
node ~/system/tools/mc.js stats # Task creation stats
node ~/system/tools/mc.js list --owner ops # Tasks created by ops-agent
Check via state:
cat ~/system/agents/state/ops.json # Cumulative stats
cat /tmp/ops-agent-state.json # Current cycle stats
Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10
Service Registry
Last Verified: 2026-02-17 | Owner: John
Service Registry — ALAI Holding
Last Updated: 2026-02-12 Owner: John (AI Director)
Domains
| Domain | Registrar | Nameservers | Points To | Purpose | Renewal |
|---|---|---|---|---|---|
| basicconsulting.no | one.com | Cloudflare | Cloudflare Tunnel | Consulting brand | Check one.com |
| mm.basicconsulting.no | — | Cloudflare | Tunnel → localhost:8065 | Mattermost | — |
| sign.basicconsulting.no | — | Cloudflare | Tunnel → localhost:3003 | Documenso | — |
| boards.basicconsulting.no | — | Cloudflare | Tunnel → localhost:3100 | Planka | — |
| vault.basicconsulting.no | — | Cloudflare | Tunnel → localhost:8200 | Vaultwarden | — |
| alai.no | one.com | Vercel | Vercel | ALAI Holding website | Check one.com |
| getdrop.no | one.com | Vercel (pending) | Vercel → drop-landing | Drop fintech landing | Check one.com |
| basicfakta.no | one.com | Vercel | Vercel | BasicFakta SaaS | Check one.com |
Hosting & Deploy
| Service | Platform | URL | Deploy Method |
|---|---|---|---|
| Drop landing | Vercel | getdrop.no | vercel --prod from ~/ALAI/products/Drop/landing |
| ALAI website | Vercel | alai.no | vercel --prod from ~/ALAI/web |
| BasicFakta | Vercel | basicfakta.no | TBD |
Local Services (Mac Studio M3 Ultra, 96GB)
| Service | Type | Port | Domain | Purpose | Status |
|---|---|---|---|---|---|
| Mattermost | Docker | 8065 | mm.basicconsulting.no | Team chat | Active |
| Planka | Docker | 3100 | boards.basicconsulting.no | Kanban boards | Active |
| Documenso | Docker | 3003 | sign.basicconsulting.no | E-signatures | Active |
| BookStack | Docker | 6875 | localhost only | Internal wiki | Active |
| Vaultwarden | Docker | 8200 | vault.basicconsulting.no | Password manager | Active |
| MC Dashboard | Node.js | 3030 | localhost (LAN) | Mission Control | Active |
| Ollama | Native | 11434 | localhost | Local AI | Active |
| n8n | Docker | 5678 | localhost | Workflow automation | Active |
| MinIO | Docker | 9000 | localhost | S3 storage (Documenso) | Active |
Cloudflare
| Item | Value |
|---|---|
| Account ID | d0ac2afb6bb5b298723b85a114151a04 |
| Tunnel ID | 3315a609-7934-45c5-ad0c-56d86d16374d |
| CLI | /opt/homebrew/bin/cloudflared |
| Zone | basicconsulting.no |
| Address | Provider | Purpose |
|---|---|---|
| john@basicconsulting.no | one.com | Support / John agent |
| info@basicconsulting.no | one.com | Edita / general |
| alem@basicconsulting.no | one.com | CEO |
| post@alai.no | TBD | Drop + ALAI public contact |
Accounts & SaaS
| Service | URL | Purpose | Owner |
|---|---|---|---|
| Vercel | vercel.com | Static hosting | john-3447 |
| Cloudflare | dash.cloudflare.com | DNS, tunnel, CDN | Alem |
| one.com | one.com | Domain registrar + email | Alem |
| GitHub | github.com | Code repos | TBD |
| Fiken | fiken.no | Accounting | Alem |
| Flowcase | everdeen.flowcase.com | CV management | Alem |
Daemons (LaunchAgents)
| Daemon | Interval | Purpose |
|---|---|---|
| com.john.ops-agent | 5 min | MM monitoring, health, auto-fix |
| com.john.mc-dashboard | always | Web dashboard :3030 |
| com.john.mc-session-worker | events | Session state extraction |
| com.john.morning-routine | 07:00 | Daily briefing |
| com.john.agentforge | 4h | Auto-audit agents |
| com.john.mm-bridge | 5s poll | Alem→John chat (#ceo) |
| com.edita.autowork | 30 min | Background task worker |
| com.john.health-check | 5 min | Service health monitoring |
| com.john.email-agent | 5 min | Email triage |
| com.john.intake-watcher | 5 min | Email→task pipeline |
| com.edita.job-hunter | periodic | Opportunity scanning |
Maintenance Notes
- Domain renewals: All on one.com — check annually
- SSL: Vercel = auto (Let's Encrypt), Cloudflare = auto
- Docker updates:
docker compose pullin ~/system/services/{service}/ - Backups:
bash ~/system/tools/db-backup.sh(daily via daemon)
Ops Agent
Runbook: Ops Agent
Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)
What It Does
Autonomous operations agent that runs 24/7:
- MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
- Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
- Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
- Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
- Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
- Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
- Escalation — creates HIGH priority MC task + MM alert when it can't resolve
Status Check
# Is it running?
launchctl list | grep ops-agent
# Recent activity
tail -50 ~/system/logs/ops-agent.log
# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log
# State file
cat /tmp/ops-agent-state.json
# Stats
cat ~/system/agents/state/ops.json
Restart
# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Verify
launchctl list | grep ops-agent
Manual Run (Testing)
# Run one cycle manually
node ~/system/daemons/ops-agent.js
# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty
Troubleshooting
Ops agent not running
# Check if loaded
launchctl list | grep ops-agent
# Expected: "- 0 com.john.ops-agent"
# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist
Not processing messages
# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool
# Check MM connectivity
node ~/system/tools/mm.js status
# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool
# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping
Classification wrong (Ollama issues)
# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
"stream": false,
"options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
# If Ollama down, ops-agent falls back to keyword heuristics (still works)
Health check reporting false positives
# Run health check directly
node ~/system/tools/health-check.js
# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool
# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>
Auto-fix loop (service keeps restarting)
# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool
# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json
# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>
Planka card not created
# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
-H "Content-Type: application/json" \
-d '{"emailOrUsername":"john","password":"BasicAS2026!"}'
# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10
Dependencies
| Service | Required | Fallback |
|---|---|---|
| Mattermost (8065) | YES | Agent skips MM check cycle |
| Ollama (11434) | NO | Falls back to keyword classification |
| MC (mc.js) | YES | Tasks not created (error logged) |
| Planka (3100) | NO | Cards not created (task still created in MC) |
| HiveMind | NO | Intel not posted (ops still works) |
Configuration
Monitored MM Teams
Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad
Ignored Users (bots)
john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js
Billable Logic
basicteam = INTERNAL (not billable)wizard,rendrom,riad= BILLABLE (client teams)
Health Check Services
Defined in health-check.js:
- 8 Docker containers
- 6 HTTP endpoints
- 2 system checks (disk, memory)
- 4 LaunchAgent daemons
Files
| File | Purpose |
|---|---|
| ~/system/daemons/ops-agent.js | Main daemon code |
| ~/Library/LaunchAgents/com.john.ops-agent.plist | LaunchAgent config |
| ~/system/tools/health-check.js | Service health monitor |
| ~/system/tools/auto-fix.js | Automated recovery |
| ~/system/agents/identities/ops.md | Agent identity card |
| ~/system/agents/state/ops.json | Persistent state |
| /tmp/ops-agent-state.json | Runtime state (last check timestamp) |
| /tmp/mm-token.json | Cached MM auth token |
| /tmp/ops-fix-history.json | Auto-fix attempt tracking |
| ~/system/logs/ops-agent.log | Activity log |
| ~/system/logs/ops-agent-launchd.log | LaunchAgent stdout |
| ~/system/logs/ops-agent-launchd-error.log | LaunchAgent stderr |
Disaster Recovery
Complete reset
# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json
# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist
# Note: First run will check messages from last 30 minutes only (default)
Rollback to mm-responder
# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist
# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)
Metrics
Check via MC:
node ~/system/tools/mc.js stats # Task creation stats
node ~/system/tools/mc.js list --owner ops # Tasks created by ops-agent
Check via state:
cat ~/system/agents/state/ops.json # Cumulative stats
cat /tmp/ops-agent-state.json # Current cycle stats
Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10
Daemons & Services
Tools Manifest
CHECK THIS BEFORE CREATING NEW TOOLS. If a tool exists, use it. If you create a new tool, add it here.
TOOL-FIRST PROTOCOL: ~/system/rules/tool-first-protocol.md
Redoslijed: Naši alati → Naši skillovi → Naša baza (HiveMind) → Internet → Ažuriraj bazu
Last audit: 2026-02-13 — Spring cleaning: 22 deprecated tools archived, 3 empty DBs deleted, 1 broken daemon unloaded, MEMORY.md trimmed 229→184 lines.
Task Management
| Tool | Command | Description |
|---|---|---|
| task.sh | ~/system/tools/task.sh list|add|start|done|block |
Task CLI using Taskwarrior 3 (cross-session) |
| mc.js | node ~/system/tools/mc.js list|add|start|done|show|routes |
Mission Control - Task management with agent routing |
| mc.js routes | node ~/system/tools/mc.js routes |
List available task routes (backend, frontend, devops, qa, bizdev, general) |
| mc.js add --route | node ~/system/tools/mc.js add "Task" --route backend |
Create task with route - auto-spawns agent on start |
Task → Agent Routing: MC tasks can be tagged with routes that automatically spawn appropriate Ollama agents when task starts.
- Routes: backend (dev), frontend (designer+dev), devops (devops), qa (auditor), bizdev (marketer), general (dev)
- Agent output is captured and stored in task.agent_output field
- Visible in
mc.js show <id>command - If Ollama unavailable, gracefully degrades (logs error, doesn't block task)
- Agent runs in background via exec() - non-blocking
- Logs to HiveMind on spawn/completion/error
Briefings & Analysis
| Tool | Command | Description |
|---|---|---|
| ceo-briefing.js | node ~/system/tools/ceo-briefing.js --full |
ZAKON #11: All-source CEO briefing (5 email accounts, MC tasks, HiveMind, sessions, daemon briefing). Zero LLM cost. |
| ceo-briefing.js | node ~/system/tools/ceo-briefing.js --quick |
Quick boot summary (counts + top items, <500 tokens). Called by boot.sh. |
| ceo-briefing.js | node ~/system/tools/ceo-briefing.js --email |
All 5 email accounts: inbox + sent for each. |
| ceo-briefing.js | node ~/system/tools/ceo-briefing.js --followup |
Open/blocked MC tasks overview. |
| ceo-briefing.js | node ~/system/tools/ceo-briefing.js --topic "X" |
Topic search across sessions + HiveMind + all email accounts. |
| council-briefing.js | node ~/system/tools/council-briefing.js |
AI Council: 4 personas (Growth, Revenue, Skeptic, Ops) analyze business data via Ollama. Posts to Slack #exec. Nightly at 22:00. |
| meeting-prep.js | node ~/system/tools/meeting-prep.js [--ics file.ics] [--date YYYY-MM-DD] |
Calendar-aware meeting prep: ICS parsing, CRM attendee lookup, pipeline context, contextual notes. |
| council-briefing.js | node ~/system/tools/council-briefing.js --model 70b |
Use 70b model for deeper analysis |
| council-briefing.js | node ~/system/tools/council-briefing.js --dry-run |
Gather data only, no Ollama/Slack |
| john-morning.sh | bash ~/system/tools/john-morning.sh |
Morning routine: Quran, tasks, HiveMind, health, daily synthesis. Daily at 07:00. |
| memory-synthesizer.js | node ~/system/tools/memory-synthesizer.js daily [date] |
Summarize day's intel → HiveMind memo. Auto in morning-routine. |
| memory-synthesizer.js | node ~/system/tools/memory-synthesizer.js weekly |
Synthesize week → HiveMind memo. Auto Sundays 23:00. |
| memory-synthesizer.js | node ~/system/tools/memory-synthesizer.js promote |
Promote weekly → long-term knowledge |
| memory-synthesizer.js | node ~/system/tools/memory-synthesizer.js prune |
Delete daily memos >30 days |
| memory-synthesizer.js | node ~/system/tools/memory-synthesizer.js view [tier] |
View tiered memory (daily/weekly/longterm) |
Meeting & Transcript Processing
| Tool | Command | Description |
|---|---|---|
| transcript-to-tasks.js | node ~/system/tools/transcript-to-tasks.js <file> |
Extract action items from meeting transcript → MC tasks via Ollama |
| transcript-to-tasks.js | node ~/system/tools/transcript-to-tasks.js <file> --preview |
Preview extracted actions (no task creation) |
| transcript-to-tasks.js | node ~/system/tools/transcript-to-tasks.js <file> --owner john |
Assign all extracted tasks to owner |
Formats: .txt, .md, .srt, .vtt. Tasks prefixed with [TRANSCRIPT].
Health & Quality
| Tool | Command | Description |
|---|---|---|
| drift-detector.js | node ~/system/tools/drift-detector.js snapshot |
Behavioral drift analysis engine — records daily metrics from 5 data sources (session claims, verification audits, email-audit.db, mission-control.db, hivemind.db) to drift.db. Anomaly detection with σ-based thresholds. Alerts to HiveMind + Slack. Daily at 23:55 via com.john.drift-detector LaunchAgent. Created: 2026-02-23. |
| drift-detector.js | node ~/system/tools/drift-detector.js analyze [--days N] |
Analyze recent metric trends (default: 7 days). Returns trend, per-metric stats, anomalies. |
| drift-detector.js | node ~/system/tools/drift-detector.js report [--days N] |
Human-readable drift report (default: 30 days). |
| drift-detector.js | node ~/system/tools/drift-detector.js alert-test |
Test alert pipeline (HiveMind + Slack). |
| daemon-health.sh | bash ~/system/daemons/daemon-health.sh |
Daemon health monitor with Slack alerts — monitors ALL com.john.* LaunchAgents, sends alerts to #alerts channel for failures/warnings/recoveries, runs every 15 min via LaunchAgent. Created: 2026-02-23. |
| daemon-health.sh | bash ~/system/daemons/daemon-health.sh --status |
Show current daemon status (KeepAlive vs interval-based) |
| daemon-health.sh | bash ~/system/daemons/daemon-health.sh --test |
Test Slack alert integration |
| stbs-health.js | node ~/system/tools/stbs-health.js |
STBS v3 production monitoring — 5 hardening components (SQLite BUSY retry, heartbeat, optimistic lock, approval tokens, session staleness). MC #1724. |
| stbs-health.js | node ~/system/tools/stbs-health.js --json |
JSON output (for ops-watchdog integration) |
| stbs-health.js | node ~/system/tools/stbs-health.js --alert |
Alert mode (exit 1 if any threshold exceeded) |
| stbs-health.js | node ~/system/tools/stbs-health.js --metric <name> |
Check specific metric only |
| md-health.js | node ~/system/tools/md-health.js |
Markdown health scanner: broken links, TODOs, empty files, stale dates. Integrated in AgentForge. |
| md-health.js | node ~/system/tools/md-health.js --json |
JSON output (for programmatic use) |
| md-health.js | node ~/system/tools/md-health.js --fix-todos |
List all TODOs across codebase |
| md-health.js | node ~/system/tools/md-health.js ~/path |
Scan specific path |
| doc-index.sh | bash ~/system/tools/doc-index.sh [--output file.json] [--verbose] |
Document indexer — scans ~/projects, ~/ALAI, ~/companies for all markdown files. Creates JSON index with metadata (path, category, size, modified). Output: ~/system/databases/doc-index.json |
| doc-index.sh | bash ~/system/tools/doc-index.sh --verbose |
Verbose mode — shows progress and breakdown by category |
| bookstack-sync.js | node ~/system/tools/bookstack-sync.js sync |
Sync system docs to BookStack wiki (full sync) |
| bookstack-sync.js | node ~/system/tools/bookstack-sync.js status |
Show what needs syncing (new/changed/ok) |
| bookstack-sync.js | node ~/system/tools/bookstack-sync.js push |
Force overwrite all pages |
| bookstack-sync.js | node ~/system/tools/bookstack-sync.js auto-sync |
Auto-sync changed files (daemon mode) |
BookStack Sync v2 Features (2026-02-18):
- Glob expansion: Sources can use
"glob": "~/.claude/skills/*/SKILL.md"patterns - Chapter support: Books can have
"chapters"array with grouped sources - Metadata headers: Auto-prepends source path to synced pages
- Stale page cleanup: Detects deleted source files, removes BookStack pages
- New books: Skills Catalog (113 skills), Hooks Reference (24 hooks), Agent Catalog (35 agents)
| bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js | Scan all pages, tag stale ones, generate report |
| bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js --dry-run | Scan and report only (no tagging) |
| bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js --slack | Post report to Slack #general |
| bookstack-webhook-relay.js | Service running on localhost:3077/webhook (internal only) | Receives BookStack webhook events and forwards to Slack |
Backup & Data Protection
| Tool | Command | Description |
|---|---|---|
| db-backup.sh | bash ~/system/daemons/db-backup.sh |
Safe daily backup of all SQLite databases using sqlite3 .backup. 30-day retention. Daily at 03:00 via LaunchAgent. |
| db-backup-verify.sh | bash ~/system/tools/db-backup-verify.sh |
Verify backup integrity for today's backups. Checks file size and runs PRAGMA integrity_check on all backups. |
Backup Strategy:
- Location: ~/system/backups/databases/
- Format: Individual .db files (not compressed) for granular restore
- Naming: {db-name}-{YYYY-MM-DD}.db
- Integrity: Each backup verified with PRAGMA integrity_check after creation
- Retention: Automatic cleanup of backups older than 30 days
- Logging: ~/system/logs/db-backup.log
- Daemon: com.john.db-backup (LaunchAgent) runs at 03:00 daily
- Databases: 33 SQLite DBs (flywheel, mission-control, knowledge, hivemind, leads, etc.)
BookStack Auto-Sync:
- Daemon: com.john.bookstack-sync (LaunchAgent, runs every 5 min)
- Rate limiting: Max 10 API calls per run
- Lock file: /tmp/bookstack-sync.lock (prevents concurrent runs)
- Last sync tracking: ~/system/services/bookstack/.last-sync
- Logging: ~/system/logs/bookstack-sync.log
- Map: ~/system/config/bookstack-sync-map.json
- State: ~/system/config/bookstack-sync-state.json
- API: https://docs.alai.no (local fallback: http://localhost:6875, via vault.js)
- Created: 2026-02-17 — Auto-syncs ~/system/ docs to BookStack on file changes
BookStack Staleness Monitor:
- Daemon: com.john.bookstack-staleness (LaunchAgent, Sunday 22:00)
- Thresholds: Current (<30d), Needs Review (30-90d), Outdated (>90d)
- Tagging: Applies "staleness" tag to stale pages via API
- Reporting: Weekly Slack report to #general
- Logging: ~/system/logs/bookstack-staleness-launchd.log
- Created: 2026-02-17 — Task #1272 BookStack Activation
BookStack Webhook Relay:
- Daemon: com.john.bookstack-webhook-relay (LaunchAgent, auto-start)
- Port: localhost:3077/webhook (internal relay, not user-facing)
- Function: Receives BookStack webhook POST → formats message → posts to Slack #all-alai
- Events: page_create, page_update, page_delete, chapter/book/shelf events
- Logging: ~/system/logs/bookstack-webhook.log
- Setup: Configure webhook in BookStack UI → Settings → Webhooks → Add webhook with endpoint localhost:3077/webhook
- Created: 2026-02-17 — Task #1272 BookStack Activation
API Utilities
| Tool | Command | Description |
|---|---|---|
| api-fallback.js | require('./api-fallback') |
Tiered API fallback + caching. fetchWithFallback(key, tiers, opts) tries each tier, caches result. |
| api-fallback.js | node ~/system/tools/api-fallback.js cache-stats |
Show cache stats |
| api-fallback.js | node ~/system/tools/api-fallback.js cache-clear |
Clear API cache |
Cache: ~/system/cache/api-fallback/ (file-based, per-key, TTL-aware)
Usage Tracking
| Tool | Command | Description |
|---|---|---|
| usage-tracker.js | node ~/system/tools/usage-tracker.js log <agent> <model> <in> <out> |
Log AI call usage (auto-hooked in agent-runner.js + council-briefing.js) |
| usage-tracker.js | node ~/system/tools/usage-tracker.js stats |
Usage summary (today, month, all-time) |
| usage-tracker.js | node ~/system/tools/usage-tracker.js stats --agent <name> |
Per-agent breakdown |
| usage-tracker.js | node ~/system/tools/usage-tracker.js stats --month |
Daily breakdown this month |
| usage-tracker.js | node ~/system/tools/usage-tracker.js top |
Top agents by cost |
| usage-tracker.js | node ~/system/tools/usage-tracker.js recent [limit] |
Recent calls |
DB: ~/system/db/usage.db (SQLite). Auto-logged from agent-runner.js (Ollama) and council-briefing.js.
Session Tracking
| Tool | Command | Description |
|---|---|---|
| session-ledger.sh | Auto (Stop/PreCompact hook) | Deterministic session extraction (files, commands, topics, errors, git) |
| session-search.sh | bash ~/system/tools/session-search.sh topic|file|task|keyword|errors|recent |
Search sessions |
| daily-consolidate.sh | bash ~/system/tools/daily-consolidate.sh [YYYY-MM-DD] |
Consolidate day's sessions into daily log |
| weekly-digest.sh | bash ~/system/tools/weekly-digest.sh [YYYY-MM-DD] |
Generate weekly summary |
Session files: ~/system/memory/sessions/YYYY-MM-DD-HHMM-sessionid.md
Memory
| Tool | Command | Description |
|---|---|---|
| hivemind.js | node ~/system/agents/hivemind/hivemind.js read [agent] [limit] |
Read shared intelligence (replaces memory-lookup.js) |
| hivemind.js | node ~/system/agents/hivemind/hivemind.js post <agent> <type> <msg> |
Post intel |
| hivemind.js | node ~/system/agents/hivemind/hivemind.js query <search> |
Search intel |
| hivemind.js | node ~/system/agents/hivemind/hivemind.js memo save|get|search|list |
Key-value memory store |
| facts.js | node ~/system/tools/facts.js save|get|list|correct|history|display|search|seed |
Long-running critical facts — SQLite event-sourced memory that survives context compression. Boot-injected. |
| facts.js display | node ~/system/tools/facts.js display |
Compact boot output of all critical facts |
| facts.js seed | node ~/system/tools/facts.js seed [--force] |
Populate/reset initial seed data |
| memory-indexer.py | python ~/system/tools/memory-indexer.py |
Index memory for search |
Communication
| Tool | Command | Description |
|---|---|---|
| slack.js | node ~/system/tools/slack.js send <channel> "msg" |
Send plain text message to Slack channel |
| slack.js | node ~/system/tools/slack.js sendBlocks <channel> <blocksFile> [fallback] |
Send Block Kit formatted message (blocks from JSON file) |
| slack.js | node ~/system/tools/slack.js read <channel> [limit] |
Read recent messages from channel |
| slack.js | node ~/system/tools/slack.js channels |
List all Slack channels |
| slack.js | node ~/system/tools/slack.js create-channel <name> |
Create new channel |
| slack.js | node ~/system/tools/slack.js unread |
Check unread messages |
| slack.js | node ~/system/tools/slack.js users |
List workspace users |
| slack.js | node ~/system/tools/slack.js status |
Check Slack connection |
| slack-blocks.js | node ~/system/tools/slack-blocks.js test [channel] |
Slack Block Kit formatting library — test command sends sample to channel |
| slack-blocks.js | require('./slack-blocks') |
Module API: builder(), tenderAlert(), tenderDigest(), emailBriefing(), emailEscalation(), weeklyPipeline(), pipelineEvent(), opsAlert(), send() |
| slack-bot.js | node ~/system/tools/slack-bot.js |
Slack bot daemon — Claude Haiku via CLI (Socket Mode). AI backend: API → CLI → Ollama |
| slack-bot.js | node ~/system/tools/slack-bot.js --test |
Test AI backend connection |
| email-to-task.js | node ~/system/tools/email-to-task.js --from "x" --subject "y" --message-id "z" --class ACTION [--priority high] |
Auto-create MC tasks from ACTION emails with deduplication |
| email-to-task.js | node ~/system/tools/email-to-task.js --status |
Show email classification stats |
| email-inbox.js | node ~/system/tools/email-inbox.js status |
SQLite-backed email inbox — per-account stats (john, info, alai) |
| email-inbox.js | node ~/system/tools/email-inbox.js pending |
List unanswered ACTION emails |
| email-inbox.js | node ~/system/tools/email-inbox.js search "keyword" |
Full-text search in subject/from/sender name |
| email-inbox.js | node ~/system/tools/email-inbox.js mark <id> responded|archived|read|ignored |
Update email status |
| email-inbox.js | node ~/system/tools/email-inbox.js stale [hours] |
Show emails unanswered > N hours (default 48) |
| email-inbox.js | node ~/system/tools/email-inbox.js insert --message-id "x" --account john --from-addr "x" --subject "x" --classification ACTION --priority high |
Insert email into inbox DB |
| MCP email | mcp__email__emails_find | Search emails (sender, subject, date, folder). Account: "john" or "info" |
| MCP email | mcp__email__email_send | Send emails (to, subject, body, HTML, attachments) |
| MCP email | mcp__email__email_respond | Reply/forward with proper threading |
| MCP email | mcp__email__emails_modify | Mark read/unread, flag, archive, move |
| MCP email | mcp__email__folders_list | List all email folders |
| mail-native.js | node ~/system/tools/mail-native.js search\|read\|send\|reply\|forward\|folders\|unread\|flag\|move\|attachment\|test | Direct IMAP/SMTP CLI — zero MCP dependency. Works from daemons, agents, interactive. Supports --folder and --account params. |
| email-audit.js | node ~/system/tools/email-audit.js find\|stats\|recent | Centralized audit logger for ALL email operations. DB: email-audit.db. Module API: logEmail(), findEmails(), stats(), recent(). |
EMAIL PRAVILO: SVE email operacije koriste MCP email tools (custom: email-mcp-bridge.js).
- Dva accounta: john@alai.no (account="john"), info@alai.no (account="info")
- Server:
~/system/tools/email-mcp-bridge.js(ImapFlow + Nodemailer, wraps our proven stack) - Konfigurisano u ~/.claude/mcp.json mcpServers.email
- Credentials: Vaultwarden (vault.js) — vault items "Email - john@alai.no", "Email - info@alai.no"
- CLI fallback:
~/system/tools/mail-native.js(za daemons i background agente koji nemaju MCP) - Audit trail: Svaki poslan email se logira u
~/system/databases/email-audit.dbviaemail-audit.js
Slack: alai-talk.slack.com (channels: ops, development, client-support, exec)
Credential Management (Vaultwarden)
| Tool | Command | Description |
|---|---|---|
| vault.js | node ~/system/tools/vault.js get <name> |
Get password from Vaultwarden by item name |
| vault.js | node ~/system/tools/vault.js get <name> --field <field> |
Get specific field (custom field, username, notes) |
| vault.js | node ~/system/tools/vault.js get <name> --json |
Get full item as JSON |
| vault.js | node ~/system/tools/vault.js add <name> <user> <pass> [opts] |
Create new vault item (--uri, --notes, --field k=v, --hidden-field k=v) |
| vault.js | node ~/system/tools/vault.js list |
List all vault items |
| vault.js | node ~/system/tools/vault.js login |
Interactive unlock + cache session (no TTL, /tmp/bw-session) |
| vault.js | node ~/system/tools/vault.js migrate |
Migrate 10 config files to vault (one-time) |
| vault.js | node ~/system/tools/vault.js sync |
Force sync with Vaultwarden server (clears cache) |
| vault.js | node ~/system/tools/vault.js refresh |
Force reload in-memory credential cache |
| password-share.js | node ~/system/tools/password-share.js create|retrieve|list|cleanup|audit |
Secure one-time password sharing with clients |
| client-vault.js | node ~/system/tools/client-vault.js init|add|list|get|rotate|check-rotation |
Per-client encrypted credential storage |
Vault Module API (for other tools):
const vault = require('~/system/tools/vault.js');
const pass = await vault.get('Email - john@alai.no');
const token = await vault.get('Slack Bot', 'token');
const val = await vault.getWithFallback('Slack Bot', 'token', () => jsonFallback());
vault.hasSession(); // boolean, non-throwing
Session: BW_SESSION env → /tmp/bw-session (0600, no TTL). Session key via env var (NOT in ps aux). Cache: First call loads all items (~600ms), subsequent <1ms. Refreshes on sync/add/refresh(). Non-TTY: Daemons get VAULT_LOCKED error (no hang). Graceful retry pattern. Vault items: AWS Console, Microsoft Azure, Vaultwarden Admin, Sentry + 10 migrated services. Note: vault-helper.js DELETED — all consumers now use vault.js directly.
Agent Infrastructure
| Tool | Command | Description |
|---|---|---|
| agent-reporter.js | node ~/system/tools/agent-reporter.js --task <id> --agent <name> --status <status> --summary <text> |
Structured agent output — validates against schema, stores in mission-control.db, emits events, posts to HiveMind |
| agent-reporter.js | node ~/system/tools/agent-reporter.js --help |
Show usage and examples |
| agent-reporter.js | node ~/system/tools/agent-reporter.js --task 937 --agent B1 --status completed --summary "..." --deliverables '[...]' |
Full structured report with deliverables, metrics, evidence |
| schema-validator.py | PostToolUse hook on TaskUpdate | Validates agent output JSON against agent-output-schema.json, logs violations to /tmp/schema-violations.log (warning-only, never blocks) |
| goal-verifier.js | node ~/system/tools/goal-verifier.js --task <id> |
Automated goal verification — reads goal-schema.json, runs verification commands, updates statuses, stores in goals.db, emits events |
| goal-verifier.js | node ~/system/tools/goal-verifier.js --help |
Show usage, goal types, and operators |
| goal-verifier.js | node ~/system/tools/goal-verifier.js --task 937 --verbose |
Run verification with detailed output per goal |
| goal-verifier.js | node ~/system/tools/goal-verifier.js --task 937 --dry-run |
Preview what would be verified without running commands |
| agent-worker.js | node ~/system/tools/agent-worker.js |
Local-model-first agent worker — polls MC, executes via Ollama tool agent, queues complex tasks for human |
| agent-worker.js | node ~/system/tools/agent-worker.js --once |
Run single cycle then exit |
| agent-worker.js | node ~/system/tools/agent-worker.js --dry-run |
Show next task without executing |
| agent-worker.js | node ~/system/tools/agent-worker.js --status |
Show worker status, queue stats |
| agent-worker.js | node ~/system/tools/agent-worker.js --stop |
Stop daemon gracefully |
| human-queue.js | node ~/system/tools/human-queue.js list |
Show all tasks queued for human review |
| human-queue.js | node ~/system/tools/human-queue.js claim <id> |
Claim task (remove from queue, resume in MC) |
| human-queue.js | node ~/system/tools/human-queue.js stats |
Queue statistics (by priority, reason, age) |
| human-queue.js | node ~/system/tools/human-queue.js clear |
Clear entire human queue |
| human-queue.js | node ~/system/tools/human-queue.js notify |
Send Slack summary if queue > 0 |
Agent Output Schema: ~/system/specs/agent-output-schema.json (JSON Schema draft-07)
DB Table: mission-control.db.agent_reports (task_id, agent, status, summary, report_json)
Event: agent.report emitted to event bus on report submission
Created: 2026-02-15 (MC #937 Phase 1)
Goal Schema: ~/system/specs/goal-schema.json (JSON Schema draft-07)
DB: ~/system/databases/goals.db (goals, goal_history tables)
Verification: verification-gate.py enforces goal verification for H/M priority tasks (if goal-schema.json present)
Events: goal.verified, goal.failed emitted to event bus
Created: 2026-02-15 (MC #937 Phase 4)
Subagents (~/.claude/agents/)
| Agent | Role | Description |
|---|---|---|
| builder.md | Build | Implements ONE task using GOTCHA, self-validates, reports via agent-reporter.js or TaskUpdate |
| validator.md | Verify | Read-only GOTCHA compliance check + acceptance criteria, reports via agent-reporter.js |
Local AI (Ollama on Mac Studio M3 Ultra)
2 Tools — Executor + Orchestrator
| Tool | Command | Description |
|---|---|---|
| agent-runner.js | node ~/system/tools/agent-runner.js <agent> --task "X" |
Executor — sends ONE task to Ollama with agent identity + state |
| agent-runner.js | node ~/system/tools/agent-runner.js list |
List all agents with status |
| agent-scheduler.js | node ~/system/kernel/agent-scheduler.js spawn <agent> <task> |
Orchestrator — forks agent-runner.js as child processes for parallel execution |
| team-coordinator.js | node ~/system/kernel/team-coordinator.js assign|execute|status|message|sync |
Team Orchestrator — multi-team coordination (Backend/Frontend/DevOps/QA) with cross-team messaging |
Relationship: agent-scheduler.js spawns agent-runner.js. Runner = single agent. Scheduler = multi-agent. team-coordinator.js uses scheduler for team execution.
What agents do: Generate text responses via Ollama. They don't execute anything.
State: ~/system/agents/state/*.json (persists between runs)
Identities: ~/system/agents/identities/*.md (15 agents)
| offline-mode.js | node ~/system/tools/offline-mode.js status | Offline Mode — check Ollama readiness for Claude fallback |
| offline-mode.js | node ~/system/tools/offline-mode.js run "task" | Route task to best local model (auto-detects type) |
| offline-mode.js | node ~/system/tools/offline-mode.js run "task" --agent dev | Use specific agent identity |
| offline-mode.js | node ~/system/tools/offline-mode.js run "task" --text-only | Text-only mode (no tool execution) |
| offline-mode.js | node ~/system/tools/offline-mode.js queue | Show outputs waiting for Claude review |
| offline-mode.js | node ~/system/tools/offline-mode.js capabilities | What local models can/can't do |
| offline-mode.js | node ~/system/tools/offline-mode.js batch tasks.txt | Run tasks from file (one per line) |
| offline-mode.js | node ~/system/tools/offline-mode.js enable\|disable | Toggle offline mode on/off |
| offline-mode.js | node ~/system/tools/offline-mode.js whitelist | Show safe read-only commands allowed offline |
| offline-mode.js | node ~/system/tools/offline-mode.js check "command" | Check if command is whitelisted for offline use |
Offline Mode: When Claude API hits usage limits, switch to local Ollama models. Auto-routes tasks to best model (qwen-coder for code, 70b for reasoning, 8b for trivial). All outputs saved to ~/system/offline-queue/ with NEEDS_REVIEW status. Claude reviews when back online. Capability matrix built in — knows what local models can/can't do. Created 2026-02-12.
Ollama Background Workers (~/system/tools/ollama-workers/)
| Tool | Command | Description |
|---|---|---|
| run-all.sh | bash ~/system/tools/ollama-workers/run-all.sh |
Run all background workers (embedding-backfill, session-summarizer, knowledge-scorer) |
| run-all.sh | bash ~/system/tools/ollama-workers/run-all.sh --dry-run |
Preview all workers, no writes |
| run-all.sh | bash ~/system/tools/ollama-workers/run-all.sh --status |
Check Ollama + Qdrant health |
| knowledge-scorer.js | node ~/system/tools/ollama-workers/knowledge-scorer.js run [--limit N] [--offset ID] [--dry-run] |
Score and tag Qdrant 'knowledge' entries: quality_score (1-5) + category via llama3.1:8b. Skips already-scored. Default limit 500/run. |
| embedding-backfill.js | node ~/system/tools/ollama-workers/embedding-backfill.js run [--db knowledge|hivemind|flywheel|all] [--limit N] [--dry-run] |
Find rows with NULL embeddings across knowledge.db/hivemind.db/flywheel.db, batch-embed via Ollama bge-m3 (batches of 32), write BLOB back to SQLite, upsert to Qdrant. |
Workers: Idempotent (skip already-processed). Safe to run repeatedly. Use --dry-run to preview. Logs to ~/system/logs/ollama-workers/.
Tier Routing (CC Rate Limit Optimization)
| Tool | Command | Description |
|---|---|---|
| ollama-engine.js | require('./ollama-engine') |
Centralized Ollama API — generate(), classify(), healthCheck(). Consolidates duplicated Ollama HTTP code from 5+ files. |
| ollama-engine.js | node ~/system/tools/ollama-engine.js test |
Run health check + generate test |
| tier-router.js | require('./tier-router') |
Central AI Router — classify(caller, task) → {tier, engine, model}. Routes tasks to Ollama (local) or human-queue. NO CC/API. |
| tier-router.js | node ~/system/tools/tier-router.js test |
Run routing tests |
| tier-router.js | node ~/system/tools/tier-router.js classify <caller> <task> |
Test classification for caller+task |
| tier-router.js | node ~/system/tools/tier-router.js stats |
Show routing stats (ollama vs human-queue) |
| ollama-tool-agent.js | node ~/system/tools/ollama-tool-agent.js --task "X" --model Y |
Ollama + Tools — multi-turn agent with read-only tools (read_file, glob, grep, list_dir, run_cmd). Replaces CC for explore/validate tasks. |
| ollama-tool-agent.js | node ~/system/tools/ollama-tool-agent.js --task "X" --verbose |
Verbose mode (show tool calls) |
Tier Routing Architecture:
- Tier 1 (Ollama 8b): classify, filter, extract, triage
- Tier 2 (Ollama 72b): summarize, draft, analyze, research, review
- Tier 2c (Ollama coder:32b): code review, debug, simple fix
- Tier 3 (CC Sonnet): multi-file coding, architecture
- Tier 4 (CC Opus): interactive sessions only
- Config:
~/system/config/tier-routing.json(caller→tier mapping, keywords, fallback) - Integration: agent-worker.js routes tasks through tier-router before execution
- Fallback: Ollama failure → auto-escalate to CC
- Created: 2026-02-16
Models
| Model | Size | Use For |
|---|---|---|
| qwen2.5-coder:32b | 19GB | Coding, debugging, refactoring |
| llama3.1:70b | 40GB | Research, writing, analysis |
| llama3.1:8b | 5GB | Fast validation, simple queries |
Routing & Decision
| Tool | Command | Description |
|---|---|---|
| route.js | node ~/system/tools/route.js project <name> |
Lookup project (internal/external) |
| route.js | node ~/system/tools/route.js query "<request>" |
Match request to company by routes |
| route.js | node ~/system/tools/route.js list |
List all projects and companies |
| route.js | node ~/system/tools/route.js add <name> <type> |
Add project to registry |
| decision.js | node ~/system/tools/decision.js log <key> <decision> [--by alem] [--tags X] [--task ID] [--rationale "..."] [--evidence "path"] [--supersedes ID] |
Decision audit log — queryable decision trail with rationale, evidence, and supersede chains. Stores in mission-control.db decisions table. |
| decision.js | node ~/system/tools/decision.js list [--tags X] [--since DATE] [--by alem] [--limit N] |
List all decisions (optionally filtered by tags, date, or author) |
| decision.js | node ~/system/tools/decision.js query "<term>" |
Full-text search across key+decision+rationale |
| decision.js | node ~/system/tools/decision.js show <id> |
Show single decision with history chain and supersede references |
| decision.js | node ~/system/tools/decision.js history <key> |
All decisions for a specific key (newest first), shows decision evolution |
| decision.js | node ~/system/tools/decision.js latest [--limit 10] |
Most recent decisions (default 10) — used in boot display for Alem |
| decision.js | node ~/system/tools/decision.js stats |
Decision statistics: count by tag, by decided_by, by month |
Database: ~/system/databases/mission-control.db (decisions table)
Registry: ~/system/databases/projects.json
Event Bus
| Tool | Command | Description |
|---|---|---|
| event-bus.js | node ~/system/tools/event-bus.js emit <type> <json> [--publisher X] |
SQLite event bus — async emit/subscribe/dispatch. Decouples tools from point-to-point execSync. |
| event-bus.js | node ~/system/tools/event-bus.js list [--type X] [--status X] [--limit N] |
List events (supports * wildcard for type) |
| event-bus.js | node ~/system/tools/event-bus.js show <id> |
Show event details with payload |
| event-bus.js | node ~/system/tools/event-bus.js replay <id> |
Re-process a failed/completed event |
| event-bus.js | node ~/system/tools/event-bus.js dead-letter list|resolve|replay |
Dead letter queue management |
| event-bus.js | node ~/system/tools/event-bus.js stats |
Event bus statistics (counts, last 24h by type) |
| event-bus.js | node ~/system/tools/event-bus.js subscriptions list|register|seed |
Manage handler subscriptions |
| event-bus.js | node ~/system/tools/event-bus.js dispatch [--once] [--interval N] |
Start dispatch loop (default 2s) |
| event-handlers.js | require('./event-handlers.js') |
All subscriber handlers — task, lead, invoice, draft, email, job events |
| durable-runner.js | node ~/system/tools/durable-runner.js start <name> --steps '["s1","s2"]' [--mc-task <id>] |
Durable workflow execution engine with SQLite persistence. Checkpoint/resume capability. Emits events via outbox table. |
| durable-runner.js | node ~/system/tools/durable-runner.js status|resume|rollback <workflow-id> |
Workflow status, resume from checkpoint, or rollback to step N |
| durable-runner.js | node ~/system/tools/durable-runner.js step-complete <id> <step> [--output '{}'] |
Mark step complete with output/files/commits |
| durable-runner.js (module) | const { DurableRunner } = require('./durable-runner') |
Module API: createWorkflow(), completeStep(), failStep(), resume(), rollback() |
| chain-runner.js | node ~/system/tools/chain-runner.js run <chain> "<input>" [--mc-task <id>] [--durable] |
YAML-defined agent chain orchestrator. DAG-ordered steps, Saga rollback, $INPUT/$ORIGINAL substitution, injection sanitization. |
| chain-runner.js | node ~/system/tools/chain-runner.js list |
List all available chains from ~/system/agents/chains/*.yaml |
| chain-runner.js | node ~/system/tools/chain-runner.js show <chain> |
Show chain definition with steps, deps, timeouts |
| chain-runner.js | node ~/system/tools/chain-runner.js resume <workflow-id> |
Resume a durable chain workflow from checkpoint |
| chain-runner.js (module) | const { ChainRunner } = require('./chain-runner') |
Module API: loadChain(), run(), listChains(), showChain(), resolveAgent() |
Event Bus Architecture (Transactional Outbox Pattern):
- Domain tools (mc.js, sales-pipeline.js, invoice-generator.js, drafts.js, durable-runner.js) write events to outbox table in their own domain DB — same transaction as domain data. Atomic: if domain write succeeds, event is guaranteed.
- Daemon tools (email-agent.js, job-hunter-agent.js) use direct
bus.emit()— no domain DB, fire-and-forget. - Two daemon pipeline:
- outbox-processor.js (2s poll): reads outbox tables from durable-runner.db + mission-control.db → emits to event-bus → marks processed. Purges old events (7d+).
- event-dispatcher.js (2s poll): relays outbox from legacy domain DBs (leads, invoices, drafts, tenders) → dispatches all events.db events to handlers.
- Handlers in event-handlers.js process events (Slack, HiveMind, Planka, leads, MC tasks, etc.)
- Retry: 3 attempts with backoff (0s → 30s → 2min) → dead letter queue → Slack alert
- DB:
~/system/databases/events.db(central store, separate from domain DBs) - Outbox tables: durable-runner.db, mission-control.db, leads.db, invoices.db, drafts.db, tenders.db
- Daemons: com.john.outbox-processor (durable-runner + MC), com.john.event-dispatcher (legacy DBs + dispatch)
- Event types: task., lead., invoice., draft., workflow., step., email., job., tender., intake., proposal., follow_up., contract.*
- Integrated tools: durable-runner.js, mc.js, sales-pipeline.js, invoice-generator.js, drafts.js (outbox), email-agent.js, job-hunter-agent.js (direct emit)
GOTCHA Core
| Tool | Command | Description |
|---|---|---|
| utils.js | require('~/system/lib/utils') |
Shared utility library (log, file, path, time, validate) |
| sales-pipeline.js | node ~/system/tools/sales-pipeline.js add|list|show|advance|stats|forecast|auto-actions |
Lead CRM — tracks leads from prospect to won/lost. Auto-actions: archive old leads (lost >30d), escalate stale proposals (>14d no activity) |
| outbound.js | node ~/system/tools/outbound.js start|list|stats |
Cold outreach prospecting — 3-email sequence (Day 1 intro, Day 3 follow-up, Day 7 final). Creates lead (cold_email), drafts intro email (LOW risk), schedules Day 3+7 reminders. Tags leads with outbound-seq. |
| email-to-contact.js | node ~/system/tools/email-to-contact.js backfill |
Auto-populate contacts.db from email classifications. Creates contacts, logs interactions, skips spam/own. |
| email-to-contact.js | node ~/system/tools/email-to-contact.js stats |
CRM import statistics (auto-imported vs manual, interactions) |
| contacts.js | node ~/system/tools/contacts.js add|list|show|search|update|log|tag|stats |
Central contact database — all partners, clients, brokers, vendors |
| contacts.js | node ~/system/tools/contacts.js export-n8n |
Export n8n-monitored emails for Known Contact workflow |
| contacts.js | node ~/system/tools/contacts.js import-leads |
Import contacts from leads.db |
| unified-crm.js | node ~/system/tools/unified-crm.js pipeline|client|search|dashboard |
READ-ONLY integration layer across 5 databases (contacts, leads, invoices, tickets, MC tasks) |
| contract-manager.js | node ~/system/tools/contract-manager.js add|list|show|renew|terminate|renewal-check|status |
Contract lifecycle management — tracks contract status (draft→sent→signed→active→expired→terminated), auto-renewal alerts, MC task creation, Slack notifications. DB: contracts.db. Types: NDA, DPA, contract, SLA, MSA. |
| contract-manager.js | node ~/system/tools/contract-manager.js renewal-check [--dry-run] |
Check for contracts expiring within 30 days, create MC renewal tasks (auto-renew only), send Slack alerts to #ops |
| document-store.js | node ~/system/tools/document-store.js store <client> <type> <file> |
Document storage & retention system — organizes business documents with retention policies. Standard path: ~/ALAI/clients/{client}/documents/{type}/. Types: contract (10y), nda (5y), invoice (5y), proposal (2y), dpa (10y), agreement (10y), signed (10y). DB: documents.db |
| document-store.js | node ~/system/tools/document-store.js list [client] [--type TYPE] |
List documents with optional filters |
| document-store.js | node ~/system/tools/document-store.js find <search> |
Search documents by client/filename/notes |
| document-store.js | node ~/system/tools/document-store.js retention-check |
Flag documents past retention period (non-destructive) |
| document-store.js | node ~/system/tools/document-store.js stats |
Storage statistics by type and client |
| send-signing-email.js | node ~/system/tools/send-signing-email.js send|send-single|test|check |
ALAI branded document signing — creates DocuSeal submission + sends ALAI branded email with embedded logo via SMTP. Standard for all contracts/NDAs/DPAs. Always test first with test command. |
| nda-generator.js | node ~/system/tools/nda-generator.js create <email> --name "Name" --company "Company" |
NDA PDF generator + DocuSeal signing flow — generates ALAI-branded NDA PDF via Puppeteer, uploads to DocuSeal, creates submission, sends ALAI branded signing emails. Flags: --preview (local PDF only), --test (send to post@alai.no), --orgnr, --address, --phone, --project. |
| fiken.js | node ~/system/tools/fiken.js status|companies|invoices|contacts|balances|dashboard |
Fiken API v2 integration — invoices list/show/sync, contacts list/show/sync, bank balances, CEO dashboard data. Syncs to invoices.db + contacts.db. |
| invoice-generator.js | node ~/system/tools/invoice-generator.js create|list|show|pay|pdf|send|remind|check-overdue|auto-remind|dashboard|stats |
Invoice CRUD with VAT, PDF/HTML generation, MCP email draft creation, auto-reminders (3 levels: friendly/firm/urgent), automatic escalation system (Day 7/14/30+) |
| invoice-generator.js | node ~/system/tools/invoice-generator.js auto-remind [--dry-run] |
Automatic invoice reminder escalation — Day 7: friendly (LOW risk draft), Day 14: firm (LOW risk draft + Slack), Day 30+: HIGH MC task + URGENT Slack. Norwegian templates. |
| support-ticket.js | node ~/system/tools/support-ticket.js create|list|show|update|assign|comment|stats |
Support ticket system with SLA tracking (P1-P4) |
| email-to-ticket.js | node ~/system/tools/email-to-ticket.js --sender "email" --subject "subject" --body "body" --uid uid |
Email → ticket bridge — detects support emails, creates tickets, generates ACK drafts, Slack + HiveMind notifications |
| ticket-sla-checker.js | node ~/system/tools/ticket-sla-checker.js |
SLA breach detector — monitors open tickets, escalates to Slack #ops, generates escalation drafts, HiveMind logs |
| ticket-resolve-notify.js | node ~/system/tools/ticket-resolve-notify.js --ticket-id TKT-12345 |
Resolution notifier — generates client resolution email draft, HiveMind log |
| team-coordinator.js | node ~/system/tools/team-coordinator.js teams|assign|handoff|block|unblock|sync|status |
Cross-team orchestration |
| onboard-client.js | node ~/system/tools/onboard-client.js new|status|list|timeline|undo |
One-command client onboarding — orchestrates project scaffold, sales pipeline, support, teams, routing, welcome email, pipeline events, HiveMind |
| expansion-dashboard.js | node ~/system/tools/expansion-dashboard.js [--compact] |
Aggregate view: companies, pipeline, invoices, support, teams |
| proposal-gen.js | node ~/system/tools/proposal-gen.js create|edit|pdf|send|list|show|approve|reject |
Professional proposal generator — auto-populates from leads, generates PDF, sends via SMTP (3 templates: standard, landing-page, webapp) |
| pipeline-events.js | node ~/system/tools/pipeline-events.js check-reminders |
Stage transition event handlers — auto-triggered by sales-pipeline.js on advance/lose, generates drafts (→ drafts.db), creates reminders (~/system/reminders/), logs to HiveMind, sends Slack notifications. Handlers: onQualified, onProposal, onNegotiating, onWon, onActive, onLost |
| follow-up.js | node ~/system/tools/follow-up.js check [--auto] |
Follow-up reminder processor — scans ~/system/reminders/ for due reminders, generates language-aware follow-up drafts (NO/EN/BS), 3 escalation levels (day 3/7/14), Slack alert on day 14 |
| follow-up.js | node ~/system/tools/follow-up.js list |
List all pending follow-up reminders with due dates and escalation levels |
| follow-up.js | node ~/system/tools/follow-up.js add <lead_id> <type> <days> |
Manually create follow-up reminder (types: proposal, inquiry) |
| drafts.js | node ~/system/tools/drafts.js list|show|approve|reject|send|stats |
Draft approval workflow — 3-level risk classification (low/medium/high), content-based pattern matching, smart auto-approval |
| drafts.js | node ~/system/tools/drafts.js process-auto [--dry-run] |
Auto-classify and process all pending drafts (LOW→approve+send, MEDIUM→approve+Slack+send, HIGH→manual) |
| drafts.js | node ~/system/tools/drafts.js auto-approve [--type type1,type2] |
Auto-approve low-risk drafts (optional type filter) |
| drafts.js | node ~/system/tools/drafts.js mark-sent <id> [--message-id mid] |
Mark draft as sent (updates linked invoice status) |
| drafts.js | node ~/system/tools/drafts.js import |
Import JSON drafts from ~/system/drafts/ |
| intake-analyzer.js | node ~/system/tools/intake-analyzer.js detect-lang "text" |
Language detection (NO/EN/BS) via character markers + word frequency |
| intake-analyzer.js | node ~/system/tools/intake-analyzer.js analyze "text" |
Request analysis via Ollama — extracts category/scope/urgency, generates 3 pricing options from Vizu pricing.md |
| intake-analyzer.js (module) | const { detectLanguage, analyzeInquiry, generateOptions } = require('./intake-analyzer') |
Module API for client intake pipeline |
intake-analyzer.js: Language detector (æøå→NO, ćčšžđ→BS, word frequency lists) + request analyzer (Ollama llama3.1:8b JSON extraction) + option generator (reads ~/ALAI/pipeline/Vizu/finance/pricing.md, maps category→packages, generates A/B/C options). Heuristic fallback when Ollama unavailable. Pure Node.js, no dependencies. Created: 2026-02-13 (MC #840).
follow-up.js: Automated follow-up reminder system. Proposal reminders: day 3 (gentle), day 7 (nudge), day 14 (final + Slack). General inquiry: day 5. Language-aware templates (NO/EN/BS) extracted from lead intake analysis. Idempotent processing (marks reminders as processed). Legacy reminder migration: infers missing escalation_level and lang fields from due date and lead notes. Wired into gotcha-health.sh (runs every 15 min). Reminder format: JSON files in ~/system/reminders/ with fields: id, lead_id, type, due_date, escalation_level, created_at, processed, lang. Created: 2026-02-13 (MC #840).
Image Generation
| Tool | Command | Description |
|---|---|---|
| image-gen.js | node ~/system/tools/image-gen.js --prompt "desc" --output path.png |
Generate image via Gemini (free) or Together.ai |
| image-gen.js | node ~/system/tools/image-gen.js --setup gemini YOUR_KEY |
Save API key to config |
| image-gen.js | node ~/system/tools/image-gen.js --prompt "desc" --count 4 |
Generate multiple images |
Providers: Gemini (default, free, no CC), Together.ai (FLUX, free tier)
Keys: ~/system/config/image-gen.json or env vars GEMINI_API_KEY, TOGETHER_API_KEY
Get key: https://aistudio.google.com/apikey (2 min, no credit card)
| brand-compositor.js | node ~/system/tools/brand-compositor.js all | Deterministic brand asset generator — resize/composite REAL logo (profile-pic.png) onto social banners, profiles, favicons. No AI generation. |
| brand-compositor.js | node ~/system/tools/brand-compositor.js profile\|avatar\|banner-linkedin\|banner-twitter\|og-image\|favicon | Generate specific asset type |
| design-engine.js | node ~/system/tools/design-engine.js render <template> --data '{}' --output path.png | Puppeteer-based HTML/CSS template rendering engine — pixel-perfect typography with Inter font, retina quality |
| design-engine.js | node ~/system/tools/design-engine.js list | List available templates |
Brand Compositor: Uses sharp (npm) for deterministic resize + composite. Same pixels every time. Source: ~/system/context/branding/alai/social/profile-pic.png. Output: ~/system/context/branding/alai/social/. Options: --source <file>, --output <dir>.
Design Engine: Uses Puppeteer (headless Chrome) to render HTML templates with professional typography (kerning, ligatures, OpenType). Templates: linkedin-banner (1584x396), twitter-banner (1500x500), og-image (1200x630), profile-card (400x400), favicon (180x180). Uses {{mustache}} placeholders. Reuses browser for batch rendering. Module export: require('./design-engine'). Options: --data '{"key":"value"}', --output path.png, --scale 2.
Created: 2026-02-10
Intel & News Aggregation
| Tool | Command | Description |
|---|---|---|
| intel-briefing.js | node ~/system/tools/intel-briefing.js |
Full daily briefing — fetch RSS + HN, summarize via Ollama, deliver to Slack #exec + HiveMind |
| intel-briefing.js | node ~/system/tools/intel-briefing.js --preview |
Preview briefing in terminal |
| intel-briefing.js | node ~/system/tools/intel-briefing.js --fetch |
Fetch only — list items without summarization |
| intel-briefing.js | node ~/system/tools/intel-briefing.js --hours 48 |
Custom lookback period (default: 24h) |
Sources (7): Anthropic News, Anthropic Engineering, Claude Code Changelog, OpenAI News, TechCrunch AI, Simon Willison, Hacker News API Summarization: Ollama llama3.1:8b (local, $0 cost) Delivery: Slack #exec channel + HiveMind + ~/system/logs/intel-briefing-{date}.md Daemon: com.edita.intel-briefing (daily 7:00 AM) MCP RSS: @missionsquad/mcp-rss added to Edita MCP config for live RSS queries Created: 2026-02-11
Tender Hunting & Public Procurement
| Tool | Command | Description |
|---|---|---|
| tender-hunter-agent.js | node ~/system/daemons/tender-hunter-agent.js |
Doffin (Norway) — TED API scanner for Norwegian IT tenders. Analyzes via Ollama, scores company fit (ALAI), stores in tenders.db. NO Puppeteer, NO Finn.no, NO TheHub. |
| tender-hunter-agent.js | node ~/system/daemons/tender-hunter-agent.js --briefing |
Generate briefing from tenders.db (HOT/WARM summary) |
| tender-hunter-agent.js | node ~/system/daemons/tender-hunter-agent.js --dry-run --verbose |
Test mode with detailed logging |
| bih-tender-hunter.js | node ~/system/daemons/bih-tender-hunter.js |
BiH Tender Hunter — TED API (primary) + ejn.gov.ba (secondary) scanner for BiH IT tenders. Analyzes via Ollama, scores company fit (SnowIT), stores in bih-tenders.db. |
| bih-tender-hunter.js | node ~/system/daemons/bih-tender-hunter.js --briefing |
Generate briefing from bih-tenders.db |
| bih-tender-hunter.js | node ~/system/daemons/bih-tender-hunter.js --pages 5 |
Custom page count (default: 3) |
| bih-tender-hunter.js | node ~/system/daemons/bih-tender-hunter.js --source ted|ejn |
Filter by data source (default: all) |
| bih-tender-hunter.js | node ~/system/daemons/bih-tender-hunter.js --help |
Show usage and options |
Doffin Agent:
- Data Source: TED API (buyer-country = "NOR")
- Keywords: Norwegian + English IT terms
- Scoring: 0-100 (75+ HOT, 55-74 WARM, <55 COLD) — remote, English, tech stack match, framework, team size bonuses; security clearance, on-site, Norwegian-only penalties
- DB: ~/system/databases/tenders.db (tenders + outbox tables)
- Events: tender.hot, tender.warm → event bus
- Delivery: Slack #exec
- Daemon: com.john.tender-hunter (30 min interval)
- Created: 2026-02-15
BiH Agent:
- Data Sources: Tier 1 (TED API buyer-country = "BIH"), Tier 2 (ejn.gov.ba — needs Puppeteer scraper)
- Keywords: Bosnian + English IT terms (digitalizacija, e-usluge, softver, etc.)
- Scoring: 0-100 (75+ HOT, 55-74 WARM, <55 COLD) — BiH-specific bonuses: digitalizacija (+15), transport/railway sector (+10), BAM currency (+10)
- DB: ~/system/databases/bih-tenders.db (tenders + outbox tables with source field: 'ted' or 'ejn')
- Events: tender.hot, tender.warm → event bus
- Delivery: Email reports (primary) + Slack #exec (fallback)
- Daemons: com.snowit.bih-tender-hunter (30 min), com.snowit.bih-tender-briefing (daily 07:30)
- Created: 2026-02-16 (MC #1057)
Reporting & Analytics
| Tool | Command | Description |
|---|---|---|
| auto-report.js | node ~/system/tools/auto-report.js daily |
Daily brief — revenue, pipeline, tasks, decisions, alerts. Generates email draft in ~/system/drafts/ |
| auto-report.js | node ~/system/tools/auto-report.js weekly |
Weekly report — revenue summary, pipeline progress, team performance, achievements. Email draft with ALAI branding |
| auto-report.js | node ~/system/tools/auto-report.js preview |
Preview report in terminal without generating draft |
| client-status-update.js | node ~/system/tools/client-status-update.js generate [--dry-run] |
Weekly client status updates — queries MC for completed tasks per project, matches to client contacts, generates ALAI-branded HTML email drafts (MEDIUM risk). LaunchAgent: Mondays 08:00. |
| client-status-update.js | node ~/system/tools/client-status-update.js list |
Show recently generated status update drafts |
Auto-Report Features:
- Aggregates data from: invoice-generator, sales-pipeline, mc.js, support-ticket, decisions doc
- ALAI brand styling (dark #09090b, accent #00E5A0)
- Mobile-friendly HTML emails
- Text + HTML versions in JSON draft
- Daemon config: ~/system/daemons/auto-report-config.json
- Recipient: alembasic@gmail.com
- Schedule: Daily 7:00 AM, Weekly Monday 8:00 AM
Dashboards
| Dashboard | URL | Description |
|---|---|---|
| Mission Control | https://mc.alai.no | Task management, sessions, active work |
| CEO Dashboard | https://mc.alai.no/ceo | Executive metrics — revenue, pipeline, projects, decisions, alerts |
| Client Portal | https://mc.alai.no/client?token=XXX | Client-facing project status — tasks, tickets, SLA. Token-authenticated. |
CEO Dashboard Features:
- Revenue Overview: MRR, outstanding invoices, 3-month trend, next due date
- Pipeline Funnel: Visual funnel from prospect to won (data from sales-pipeline.js)
- Active Projects: Kanban board (active/pending/stalled) from MC tasks
- Decisions Pending: GO/NO-GO decisions from ~/system/specs/alem-decisions-2026-02.md
- Alerts Panel: Overdue invoices, SLA breaches, stale tasks (>7 days)
- Upcoming Timeline: Next 14 days deadlines from MC tasks
- Dark theme (ALAI brand: #09090b background, #00E5A0 accent)
- Auto-refresh: 60 seconds
- Mobile responsive
Client Portal Features:
- Token auth:
POST /api/client/tokens(local network only) to generate tokens - Summary: active tasks, completed count, open tickets, blocked items
- Task list: filtered by client project, shows priority/status
- Ticket list: from tickets.db, shows SLA compliance
- ALAI dark theme, auto-refresh 60s, mobile responsive
- Token management: create/list/revoke via local API
Testing & Verification
| Tool | Command | Description |
|---|---|---|
| smoke-test.js | node ~/system/tools/smoke-test.js |
Run all smoke tests (Docker, Slack, daemons, MC, HiveMind) |
| smoke-test.js | node ~/system/tools/smoke-test.js report |
Run all + post report to Slack #ops |
| smoke-test.js | node ~/system/tools/smoke-test.js slack|docker|daemons|mc|hivemind |
Test specific suite |
| smoke-test.js | node ~/system/tools/smoke-test.js api <url> |
Test specific API endpoint |
| health-check.js | node ~/system/tools/health-check.js |
Monitor all services (Docker, HTTP, system, daemons) with human/JSON output |
| health-check.js | node ~/system/tools/health-check.js --quick |
HTTP endpoints only (fast check) |
| health-check.js | node ~/system/tools/health-check.js --json |
JSON output for programmatic use |
| daemon-health.js | node ~/system/tools/daemon-health.js |
Daemon heartbeat monitor — checks all com.john.* LaunchAgents, reports PID/exit/status, detects unloaded plists |
| daemon-health.js | node ~/system/tools/daemon-health.js --quick |
Quick status only |
| daemon-health.js | node ~/system/tools/daemon-health.js --json |
JSON output for dashboards |
| auto-fix.js | node ~/system/tools/auto-fix.js <service> <issue> |
Automated service recovery (restart loop prevention: max 3/hour) |
| ops-watchdog.js | node ~/system/daemons/ops-watchdog.js |
Master watchdog daemon — health checks every 120s, auto-recovery via auto-fix.js, Slack alerts, event bus integration. Config: ~/system/config/ops-watchdog.json |
| cold-start.sh | bash ~/system/ops/cold-start.sh |
Bring entire system up from fresh boot — 5-layer startup (infra→docker→core→business→workers→enrichment), pre-flight checks, verification |
| planka-sync.js | node ~/system/tools/planka-sync.js test|status|sync <mc-id> |
MC↔Planka bidirectional sync — auto-moves cards on mc.js start/done/pause/resume |
| preflight-check.js | node ~/system/tools/preflight-check.js --task <id> |
Pre-closure quality gate aggregator — checks GOTCHA, HOP Build, evidence, CoVe, validator, HiveMind, syntax before mc.js done |
| MCP playwright | mcp__playwright__* (nativni Claude toolovi) |
Browser automation — navigate, click, fill, screenshot |
Reports: ~/system/reports/smoke-test-*.json
Protocol: Smoke test BEFORE + AFTER infra changes. Playwright for UI. npm test for code.
Deploy Quality Gate
| Tool | Command | Description |
|---|---|---|
| qa-19.js | node ~/system/tools/qa-19.js check <task-id> |
PRIMARY quality gate (ZAKON #14). 19-point check in 5 phases. Adapts per task type. |
| qa-19.js | node ~/system/tools/qa-19.js list |
Show all 19 checks |
| quality-gate.js | DELETED 2026-02-26 | Superseded by qa-19.js. Do not use. |
Checks (19): RAG queried, GOTCHA written, tools checked, context read, build passes, tests pass, no secrets, no debug artifacts, error handling, performance, output matches spec, evidence captured, destination verified, visual check, backup taken, self-review, validator review, quality gate, CEO acceptance.
Rule: ZAKON #14 — Run qa-19.js check <task-id> before mc.js done. Minimum 15/19 (M priority) or 17/19 (H priority).
Anti-Hallucination & Drift Detection
| Tool | Command | Description |
|---|---|---|
| cove.js | node ~/system/tools/cove.js verify --task-id <id> --claims-file <path> |
Chain-of-Verification — deterministically re-verify session claims using claim-types.json spec. Reads JSONL, executes file/syntax/server/build checks, writes cove-report.json |
| cove.js | node ~/system/tools/cove.js report --task-id <id> |
Display CoVe verification report for a task |
| vcr.js | node ~/system/tools/vcr.js record --session-id <id> --tool <name> --input <json> --output <text> --duration <ms> |
Record a tool interaction to vcr.db (used by vcr-recorder.py hook) |
| vcr.js | node ~/system/tools/vcr.js replay <session-id> |
Replay recorded session — re-executes deterministic tools (Read/Glob/Grep), compares output hashes, flags regressions |
| vcr.js | node ~/system/tools/vcr.js list [--days 7] |
List recorded VCR sessions |
| vcr.js | node ~/system/tools/vcr.js compare <session1> <session2> |
Diff two sessions — detect behavioral changes between recordings |
| drift-detector.js | node ~/system/tools/drift-detector.js snapshot |
Collect today's behavioral metrics from all data sources (claims, email-audit, MC, HiveMind, verification audits) |
| drift-detector.js | node ~/system/tools/drift-detector.js analyze |
Analyze recent trends — anomaly detection via rolling 7-day mean ± 2σ |
| drift-detector.js | node ~/system/tools/drift-detector.js report [--days 30] |
Human-readable drift report with ASCII table |
VCR activation: touch /tmp/vcr-recording to start, rm /tmp/vcr-recording to stop. Hook: vcr-recorder.py (PostToolUse, advisory).
Drift daemon: com.john.drift-detector runs daily at 23:55 (snapshot + analyze). Alerts: HiveMind (always) + Slack #john-alerts (MEDIUM+).
Rule: ~/system/rules/determinism-spectrum.md — maps all 44 system components to 5-level determinism scale.
Test Quality
| Tool | Command | Description |
|---|---|---|
| test-auditor.js | node ~/system/tools/test-auditor.js <project-dir> |
Scan test suite for weak validation — detects "no crash" without rejection, missing stupid-user inputs, unused chaos strings |
| test-auditor.js | node ~/system/tools/test-auditor.js <dir> --json |
JSON output for pipeline integration |
Detects: (1) Chaos tests with "no crash" but no rejection assertion, (2) Form fields missing stupid-user inputs (numbers in names, letters in phones), (3) CHAOS_STRINGS defined but unused. Exit: 0=clean, 1=findings.
Rule: ~/system/rules/testing.md (Mandatory Input Rejection Tests section)
Plan Enforcement
| Tool | Command | Description |
|---|---|---|
| plan-advance-step.js | node ~/system/tools/plan-advance-step.js |
Manually advance to next plan step with gate checks (for builder agents) |
| plan-adherence-report.js | node ~/system/tools/plan-adherence-report.js <task-id> |
Post-execution adherence report — did agent follow the plan? Shows step execution, violations, summary |
Plan Enforcement Architecture:
- Hook:
~/.claude/hooks/plan-enforcer.py(PreToolUse) gates Write/Edit/Bash based on current plan step - Plan files:
/tmp/plan-{task-id}.json(machine-readable plan),/tmp/plan-state-{task-id}.json(execution state) - Audit log:
/tmp/plan-audit-{task-id}.jsonl(every hook decision logged) - Graceful degradation: If no plan file exists, hook warns but allows (not all tasks have plans)
- Manual step advance: Builder calls plan-advance-step.js when ready to move forward
- Validator check: Validator runs plan-adherence-report.js to verify compliance
- Created: 2026-02-13 (MC #845)
Build Mode
| Tool | Command | Description |
|---|---|---|
| build-mode.js | node ~/system/tools/build-mode.js start <dir> [--task N] [--concurrency N] [--yolo] |
Activate build mode — bypass process hooks for project dir |
| build-mode.js | node ~/system/tools/build-mode.js stop [--status completed|failed] |
Deactivate build mode |
| build-mode.js | node ~/system/tools/build-mode.js status |
Show current build mode state |
| build-mode.js | node ~/system/tools/build-mode.js pause|resume |
Pause/resume build mode |
| build-mode.js | node ~/system/tools/build-mode.js sessions [--limit N] |
List build sessions |
| build-mode.js | node ~/system/tools/build-mode.js autocoder [--project-dir <dir>] [--yolo] |
Launch AutoCoder agent |
| build-mode.js | node ~/system/tools/build-mode.js update-features <total> <passing> |
Update feature progress |
Build Mode: Switches from Operations→Build mode. Bypasses GOTCHA checklist, delegation enforcer, agent protocol, verification gate for files WITHIN project dir. Security hooks (forbidden paths, hallucination, bash security) remain active. 8h TTL auto-expire. DB: build_sessions table in mission-control.db. Flag: /tmp/build-mode-active.json. Hook: ~/.claude/hooks/build_mode.py (shared module).
AutoCoder: ~/system/services/autocoder/ — autonomous coding agent (Python, Claude Agent SDK). Initializer creates features in SQLite, Coding Agent implements them. Supports parallel mode (--concurrency) and YOLO mode (skip browser tests).
Skill: /build <dir> — activates build mode via skill.
Build Pipeline
| Tool | Command | Description |
|---|---|---|
| build-project.js | node ~/system/tools/build-project.js prep "Name" "type" "Description" |
Scaffold + CLAUDE.md + onboard + spec + task |
| build-project.js | node ~/system/tools/build-project.js deploy "Name" |
Vercel deploy |
| build-project.js | node ~/system/tools/build-project.js status "Name" |
Check project state |
| assert-log.sh | source ~/system/tools/assert-log.sh |
Structured assertion library for deterministic verification (Phase 1) |
| gate-pre-claim.sh | bash ~/system/tools/gate-pre-claim.sh --spec spec.json --workdir /path |
Pre-claim verification gate — file exists, hash changed, forbidden patterns (Phase 2) |
| gate-pre-claim.sh | bash ~/system/tools/gate-pre-claim.sh --snapshot --workdir /path |
Snapshot file hashes before build |
| gate-pre-deploy.sh | bash ~/system/tools/gate-pre-deploy.sh --project-dir /path |
Pre-deploy verification gate — tests, build, artifacts, TODO check (Phase 4) |
| pipeline-controller.js | node ~/system/tools/pipeline-controller.js create\|status\|advance\|gate\|gate-pass\|abort\|resume\|history\|list\|dashboard | Central pipeline orchestrator — tracks projects through 13 lifecycle phases (lead→support), automated gate checks, phase history, abort/resume. DB: pipeline.db |
| pipeline-watchdog.js | node ~/system/tools/pipeline-watchdog.js scan\|status [--auto-resume] [--notify] | Detects stalled pipelines (2h threshold), orphan Claude team tasks (1h), stale MC tasks. Marks stalled, auto-resumes, Slack alerts (2h cooldown). Skips aborted. |
| docuseal-webhook.js | node ~/system/tools/docuseal-webhook.js start [--port 3033] | Standalone DocuSeal webhook server — emits contract.signed events to event-bus. Port 3033. MC #1039 |
| docuseal-register-webhook.js | node ~/system/tools/docuseal-register-webhook.js register\|list\|delete [--url URL] | DocuSeal webhook registration helper — register/list/delete webhooks via API. Requires vault session. MC #1756 |
| test-docuseal-webhook.sh | bash ~/system/tools/test-docuseal-webhook.sh | Test DocuSeal webhook endpoint with mock payloads. MC #1756 |
| rollback.js | node ~/system/tools/rollback.js tag\|list\|rollback\|status <project> | Git tag-based deployment rollback — tag deploys, list history, one-command rollback. Projects in ~/projects/. |
| post-mortem.js | node ~/system/tools/post-mortem.js generate\|create\|list\|show | Incident post-mortem management — generate from ticket, create blank, list/show. Template: ~/system/template/post-mortem.md. Output: ~/system/reports/post-mortems/ |
Types: landing-page | nextjs-app | api-backend
Templates: ~/system/template/types/<type>/CLAUDE.md + spec.md
CI/CD: ~/system/template/github-actions/ci.yml (copied by scaffold.sh), ~/system/template/docker-compose.staging.yml
Deploy: --platform vercel|railway|fly (auto-detects from type if omitted)
Pipeline Gates: Part of Zero-Hallucination Deterministic Build Pipeline
Client Interaction & Design Review
| Tool | Command | Description |
|---|---|---|
| preview-share.js | node ~/system/tools/preview-share.js start|stop|status|list |
Client preview sharing — starts local dev server + Cloudflare tunnel for public URL. Auto-detects build output dirs. |
| design-approval.js | node ~/system/tools/design-approval.js create|list|approve|reject|show|stats |
Design review workflow — tracks design approval from draft→sent→reviewing→approved/rejected→implemented. DB: design-reviews.db |
| design-board.js | node ~/system/tools/design-board.js create|list|stop|restart |
Client-facing design review board — ALAI-branded web page with design options, feedback form, approve/reject. Cloudflare tunnel (http2 protocol) for public URL. Health check endpoint. Integrates with design-reviews.db. |
| client-signoff.js | node ~/system/tools/client-signoff.js create <project> <email> --type uat|delivery [--project-type webapp] [--message "X"] |
UAT + delivery approval workflow. Sends email with approval link, client approves/rejects via web UI (https://mc.alai.no/signoff/{token}), pipeline auto-advances. Commands: create, status, approve, reject, checklist, check, list. DB: design-reviews.db |
UAT Template: ~/system/template/uat-checklist.md (per project type: webapp, landing-page, api-backend)
DB: ~/system/databases/design-reviews.db (reviews + signoffs tables)
File Editing
| Tool | Command | Description |
|---|---|---|
| smart-edit.js | node ~/system/tools/smart-edit.js view <file> [start-end] |
Show file lines with line numbers |
| smart-edit.js | node ~/system/tools/smart-edit.js replace <file> <start-end> <content> |
Replace line range with new content |
| smart-edit.js | node ~/system/tools/smart-edit.js insert <file> <after> <content> |
Insert content after line number |
| smart-edit.js | node ~/system/tools/smart-edit.js delete <file> <start-end> |
Delete line range |
| smart-edit.js | node ~/system/tools/smart-edit.js append <file> <content> |
Append content to end of file |
Why: Line-number based editing is more reliable than str_replace (exact match failures). Inspired by The Harness Problem. Reduces edit fail rate from ~15-20% to ~5%.
Backup: Auto-creates .bak before each edit. Use --no-backup to skip.
Stdin: Use - as content arg to pipe content via stdin (for multi-line edits).
Lines: 1-indexed, inclusive ranges (10-15 = lines 10 through 15).
Workflow: view to see lines → replace/insert/delete by line number.
Daemons (LaunchAgents)
| Daemon | Interval | Description |
|---|---|---|
| com.john.slack-bot | always | Slack bot — Claude Haiku via Socket Mode. AI: API → CLI → Ollama. Needs SLACK_BOT_TOKEN + SLACK_APP_TOKEN |
| com.john.mc-dashboard | always | Mission Control web dashboard (port 3030) — includes CEO Dashboard at /ceo, DocuSeal webhook at /webhooks/docuseal (auto-advances pipeline on NDA/contract signing) |
| com.john.mc-session-worker | on session events | Session state extraction |
| com.john.pipeline-watcher | 60 sec | Pipeline event dispatcher + invoice auto-reminder daemon — checks unsigned proposals, triggers invoice escalation (Day 7/14/30+ reminders) |
| com.john.event-dispatcher | always | Event bus dispatcher daemon — polls events.db every 2s, routes to handlers, retry with backoff, dead letter queue |
| com.john.outbox-processor | always | Outbox processor daemon — polls durable-runner.db + mission-control.db outbox tables every 2s, emits to event-bus, purges old events (7d+). MC #1760 |
| com.john.ops-watchdog | always | Master watchdog — health checks every 120s, auto-recovery, Slack alerts, event bus. Config: ~/system/config/ops-watchdog.json |
| com.john.client-status-update | Monday 08:00 | Weekly client status update generator — queries MC for completed tasks, generates ALAI-branded email drafts per project |
| com.john.network-watchdog | 60 sec | Network monitoring daemon — ping gateway, DNS resolution check, internet connectivity check. Alert chain: Slack ops → macOS notification → log. 3 consecutive failures trigger alert with 10min cooldown. Tracks uptime stats. |
| com.john.vault-keeper | always | Vault auto-unlock daemon — auto-unlocks Vaultwarden using macOS Keychain password, session refresh every 15min, circuit breaker, macOS notifications |
Ops Documentation: ~/system/ops/ — service catalog, dependency map, 15 runbooks, cold-start script, ops README.
Ops Dashboard: https://mc.alai.no/ops (status page), /api/ops/health (JSON), /api/ops/history (events)
Env Vars (both profiles):
enableToolSearch=true— lazy-load MCP toolsCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=true— agent teamsDISABLE_AUTOUPDATER=1— prevent auto-update breaking custom setupCLAUDE_CODE_DISABLE_AUTO_COMPACT=true— manual compaction control
Boards (Planka — Kanban)
| Tool | URL | Description |
|---|---|---|
| Planka | https://boards.alai.no | Kanban boards per project (Trello-like) |
| Planka local | http://localhost:3100 | Direct local access (use https://boards.alai.no for sharing) |
Admin: john / BasicAS2026!
User: alem / Alem2026!
Password reset: node ~/system/tools/planka-admin.js reset-password <username> <new-pass>
Add user: node ~/system/tools/planka-admin.js add-user <email> <username> <name> <pass>
SMTP: Configured (send.one.com:465, john@alai.no) — za notifikacije
Docker: ~/system/services/planka/docker-compose.yml
Projects: Wizard NUF, Ren Drom, Riad Basic, Drop Fintech, ALAI Internal, BasicAS Operations
Hosting: Azure Container Apps (boards.alai.no via Cloudflare DNS)
Setup & Backup
| Tool | Command | Description |
|---|---|---|
| syslog.sh | bash ~/system/tools/syslog.sh add "opis" |
System Changelog — logira promjene za oba agenta |
| syslog.sh | bash ~/system/tools/syslog.sh today |
Današnje changelog entries |
| syslog.sh | bash ~/system/tools/syslog.sh recent [N] |
Zadnjih N entries |
| setup-backup.sh | bash ~/system/tools/setup-backup.sh "opis" |
Backup setup files + changelog |
| sync-to-mini.sh | bash ~/system/tools/sync-to-mini.sh [--execute] |
Sync GOTCHA to Mac Mini |
| daemon-manager.js | node ~/system/daemons/daemon-manager.js list|start|stop|status |
Manage persistent background services |
| team-cleanup.sh | bash ~/system/tools/team-cleanup.sh [--force] [--days N] |
Clean stale Agent Teams task/team dirs (default 7d) |
Company Management
| Tool | Command | Description |
|---|---|---|
| company.sh | ~/system/tools/company.sh list|info|add |
Company registry management |
| company-worker.js | node ~/system/tools/company-worker.js run|run-all|status|list|dry-run |
Autonomous work loop generator for pipeline companies. Generates MC tasks per company (Securion/Proveo/Proxima), posts to Slack/HiveMind, emits events. Config: ~/system/tools/config/company-worker-config.json |
| skill-resolver.js | node ~/system/tools/skill-resolver.js resolve <skill-name> [--company X] |
Resolve skill path with company override. Priority: ~/companies/COMPANY/skills/SKILL/SKILL.md (if company set) → ~/.claude/skills/SKILL/SKILL.md (global fallback). Returns absolute path or exit 1. Performance: ~47ms. |
| tool-resolver.js | node ~/system/tools/tool-resolver.js check <tool-name> [--company X] |
Check if tool allowed for company via tools.json config. Modes: whitelist (financial), blacklist (dev), inherit-all (orchestrators). Pattern matching: exact + glob (invoice-*.js). Returns ALLOWED|DENIED with reason on stderr. Performance: ~49ms. |
Skills (Claude Code Slash Commands)
| Command | Description |
|---|---|
/plan-with-team |
Creates plan with builder/validator teams |
/build-plan |
Executes approved plan using TaskList |
/code-review |
Systematic GOTCHA code review (security, quality, performance) |
/debugging |
Systematic bug investigation and resolution |
/security-audit |
OWASP Top 10 + config + infra security review |
/design-system |
AI-powered design generator — multi-tool (v0.dev, Google Stitch, Figma Make, Codia AI). Prompt templates per tool. Brief → kickass design + code. |
/figma-design |
Figma WebSocket bridge operations — populate design systems, create screens programmatically |
/build |
Switch to Build Mode — bypass process hooks, launch AutoCoder, track sessions |
Workflow: /plan-with-team "task" → plan → approval → /build-plan → execution
Build: /build <project_dir> → activate build mode → code freely → stop
Design: /design-system "brief" → AI tool selection → optimized prompts → Figma + code
Review: /code-review <file> or /security-audit <target>
Debug: /debugging "<bug description>"
Vector & Semantic Search
| Tool | Command | Description |
|---|---|---|
| vector-db.js | node ~/system/tools/vector-db.js help |
Hybrid Vector DB: SQLite + vector columns for semantic search. Reusable module. |
| vector-db.js (module) | const { VectorDB } = require('./vector-db') |
Module API: createCollection(), insert(), search(), hybridSearch(), bulkInsert() |
| vector-db.js search | node ~/system/tools/vector-db.js search <db> <collection> <query> |
Semantic search via Ollama nomic-embed-text (768-dim) |
| vector-db.js hybrid | node ~/system/tools/vector-db.js hybrid <db> <col> <query> --where "cond" |
SQL filter + vector ranking combined |
| knowledge-base.js | node ~/system/tools/knowledge-base.js add <url-or-file> [--tag t] |
KB: drop URL/file → chunk → vector store. Semantic search over all docs. |
| knowledge-base.js | node ~/system/tools/knowledge-base.js search <query> [--tag t] |
Semantic search across knowledge base documents |
| humanizer.js | echo "text" | node ~/system/tools/humanizer.js [--deep] |
Remove AI patterns from text. Quick (regex) or deep (Ollama rewrite). Module: require('./humanizer') |
| hourly-backup.sh | bash ~/system/tools/hourly-backup.sh [--dry-run|--list] |
Hourly auto-commit to 'auto-backup' branch across all repos. LaunchAgent: com.john.hourly-backup. |
| db-backup.sh | bash ~/system/tools/db-backup.sh [--list|--restore] |
Daily SQLite backup (14 DBs). sqlite3 .backup, tar.gz, 30-day rotation. LaunchAgent: com.john.db-backup (03:00). |
| cron-notify.sh | bash ~/system/tools/cron-notify.sh "job" "OK|ERROR" "details" |
Post cron results to Slack #ops channel. Used by db-backup, hourly-backup. |
| memory-indexer.py | python3 ~/system/tools/memory-indexer.py index|search|stats|test-embed |
Index ~/system/ MD files into knowledge.db (SQLite + Ollama nomic-embed-text, 768-dim, tag='memory-file') |
Vector Pattern: Embeddings stored as BLOB (Float32Array) in SQLite. Cosine similarity computed in JS. Model: nomic-embed-text (768-dim, local Ollama). Batch embedding supported (32/batch). Usage tracked via usage-tracker.js. Unified model: ALL embedding tools use nomic-embed-text via Ollama — no model mismatch.
RAG & Knowledge Flywheel
| Tool | Command | Description |
|---|---|---|
| retrieval-orchestrator.js | node ~/system/tools/retrieval-orchestrator.js query "text" [--limit N] [--verbose] |
Multi-store retrieval: HiveMind + Knowledge DB + RAG Cache + Sessions → RRF merge |
| retrieval-orchestrator.js | node ~/system/tools/retrieval-orchestrator.js stats |
Store statistics (coverage, entry counts) |
| retrieval-orchestrator.js | node ~/system/tools/retrieval-orchestrator.js stores |
List available stores and status |
| session-archiver.js | node ~/system/tools/session-archiver.js stats |
Session file statistics (count, size, savings) |
| session-archiver.js | node ~/system/tools/session-archiver.js archive [--dry-run] [--days 14] |
Strip raw transcripts from old sessions |
| session-archiver.js | node ~/system/tools/session-archiver.js index [--limit N] |
Embed session summaries into knowledge DB |
| session-archiver.js | node ~/system/tools/session-archiver.js cleanup [--dry-run] |
Archive + index (LaunchAgent runs daily 03:00) |
| docuseal-monitor.js | node ~/system/tools/docuseal-monitor.js check |
Poll DocuSeal for new signings → Slack + email + HiveMind + contracts.db |
| docuseal-monitor.js | node ~/system/tools/docuseal-monitor.js status |
Show recent DocuSeal submissions with signer status |
| docuseal-monitor.js | node ~/system/tools/docuseal-monitor.js history |
All tracked signings from contracts.db |
| rag-health.js | node ~/system/tools/rag-health.js |
Full RAG health check: Ollama, Knowledge DB, HiveMind, RAG Cache, Session Archiver, Orchestrator smoke |
| rag-health.js | node ~/system/tools/rag-health.js --json |
JSON output (for ops-watchdog integration) |
| rag-health.js | node ~/system/tools/rag-health.js --alert |
Exit 1 if any critical check fails (for cron/alerting) |
| rag-health.js | node ~/system/tools/rag-health.js --smoke |
Run orchestrator smoke query only |
| lightrag.js | node ~/system/tools/lightrag.js query "question" [--mode hybrid|local|global|naive] |
LightRAG REST client — semantic query, document upload, graph exploration, RAG cache sync via configured Azure/Cloud endpoint |
| lightrag.js | node ~/system/tools/lightrag.js upload <file-or-dir> [--recursive] |
Upload documents to LightRAG knowledge graph |
| lightrag.js | node ~/system/tools/lightrag.js explore [--entity "name"] [--limit N] |
Explore knowledge graph entities and relationships |
| lightrag.js | node ~/system/tools/lightrag.js status |
Get LightRAG system status and statistics |
| lightrag.js | node ~/system/tools/lightrag.js sync-from-rag |
Import rag-router cache → LightRAG |
| lightrag.js | node ~/system/tools/lightrag.js sync-to-rag |
Export LightRAG results → rag-router cache |
| lightrag-migrate.js | node ~/system/tools/lightrag-migrate.js start [--source hivemind|knowledge|both] [--rate 2] [--limit 1000] [--tier 1] [--type type1,type2] [--tag tag] [--dry-run] |
Daemon: migrate HiveMind + Knowledge DB to LightRAG (HTTP API). Idempotent, rate-limited (default 2 docs/min), resumable with state tracking. |
| lightrag-migrate.js | node ~/system/tools/lightrag-migrate.js status |
Show migration progress (source, last_id, total_migrated, failed, rate) |
| lightrag-migrate.js | node ~/system/tools/lightrag-migrate.js stop |
Stop running migration daemon (graceful SIGTERM + kill) |
| lightrag-migrate.js | node ~/system/tools/lightrag-migrate.js reset |
Clear migration state file (/tmp/lightrag-migration-state.json) |
| rag-router.js | node ~/system/tools/rag-router.js query "text" |
RAG intelligence router — embed, cache search, local model dispatch, interaction logging |
| rag-router.js | node ~/system/tools/rag-router.js learn "question" "answer" |
Add Q&A pair to RAG cache |
| rag-router.js | node ~/system/tools/rag-router.js stats |
Flywheel metrics (cache hit rate, cost savings) |
| rag-router.js | node ~/system/tools/rag-router.js test |
Run self-test suite |
| rag-router.js | node ~/system/tools/rag-router.js capture <id> "response" |
Capture external response for interaction, auto-index to cache |
| rag-router.js (module) | const { RAGRouter } = require('./rag-router') |
Module API: query(), learn(), capture(), stats() |
| rag-mcp.js | MCP server (stdio) | RAG MCP server — exposes rag_query, rag_learn, rag_stats tools. Config: ~/.claude/mcp.json |
| MCP rag | mcp__rag__rag_query |
Route query through RAG cache + local models. Returns response or needs_external flag |
| MCP rag | mcp__rag__rag_learn |
Add Q&A pair to RAG cache with source tracking |
| MCP rag | mcp__rag__rag_stats |
Flywheel metrics (cache hit rate, cost savings, training queue) |
| flywheel-extractor.js | node ~/system/tools/flywheel-extractor.js extract [--output path] [--batch-name "X"] |
Extract external interactions from flywheel.db → JSONL for alaiML training |
| flywheel-extractor.js | node ~/system/tools/flywheel-extractor.js stats |
Show training queue size, extraction batches |
| flywheel-indexer.js | node ~/system/tools/flywheel-indexer.js index [--batch YYYYMMDD] [--dry-run] |
Sync high-quality external responses back to rag_cache (closes the loop) |
| flywheel-indexer.js | node ~/system/tools/flywheel-indexer.js stats |
Show pending/cached/total counts |
| flywheel-session-extractor.js | node ~/system/tools/flywheel-session-extractor.js extract [--dry-run] [--limit N] |
Extract Q&A pairs from Claude Code session transcripts → RAG cache |
| flywheel-session-extractor.js | node ~/system/tools/flywheel-session-extractor.js stats |
Show extraction metrics (processed/pending sessions, pairs extracted) |
| flywheel-session-extractor.js | node ~/system/tools/flywheel-session-extractor.js reprocess <session-id> |
Force re-extract a specific session |
RAG Flywheel Architecture:
- Cache: Embedding-based semantic cache (0.85 similarity threshold). Hit → instant response
- Local: Tier-router dispatch to Ollama models (tier 2: qwen2.5:72b). Hit → fast local response
- External: Falls back to Claude Code when cache miss + local unavailable
- Session Capture: Q&A pairs from session transcripts auto-extracted every 5min (daemon)
- Response Capture: External responses can be captured back via capture() → auto-index to cache
- Learning: Every interaction logged to flywheel.db. High-quality Q&A pairs added to cache
- DB: ~/system/databases/flywheel.db (interactions + rag_cache tables)
- Integration: Uses vector-db.js (embeddings) + tier-router.js (local dispatch)
- Cost Savings: Tracks queries answered locally vs externally, cumulative savings
- Created: 2026-02-21 (MC #1610)
OSINT Investigation
| Tool | Command | Description |
|---|---|---|
| investigate.js | node ~/system/tools/investigate.js investigate --phone X --name Y --email Z --location W |
OSINT person lookup — spawns 4 parallel Claude subagents (phone, social, business, news) + synthesizer. SQLite backend with confidence scoring. |
| investigate.js | node ~/system/tools/investigate.js show <id> |
Show investigation findings grouped by category |
| investigate.js | node ~/system/tools/investigate.js list |
List all investigations |
| investigate.js | node ~/system/tools/investigate.js report <id> |
Full formatted investigation report |
| investigate.js | node ~/system/tools/investigate.js save-findings <id> <source> <json> |
Save agent findings (internal — used by orchestrator) |
| investigate.js | node ~/system/tools/investigate.js complete <id> |
Mark investigation as complete |
Architecture: 4 parallel investigator agents + 1 synthesizer:
- Phone Lookup — phone directories, carrier, business listings
- Social Media — LinkedIn, Facebook, Instagram, GitHub, Twitter/X
- Business Registry — BiH registar, OpenCorporates, Brønnøysund, court records
- News & Public — klix.ba, avaz.ba, nrk.no, Google News, academic records
- Synthesizer — deduplication, cross-reference, confidence upgrade, profile building
Confidence levels: verified (2+ sources), likely (1 reliable), possible (indirect), unverified (uncertain) Phone parser: Auto-detects BiH (06x→+387) and Norwegian (4x/9x→+47) numbers DB: ~/system/databases/investigations.db Created: 2026-02-21
Databases (~/system/databases/)
| Database | Description |
|---|---|
| investigations.db | OSINT person investigations — use investigate.js |
| leads.db | Sales pipeline / Lead CRM — use sales-pipeline.js |
| invoices.db | Invoice tracking — use invoice-generator.js |
| contracts.db | Contract lifecycle management — use contract-manager.js |
| documents.db | Document storage & retention — use document-store.js |
| tickets.db | Support tickets with SLA — use support-ticket.js |
| teams.db | Cross-team coordination — use team-coordinator.js |
| strategy-tracker.db | Strategic goals |
| alem-directives.db | Alem's direct orders |
| projects.db | Project lifecycle (phases, milestones, metrics) |
| hivemind.db | Agent shared intelligence |
| facts.db | Critical facts with event-sourced history — use facts.js |
| drafts.db | Email draft approval workflow — use drafts.js |
| events.db | Event bus store — use event-bus.js |
| flywheel.db | RAG flywheel — interactions log + cache. Use rag-router.js |
| projects.json | Routing registry — use route.js |
| company-registry.json | Company information registry |
Enforcement Hooks (~/.claude/hooks/)
| Hook | Matcher | Description |
|---|---|---|
| security-guard.py | .* (all tools) |
Blocks forbidden paths, dangerous commands, delete protection, business-critical doc enforcement |
| agent-protocol-enforcer.py | Task |
CORE PROTOCOL enforcement for subagent spawning |
| gotcha-enforcer.py | Write|Edit|NotebookEdit|Bash |
Boot flag + MC active task enforcement |
| gate-pre-commit.py | Bash |
Pre-commit validation |
| hallucination-detector.py | Write|Edit |
Phantom tools, phantom paths, wrong ports, phantom require/import detection |
| teammate-quality-gate.py | TeammateIdle |
Quality gate for agent teammates — checks TODO/FIXME markers, syntax errors in recent files. Exit 2 = keep working |
Global: All hooks apply to ALL agents (parent + subagents) via ~/.claude/settings.json.
ZAKON #1: AI bez enforcement-a ne radi. Hooks su deterministički enforcement.
Design & Figma
| Tool | Command | Description |
|---|---|---|
| figma-extract.js | node ~/system/tools/figma-extract.js extract-tokens <file-key> |
Extract design tokens (colors, typography, effects) from Figma file |
| figma-extract.js | node ~/system/tools/figma-extract.js extract-components <file-key> |
List components with metadata and variants |
| figma-extract.js | node ~/system/tools/figma-extract.js frame-to-prompt <file-key> <node> |
Generate implementation prompt from Figma frame |
| figma-extract.js | node ~/system/tools/figma-extract.js file-info <file-key> |
File metadata and pages |
| figma-to-react.js | node ~/system/tools/figma-to-react.js <file-key> <node-id> --output Login.tsx |
Figma → React + Tailwind — generates production React TSX from Figma frame via REST API. Post-processing: Pass 1 token replacement (figma-token-map.json), Pass 2 component mapping (figma-component-map.json), Pass 3 icon resolution (Lucide). Flag: --no-post-process to skip. |
| figma-to-react.js | node ~/system/tools/figma-to-react.js <file-key> <node-id> --component Name |
Custom component name (default: derived from frame name) |
| figma-to-react.js | node ~/system/tools/figma-to-react.js <file-key> <node-id> |
Output to stdout (pipe to file or preview) |
| figma-validate.js | node ~/system/tools/figma-validate.js compare <file-key> <node-id> <url> --output /tmp/validate/ |
Visual validation tool — compare built page vs Figma design via pixel diff. Exit: 0=PASS 1=FAIL 2=ERROR. Enforces ZAKON 0.1 |
| figma-validate.js | node ~/system/tools/figma-validate.js compare ... --threshold 0.05 --viewport 1920x1080 |
Custom threshold (default 0.1=10%) and viewport (default 375x812) |
| figma-token-sync.js | node ~/system/tools/figma-token-sync.js <file-key> --output ./tokens/ --format all |
Figma Variables → Design Tokens — extracts Variables API → W3C DTCG JSON + Tailwind theme + CSS custom properties. Supports modes (light/dark). |
| figma-token-sync.js | node ~/system/tools/figma-token-sync.js <file-key> --format tailwind --output ./tailwind-tokens.js |
Single format: tailwind, css, w3c, json, or all |
| figma-token-map.json | ~/system/config/figma-token-map.json |
Hex color → Tailwind token lookup table for figma-to-react.js Pass 1 (token replacement). Source: Bilko tailwind.config.ts |
| figma-component-map.json | ~/system/config/figma-component-map.json |
Figma component → shadcn/ui mapping + Lucide icon map for figma-to-react.js Pass 2-3 (component mapping, icon resolution) |
| figma-populate.js | bun ~/system/tools/figma-populate.js <channel-id> |
Populate Figma with design tokens (colors, typography, spacing, radius, buttons) via WebSocket bridge |
| v0-generate.js | node ~/system/tools/v0-generate.js generate "prompt" |
v0.dev Platform API wrapper — prompt → React+Tailwind code. Also generates optimized prompts for manual use. |
| v0-generate.js | node ~/system/tools/v0-generate.js generate --brief Name --screen login --industry fintech --primary "#hex" |
Structured brief → optimized prompt |
| v0-generate.js | node ~/system/tools/v0-generate.js prompt --brief Name --industry fintech |
Output prompt only (no API call) — for copy-paste into v0.dev or Google Stitch |
| v0-generate.js | node ~/system/tools/v0-generate.js setup <api-key> |
Save v0.dev API key |
| design-to-code.js | node ~/system/tools/design-to-code.js assemble --stitch-code <html> --assets-dir <dir> --target-page <tsx> |
Assemble Stitch HTML + Figma assets → Next.js TSX. Converts HTML→JSX, inline styles→Tailwind, integrates assets, optional logic preservation. |
| design-to-code.js | node ~/system/tools/design-to-code.js assemble ... --preserve-logic |
Extract and keep business logic (useState, handlers) from existing page |
| MCP figma | mcp__figma__* (native Claude tools) |
Figma MCP integration — direct Figma access from Claude |
Config: ~/system/config/figma.json or FIGMA_TOKEN env var
v0 Config: ~/system/config/v0.json or V0_API_KEY env var
File key: From Figma URL — figma.com/design/<FILE-KEY>/...
Node ID: From Figma URL (select frame, copy link) or use figma-extract.js list-nodes <file-key>
Figma bridge: WebSocket on port 3055 (bun). Channel ID from Figma Desktop → Plugins → Claude MCP Plugin.
External AI tools: v0.dev ($20/mo), Google Stitch (free: stitch.withgoogle.com), Figma Make (native), Codia AI (Figma plugin)
Design output: ~/system/design-output/
Created: 2026-02-12 (figma-extract), 2026-02-13 (figma-populate, v0-generate, /design-system skill), 2026-02-14 (figma-to-react, figma-validate, figma-token-sync)
Browser Form Filling
| Tool | Command | Description |
|---|---|---|
| form-filler.py | python ~/system/tools/form-filler.py <url> <fields.json> |
Fill web forms from JSON config — visible browser (Alem sees), CAPTCHA pause, screenshot |
| form-filler.py | python ~/system/tools/form-filler.py <url> <fields.json> --headless --submit |
Headless auto-fill + submit |
| form-filler.py | python ~/system/tools/form-filler.py <url> <fields.json> --wait-for-captcha --submit |
Fill, pause for CAPTCHA, submit |
| form-filler.py | python ~/system/tools/form-filler.py <url> <fields.json> --screenshot /tmp/out.png |
Fill + screenshot |
| form-filler.py | python ~/system/tools/form-filler.py <url> <fields.json> --dry-run |
Print fields without browser |
Pre-built configs: ~/system/tools/form-configs/
anthropic-startup.json— Anthropic Claude Startup Program ($25K-$100K)aws-activate.json— AWS Activate Founders ($1K-$100K)google-cloud-startups.json— Google Cloud for Startups ($2K-$200K)microsoft-founders-hub.json— Microsoft Founders Hub ($1K-$150K)
JSON format: {"fields": [{"selector": "label=X", "value": "Y", "type": "text|select|checkbox|radio|date|click|file"}], "submit_selector": "button[type='submit']"}
Selectors: CSS (input[name='x']), text=, placeholder=, label=, role=, nth=N suffix
Requires: Python Playwright (pip install playwright)
Created: 2026-02-18
Archived (NE POSTOJE — samo za referencu)
| Tool | Status | Note |
|---|---|---|
| REMOVED (2026-02-07) | Orphaned code, never hooked, conflicts with session-ledger.sh | |
| REMOVED | Zamijenjeno HiveMind-om | |
| REMOVED | Zamijenjeno HiveMind-om | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran — pravi enforcement = ~/.claude/hooks/ | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| NEVER EXISTED | Haluciniran | |
| ARCHIVED (2026-02-06) | Was orphaned — see ~/system/archive/ | |
| ARCHIVED (2026-02-06) | Was checker-only — see ~/system/archive/ | |
| DEPRECATED (2026-02-11) | Community MCP server — unreliable, replaced by custom email-mcp-bridge.js | |
| TESTED (2026-02-11) | Python MCP — ClosedResourceError bug, not used |
brand-package.js
Purpose: Generate brand package (guidelines, colors, typography) for company factory pipeline
Location: ~/system/tools/brand-package.js
Usage: node ~/system/tools/brand-package.js "ProjectName" --logo /path/to/logo.png [--colors "primary:#hex,secondary:#hex"] [--output /path/]
Dependencies: None (pure Node.js)
Output: Creates brand-guidelines.md, colors.json, typography.json
Features: Extracts colors from PNG logo, supports color overrides, generates complete brand identity
Created: 2026-02-09
Go-Live Runbook
Go-Live Runbook
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
1. Go-Live Overview
What: {{PROJECT_NAME}} v{{VERSION}} production launch When: {{LAUNCH_DATE}} at {{LAUNCH_TIME}} {{TIMEZONE}} Deployment window: {{WINDOW_START}} – {{WINDOW_END}} ({{WINDOW_DURATION}}h window) Go-Live Type: {{TYPE}}
Incident Commander: {{IC}} (primary), {{IC_BACKUP}} (backup) Technical Lead: {{TECH_LEAD}} Communications Lead: {{COMMS_LEAD}} War Room: {{WAR_ROOM_LINK}} Status Page: {{STATUS_PAGE_URL}}
2. Pre-Launch Checklist
T-7 Days: Infrastructure Verification
- All production infrastructure provisioned and tested
- Load balancer health checks passing for all instances
- Auto-scaling groups configured and tested (scale-up + scale-down)
- Database replicas in sync and replication lag < {{REPLICATION_LAG}}s
- Backup jobs running successfully (last backup verified: {{VERIFY_DATE}})
- CDN configured and serving assets correctly
- All IAM roles and permissions verified
- Infrastructure monitoring dashboards showing green
- Estimated cost reviewed and within budget
Owner: {{INFRA_OWNER}} | Due: T-7 days
T-5 Days: DNS Configuration
- DNS records created/updated in {{DNS_PROVIDER}}
{{DOMAIN}}→ Load balancer (TTL set to {{LOW_TTL}} for easy rollback)api.{{DOMAIN}}→ API load balancerwww.{{DOMAIN}}→ Redirect to{{DOMAIN}}
- DNS propagation verified (check from multiple regions)
- DNS failover routing configured (if applicable)
- Old DNS records documented (for rollback reference)
Owner: {{DNS_OWNER}} | Due: T-5 days
T-5 Days: SSL Certificates
- TLS certificates provisioned for all domains
{{DOMAIN}}✅*.{{DOMAIN}}✅
- Certificate expiry > 90 days from go-live date
- HTTPS redirect configured (HTTP → HTTPS)
- HSTS header configured
- SSL Labs test: Grade A or better ({{SSL_TEST_LINK}})
Owner: {{SSL_OWNER}} | Due: T-5 days
T-3 Days: CDN Configuration
- CDN distribution pointing to production origin
- Cache behaviors configured per specification
- Static asset cache headers correct (1yr for fingerprinted assets)
- CDN WAF rules enabled and tested
- CDN purge command tested and documented
- CDN performance verified from target geographies
Owner: {{CDN_OWNER}} | Due: T-3 days
T-3 Days: Database Migration
- Final migration scripts reviewed and approved
- Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
- Rollback/down migration tested
- Migration script idempotent (safe to run twice)
- Database backup taken immediately before migration window
- Data integrity checks script prepared (
scripts/verify-migration.sh)
Owner: {{DB_OWNER}} | Due: T-3 days
T-2 Days: Feature Flags
- All new features behind feature flags
- Feature flags defaulting to OFF in production
- Flag rollout plan documented (which flags, in what order, with what criteria)
- Kill switch flags configured (disable any feature immediately if needed)
Owner: {{FF_OWNER}} | Due: T-2 days
T-2 Days: Third-Party Integrations
- {{INTEGRATION_1}} — live API keys configured in secrets manager
- {{INTEGRATION_2}} — live API keys configured in secrets manager
- Payment gateway: live mode activated and tested with real card (refunded)
- Email service: sending domain authenticated (SPF, DKIM, DMARC)
- All integrations tested in production with smoke tests
- Webhook URLs updated to production endpoints
Owner: {{INTEGRATION_OWNER}} | Due: T-2 days
T-1 Day: Monitoring & Alerting
- All alert rules deployed to production monitoring
- Alert routing configured — PagerDuty / on-call active
- Dashboards showing production data
- Log aggregation capturing production logs
- Distributed tracing enabled
- Synthetic monitoring configured (uptime checks every 1 min)
- Alert test fired and received by on-call
Owner: {{MONITORING_OWNER}} | Due: T-1 day
T-1 Day: Backup Verification
- Production backup job running on schedule
- Last backup restored to test environment and verified
- Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
- Point-in-time recovery tested
Owner: {{BACKUP_OWNER}} | Due: T-1 day
T-1 Day: Legal / Compliance Sign-off
- Privacy policy published and linked
- Terms of service published and linked
- Cookie consent banner implemented (if required by jurisdiction)
- GDPR data processing inventory updated
- Security assessment completed and any findings resolved or accepted
- Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}
Owner: {{LEGAL_OWNER}} | Due: T-1 day
T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)
- Staging smoke tests passing (last run: {{TIMESTAMP}})
- All engineers briefed and available
- War room open and all participants joined
- Rollback procedure rehearsed mentally
- Monitoring dashboards open
- Status page updated: "Scheduled maintenance: {{TIME}} - {{END_TIME}}"
- Customer support briefed on launch features and potential issues
- Deployment script / CI pipeline ready to trigger
3. Launch Day Procedure (Hour by Hour)
H-0: Deployment Start
| Time | Action | Owner | Status | Notes |
|---|---|---|---|---|
| H+0:00 | Announce in war room: "Deployment started" | {{IC}} | ||
| H+0:00 | Take final pre-deploy database backup | {{DB_OWNER}} | ||
| H+0:05 | Enable maintenance mode (if applicable) | {{DEPLOY_OWNER}} | ||
| H+0:10 | Trigger production deployment pipeline | {{DEPLOY_OWNER}} | Pipeline: {{PIPELINE_LINK}} | |
| H+0:15 | Monitor deployment progress | {{TECH_LEAD}} |
H+0:15 → H+0:45: Database Migration Execution
| Time | Action | Owner | Status |
|---|---|---|---|
| H+0:15 | Confirm deployment artifact ready | {{DEPLOY_OWNER}} | |
| H+0:20 | Run database migrations: bash scripts/migrate-prod.sh |
{{DB_OWNER}} | |
| H+0:25 | Verify migration completed: bash scripts/verify-migration.sh |
{{DB_OWNER}} | |
| H+0:30 | Confirm new application instances healthy | {{TECH_LEAD}} | |
| H+0:40 | Deploy new application version to all instances | {{DEPLOY_OWNER}} |
H+0:45 → H+1:00: DNS Cutover
| Time | Action | Owner | Status |
|---|---|---|---|
| H+0:45 | Point DNS to production load balancer | {{DNS_OWNER}} | |
| H+0:50 | Monitor DNS propagation | {{DNS_OWNER}} | |
| H+0:55 | Confirm HTTPS working from external network | {{TECH_LEAD}} | |
| H+1:00 | Disable maintenance mode | {{DEPLOY_OWNER}} |
H+1:00 → H+1:30: Smoke Tests
| Time | Action | Owner | Status |
|---|---|---|---|
| H+1:00 | Run automated smoke tests: bash scripts/smoke-tests.sh production |
{{QA_OWNER}} | |
| H+1:10 | Manual smoke test — critical user journey 1 | {{QA_OWNER}} | |
| H+1:15 | Manual smoke test — critical user journey 2 | {{QA_OWNER}} | |
| H+1:20 | Verify payment processing (test transaction) | {{QA_OWNER}} | |
| H+1:25 | Verify email delivery (test email) | {{QA_OWNER}} | |
| H+1:30 | All smoke tests PASS → proceed to monitoring | {{IC}} |
H+1:30 → H+2:00: Monitoring Verification
| Time | Action | Owner | Status |
|---|---|---|---|
| H+1:30 | Verify error rate < {{ERROR_THRESHOLD}}% | {{TECH_LEAD}} | |
| H+1:35 | Verify P99 latency < {{P99_THRESHOLD}}ms | {{TECH_LEAD}} | |
| H+1:40 | Verify no unexpected spikes in DB CPU/connections | {{DB_OWNER}} | |
| H+1:50 | Begin enabling feature flags (per rollout plan) | {{FF_OWNER}} | |
| H+2:00 | Declare go-live successful | {{IC}} |
4. Post-Launch Monitoring (T+1 to T+7)
Enhanced Monitoring Period
Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal
| Period | Check Frequency | Responsible |
|---|---|---|
| H+0 to H+4 | Every 30 min | On-call engineer |
| H+4 to H+24 | Every 60 min | On-call engineer |
| Day 2-7 | Standard monitoring | On-call rotation |
Metrics to watch during enhanced monitoring:
- Error rate (target: < {{ERROR_THRESHOLD}}%)
- P99 latency (target: < {{P99_THRESHOLD}}ms)
- DB connection pool utilization (target: < {{DB_POOL}}%)
- Cache hit rate (target: > {{CACHE_HIT}}%)
- Memory trend (should be stable, not growing)
Support Escalation Procedures
| Issue Type | First Contact | Escalation |
|---|---|---|
| User-facing errors | Customer support → Engineering | On-call engineer |
| Performance degradation | On-call engineer | Tech lead + Eng manager |
| Data issues | On-call engineer | DB owner + Engineering lead |
| Security concern | Security contact → CISO | Immediate escalation |
Performance Baseline Comparison
Compare post-launch metrics to pre-launch staging baseline:
| Metric | Staging Baseline | Production Actual | Delta | Status |
|---|---|---|---|---|
| P95 latency | {{STG_P95}}ms | TBD | TBD | TBD |
| Error rate | {{STG_ERR}}% | TBD | TBD | TBD |
| Throughput | {{STG_RPS}} rps | TBD | TBD | TBD |
5. Rollback Triggers & Procedure
Rollback Decision Criteria
Automatic rollback triggers:
- Smoke tests fail after deployment
- Error rate > {{ROLLBACK_ERROR_RATE}}% for {{ROLLBACK_DURATION}} consecutive minutes
- Database migration causes data integrity issues
Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):
- P99 latency > {{ROLLBACK_P99}}ms sustained for {{ROLLBACK_LATENCY_DURATION}} min
- Critical feature broken with no quick fix available
- Security vulnerability discovered in new release
Rollback Procedure (Quick Reference)
- Announce in war room: "Initiating rollback"
- Update status page: "We are investigating an issue and may revert recent changes"
- Run:
bash scripts/rollback.sh production(or trigger CI pipeline rollback) - Monitor health checks — confirm previous version healthy
- If DB migration included: run down migration
bash scripts/migrate-down.sh production - Verify all smoke tests pass on previous version
- Update status page: "Issue resolved, system restored"
- Notify stakeholders
Full rollback procedure: See rollback-plan.md
6. Communication Plan
Pre-Launch Communications
| Audience | Channel | When | Message |
|---|---|---|---|
| Internal team | Slack #launches | T-3 days | Launch schedule and plan |
| Customer support | Briefing doc + Slack | T-2 days | Features, FAQ, escalation path |
| Existing users | Email / in-app banner | T-1 day | "Exciting updates coming" |
| Status page subscribers | Status page | T-4 hours | Scheduled maintenance notification |
Launch Day Communications
| Audience | Channel | When | Message |
|---|---|---|---|
| Status page | status page | T-0 | "Scheduled deployment in progress" |
| Internal | Slack #launches | At success | "🚀 {{PROJECT}} is live!" |
| Users | Email / in-app | H+1 after success | Launch announcement |
| Status page | status page | H+1 | "Deployment complete — all systems normal" |
7. Stakeholder Notification Timeline
| Milestone | Notify | Channel | Owner |
|---|---|---|---|
| Deployment started | Engineering team | Slack war room | {{IC}} |
| Smoke tests pass | Engineering + Product | Slack | {{IC}} |
| Go-live declared | All stakeholders | Email + Slack | {{COMMS_LEAD}} |
| Rollback initiated | All stakeholders + Management | Immediate call + Slack | {{IC}} |
Related Documents
- Deployment Checklist
- Rollback Plan
- Operational Runbook
- Monitoring & Observability
- Disaster Recovery Plan
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
Operational Runbook
Operational Runbook
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
1. Service Overview
Service: {{PROJECT_NAME}} Purpose: {{SERVICE_PURPOSE}} Technology stack: {{STACK}} Architecture reference: Deployment Architecture
Service URLs:
| Environment | URL | Health Check |
|---|---|---|
| Production | {{PROD_URL}} |
{{PROD_URL}}/health |
| Staging | {{STG_URL}} |
{{STG_URL}}/health |
Key dashboards:
- System overview: {{DASHBOARD_LINK}}
- Service metrics: {{SERVICE_DASHBOARD_LINK}}
- Logs: {{LOG_DASHBOARD_LINK}}
2. Common Operational Tasks
2.1 Service Restart Procedure
When to use: Application unresponsive, hanging workers, suspected deadlock
Steps:
Option A — Rolling restart (no downtime):
# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --force-new-deployment
# Kubernetes
kubectl rollout restart deployment/{{DEPLOYMENT}} -n {{NAMESPACE}}
Option B — Emergency restart (brief downtime, use only if rolling restart fails):
# Stop all instances
{{STOP_COMMAND}}
# Wait for drain
sleep 30
# Start fresh
{{START_COMMAND}}
Verify:
# Check all instances healthy
{{HEALTH_CHECK_COMMAND}}
# Check for errors post-restart
{{LOG_CHECK_COMMAND}}
Expected restart time: {{RESTART_TIME}} minutes Alert expected: Service restart will trigger deployment alert — acknowledge in PagerDuty
2.2 Log Retrieval & Analysis
Centralized logs: {{LOG_URL}}
Quick log retrieval:
# Last 100 error lines
{{LOG_TOOL}} --filter "level=error" --since "1h" --service {{SERVICE}}
# Logs for a specific user
{{LOG_TOOL}} --filter "user_id={{USER_ID}}" --since "24h"
# Logs for a specific request
{{LOG_TOOL}} --filter "request_id={{REQUEST_ID}}"
# Database slow query logs
{{DB_LOG_COMMAND}}
Log format reference: See Monitoring & Observability
2.3 Database Maintenance
Connection count check:
SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;
Kill idle connections:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND state_change < now() - interval '5 minutes'
AND pid <> pg_backend_pid();
Running queries (detect long-running):
SELECT pid, duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '1 minute'
AND state != 'idle';
Vacuum / analyze (if table bloat suspected):
VACUUM ANALYZE {{TABLE_NAME}};
Check replication lag:
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
2.4 Cache Clearing / Warming
Clear all cache (use with caution — may spike DB load):
{{CACHE_FLUSH_COMMAND}}
Clear specific key pattern:
{{CACHE_DELETE_PATTERN_COMMAND}}
Check cache hit rate:
{{CACHE_STATS_COMMAND}}
Warm cache after clearing:
# Run cache warming script
bash scripts/warm-cache.sh {{ENVIRONMENT}}
# Or trigger warming job
{{WARM_CACHE_JOB_COMMAND}}
Expected DB load spike after cache clear: {{CACHE_CLEAR_IMPACT}} minutes of elevated load
2.5 Certificate Renewal
Automated renewal: Configured via {{CERT_TOOL}} (Let's Encrypt / ACM) Auto-renewal trigger: 30 days before expiry
Manual renewal (if auto-renewal fails):
# Check expiry
echo | openssl s_client -connect {{DOMAIN}}:443 2>/dev/null | openssl x509 -noout -dates
# Manual renewal
{{CERT_RENEW_COMMAND}}
# Verify
{{CERT_VERIFY_COMMAND}}
Verify renewal alert is working:
- Alert configured: "Certificate expiring in < 30 days" → {{ALERT_CHANNEL}}
- Test certificate:
curl -I https://{{DOMAIN}}and checkStrict-Transport-Securityheader
2.6 Scaling Up / Down
Scale up (increase capacity):
# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --desired-count {{COUNT}}
# Kubernetes
kubectl scale deployment/{{DEPLOYMENT}} --replicas={{COUNT}} -n {{NAMESPACE}}
Verify scale-out:
# Check instance count
{{INSTANCE_COUNT_COMMAND}}
# Confirm health
{{HEALTH_CHECK_COMMAND}}
Scale down (reduce capacity — use cautiously):
- Do NOT scale below {{MIN_INSTANCES}} instances
- Scale down during off-peak hours only ({{OFF_PEAK_HOURS}})
- Monitor for 10 minutes after scaling down to confirm stability
3. Troubleshooting Playbooks
3.1 High CPU Usage
Symptoms: CPU alert fires, slow responses, possible OOM
- Identify the source:
# Top processes by CPU {{CPU_TOP_COMMAND}} - Check for: runaway loops, large queries being processed, missing cache causing recalculation
- Check for recently deployed code — did CPU spike after a deploy? → Consider rollback
- Check queue depth — backed-up job queue causes worker CPU spike
- If single instance: restart that instance (
{{RESTART_SINGLE_COMMAND}}) - If all instances: scale up immediately, then investigate root cause
- Escalate if: CPU > {{CPU_ESCALATE}}% for > {{ESCALATE_DURATION}} min after scaling
3.2 Memory Leaks
Symptoms: Slowly increasing memory, eventual OOM kill / restart loop
- Check memory trend in monitoring dashboard — linear increase over hours = leak
- Identify the leak:
- Enable heap dump:
{{HEAP_DUMP_COMMAND}} - Profile with:
{{PROFILER}}
- Enable heap dump:
- Short-term mitigation: Schedule rolling restarts every {{RESTART_INTERVAL}}h
{{SCHEDULED_RESTART_COMMAND}} - Create ticket with heap dump attached — requires developer investigation
- Escalate if: Restart cycle < {{MIN_RESTART_INTERVAL}}h (memory fills too fast)
3.3 Slow Database Queries
Symptoms: High P99 latency, DB CPU spike, timeouts in logs
- Find slow queries:
SELECT query, calls, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 20; - Check for missing indexes: Look for sequential scans on large tables
- Check for blocking queries:
SELECT blocking.pid, blocking.query, blocked.pid, blocked.query FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid)); - Kill blocking query if safe:
SELECT pg_cancel_backend({{PID}}); -- If cancel doesn't work: SELECT pg_terminate_backend({{PID}}); - Create ticket — developer must optimize the query
3.4 Service Connectivity Issues
Symptoms: Connectivity errors between services, 502/503 errors
- Check health endpoints:
curl -I {{SERVICE_URL}}/health - Check network security groups / firewall rules — was anything changed recently?
- Check service discovery — DNS resolving correctly?
nslookup {{SERVICE_INTERNAL_DNS}} - Check if service is running:
{{SERVICE_STATUS_COMMAND}} - Check logs for connection errors:
{{CONNECTIVITY_LOG_COMMAND}}
3.5 High Error Rates
Symptoms: Error rate alert, user complaints, 5xx in logs
- Identify error type:
{{LOG_ERROR_COMMAND}}— what errors, what services, what endpoints? - Check if correlated with: recent deployment, external service outage, traffic spike
- Check external service status pages:
- {{SERVICE_1}} status: {{STATUS_PAGE_1}}
- {{SERVICE_2}} status: {{STATUS_PAGE_2}}
- If recent deployment: Consider rollback if errors affecting > {{ROLLBACK_ERROR_THRESHOLD}}% of requests
- If external service down: Check circuit breaker status, enable fallback
- Escalate if: Error rate > {{ESCALATE_ERROR_RATE}}% for > {{ESCALATE_DURATION}} min
3.6 Disk Space Issues
Symptoms: Disk space alert, application errors writing files
- Check disk usage:
df -h du -sh /var/log/* | sort -rh | head -10 - Quick wins:
# Rotate and compress logs logrotate -f /etc/logrotate.conf # Clear old Docker images docker image prune -a --filter "until=24h" # Clear /tmp find /tmp -mtime +7 -delete - If database disk: Check for table bloat, dead tuples, WAL accumulation
SELECT pg_size_pretty(pg_database_size('{{DB_NAME}}')); - Escalate if: Disk > {{DISK_ESCALATE}}% and cannot free space quickly
4. Health Check Endpoints
| Endpoint | Method | Expected Response | What It Checks |
|---|---|---|---|
{{BASE_URL}}/health |
GET | HTTP 200 {"status":"ok"} |
Application running |
{{BASE_URL}}/health/ready |
GET | HTTP 200 {"status":"ready"} |
App + DB + Cache connected |
{{BASE_URL}}/health/live |
GET | HTTP 200 {"status":"alive"} |
App process alive |
{{BASE_URL}}/health/db |
GET | HTTP 200 {"status":"ok","latency_ms":X} |
Database reachable |
{{BASE_URL}}/health/cache |
GET | HTTP 200 {"status":"ok"} |
Redis reachable |
Health check from load balancer: {{HEALTH_CHECK_PATH}} every {{LB_INTERVAL}}s
Unhealthy threshold: {{UNHEALTHY_COUNT}} consecutive failures
5. Alert Response Procedures
| Alert | Immediate Action | Runbook Section |
|---|---|---|
HighErrorRate |
Check logs, identify error type, assess scope | 3.5 High Error Rates |
SlowP99 |
Check DB slow queries, recent deploys | 3.3 Slow DB Queries |
ServiceDown |
Restart service, check logs | 2.1 Service Restart |
HighCPU |
Scale up, identify source | 3.1 High CPU |
DiskAlmostFull |
Clear logs/tmp, escalate if > 90% | 3.6 Disk Space |
DBReplicationLag |
Check replication, network, disk on replica | DB section |
CertificateExpiring |
Trigger manual renewal | 2.5 Certificate Renewal |
6. Escalation Matrix
| Situation | First Contact | Escalation | Ultimate Escalation |
|---|---|---|---|
| Service down | On-call engineer | Tech lead | Engineering manager |
| Data loss / corruption | On-call + Tech lead | CTO | CTO |
| Security incident | Security contact | CISO | CEO |
| Payment system down | On-call + Payment owner | Stripe/payment provider support | Engineering manager |
Emergency contacts:
| Role | Name | Phone | Slack |
|---|---|---|---|
| On-call (primary) | {{PRIMARY}} | {{PHONE}} | {{SLACK}} |
| On-call (backup) | {{BACKUP}} | {{PHONE}} | {{SLACK}} |
| Tech Lead | {{TECH_LEAD}} | {{PHONE}} | {{SLACK}} |
| Engineering Manager | {{ENG_MGR}} | {{PHONE}} | {{SLACK}} |
7. On-Call Handoff Procedure
Handoff cadence: {{HANDOFF_CADENCE}} Handoff time: {{HANDOFF_TIME}}
Outgoing on-call must document:
- Any open incidents or ongoing issues
- Any monitoring anomalies (elevated error rates, slow queries not yet resolved)
- Any upcoming events that may affect the system (marketing campaigns, scheduled maintenance)
- Any temporary mitigations in place that need permanent fixes
- Context on any unusual alerts that fired and were noise
Handoff document template: {{HANDOFF_TEMPLATE_LINK}}
8. Maintenance Window Procedure
Maintenance window schedule: {{MAINTENANCE_WINDOW}} (lowest traffic period)
Pre-maintenance:
- Announce in Slack #ops: "Maintenance window {{DATE}} {{TIME}}-{{END_TIME}}"
- Update status page: "Scheduled maintenance" with details
- Notify impacted customers if downtime expected > {{DOWNTIME_NOTIFY_THRESHOLD}} minutes
- Confirm rollback plan is ready
During maintenance:
- Enable maintenance mode (if applicable):
{{MAINTENANCE_MODE_CMD}} - Execute maintenance tasks per the specific runbook for the task
- Run smoke tests after each major step
- Document every action taken with timestamps
Post-maintenance:
- Disable maintenance mode:
{{DISABLE_MAINTENANCE_CMD}} - Run full smoke test suite
- Monitor for 30 minutes
- Update status page: "Maintenance complete, all systems normal"
- Post-maintenance report in #ops Slack channel
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
Incident Report
Incident Report
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
1. Incident Metadata
| Field | Value |
|---|---|
| Incident ID | INC-{{YYYY}}-{{SEQ}} |
| Severity | P{{SEVERITY}} |
| Status | {{STATUS}} |
| Incident Commander | {{IC}} |
| Technical Lead | {{TECH_LEAD}} |
| Communications Lead | {{COMMS_LEAD}} |
| Declared at | {{START_TIME}} {{TIMEZONE}} |
| Resolved at | {{END_TIME}} {{TIMEZONE}} |
| Total duration | {{DURATION}} |
| Affected service(s) | {{SERVICES}} |
| Environment | Production / Staging |
2. Executive Summary
{{EXECUTIVE_SUMMARY}}
Example: "On {{DATE}}, a database connection pool exhaustion caused the {{SERVICE}} API to return 503 errors for approximately 47 minutes, affecting {{AFFECTED_COUNT}} users and resulting in an estimated {{REVENUE_IMPACT}} in lost transactions. The root cause was a code change in the v{{VERSION}} deployment that introduced N+1 queries under high load."
3. Detection
Detected by: {{DETECTION_METHOD}} Detected at: {{DETECTION_TIME}} Lag from start to detection: {{DETECTION_LAG}} minutes Detecting system: {{DETECTING_SYSTEM}}
Alerting effectiveness:
- Alert fired within the expected window (< {{ALERT_SLA}} minutes)
- Alert delivered to on-call without delay
- Alert contained sufficient context to begin investigation
Improvements to detection identified:
- {{DETECTION_IMPROVEMENT_1}}
4. Detailed Timeline
Timezone: All times in {{TIMEZONE}}
| Time | Event | Actor | Notes |
|---|---|---|---|
| {{TIME}} | {{EVENT_1}} | {{ACTOR}} | |
| {{TIME}} | {{EVENT_2}} | System | Alert ID: {{ALERT_ID}} |
| {{TIME}} | {{EVENT_3}} | {{ENGINEER}} | |
| {{TIME}} | {{EVENT_4}} | {{IC}} | |
| {{TIME}} | {{EVENT_5}} | {{ENGINEER}} | |
| {{TIME}} | {{EVENT_6}} | {{ENGINEER}} | |
| {{TIME}} | {{EVENT_7}} | System | |
| {{TIME}} | {{EVENT_8}} | {{IC}} |
5. Impact Assessment
Users Affected
| Metric | Value |
|---|---|
| Total users affected | {{USER_COUNT}} |
| % of total user base | {{USER_PERCENT}}% |
| Geography affected | {{GEOGRAPHY}} |
| User tier affected | {{USER_TIER}} |
Services Affected
| Service | Impact Type | Severity | Duration |
|---|---|---|---|
| {{SERVICE_1}} | {{IMPACT_TYPE}} | {{SEV}} | {{DURATION}} |
| {{SERVICE_2}} | {{IMPACT_TYPE}} | {{SEV}} | {{DURATION}} |
Data Impact
| Type | Assessment |
|---|---|
| Data loss | {{DATA_LOSS}} |
| Data corruption | {{DATA_CORRUPTION}} |
| Data exposure | {{DATA_EXPOSURE}} |
| Verification method | {{VERIFICATION}} |
Financial Impact
| Category | Amount | Notes |
|---|---|---|
| Lost transactions | ${{AMOUNT}} | {{TRANSACTION_COUNT}} failed transactions |
| SLA credits | ${{AMOUNT}} | Per SLA contract |
| Operational cost | ${{AMOUNT}} | Engineering hours to resolve |
| Total estimated | ${{TOTAL}} |
SLA Breach Assessment
| SLA Metric | Target | Actual | Breach |
|---|---|---|---|
| Uptime | {{UPTIME_SLA}}% | {{ACTUAL_UPTIME}}% | {{BREACH}} |
| Response time (P99) | < {{P99_SLA}}ms | {{P99_ACTUAL}}ms | {{BREACH}} |
| MTTR | < {{MTTR_SLA}} | {{MTTR_ACTUAL}} | {{BREACH}} |
6. Root Cause Analysis
5 Whys
| Why # | Question | Answer |
|---|---|---|
| Why 1 | Why did users see errors? | {{ANSWER_1}} |
| Why 2 | Why was the API returning 503? | {{ANSWER_2}} |
| Why 3 | Why was the connection pool exhausted? | {{ANSWER_3}} |
| Why 4 | Why was the N+1 query introduced? | {{ANSWER_4}} |
| Why 5 | Why did code review miss it? | {{ANSWER_5}} |
Root cause: {{ROOT_CAUSE}}
Contributing Factors
- {{FACTOR_1}}
- {{FACTOR_2}}
- {{FACTOR_3}}
Trigger Event
What triggered this specific incident now: {{TRIGGER}}
7. Resolution Steps
| Step | Time | Action | Result |
|---|---|---|---|
| 1 | {{TIME}} | {{ACTION_1}} | {{RESULT_1}} |
| 2 | {{TIME}} | {{ACTION_2}} | {{RESULT_2}} |
| 3 | {{TIME}} | {{ACTION_3}} | {{RESULT_3}} |
Resolution commands (for runbook):
# {{RESOLUTION_DESCRIPTION}}
{{RESOLUTION_COMMAND}}
8. What Went Well
- {{WENT_WELL_1}}
- {{WENT_WELL_2}}
- {{WENT_WELL_3}}
9. What Went Wrong
- {{WENT_WRONG_1}}
- {{WENT_WRONG_2}}
- {{WENT_WRONG_3}}
10. Action Items
| # | Action | Owner | Due Date | Priority | Status |
|---|---|---|---|---|---|
| 1 | {{ACTION_1}} | {{OWNER}} | {{DUE}} | High | Open |
| 2 | {{ACTION_2}} | {{OWNER}} | {{DUE}} | High | Open |
| 3 | {{ACTION_3}} | {{OWNER}} | {{DUE}} | Medium | Open |
| 4 | {{ACTION_4}} | {{OWNER}} | {{DUE}} | High | Open |
| 5 | {{ACTION_5}} | {{OWNER}} | {{DUE}} | Low | Open |
11. Lessons Learned
- {{LESSON_1}}
- {{LESSON_2}}
- {{LESSON_3}}
12. Related Incidents
| Incident ID | Date | Similarity | Resolved |
|---|---|---|---|
| INC-{{ID}} | {{DATE}} | {{DESCRIPTION}} | Yes / No |
13. Communication Log
| Time | Channel | Message Summary | Audience | Sent By |
|---|---|---|---|---|
| {{TIME}} | Status page | "Investigating reports of elevated errors" | All users | {{SENDER}} |
| {{TIME}} | Status page | "Identified root cause, applying fix" | All users | {{SENDER}} |
| {{TIME}} | Status page | "Incident resolved, all systems normal" | All users | {{SENDER}} |
| {{TIME}} | Customer notification for SLA breach | Affected customers | {{SENDER}} |
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
Post-Mortem
Post-Mortem
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
Blameless Culture Statement
This post-mortem is conducted in a blameless spirit. Our goal is to understand how and why the incident occurred — not to assign fault to individuals. People make the best decisions they can with the information and tools available at the time. When things go wrong, we look for systemic improvements that make the right action easier and the wrong action harder for everyone.
1. Incident Reference & Metadata
| Field | Value |
|---|---|
| Incident ID | INC-{{YYYY}}-{{SEQ}} |
| Severity | P{{SEVERITY}} |
| Incident Report | INC-{{YYYY}}-{{SEQ}} |
| Post-Mortem Facilitator | {{FACILITATOR}} |
| Post-Mortem Date | {{PM_DATE}} |
| Attendees | {{ATTENDEES}} |
| Status | Draft / In Review / Final |
2. Executive Summary
{{EXECUTIVE_SUMMARY}}
Example: "A database index was dropped during a migration on {{DATE}}, causing query performance to degrade by 50× under load. This resulted in a 1h 23min degraded service period affecting {{USERS}} users. We have restored the index, added migration validation tooling, and created safeguards to prevent similar incidents."
3. Impact Summary
| Metric | Value |
|---|---|
| Total duration | {{DURATION}} (detected at {{DETECTED}}, resolved at {{RESOLVED}}) |
| Users affected | {{USER_COUNT}} ({{USER_PERCENT}}% of user base) |
| Requests affected | {{REQUEST_COUNT}} ({{REQUEST_PERCENT}}% error rate during incident) |
| Estimated revenue impact | ${{REVENUE}} |
| SLA breach | {{SLA_BREACH}} |
| SLA credits owed | ${{CREDITS}} |
4. Detailed Timeline
timeline
title Incident Timeline
{{TIME_1}} : {{EVENT_1}}
{{TIME_2}} : {{EVENT_2}}
{{TIME_3}} : {{EVENT_3}}
{{TIME_4}} : {{EVENT_4}}
{{TIME_5}} : {{EVENT_5}}
| Time | Event | MTTD/MTTR Marker |
|---|---|---|
| {{T1}} | {{EVENT}} | ← Incident start |
| {{T2}} | {{EVENT}} | |
| {{T3}} | {{EVENT}} | ← Detection (MTTD = T3 - T1) |
| {{T4}} | {{EVENT}} | |
| {{T5}} | {{EVENT}} | |
| {{T6}} | {{EVENT}} | |
| {{T7}} | {{EVENT}} | |
| {{T8}} | {{EVENT}} | ← Resolved (MTTR = T8 - T1) |
MTTD (Mean Time to Detect): {{MTTD}} minutes MTTR (Mean Time to Resolve): {{MTTR}} minutes
5. Root Cause Analysis
5.1 5 Whys Analysis
| Why # | Question | Answer |
|---|---|---|
| Why 1 | Why did users experience {{SYMPTOM}}? | {{WHY_1}} |
| Why 2 | Why did {{WHY_1_ANSWER}} happen? | {{WHY_2}} |
| Why 3 | Why did {{WHY_2_ANSWER}} happen? | {{WHY_3}} |
| Why 4 | Why did {{WHY_3_ANSWER}} happen? | {{WHY_4}} |
| Why 5 | Why did {{WHY_4_ANSWER}} happen? | {{WHY_5}} |
Root cause: {{ROOT_CAUSE}}
5.2 Contributing Factors
| Factor | Type | Action Required |
|---|---|---|
| {{FACTOR_1}} | Technical / Process / Human | Yes / No |
| {{FACTOR_2}} | Technical / Process / Human | Yes / No |
| {{FACTOR_3}} | Technical / Process / Human | Yes / No |
5.3 Trigger Event
The specific trigger for this incident: {{TRIGGER}}
6. What Went Well
- {{CATEGORY_1}}: {{DESCRIPTION}}
- {{CATEGORY_2}}: {{DESCRIPTION}}
- {{CATEGORY_3}}: {{DESCRIPTION}}
7. What Went Wrong
- {{CATEGORY_1}}: {{DESCRIPTION}}
- {{CATEGORY_2}}: {{DESCRIPTION}}
- {{CATEGORY_3}}: {{DESCRIPTION}}
8. Where We Got Lucky
- {{LUCKY_1}}
- {{LUCKY_2}}
- {{LUCKY_3}}
9. Action Items
Short-Term Fixes (This Sprint)
| # | Action | Owner | Due | Priority | Ticket |
|---|---|---|---|---|---|
| 1 | {{SHORT_TERM_1}} | {{OWNER}} | {{DATE}} | Critical | {{TICKET}} |
| 2 | {{SHORT_TERM_2}} | {{OWNER}} | {{DATE}} | High | {{TICKET}} |
| 3 | {{SHORT_TERM_3}} | {{OWNER}} | {{DATE}} | Medium | {{TICKET}} |
Long-Term Improvements (Next Quarter)
| # | Action | Owner | Due | Priority | Ticket |
|---|---|---|---|---|---|
| 1 | {{LONG_TERM_1}} | {{OWNER}} | {{DATE}} | High | {{TICKET}} |
| 2 | {{LONG_TERM_2}} | {{OWNER}} | {{DATE}} | Medium | {{TICKET}} |
Process Changes
| # | Change | Owner | Implementation Date |
|---|---|---|---|
| 1 | {{PROCESS_1}} | {{OWNER}} | {{DATE}} |
| 2 | {{PROCESS_2}} | {{OWNER}} | {{DATE}} |
10. Follow-Up Tracking
Follow-up review date: {{FOLLOWUP_DATE}} (4 weeks after incident) Follow-up owner: {{FOLLOWUP_OWNER}}
| Action Item | Expected Completion | Verified Complete | Effective |
|---|---|---|---|
| {{ACTION_1}} | {{DATE}} | Yes / No | Yes / No / TBD |
| {{ACTION_2}} | {{DATE}} |
11. Recurrence Prevention
Before this incident: {{BEFORE_STATE}}
After implementing action items: {{AFTER_STATE}}
Confidence in prevention: {{CONFIDENCE}} / 10 Residual risk: {{RESIDUAL_RISK}}
12. Review & Sign-Off
Post-mortem presented at: {{MEETING}} on {{MEETING_DATE}} Meeting recording: {{RECORDING_LINK}} Meeting notes: {{NOTES_LINK}}
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
SLA Report
SLA Report
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
1. Reporting Period
| Field | Value |
|---|---|
| Period | {{MONTH}} {{YEAR}} |
| From | {{START_DATE}} 00:00:00 UTC |
| To | {{END_DATE}} 23:59:59 UTC |
| Report Generated | {{REPORT_DATE}} |
| Generated By | {{AUTHOR}} |
2. SLA Summary Table
| Metric | SLA Target | Actual | Status | Notes |
|---|---|---|---|---|
| Availability (uptime) | ≥ {{AVAIL_SLA}}% | {{AVAIL_ACTUAL}}% | ✅ Pass / ❌ Breach | |
| P95 Response Time | ≤ {{P95_SLA}}ms | {{P95_ACTUAL}}ms | ✅ Pass / ❌ Breach | |
| P99 Response Time | ≤ {{P99_SLA}}ms | {{P99_ACTUAL}}ms | ✅ Pass / ❌ Breach | |
| Error Rate | ≤ {{ERR_SLA}}% | {{ERR_ACTUAL}}% | ✅ Pass / ❌ Breach | |
| MTTR (P1 incidents) | ≤ {{MTTR_SLA}} | {{MTTR_ACTUAL}} | ✅ Pass / ❌ Breach | |
| MTTD (alert detection) | ≤ {{MTTD_SLA}} | {{MTTD_ACTUAL}} | ✅ Pass / ❌ Breach | |
| Scheduled maintenance | ≤ {{MAINT_SLA}}h/mo | {{MAINT_ACTUAL}}h | ✅ Pass / ❌ Breach |
Overall SLA compliance this period: {{OVERALL_STATUS}}
3. Availability Report
3.1 Uptime Percentage
| Service | Total Minutes | Downtime Minutes | Uptime Minutes | Uptime % |
|---|---|---|---|---|
| {{SERVICE_1}} | {{TOTAL_MIN}} | {{DOWN_MIN}} | {{UP_MIN}} | {{UP_PCT}}% |
| {{SERVICE_2}} | {{TOTAL_MIN}} | {{DOWN_MIN}} | {{UP_MIN}} | {{UP_PCT}}% |
| Aggregate | {{AGG_UPTIME}}% |
Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.
3.2 Downtime Incidents
| Incident ID | Start | End | Duration | Service | Cause | SLA Counted |
|---|---|---|---|---|---|---|
| INC-{{ID}} | {{START}} | {{END}} | {{DURATION}}min | {{SERVICE}} | {{CAUSE}} | Yes / Excluded |
Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes
3.3 Maintenance Windows
| Date | Duration | Service Affected | Pre-announced | Purpose |
|---|---|---|---|---|
| {{DATE}} | {{DURATION}}min | {{SERVICE}} | Yes ({{DAYS}} days advance notice) | {{PURPOSE}} |
4. Performance Report
4.1 Response Time
| Service / Endpoint | P50 | P90 | P95 | P99 | Max | SLA (P95) | Status |
|---|---|---|---|---|---|---|---|
| Overall | {{P50}}ms | {{P90}}ms | {{P95}}ms | {{P99}}ms | {{MAX}}ms | {{SLA}}ms | ✅ / ❌ |
GET / |
{{P50}}ms | {{P90}}ms | {{P95}}ms | {{P99}}ms | {{MAX}}ms | {{SLA}}ms | ✅ / ❌ |
POST /api/{{RESOURCE}} |
{{P50}}ms | {{P90}}ms | {{P95}}ms | {{P99}}ms | {{MAX}}ms | {{SLA}}ms | ✅ / ❌ |
4.2 Throughput
| Service | Avg Requests/sec | Peak Requests/sec | Peak Time |
|---|---|---|---|
| {{SERVICE_1}} | {{AVG_RPS}} | {{PEAK_RPS}} | {{PEAK_TIME}} |
Total requests served this period: {{TOTAL_REQUESTS}}
4.3 Error Rate
| Service | Total Requests | 4xx Errors | 5xx Errors | Error Rate | SLA | Status |
|---|---|---|---|---|---|---|
| {{SERVICE_1}} | {{TOTAL}} | {{4XX}} | {{5XX}} | {{ERR_RATE}}% | ≤ {{ERR_SLA}}% | ✅ / ❌ |
5. Incident Summary
5.1 Incidents by Severity
| Severity | Count | Total Duration | Avg MTTR |
|---|---|---|---|
| P1 (Critical) | {{P1_COUNT}} | {{P1_DURATION}} | {{P1_MTTR}} |
| P2 (High) | {{P2_COUNT}} | {{P2_DURATION}} | {{P2_MTTR}} |
| P3 (Medium) | {{P3_COUNT}} | {{P3_DURATION}} | {{P3_MTTR}} |
| P4 (Low) | {{P4_COUNT}} | {{P4_DURATION}} | {{P4_MTTR}} |
| Total | {{TOTAL_COUNT}} | {{TOTAL_DURATION}} | {{AVG_MTTR}} |
5.2 MTTR (Mean Time to Resolve)
| Severity | SLA Target | This Period | Last Period | Trend |
|---|---|---|---|---|
| P1 | ≤ {{P1_MTTR_SLA}} | {{P1_MTTR_ACT}} | {{P1_MTTR_PREV}} | ↑ / ↓ / → |
| P2 | ≤ {{P2_MTTR_SLA}} | {{P2_MTTR_ACT}} | {{P2_MTTR_PREV}} | ↑ / ↓ / → |
5.3 MTTD (Mean Time to Detect)
| Period | MTTD | vs SLA | Trend |
|---|---|---|---|
| This period | {{MTTD_ACT}} | {{MTTD_STATUS}} | ↑ / ↓ / → |
| Last period | {{MTTD_PREV}} |
6. SLA Breach Analysis
{{#if SLA_BREACH}}
Breach Details
| Breach # | Metric | SLA | Actual | Duration | Customers Affected |
|---|---|---|---|---|---|
| 1 | {{METRIC}} | {{SLA_TARGET}} | {{ACTUAL}} | {{BREACH_DURATION}} | {{CUSTOMERS}} |
Root Cause
{{BREACH_ROOT_CAUSE}}
Remediation
{{BREACH_REMEDIATION}}
Contractual Obligations
| Customer | Contract Reference | Credit Due | Notification Required | Notification Sent |
|---|---|---|---|---|
| {{CUSTOMER}} | {{CONTRACT_REF}} | ${{CREDIT}} | Yes | {{DATE}} |
{{else}}
No SLA breaches this period. All commitments met.
{{/if}}
7. Trend Analysis
Availability Trend (Last 6 Months)
| Month | Uptime % | vs Target | Incidents |
|---|---|---|---|
| {{MONTH_6}} | {{PCT}}% | {{STATUS}} | {{COUNT}} |
| {{MONTH_5}} | {{PCT}}% | {{STATUS}} | {{COUNT}} |
| {{MONTH_4}} | {{PCT}}% | {{STATUS}} | {{COUNT}} |
| {{MONTH_3}} | {{PCT}}% | {{STATUS}} | {{COUNT}} |
| {{MONTH_2}} | {{PCT}}% | {{STATUS}} | {{COUNT}} |
| {{MONTH_1}} (This period) | {{PCT}}% | {{STATUS}} | {{COUNT}} |
P95 Latency Trend (Last 6 Months)
| Month | P95 (ms) | vs SLA |
|---|---|---|
| {{MONTH_6}} | {{P95}}ms | ✅ / ❌ |
| {{MONTH_5}} | {{P95}}ms | ✅ / ❌ |
| {{MONTH_4}} | {{P95}}ms | ✅ / ❌ |
| {{MONTH_3}} | {{P95}}ms | ✅ / ❌ |
| {{MONTH_2}} | {{P95}}ms | ✅ / ❌ |
| {{MONTH_1}} (This period) | {{P95}}ms | ✅ / ❌ |
8. Improvement Initiatives
| Initiative | Source | Owner | Target Date | Status | Expected Impact |
|---|---|---|---|---|---|
| {{INITIATIVE_1}} | Post-mortem INC-{{ID}} | {{OWNER}} | {{DATE}} | {{STATUS}} | +{{IMPACT}}% availability |
| {{INITIATIVE_2}} | Proactive | {{OWNER}} | {{DATE}} | {{STATUS}} | P99 < {{P99}} ms |
| {{INITIATIVE_3}} | Customer feedback | {{OWNER}} | {{DATE}} | {{STATUS}} | Reduce MTTR by 30% |
9. Customer Communication Summary
| Date | Type | Recipients | Subject | Sent By |
|---|---|---|---|---|
| {{DATE}} | Incident notification | All customers | {{SUBJECT}} | {{SENDER}} |
| {{DATE}} | SLA credit notice | Affected customers | {{SUBJECT}} | {{SENDER}} |
| {{DATE}} | Monthly SLA report | Enterprise customers | {{SUBJECT}} | {{SENDER}} |
10. Next Period Targets
| Metric | This Period | Next Period Target | Rationale |
|---|---|---|---|
| Availability | {{AVAIL_ACT}}% | {{AVAIL_NEXT}}% | {{RATIONALE}} |
| P95 latency | {{P95_ACT}}ms | {{P95_NEXT}}ms | {{RATIONALE}} |
| Error rate | {{ERR_ACT}}% | {{ERR_NEXT}}% | {{RATIONALE}} |
| MTTR (P1) | {{MTTR_ACT}} | {{MTTR_NEXT}} | {{RATIONALE}} |
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |
Terminal & Tmux Shortcuts
Terminal & Tmux Shortcuts
Brzi pregled prečica za svakodnevni rad u terminalu i tmux-u.
Tmux — Panel Navigacija
Prefix: Ctrl+A (naš custom config)
| Prečica | Opis |
|---|---|
Ctrl+A → o |
Prebaci na sljedeći panel (kruži redom) |
Ctrl+A → ← → ↑ ↓ |
Prebaci na panel u tom smjeru |
Ctrl+A → q + broj |
Pokaže brojeve panela, pritisni broj za skok |
Ctrl+A → z |
Zoom (fullscreen) trenutni panel (ponovi za undo) |
Ctrl+A → x |
Zatvori trenutni panel |
Ctrl+A → % |
Podijeli panel vertikalno (lijevo/desno) |
Ctrl+A → " |
Podijeli panel horizontalno (gore/dole) |
Tmux — Window Navigacija
| Prečica | Opis |
|---|---|
Ctrl+A → n |
Sljedeći window |
Ctrl+A → p |
Prethodni window |
Ctrl+A → 0-9 |
Direktno na window po broju |
Ctrl+A → c |
Kreiraj novi window |
Ctrl+A → , |
Preimenuj trenutni window |
Ctrl+A → w |
Lista svih windowa (interaktivni izbor) |
Tmux — Session Management
| Prečica | Opis |
|---|---|
Ctrl+A → d |
Detach iz sesije (sesija ostaje živa) |
Ctrl+A → s |
Lista sesija (prebaci se) |
Ctrl+A → $ |
Preimenuj sesiju |
tmux ls |
Lista svih sesija iz terminala |
tmux a -t <ime> |
Attach na sesiju |
tmux new -s <ime> |
Nova sesija |
Tmux — Copy Mode (Scroll)
| Prečica | Opis |
|---|---|
Ctrl+A → [ |
Uđi u copy/scroll mode |
q |
Izađi iz copy mode-a |
↑ ↓ ili PgUp PgDn |
Skrolaj |
Space → selektuj → Enter |
Kopiraj tekst |
Terminal — Readline Prečice
| Prečica | Opis |
|---|---|
Ctrl+A |
Skok na početak linije |
Ctrl+E |
Skok na kraj linije |
Ctrl+K |
Obriši od kursora do kraja |
Ctrl+U |
Obriši od kursora do početka |
Ctrl+W |
Obriši riječ unazad |
Ctrl+R |
Pretraži historiju komandi |
Ctrl+L |
Očisti ekran |
Ctrl+C |
Prekini trenutnu komandu |
Ctrl+D |
Izlaz (EOF) |
Claude Code — Prečice
| Prečica | Opis |
|---|---|
Enter |
Pošalji poruku |
Shift+Tab |
Accept edits |
Esc |
Cancel / Interrupt |
Ctrl+O |
Expand/collapse tool output |
/help |
Pomoć |
/clear |
Očisti kontekst |
Tip: Na Studio serveru tmux prefix je
Ctrl+A(ne defaultCtrl+B). Konfig:~/.tmux.conf
Baikal CalDAV Runbook
Service: Baikal CalDAV
Label: Docker container baikal + LaunchAgent com.john.calendar-bridge
Tier: P2 (Business)
Port: 5232 (local), calendar.basicconsulting.no (public via Cloudflare)
What It Does
Self-hosted CalDAV server for ALAI Business calendar. Alem syncs from iPhone/MacBook via native Calendar app. calendar-bridge.js daemon scans emails every 5min, detects meeting invites, forwards to alem@alai.no, and creates CalDAV events.
Architecture
Email (john@) → email-agent.js → calendar-bridge.js → Baikal CalDAV → Alem iPhone/Mac
↓
mail-native.js forward → alem@alai.no
Components
| Component | Location | Type |
|---|---|---|
| Baikal server | ~/system/services/baikal/docker-compose.yml | Docker |
| calendar-bridge.js | ~/system/tools/calendar-bridge.js | Tool + Daemon |
| LaunchAgent | ~/Library/LaunchAgents/com.john.calendar-bridge.plist | Daemon (5min) |
| Cloudflare tunnel | calendar.basicconsulting.no → localhost:5232 | Tunnel |
| Credentials | Vaultwarden → "Baikal CalDAV" | Vault |
| Calendar | "ALAI Business" (CalDAV user: alem) | CalDAV |
| Data | ~/system/services/baikal/data/ | Persistent volume |
Dependencies
- Docker (container: baikal)
- Cloudflare tunnel (com.john.cloudflared)
- Vaultwarden (credentials)
- mail-native.js (email forwarding)
- email-agent.js (inline meeting detection)
Health Check
# Quick check
node ~/system/tools/calendar-bridge.js test
# Docker container
docker ps --filter name=baikal
# CalDAV endpoint
curl -s -o /dev/null -w "%{http_code}" http://localhost:5232/dav.php/
# Public URL (expect 401 = auth required = healthy)
curl -s -o /dev/null -w "%{http_code}" https://calendar.basicconsulting.no/dav.php/
# List events
node ~/system/tools/calendar-bridge.js list
Common Failures & Fixes
Failure 1: Baikal container down
Symptoms: calendar-bridge.js test fails, CalDAV 502/connection refused Fix:
cd ~/system/services/baikal && docker compose up -d
Failure 2: Cloudflare tunnel not routing
Symptoms: Public URL returns 404 or timeout, local URL works fine Fix:
# Check config includes calendar entry
grep calendar ~/.cloudflared/config.yml
# Restart tunnel
launchctl kickstart -k gui/$(id -u)/com.john.cloudflared
Failure 3: Calendar-bridge scan finds nothing
Symptoms: Meeting invites arrive but no events created, no forwards Check:
# Check daemon is running
launchctl list | grep calendar-bridge
# Check logs
tail -50 ~/system/logs/calendar-bridge.log
# Check state file
cat ~/system/logs/calendar-bridge-state.json
# Manual scan with verbose
node ~/system/tools/calendar-bridge.js scan --verbose
Failure 4: Alem can't sync from iPhone
Symptoms: iPhone Calendar shows error, events not showing Check:
- Verify credentials in Vault:
node ~/system/tools/vault.js get "Baikal CalDAV" - Test public CalDAV endpoint (should return 401, not 502/404)
- iPhone settings: Server =
calendar.basicconsulting.no/dav.php/principals/alem
Failure 5: Authentication failure
Symptoms: 401 with correct password Fix: Password might be out of sync. Re-hash in Baikal DB:
NEW_PASS=$(bw get password "Baikal CalDAV" --session $(cat /tmp/bw-session))
DIGEST=$(printf "alem:BaikalDAV:$NEW_PASS" | md5)
docker exec baikal sqlite3 /var/www/baikal/Specific/db/db.sqlite \
"UPDATE users SET digesta1='$DIGEST' WHERE username='alem';"
Restart Procedure
# Restart Baikal
cd ~/system/services/baikal && docker compose restart
# Restart calendar-bridge daemon
launchctl kickstart -k gui/$(id -u)/com.john.calendar-bridge
Backup
- SQLite DB: ~/system/services/baikal/data/Specific/db/db.sqlite
- Config: ~/system/services/baikal/data/config/baikal.yaml
- Included in daily db-backup.sh via Docker volume mount
MC Task
Created: #3029 (Deploy), #3035 (Documentation + Watchdog)
ALAI Infrastructure Map & Ops Runbooks
ALAI Infrastructure Map & Ops Runbooks
Last updated: 2026-03-12 | Author: John (AI Director)
1. Infrastructure Overview
Azure VM — vm-alai-support
| Property | Value |
|---|---|
| IP | 4.223.110.181 |
| Region | Sweden Central |
| Size | Standard_B2als_v2 (2 vCPU, 4GB RAM) |
| OS | Ubuntu 22.04 LTS |
| SSH | ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181 |
| Resource Group | rg-alai-support |
| Cost | ~$35/mo (Founders Hub credits, expires 2026-11-15) |
| Compose | /opt/alai/docker-compose.yml |
ANVIL — Mac Studio M3 Max (Local)
| Property | Value |
|---|---|
| Role | AI inference, product dev, agent orchestration |
| Services | Ollama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed |
| Tunnel | Cloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc |
2. Services on Azure VM (16 containers)
| Service | URL | Container |
|---|---|---|
| BookStack (Wiki) | docs.basicconsulting.no | alai-bookstack-1 |
| Documenso (e-Sign) | sign.basicconsulting.no | alai-documenso-1 |
| Planka (Boards) | boards.basicconsulting.no | alai-planka-1 |
| Vaultwarden | vault.basicconsulting.no | alai-vaultwarden-1 |
| Baikal (CalDAV) | calendar.basicconsulting.no | alai-baikal-1 |
| Grafana | grafana.basicconsulting.no | alai-grafana-1 |
| Prometheus | prometheus.basicconsulting.no | alai-prometheus-1 |
| Paperless-ngx | archive.basicconsulting.no | alai-paperless-1 |
| Caddy (TLS proxy) | — | alai-caddy-1 |
3. ANVIL Daemons
| Daemon | LaunchAgent | Script |
|---|---|---|
| Pi-Orchestrator | com.john.pi-orchestrator | ~/system/kernel/pi-orchestrator.js |
| Telegram Agent | com.john.telegram-agent | ~/system/tools/telegram-agent.js |
| Email Agent | com.john.email-agent | ~/system/daemons/email-agent.js |
| Vault Keeper | com.john.vault-keeper | ~/system/daemons/vault-keeper.js |
| Event Dispatcher | com.john.event-dispatcher | ~/system/daemons/event-dispatcher.js |
| Tool-Shed | com.john.tool-shed | ~/system/tools/tool-shed.js (:3050) |
4. DNS — Cloudflare
Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0
| Subdomain | Target | Proxy |
|---|---|---|
| docs, sign, boards, vault, calendar, grafana, prometheus, archive | 4.223.110.181 (Azure VM) | Orange cloud |
| lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vnc | Cloudflare Tunnel (ANVIL) | Orange cloud |
5. Runbooks
5.1 Azure VM Full Restart
az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps # verify 16 containers
5.2 Single Service Recovery
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50
5.3 TLS Certificate Issues
Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.
5.4 ANVIL Daemon Recovery
launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log
5.5 Database Backup
docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql
5.6 Pi-Orchestrator Not Processing
curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10
5.7 Email Agent Not Fetching
export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log
5.8 SSH IP Update
az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
-n AllowSSH --source-address-prefixes "NEW_IP"
6. Security
- All services behind Cloudflare Access (Zero Trust)
- SSH restricted to office IP
- Docker .env (chmod 600) with secrets
- Let's Encrypt TLS on all domains
- Gitleaks pre-commit + CI on all 6 products
7. Monthly Cost
| Item | Cost |
|---|---|
| Azure VM (B2als_v2) | ~$35/mo |
| Cloudflare | Free |
| Total | ~$36/mo (Azure Founders Hub credits until Nov 2026) |
ALAI Infrastructure Map & Ops Runbooks
ALAI Infrastructure Map & Ops Runbooks
Last updated: 2026-03-12 | Author: John (AI Director)
1. Infrastructure Overview
Azure VM — vm-alai-support
| Property | Value |
|---|---|
| IP | 4.223.110.181 |
| Region | Sweden Central |
| Size | Standard_B2als_v2 (2 vCPU, 4GB RAM) |
| OS | Ubuntu 22.04 LTS |
| SSH | ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181 |
| Resource Group | rg-alai-support |
| Cost | ~$35/mo (Founders Hub credits, expires 2026-11-15) |
| Compose | /opt/alai/docker-compose.yml |
ANVIL — Mac Studio M3 Max (Local)
| Property | Value |
|---|---|
| Role | AI inference, product dev, agent orchestration |
| Services | Ollama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed |
| Tunnel | Cloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc |
2. Services on Azure VM (16 containers)
| Service | URL | Container |
|---|---|---|
| BookStack (Wiki) | docs.basicconsulting.no | alai-bookstack-1 |
| Documenso (e-Sign) | sign.basicconsulting.no | alai-documenso-1 |
| Planka (Boards) | boards.basicconsulting.no | alai-planka-1 |
| Vaultwarden | vault.basicconsulting.no | alai-vaultwarden-1 |
| Baikal (CalDAV) | calendar.basicconsulting.no | alai-baikal-1 |
| Grafana | grafana.basicconsulting.no | alai-grafana-1 |
| Prometheus | prometheus.basicconsulting.no | alai-prometheus-1 |
| Paperless-ngx | archive.basicconsulting.no | alai-paperless-1 |
| Caddy (TLS proxy) | — | alai-caddy-1 |
3. ANVIL Daemons
| Daemon | LaunchAgent | Script |
|---|---|---|
| Pi-Orchestrator | com.john.pi-orchestrator | ~/system/kernel/pi-orchestrator.js |
| Telegram Agent | com.john.telegram-agent | ~/system/tools/telegram-agent.js |
| Email Agent | com.john.email-agent | ~/system/daemons/email-agent.js |
| Vault Keeper | com.john.vault-keeper | ~/system/daemons/vault-keeper.js |
| Event Dispatcher | com.john.event-dispatcher | ~/system/daemons/event-dispatcher.js |
| Tool-Shed | com.john.tool-shed | ~/system/tools/tool-shed.js (:3050) |
4. DNS — Cloudflare
Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0
| Subdomain | Target | Proxy |
|---|---|---|
| docs, sign, boards, vault, calendar, grafana, prometheus, archive | 4.223.110.181 (Azure VM) | Orange cloud |
| lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vnc | Cloudflare Tunnel (ANVIL) | Orange cloud |
5. Runbooks
5.1 Azure VM Full Restart
az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps # verify 16 containers
5.2 Single Service Recovery
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50
5.3 TLS Certificate Issues
Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.
5.4 ANVIL Daemon Recovery
launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log
5.5 Database Backup
docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql
5.6 Pi-Orchestrator Not Processing
curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10
5.7 Email Agent Not Fetching
export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log
5.8 SSH IP Update
az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
-n AllowSSH --source-address-prefixes "NEW_IP"
6. Security
- All services behind Cloudflare Access (Zero Trust)
- SSH restricted to office IP
- Docker .env (chmod 600) with secrets
- Let's Encrypt TLS on all domains
- Gitleaks pre-commit + CI on all 6 products
7. Monthly Cost
| Item | Cost |
|---|---|
| Azure VM (B2als_v2) | ~$35/mo |
| Cloudflare | Free |
| Total | ~$36/mo (Azure Founders Hub credits until Nov 2026) |
System Map — Infrastructure & Services
ALAI System Map
Ažurirano: 2026-03-16
Autor: John (AI Director, AI-first OS)
☁️ Azure VM — Supporting Services (Production)
VM: vm-alai-support | Azure Founders Hub | Sweden Central
Specs: Standard_B2als_v2 — 2 vCPU / 4GB RAM / 30GB SSD | IP: 4.223.110.181
Compose: /opt/alai/docker-compose.yml
SSH port 22 je zatvoren/firewall'd — pristup samo kroz Caddy/Cloudflare
| Servis | URL | Status |
|---|---|---|
| BookStack (wiki/docs) | https://docs.alai.no | ✅ |
| Vaultwarden (passwords) | https://vault.basicconsulting.no | ✅ |
| Documenso (e-sign) | https://sign.basicconsulting.no | ✅ |
| Grafana (monitoring) | https://grafana.basicconsulting.no | ✅ |
| Planka (kanban) | https://boards.basicconsulting.no | ✅ |
| Baikal (CalDAV) | https://cal.basicconsulting.no | ❌ down |
| Prometheus | (interno, bez javnog URL-a) | ? |
| Caddy | (reverse proxy za sve gore) | ✅ |
🖥️ ANVIL (MacBook Pro M3 Max) — Lokalni Dev
Docker containers (dev baze za produkte)
| Container | Port | Projekt |
|---|---|---|
| lumiscare-postgres | 5432 | Lumiscare |
| lumiscare-redis | 6379 | Lumiscare |
| plock-db | 5434 | Plock |
| plock-redis | 6380 | Plock |
| backend-postgres | 5435 | (shared backend) |
| backend-redis | 6381 | (shared backend) |
| bilko-postgres | 5436 | Bilko |
| bilko-redis | 6382 | Bilko |
| drop-postgres | 5433 | Drop |
| lobby-postgres | 5437 | Lobby |
| qdrant | 6333-6334 | RAG vector search |
| sonarqube | 9000 | Code quality |
| bookstack (lokalno) | 6875 | ⚠️ Dev/sync kopija, prod=Azure |
| bookstack_db | 3306 | (bookstack lokalni DB) |
⚠️ Ovo su DEV baze — production servisi su na Azure ili u cloud providerima
Lokalni servisi (ne Docker)
| Servis | Port | Detalji |
|---|---|---|
| Ollama ANVIL | 11434 | 10 modela (qwen2.5-coder:32b, llama3.1:8b, llama-guard...) |
| N8N | 5678 | Workflow automation (lokalni, via LaunchAgent) |
| MC Dashboard | (interno) | Mission Control web UI |
| Caddy Vault | (interno) | Secret proxy |
| Tender Dashboard | (interno) | Anbud-tracking UI |
| Tool Shed | (interno) | Tool registry API |
Ollama Modeli
| Host | Modeli | Najveći |
|---|---|---|
| ANVIL (localhost:11434) | 10 | qwen2.5-coder:32b (23GB), llama-guard3:8b |
| FORGE (10.0.0.2:11434) | 5 | deepseek-r1:70b (42GB), qwen3:32b (20GB) |
⚙️ Aktivni LaunchAgent Daemoni (~33)
ALAI Kernel
agent-timeout-monitor · idle-learning-daemon · ram-monitor · task-router
John's Agents
browser-worker · caddy-vault · cloudflared · comms-agent · documenso-webhook · draft-sender · email-tracker · event-dispatcher · hook-daemon · intake-watcher · mc-dashboard · n8n · network-watchdog · ops-watchdog · outbox-processor · pi-orchestrator · pipeline-watcher · slack-bot · telegram-agent · tender-dashboard · tool-shed · vault-keeper · vault-proxy
Produkt Monitoring
drop.health-check
🗄️ Aktivne SQLite Baze (~54) — ~/system/databases/
| Baza | Namjena |
|---|---|
| mission-control.db (10MB) | Svi MC taskovi (3847 done, 36 open) |
| hivemind.db (52MB) | Intel, knowledge, sessions, events |
| knowledge.db (187MB) | RAG knowledge base |
| flywheel.db (36MB) | RAG cache |
| events.db (11MB) | Event bus log |
| guardrails-audit.db (9.6MB) | AI safety audit |
| bee-index.db (3.4MB) | Code/file index |
| tenders.db (184KB) | Anbud/tender tracker |
| leads.db (224KB) | CRM leads |
| contacts.db (96KB) | CRM kontakti |
| hivemind-archive.db (5.9MB) | HiveMind arhiva |
| email-inbox.db (164KB) | Email inbox |
| drafts.db (292KB) | Email draftovi |
| routing-outcomes.db (64KB) | AI routing metrike |
| tool-audit.db (900KB) | Tool usage audit |
| bih-tenders.db (284KB) | BiH tender scraper |
| strategy-tracker.db (128KB) | Strategija/OKR |
| teams.db (40KB) | Timovi |
| projects.db (40KB) | Projekti |
| pipeline.db (56KB) | Sales pipeline |
| sprint-pipeline.db (32KB) | Sprint tracker |
| goals.db (44KB) | Ciljevi |
| invoices.db (36KB) | Fakture |
| baikal-caldav.db (108KB) | Kalendar (CalDAV backup) |
| + još ~30 manjih baza | contacts, emails, tickets, vcr, distill... |
🌐 Eksterni Servisi
| Servis | Namjena |
|---|---|
| Anthropic API | Claude (claude-3-5-sonnet, claude-opus) |
| Fiken | Regnskap, fakture, lønn (NO) |
| Cloudflare | DNS, Tunnel, DDoS zaštita |
| Slack (basicconsulting) | Interna komunikacija |
| Telegram | Notifikacije, bot |
| Dropbox | File sync |
| one.com | Email hosting (SMTP/IMAP) |
| GitHub | Code repos |
| Azure Founders Hub | VM hosting |
🔧 Tools & Scripts — ~/system/tools/
- Ukupno: 1,310 skripti
- JS: 1,248 | SH: 58 | PY: 4
📁 Ključni Direktorijumi
~/system/
tools/ ← 1,310 JS/SH skripti
databases/ ← ~54 aktivnih SQLite baza
config/ ← json konfiguracije, daemon registry
agents/ ← hivemind, agent definicije
notes/ ← ovaj fajl i drugi notesi
backups/ ← dnevni backup svake baze
services/ ← docker-compose po servisu
~/ALAI/
products/ ← Drop, Bilko, Plock, Gotiva, Lobby, Lumiscare...
internal/ ← configs, tools, docs
legal/ ← ugovori, compliance, templates
🚦 Mission Control Status (2026-03-16)
| Status | Broj |
|---|---|
| ✅ done | 3,847 |
| ⏸️ paused | 664 |
| 🔴 blocked | 120 |
| 🔵 open | 36 |
ALAI Domain Migration — basicconsulting.no → alai.no
ALAI Domain Migration — basicconsulting.no → alai.no
Context
ALAI rebrand did not include support stack migration. 11 subdomains remain on legacy basicconsulting.no domain.
Current Live State (by Zone)
basicconsulting.no (Cloudflare zone 4670dbd0acfeab4174ac0d4746d11ea0)
- 30+ DNS records
- Main support stack hosting
- Active services: docs, sign, bilko-demo, www
- Inactive subdomains: status, support, monitor, alerts, help, wiki
alai.no (Cloudflare zone 3dc40d9c37fee79c4281f7e86870c0b5)
- Status: PENDING — Nameservers on one.com not yet changed
- Required NS change: ns01/02.one.com → aspen.ns.cloudflare.com + wells.ns.cloudflare.com
- 18 DNS records pre-created (A/CNAME for 15 services + root)
- Blocker: Alem must update NS on one.com dashboard (5 min task, blocks 15 subdomain migrations)
snowit.ba (AWS Route53 zone Z04121493CAJZ75TQUPIW)
- 2026-04-19 added: A record root → 76.76.21.21 (Vercel)
- CNAME www → cname.vercel-dns.com
- Change ID: C065644119MEENZWSSKW3
Cloudflare Tunnel Config
- Tunnel ID: 3315a609-7934-45c5-ad0c-56d86d16374d (named "mattermost")
- Host VM: Azure 4.223.110.181 (swedencentral)
- Ingress rules: Multiple service routes (see tunnel config for details)
Incident: sign.basicconsulting.no 404 (2026-04-18)
Symptom: DNS resolved to Cloudflare proxy but returned 404.
Root cause: Tunnel ingress had route sign.basicconsulting.no → localhost:3003 but cloudflared could not reach backend.
Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied).
Result: Documenso Sign In page now live.
Alem TODO
- Log into one.com domain panel
- Select alai.no domain
- Change nameservers from ns01/02.one.com to:
- aspen.ns.cloudflare.com
- wells.ns.cloudflare.com
- Wait 5-30 minutes for propagation
- Verify:
dig alai.no NSshould show Cloudflare nameservers
See Also
Created: 2026-04-19 | Source: ~/ALAI/products/Bilko/docs/runbooks/alai-support-stack-migration.md
AWS CLI Setup — john-deploy IAM
AWS CLI Setup — john-deploy IAM
Credentials Location
~/.aws/credentials
[default]
aws_access_key_id = AKIAUXDEHCNUHFX472XL
aws_secret_access_key = (stored in Vault: "AWS CLI - john-deploy IAM")
IAM User Details
- User: john-deploy
- AWS Account: 324480209768
- ARN: arn:aws:iam::324480209768:user/john-deploy
- Access Key ID: AKIAUXDEHCNUHFX472XL
- Secret Key: DO NOT print in docs — reference Bitwarden/Vault item "AWS CLI - john-deploy IAM"
- Primary Region: eu-central-1 (Frankfurt)
Permissions
Known permissions (unverified full list):
- Route53 (zone management, record creation)
- S3 (bucket operations)
- SES (email sending)
- ECR (container registry)
- App Runner (serverless containers)
Validated Usage
- 2026-04-14: Credentials confirmed working
- 2026-04-19: Route53 change for snowit.ba (Change ID: C065644119MEENZWSSKW3)
Usage Pattern
# Export credentials as env vars
export AWS_ACCESS_KEY_ID=AKIAUXDEHCNUHFX472XL
export AWS_SECRET_ACCESS_KEY="(from Vault)"
export AWS_DEFAULT_REGION=eu-central-1
# Example: Route53 change
aws route53 change-resource-record-sets \
--hosted-zone-id Z04121493CAJZ75TQUPIW \
--change-batch file://change-batch.json
MCP Docker AWS Tool
Tool: mcp__MCP_DOCKER__call_aws
Note: This tool has its own config and uses environment variables. May not share the same credentials as CLI.
Security Notes
- Secret key NEVER committed to git
- Stored in Vault: "AWS CLI - john-deploy IAM" item
- Keychain fallback on macOS
- If rotating keys: update Vault + ~/.aws/credentials + env vars
See Also
- ALAI Domain Migration (uses Route53)
Created: 2026-04-19 | Validated: 2026-04-14 + 2026-04-19
Slack alaiops Bot — Backend Architecture
Slack alaiops Bot — Backend Architecture
Basic Info
- Workspace: alai-talk.slack.com
- Bot user: @alaiops (U0AEMU81LBG)
- Channels: 11 public + 6 private (manual invite required for private)
- Mode: Socket Mode (no public webhook needed)
Tokens Location
- Primary: macOS Keychain
slack-bot/slack-bot-tokenslack-bot/slack-app-token
- Fallback 1: Bitwarden/Vault
- Fallback 2: Environment variables
Daemon
- LaunchAgent: com.john.slack-bot
- PID lookup:
pgrep -f slack-bot.js - Code: ~/system/tools/slack-bot.js
- Logs: ~/system/logs/slack-bot.log
Backend Chain (via comms-responder.js)
Priority-based fallback system (lower number = higher priority, faster response):
- Groq (priority 5, ~100-500ms) — PRIMARY
- Model: llama-3.1-8b-instant
- Added: 2026-04-18
- Requires:
GROQ_API_KEYenv var - Adapter: ~/system/tools/adapters/groq.js
- Claude API (priority 10, ~2s)
- Claude CLI (priority 20, ~20s)
- Ollama (priority 30, ~40s) — FALLBACK ONLY
Groq Adapter
// Registered in ~/system/tools/adapters/index.js
const groq = require("./groq.js");
// Usage
const response = await groq.send("prompt", {
model: "llama-3.1-8b-instant",
temperature: 0.7,
max_tokens: 512
});
Event Subscriptions
Status: Re-enabled 2026-04-18 after scope fix
Critical fix: Bot NO LONGER requires admin scopes (caused "Enterprise only" error). Removed admin scopes from User token, kept 15 bot scopes.
Active bot scopes (15):
- app_mentions:read
- channels:history
- channels:read
- chat:write
- groups:history
- groups:read
- im:history
- im:read
- im:write
- mpim:history
- mpim:read
- reactions:read
- reactions:write
- users:read
- users:read.email
Dead Pattern Warning
If bot stops responding, check logs first:
tail -100 ~/system/logs/slack-bot.log
Benign pattern (ignore): "Dedup: skipping" — message already processed
Error patterns (investigate):
- "Socket mode error"
- "Token invalid"
- "Groq API error"
- "All backends failed"
Test Commands
# Send test message
node ~/system/tools/slack.js send general "Test from John"
# Read channel history
node ~/system/tools/slack.js read general 10
# Check bot status
pgrep -f slack-bot.js && echo "Running" || echo "Stopped"
See Also
- Groq adapter source: ~/system/tools/adapters/groq.js
- Bot source: ~/system/tools/slack-bot.js
- Comms responder: ~/system/tools/comms-responder.js
Created: 2026-04-19 | Last updated: 2026-04-18 (Groq backend added)
Documenso Self-Hosted — sign.basicconsulting.no
Documenso Self-Hosted — sign.basicconsulting.no
Service Details
- Service: Documenso v2.x (open-source document signing)
- URL: https://sign.basicconsulting.no
- DNS: A record → 4.223.110.181 (Azure VM, proxied via Cloudflare)
- Hosting: Azure VM (swedencentral)
Admin Credentials
- Email: alem@alai.no
- Password: (stored in Vault: "Documenso - sign.basicconsulting.no")
- Vault item password: Cemerika_!950
API Integration
- API Token: api_xn907c9xczrteoba (created 2026-04-19 for Bilko Sign integration)
- API Base URL: https://sign.basicconsulting.no/api/v1
Test cURL
curl -H "Authorization: api_xn907c9xczrteoba" \
https://sign.basicconsulting.no/api/v1/documents
# Expected response:
{"documents":[],"totalPages":0}
Bilko Sign Integration
Documenso is used as the signing backend for Bilko (accounting SaaS).
- Spec: ~/ALAI/products/Bilko/docs/product/BILKO-SIGN-SPEC.md
- Integration team: Skybound (mobile + frontend specialists)
GCP Secret Manager
- Secret name: bilko-documenso-api-key
- Value: api_xn907c9xczrteoba
- Bound to: bilko-api Cloud Run service (revision 00045-flz)
- Environment variable: DOCUMENSO_API_KEY
bilko-api Environment Variables
DOCUMENSO_API_URL=https://sign.basicconsulting.no
DOCUMENSO_API_KEY=(from GCP Secret Manager)
Incident History
2026-04-18: 404 Error
Symptom: sign.basicconsulting.no returned 404 Not Found
Root cause: Cloudflare Tunnel ingress had route to localhost:3003 but cloudflared could not reach backend
Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied)
Result: Documenso Sign In page now live
Maintenance
Backup API Tokens
- Store all API tokens in Vault immediately after creation
- Documenso does NOT allow viewing tokens after creation (one-time display)
Version Updates
# Check current version
curl -s https://sign.basicconsulting.no/api/health | jq .version
# Update (on Azure VM)
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /path/to/documenso
docker-compose pull
docker-compose up -d
Future Migration
Target: sign.alai.no (part of ALAI domain migration)
- See ALAI Domain Migration runbook
- Requires: alai.no NS change on one.com (pending as of 2026-04-19)
See Also
- ALAI Domain Migration
- Bilko Sign spec: ~/ALAI/products/Bilko/docs/product/BILKO-SIGN-SPEC.md
Created: 2026-04-19 | API token created: 2026-04-19 | Incident fixed: 2026-04-18
Azure Blob Offsite Backup Setup
Azure Blob Offsite Backup Setup
Overview
Purpose: Offsite backup for ALAI system databases and git bundles
Region: North Europe (Dublin) — geographic separation from primary Sweden Central VM
Retention: 365 days with lifecycle policies (Hot → Cool → Archive → Delete)
Recovery Time Objective: 4 hours (manual restore)
Azure Resources
| Resource Type | Name | Purpose |
|---|---|---|
| Resource Group | alai-backups-rg | Isolation boundary for backup storage |
| Storage Account | alaibackups0ebb | Blob storage (LRS, Standard tier) |
| Container | system-db-backups | SQLite databases (hivemind.db, mission-control.db, etc.) |
| Container | system-git-bundles | Git repository bundles |
| Service Principal | alai-backup-writer | Scoped write-only access (Storage Blob Data Contributor) |
Service Principal Setup
# Create service principal
az ad sp create-for-rbac --name alai-backup-writer --skip-assignment
# Assign Storage Blob Data Contributor to SA only (not subscription)
STORAGE_ID=$(az storage account show --name alaibackups0ebb --query id -o tsv)
az role assignment create \
--assignee <service-principal-app-id> \
--role "Storage Blob Data Contributor" \
--scope "$STORAGE_ID"
# Store credentials in ~/system/config/azure-backup.env
cat > ~/system/config/azure-backup.env <
Lifecycle Policy
Hot → Cool: 30 days
Cool → Archive: 90 days
Archive → Delete: 365 days
Delete blobs: Last modified > 365 days
az storage account management-policy create \
--account-name alaibackups0ebb \
--policy @lifecycle-policy.json
lifecycle-policy.json:
{
"rules": [
{
"enabled": true,
"name": "archive-old-backups",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {"daysAfterModificationGreaterThan": 30},
"tierToArchive": {"daysAfterModificationGreaterThan": 90},
"delete": {"daysAfterModificationGreaterThan": 365}
}
},
"filters": {"blobTypes": ["blockBlob"]}
}
}
]
}
Backup Scripts
LightRAG to Azure Blob
#!/bin/bash
# ~/system/tools/migrate-lightrag-to-azure.sh
source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_FILE="/tmp/lightrag-backup-$TIMESTAMP.tar.gz"
tar -czf "$BACKUP_FILE" ~/system/lightrag/
az storage blob upload \
--account-name alaibackups0ebb \
--container-name system-db-backups \
--name "lightrag-$TIMESTAMP.tar.gz" \
--file "$BACKUP_FILE" \
--auth-mode login
rm "$BACKUP_FILE"
Ollama Models Export
#!/bin/bash
# ~/system/tools/ollama-models-export.sh --azure
source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
EXPORT_DIR="/tmp/ollama-export-$TIMESTAMP"
mkdir -p "$EXPORT_DIR"
ollama list | tail -n +2 | awk '{print $1}' > "$EXPORT_DIR/model-list.txt"
while read -r model; do
ollama show "$model" --modelfile > "$EXPORT_DIR/$model.modelfile"
done < "$EXPORT_DIR/model-list.txt"
tar -czf "$EXPORT_DIR.tar.gz" "$EXPORT_DIR"
az storage blob upload \
--account-name alaibackups0ebb \
--container-name system-db-backups \
--name "ollama-models-$TIMESTAMP.tar.gz" \
--file "$EXPORT_DIR.tar.gz"
rm -rf "$EXPORT_DIR" "$EXPORT_DIR.tar.gz"
Disaster Recovery Path
- List available backups:
az storage blob list \
--account-name alaibackups0ebb \
--container-name system-db-backups \
--output table
- Download latest backup:
az storage blob download \
--account-name alaibackups0ebb \
--container-name system-db-backups \
--name "lightrag-20260420-143000.tar.gz" \
--file /tmp/restore-lightrag.tar.gz
- Verify SHA-256 checksum:
shasum -a 256 /tmp/restore-lightrag.tar.gz
- Restore to target system:
tar -xzf /tmp/restore-lightrag.tar.gz -C ~/system/
Monitoring
- Cron: Hourly backup at :15 (15 * * * *)
- Log:
~/system/logs/azure-backup.log - Alert: HiveMind alert if backup fails 2 consecutive runs
node ~/system/agents/hivemind/hivemind.js post john alert \
"Azure backup failed 2 consecutive runs — check ~/system/logs/azure-backup.log"
ANVIL Memory Troubleshooting — Mac Studio
ANVIL Memory Troubleshooting — Mac Studio (M2 Ultra 192GB)
Incident Summary
Date: 2026-04-20
Symptom: System freezes, Chrome/Claude unresponsive, OOM kernel panics
Root Cause: Zombie Ollama runner processes + duplicate launchd agents + runaway grep processes
Resolution: Ollama config tuning, duplicate agent removal, zombie cleanup daemon, Ollama 0.21.0 upgrade
Root Causes
- Ollama zombie runners:
ollama psreports 0 models loaded, butpgrep -fl ollama_llama_servershows 4-6 GB processes still resident - Duplicate launchd agents: Both
com.alai.ollama-serve.plistandcom.alai.ollama-serve-v2.plistrunning simultaneously → 2x Ollama daemons - grep memory leak:
grep -rncommands on large codebases hang and consume 8+ GB RAM each - Preload warmup bloat:
com.john.ollama-warmup.plistloading 3 models on boot → 48 GB baseline before any work
Permanent Fix — Ollama Config
File: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.alai.ollama-serve-v2</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/ollama</string>
<string>serve</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>OLLAMA_HOST</key>
<string>0.0.0.0:11434</string>
<key>OLLAMA_KEEP_ALIVE</key>
<string>60s</string>
<key>OLLAMA_MAX_LOADED_MODELS</key>
<string>1</string>
<key>OLLAMA_NUM_PARALLEL</key>
<string>1</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/ollama-serve.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ollama-serve-error.log</string>
</dict>
</plist>
Key parameters:
OLLAMA_KEEP_ALIVE=60s— unload model after 60s idle (default 5m causes bloat)OLLAMA_MAX_LOADED_MODELS=1— only one model resident at a timeOLLAMA_NUM_PARALLEL=1— no parallel inference (reduces contention)
Zombie Cleanup Daemon
File: ~/Library/LaunchAgents/com.alai.zombie-cleanup.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.alai.zombie-cleanup</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>/Users/makinja/system/tools/zombie-proc-cleanup.sh</string>
</array>
<key>StartInterval</key>
<integer>3600</integer>
<key>StandardOutPath</key>
<string>/tmp/zombie-cleanup.log</string>
</dict>
</plist>
Script: ~/system/tools/zombie-proc-cleanup.sh
#!/bin/bash
# Kill zombie Ollama runners (no parent process or disconnected from ollama serve)
pgrep -fl ollama_llama_server | while read -r pid rest; do
parent=$(ps -o ppid= -p "$pid" | xargs)
if [[ -z "$parent" ]] || ! ps -p "$parent" | grep -q ollama; then
echo "$(date): Killing zombie Ollama runner $pid"
kill -9 "$pid"
fi
done
# Kill grep processes older than 5 minutes (likely hung)
ps -eo pid,etime,command | grep 'grep -rn' | while read -r pid etime rest; do
minutes=$(echo "$etime" | awk -F: '{print ($1*60)+$2}')
if [[ "$minutes" -gt 5 ]]; then
echo "$(date): Killing hung grep process $pid (runtime: $etime)"
kill -9 "$pid"
fi
done
Disabled Agents
launchctl unload ~/Library/LaunchAgents/com.alai.ollama-serve.plist
launchctl unload ~/Library/LaunchAgents/com.john.ollama-warmup.plist
rm ~/Library/LaunchAgents/com.alai.ollama-serve.plist
rm ~/Library/LaunchAgents/com.john.ollama-warmup.plist
Ollama Upgrade
brew upgrade ollama # 0.19.0 → 0.21.0
# Changelog: Fixed memory leak in runner cleanup (issue #4821)
OOM Symptom Recognition
Command:
vm_stat | awk '/Pages free/ {printf "%.1f GB\n", $3*16384/1024/1024/1024}'
Thresholds:
- < 5 GB free: Alert — investigate top memory consumers
- < 2 GB free: Critical — kill non-essential processes immediately
- < 500 MB free: Imminent OOM — force quit Claude/Chrome, restart Ollama
Quick triage:
ps aux | sort -nrk 4 | head -10 # Top 10 memory hogs
pgrep -fl ollama_llama_server # Zombie Ollama runners
pgrep -fl grep # Hung grep processes
Prevention Checklist
- Monitor free RAM hourly:
vm_statcheck in cron - Zombie cleanup daemon running:
launchctl list | grep zombie-cleanup - Only one Ollama launchd agent:
launchctl list | grep ollama→ expect 1 line - No warmup preload agents:
launchctl list | grep warmup→ empty - Grep with timeout:
timeout 60 grep -rn ...instead of baregrep -rn
Email Pipeline + Edita PA — Runbook
Email Pipeline + Edita PA — Runbook
Overview
The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).
Architecture
- Daemon:
~/system/daemons/email-agent.js - LaunchAgent:
com.john.email-agent(via wrapperemail-agent-wrapper.sh) - Vault: Bitwarden session (
/tmp/bw-session) required for IMAP credentials - Triage LLM: llama3.1:8b (Ollama ANVIL, preloaded via
ollama-triage-preload.sh)
OWN Classifier Logic
The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.
Constants (email-agent.js lines 118-123)
const OWN_SYSTEM_PREFIXES = [
'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];
isOwnSystemEmail() Function (lines 446-456)
Two-tier check:
- Exact match:
OWN_ADDRESSESarray (hardcoded machine addresses) - Prefix + domain: Any prefix in
OWN_SYSTEM_PREFIXESon domains inOWN_SYSTEM_DOMAINS
Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.
TLDR_SKIP Routing
// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';
// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}
VIP Ordering
Classification priority (lines 464-481):
- VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
- TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
- OWN: System emails → archive, no task
- Other: Spam allowlist check → Ollama classification
Edita PA Phases
Phase 0: --dry-run (Log-Only)
Classification + logging only. No archive, no escalate, no respond.
node ~/system/daemons/email-agent.js --dry-run
Phase 1: --allow-archive (CURRENT)
Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).
node ~/system/daemons/email-agent.js --allow-archive
Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).
Phase 2: Full Live (NOT YET APPROVED)
Archive + escalate + respond. Requires CEO explicit approval.
node ~/system/daemons/email-agent.js --allow-all
Unit Testing
Test classifier without IMAP/Vault dependencies:
node ~/system/daemons/test-email-classifier.js
Scenarios (16 total):
- VIP bypass (alem@alai.no, CEO family)
- TLDR_SKIP routing
- OWN system emails (noreply@alai.no, sentinel@basicconsulting.no)
- Spam patterns with allowlist exceptions (GitHub, Cloudflare, Anthropic)
Rollback
Revert to dry-run mode:
launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist
# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh
launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist
Monitoring
- Logs:
~/system/logs/email-agent-launchd.log - Errors:
~/system/logs/email-agent-launchd-error.log - MC tasks:
node ~/system/tools/mc.js list --owner edita - DLQ: Failed vault sessions stored in
email-agent.jsin-memory DLQ (logged only, no persistence)
Generated by Skillforge | ALAI, 2026
Contact Form Handlers
This section documents all contact forms across ALAI properties and their email delivery mechanisms.
alai.no Contact Form
- Frontend:
https://alai.no/contact(Cloudflare Pages) - Handler: CF Pages Function
/functions/contact.js - Endpoint:
POST https://alai.no/api/contact - Email provider: Resend API
- Recipient:
info@alai.no - Credentials: Bitwarden item "Resend API Key" → CF Pages env var
RESEND_API_KEY - Status: LIVE (deployed 2026-04-21, MC #8587)
Test procedure:
curl -X POST https://alai.no/api/contact \
-H "Content-Type: application/json" \
-d '{"name": "Test User", "email": "test@example.com", "message": "E2E test 2026-04-21 14:00"}'
# Verify inbox:
himalaya search --account info-alai --folder INBOX "subject:Contact Form"
snowit.ba Contact Form
- Frontend:
https://snowit.ba/contact - Handler: BROKEN — Vercel API route not migrated to CF Pages (MC #8591)
- Endpoint:
POST https://api.basicconsulting.no/contact(hijacked by documenso-webhook, returns false success) - Recipient:
info@snowit.ba(LumisCare side, not ALAI-managed) - Status: BROKEN — awaiting CodeCraft fix
getdrop.no Waitlist
- Frontend:
https://getdrop.no(Cloudflare Pages) - Handler: CF Pages Function
/functions/waitlist.js - Endpoint:
POST https://getdrop.no/api/waitlist - Storage: Cloudflare D1 database
drop-waitlist - Email provider: None (DB-only storage, no email sent)
- Status: LIVE
Test procedure:
wrangler d1 execute drop-waitlist --command "SELECT * FROM submissions ORDER BY created_at DESC LIMIT 5"
merdzanovic.ba Contact Form
- Status: UNKNOWN — needs audit (likely same risk as snowit.ba)
- MC Task: #8593 (audit all ALAI-managed contact forms)
Form Handler Migration Checklist
When migrating sites from Vercel/Netlify to Cloudflare Pages:
- Inventory: Identify all POST endpoints (forms, webhooks, API routes)
- Port handlers: Rewrite Vercel API routes as CF Pages Functions (
/functions/*.js) - Environment variables: Copy SMTP/API credentials to CF Pages env vars
- Update form actions: Change form targets to new CF Pages routes (e.g.,
/api/contact) - E2E test: Follow Forms E2E Testing Protocol (HTTP + inbox check MANDATORY)
- Monitor: Check inbox/DB for 24 hours post-migration to catch silent failures
Reference incident: 2026-04-21 alai.no Contact Form Failure
Himalaya IMAP Setup (for Form Testing)
Himalaya CLI provides rapid inbox verification without browser login.
Install
brew install himalaya
Configure Account
Add to ~/.config/himalaya/config.toml:
[accounts.info-alai]
default = false
email = "info@alai.no"
display-name = "ALAI Info"
[accounts.info-alai.imap]
host = "imap.one.com"
port = 993
encryption = "tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"
[accounts.info-alai.smtp]
host = "send.one.com"
port = 587
encryption = "start-tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"
Usage
# Unlock Bitwarden first
bw unlock --raw > /tmp/bw-session
# List recent messages
himalaya list --account info-alai --folder INBOX --page-size 20
# Search for form submissions
himalaya search --account info-alai --folder INBOX "from:noreply@alai.no"
# Search by date range
himalaya search --account info-alai --folder INBOX "since:2026-04-21"
Credentials: Bitwarden item "Email - info@alai.no"
Updated: 2026-04-21 | Skillforge
Email Pipeline + Edita PA — Runbook
Email Pipeline + Edita PA — Runbook
Overview
The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).
Architecture
- Daemon:
~/system/daemons/email-agent.js - LaunchAgent:
com.john.email-agent(via wrapperemail-agent-wrapper.sh) - Vault: Bitwarden session (
/tmp/bw-session) required for IMAP credentials - Triage LLM: llama3.1:8b (Ollama ANVIL, preloaded via
ollama-triage-preload.sh)
OWN Classifier Logic
The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.
Constants (email-agent.js lines 118-123)
const OWN_SYSTEM_PREFIXES = [
'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];
isOwnSystemEmail() Function (lines 446-456)
Two-tier check:
- Exact match:
OWN_ADDRESSESarray (hardcoded machine addresses) - Prefix + domain: Any prefix in
OWN_SYSTEM_PREFIXESon domains inOWN_SYSTEM_DOMAINS
Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.
TLDR_SKIP Routing
// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';
// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}
VIP Ordering
Classification priority (lines 464-481):
- VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
- TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
- OWN: System emails → archive, no task
- Other: Spam allowlist check → Ollama classification
Edita PA Phases
Phase 0: --dry-run (Log-Only)
Classification + logging only. No archive, no escalate, no respond.
node ~/system/daemons/email-agent.js --dry-run
Phase 1: --allow-archive (CURRENT)
Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).
node ~/system/daemons/email-agent.js --allow-archive
Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).
Phase 2: Full Live (NOT YET APPROVED)
Archive + escalate + respond. Requires CEO explicit approval.
node ~/system/daemons/email-agent.js --allow-all
Unit Testing
Test classifier without IMAP/Vault dependencies:
node ~/system/daemons/test-email-classifier.js
Scenarios (16 total):
- VIP bypass (alem@alai.no, CEO family)
- TLDR_SKIP routing
- OWN system emails (noreply@alai.no, sentinel@basicconsulting.no)
- Spam patterns with allowlist exceptions (GitHub, Cloudflare, Anthropic)
Rollback
Revert to dry-run mode:
launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist
# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh
launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist
Monitoring
- Logs:
~/system/logs/email-agent-launchd.log - Errors:
~/system/logs/email-agent-launchd-error.log - MC tasks:
node ~/system/tools/mc.js list --owner edita - DLQ: Failed vault sessions stored in
email-agent.jsin-memory DLQ (logged only, no persistence)
Generated by Skillforge | ALAI, 2026
Ollama Fleet Architecture
Ollama Fleet Architecture
Overview
ALAI operates a two-node Ollama fleet: ANVIL (local dev Mac) and FORGE (Ubuntu 22.04 GPU workstation). ANVIL handles triage workloads (email, TLDR, quick classification), FORGE handles heavy inference (32B+ models, RAG pipelines).
ANVIL Ollama Configuration
Capacity Limits
- MAX_LOADED_MODELS: 2 (prevents RAM exhaustion)
- KEEP_ALIVE: 30s (default for on-demand models)
- Hardware: M1 Pro, 32GB RAM, 5GB reserved for triage model
LaunchAgent: com.alai.ollama-serve-v2
Label: com.alai.ollama-serve-v2
Plist: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist
Port: 11434
Environment:
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=q8_0
OLLAMA_MAX_LOADED_MODELS=2
OLLAMA_KEEP_ALIVE=30s
Triage Preload Pattern
MC #8477 — Prevent qwen2.5-coder:32b (23GB) from blocking email/TLDR triage.
Strategy
Preload llama3.1:8b with keep_alive=-1 (indefinite) so it's always resident for fast triage operations. 5GB footprint.
LaunchAgent: com.john.ollama-triage-preload
Label: com.john.ollama-triage-preload
Script: ~/system/tools/ollama-triage-preload.sh
Trigger: RunAtLoad + StartInterval 300s (every 5 min)
Log: ~/system/logs/ollama-triage-preload-stdout.log
Script Logic (ollama-triage-preload.sh)
- Check if
llama3.1:8bis already loaded via/api/ps - If not loaded, send minimal prompt with
keep_alive=-1 - Log success/skip
curl -sf -X POST "$OLLAMA_URL/api/generate" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"llama3.1:8b\",
\"prompt\": \"ready\",
\"stream\": false,
\"keep_alive\": -1,
\"options\": {
\"num_predict\": 1
}
}"
Model Tier System
| Tier | Model | Size | Use Case | Keep Alive | Node |
|---|---|---|---|---|---|
| Triage | llama3.1:8b | 5GB | Email classification, TLDR summarization, quick routing | -1 (indefinite) | ANVIL |
| Heavy | qwen2.5-coder:32b | 23GB | Code generation, architecture review, complex reasoning | 30s (on-demand) | ANVIL |
| Primary | devstral:24b | ~15GB | Agent orchestration, planning, context routing | 300s | FORGE |
FORGE Failover
Consumers (email-agent.js, tldr-briefing.js, YouTube daemon) can set FORGE_FIRST=0 environment variable to skip FORGE and use ANVIL directly.
# Force ANVIL-only
export FORGE_FIRST=0
node ~/system/daemons/youtube-daemon.js
Default behavior: Try FORGE (10.0.0.2:11434), fallback to ANVIL (localhost:11434) on timeout.
Vault-Keeper Watchdog (MC #8471 — PENDING)
Monitors ~/system/.cache/vault-keeper-heartbeat file. If stale > 1 hour, SENTINEL alerts.
Implementation
LaunchAgent: com.john.vault-keeper-watchdog
Interval: 600s (10 min)
Script: ~/system/daemons/vault-keeper-watchdog.sh
Alert: Slack #sentinel-alerts
Logic
- Read heartbeat file timestamp
- Compare with current time
- If > 3600s, send SENTINEL alert with vault-keeper logs
YouTube Daemon Lesson (MC #8472)
Log redirection corruption: tee + subshell arithmetic capture caused output mangling.
Anti-Pattern
# WRONG — tee inside $() breaks arithmetic
NEW_COUNT=$(node ~/system/daemons/youtube-processor.js | tee -a "$LOG")
Correct Pattern
# RIGHT — separate logging stream
node ~/system/daemons/youtube-processor.js >> "$LOG" 2>&1
LaunchAgent Duplication
Never use both KeepAlive and StartInterval in same plist. StartInterval triggers even if process is still running, causing overlap.
# WRONG
<key>KeepAlive</key>
<true/>
<key>StartInterval</key>
<integer>3600</integer>
# RIGHT (pick one)
<key>StartInterval</key>
<integer>3600</integer>
Fleet Monitoring
ANVIL
curl http://localhost:11434/api/ps
curl http://localhost:11434/api/tags
tail -f ~/system/logs/ollama-triage-preload-stdout.log
FORGE
curl http://10.0.0.2:11434/api/ps
ssh forge "tail -f /var/log/ollama.log"
Mission Control
node ~/system/tools/mc.js list --tag ollama
node ~/system/tools/cost-tracker.js summary --service ollama
Generated by Skillforge | ALAI, 2026
Static Hosting Migration — Progress Log
Static Hosting Migration — Progress Log
MC: #8523 (tracking), #8482 (basicconsulting.no), #8489 (bilko.io) | Date: 2026-04-20
Overview
ALAI is migrating 8 static sites from Vercel/Azure VM to Cloudflare Pages for cost savings (€0 vs €12-14/mo), operational simplification, and DDoS/WAF coverage. See full blueprint at ~/system/specs/ALAI-STATIC-HOSTING-BLUEPRINT.md.
Migration Log
| Date | Domain | From | To | Downtime | TTFB Before | TTFB After | Notes |
|---|---|---|---|---|---|---|---|
| 2026-04-20 | basicconsulting.no | Vercel (76.76.21.21) | CF Pages | ~60s | 114ms | 51ms (warm avg) | MC #8482. DNS: A→CNAME. Validation required domain re-add. TTFB improved 55%. Proveo pilot validated #8490. |
| 2026-04-20 | bilko.io | one.com (down) | CF Pages | N/A (site was down) | N/A | 68ms (warm avg) | MC #8489. Apex CNAME not possible on one.com free tier (paid feature). Switched to Cloudflare NS (ana.ns.cloudflare.com, bob.ns.cloudflare.com). CF Pages zone ID: 62d89b79f0648d3fa1d045335a989ea7. DNS: CNAME flattening bilko.io → bilko-io.pages.dev (proxied), www → bilko-io.pages.dev. |
Paused Migrations
MC #8483 — basicfakta.no
Reason: Inventory error. Site has serverless functions (Vercel Edge), not pure static. Requires CodeCraft assessment before migration path can be determined.
MC #8484 — snowit.no
Reason: Inventory error. Site has API routes (Next.js), not pure static. Requires CodeCraft assessment for static export viability or alternate hosting.
Audit Verdict: bilko-demo.alai.no (MC #8486)
Decision: Stays on GCP Cloud Run. Not eligible for CF Pages migration.
Reason: Full-stack Next.js app with dynamic API routes and server-side rendering. Static export would break functionality. Current platform (Cloud Run) is correct fit.
Lessons Learned
one.com Apex CNAME Limitation
one.com free tier does NOT support apex CNAME (requires paid plan). For domains registered at one.com, the migration path is:
- Switch nameservers to Cloudflare (ana.ns.cloudflare.com, bob.ns.cloudflare.com)
- Import DNS records via Cloudflare zone scan
- Set up CNAME flattening in Cloudflare (apex → CF Pages project, proxied)
Propagation time: 15 minutes to 4 hours for .no domains.
Inventory Validation Pre-Migration
Before scheduling a migration, verify the site is truly static:
- Check for
pages/api/orapp/api/directories (Next.js API routes) - Check for Vercel Edge Functions (
middleware.ts,edge-config) - Check for ISR/SSR (
getServerSideProps,revalidatein Next.js) - Run
npm run buildand verify output isout/ordist/(static export)
If any of the above exist, the site is NOT static and requires CodeCraft review.
TTFB Improvements
Cloudflare Pages with CDN caching (orange-cloud proxy) delivers 50-60% TTFB improvement over Vercel for static sites. Cold start overhead is negligible (CF edge network vs Vercel edge).
Remaining Migrations
| Domain | Current Host | Status | MC Task |
|---|---|---|---|
| alai.no | CF Pages | ✅ Complete (already on target platform) | N/A |
| basicconsulting.no | CF Pages | ✅ Complete (2026-04-20) | #8482 |
| bilko.io | CF Pages | ✅ Complete (2026-04-20) | #8489 |
| basicfakta.no | Vercel | ⏸ Paused (serverless functions found) | #8483 |
| snowit.no | Vercel | ⏸ Paused (API routes found) | #8484 |
| getdrop.no | Azure VM | 🔄 Pending (DNS on Vercel, move to CF) | #8485 |
| kenyhot.pro | Vercel | 🔄 Pending (coordinate with client) | #8487 |
| merdzanovic.ba | Vercel | 🔄 Pending (coordinate with client) | #8488 |
DNS Consolidation Status
| Domain | Registrar | Current NS | Target NS | Status |
|---|---|---|---|---|
| alai.no | one.com | Cloudflare | Cloudflare | ✅ Done |
| basicconsulting.no | one.com | Cloudflare | Cloudflare | ✅ Done |
| bilko.io | one.com | Cloudflare | Cloudflare | ✅ Done (2026-04-20) |
| getdrop.no | one.com | Vercel | Cloudflare | 🔄 Pending |
| basicfakta.no | one.com | Vercel | Cloudflare | 🔄 Pending |
| snowit.no | one.com | Unknown | Cloudflare | 🔄 Pending |
Generated by Skillforge | ALAI, 2026
ANVIL DR Bootstrap Runbook (Mac Air)
ANVIL DR Bootstrap Runbook (Mac Air)
When to use
This runbook is for recovering the ALAI AI factory infrastructure when:
- ANVIL (Mac Studio, 100.103.49.98) is dead, stolen, or inaccessible
- Hardware failure requiring complete rebuild on new Mac
- Setting up FORGE (disaster recovery clone) on fresh hardware
- Provisioning a new MacBook Air for Alem with minimal AI factory capabilities
SPOF Context: As of 2026-04-20, ANVIL is the single Mac Studio hosting 112 LaunchAgent daemons, 68 SQLite databases (litestream-replicated), Ollama (8 models), and the entire ~/system + ~/.claude infrastructure. This runbook enables recovery to any fresh Mac with admin access.
Prerequisites
Before starting bootstrap, ensure you have:
- Fresh Mac with admin account (macOS Sonoma or later, Apple Silicon preferred)
- Tailscale app installed + logged into
alembasic@tailnet (download from tailscale.com/download) - GitHub account with read access to:
github.com/johnatbasicas/clawd(~/system repo, auto-backup branch)github.com/johnatbasicas/claude-config(~/.claude repo)
- Bitwarden account unlocked with master password ready (Alem's personal vault: alembasic@gmail.com)
- Internet connection (stable, for 2-3 GB of Homebrew packages + Ollama models)
Step-by-step Bootstrap
Phase 1: Foundation
1. Install Xcode Command Line Tools
xcode-select --install
Expected: GUI dialog appears. Click "Install" and wait 5-10 minutes. Verify with:
xcode-select -p
# Should output: /Library/Developer/CommandLineTools
2. Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Expected: Homebrew installs to /opt/homebrew. Add to shell profile:
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
# Verify:
brew --version
# Should show: Homebrew 4.x.x
3. Install Bitwarden CLI + unlock vault
brew install bitwarden-cli
# Unlock vault (enter master password when prompted):
bw login alembasic@gmail.com
export BW_SESSION=$(bw unlock --raw)
# Verify:
bw status | jq .status
# Should show: "unlocked"
Note: Keep this terminal window open. BW_SESSION is needed for bootstrap script.
Phase 2: Clone Infrastructure Repos
4. Clone ~/system (clawd repo)
# If using SSH (recommended if SSH keys already set up):
git clone git@github.com:johnatbasicas/clawd.git ~/system
# OR if using HTTPS with GitHub PAT:
git clone https://github.com/johnatbasicas/clawd.git ~/system
# Switch to auto-backup branch (contains latest portability artifacts):
cd ~/system
git checkout auto-backup
git pull
Expected:
ls ~/system/
# Should show: Brewfile, bootstrap.sh, config/, databases/, tools/, etc.
5. Clone ~/.claude (claude-config repo)
git clone git@github.com:johnatbasicas/claude-config.git ~/.claude
# Verify:
ls ~/.claude/
# Should show: CLAUDE.md, hooks/, agents/, skills/, projects/
Phase 3: Run Bootstrap Script
6. Execute bootstrap (with BW_SESSION active)
cd ~/system
bash bootstrap.sh workstation
Role options:
anvil: Full primary node (all daemons, Ollama, heavy workloads)forge: DR clone (continuous restore from Azure, lighter load)workstation: Minimal setup (SSH relay to ANVIL for heavy ops)
What the script does:
- Re-checks Xcode CLT + Homebrew (idempotent)
- Installs ~70 brew packages from Brewfile (15-30 min depending on connection)
- Copies 112 LaunchAgent plists from
~/system/config/launchagents/to~/Library/LaunchAgents/ - Rehydrates
BW:<item>placeholders in plists by callingbw get password <item> - Loads all LaunchAgents via
launchctl bootstrap - Verifies core services (Ollama, litestream)
Expected output (tail of bootstrap.log):
[bootstrap] Bootstrap COMPLETE. Next steps:
[bootstrap] - Verify SSH: ssh makinja@100.103.49.98
[bootstrap] - Check MC: node ~/system/tools/mc.js list
[bootstrap] - Log: /Users/makinja/bootstrap.log
LaunchAgents loaded: 112
Ollama models available: 8
Litestream: RUNNING
If BW rehydration fails: You'll see warnings like:
WARN: Bitwarden item 'groq-api-key' not found — com.alai.groq-model-benchmark.plist will need manual fix
Fix manually after bootstrap completes (see Troubleshooting section).
Phase 4: Database Restore (if DBs lost/corrupt)
When to run: Only if ~/system/databases/ is empty or you need to restore from Azure backups (e.g., ANVIL disk died).
7. Set Azure auth environment variables
export AZURE_CLIENT_ID="1a0b3018-0c31-474b-918f-531b0a29a669"
export AZURE_CLIENT_SECRET=$(bw get password alai-backup-writer-secret)
export AZURE_TENANT_ID="cd0a7929-1d14-4f81-820d-b36e45f72cf7"
8. Restore P0 critical databases
mkdir -p ~/system/databases
# Mission Control:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/mission-control.db
# HiveMind:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/hivemind.db
# Tasks:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/tasks.db
# Costs:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/costs.db
# Events:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/events.db
9. Restore P0 financial databases
# Fiken (accounting cache):
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/fiken.db
# Invoices:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/invoices.db
# Contracts:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/contracts.db
# Leads:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/leads.db
Note: The -if-replica-exists flag prevents overwriting local DB if it's newer than Azure backup. Omit this flag to force restore.
Bulk restore all 68 DBs (if needed):
for db in mission-control hivemind tasks costs events fiken invoices contracts leads \
orchestrator-queue orchestrator-workers durable-runner session-index knowledge \
emails email-inbox alem-directives agent-routing bee-index companies contacts \
deploy-registry design-reviews distill documents drafts drift email-audit \
email-briefing email-index email-tracking escalations facts flywheel goals \
guardrails-audit health-events hivemind-archive master-control mc minions \
observability orchestrator-events pipeline projects routing-outcomes skill-improvements \
skill-registry sprint-pipeline strategy-tracker teams tenders tickets tool-audit \
tool-registry trace-events applications-tracker baikal-caldav prompt-cache \
prompt-metrics semantic-reuse-index stbs telemetry token-cost usage vcr bih-tenders browser-tasks; do
echo "Restoring $db..."
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/$db.db || echo "WARN: $db restore failed or skipped"
done
Verify restores:
ls -lh ~/system/databases/*.db | wc -l
# Should show: 68 (or close, depending on which DBs had replicas)
# Check specific DB integrity:
sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"
# Should output: ok
Bitwarden Items Required
The following Bitwarden vault items MUST exist in Alem's vault before running bootstrap. These are referenced as BW:<item> placeholders in LaunchAgent plists:
| Item Name | Used By | Purpose |
|---|---|---|
alai-backup-writer-secret | litestream, Azure backups | Azure SP client secret for Storage Blob write access |
cf-access-client-secret | BookStack sync, CF-protected APIs | Cloudflare Access client secret for docs.basicconsulting.no |
groq-api-key | Groq model benchmark daemon | Groq API key for LLM model testing |
slack-app-token | Slack integration | Slack app-level token for socket mode |
slack-bot-token | Slack integration | Slack bot user OAuth token (xoxb-...) |
How to verify items exist:
bw get item alai-backup-writer-secret --session $BW_SESSION
bw get item cf-access-client-secret --session $BW_SESSION
bw get item groq-api-key --session $BW_SESSION
bw get item slack-app-token --session $BW_SESSION
bw get item slack-bot-token --session $BW_SESSION
If missing: Contact Alem or check Vaultwarden (https://vault.basicconsulting.no) for backup credentials. These secrets are also in ANVIL's Keychain if ANVIL is still accessible.
Post-Bootstrap Verification
10. Check LaunchAgents loaded
launchctl list | grep -E "com\.(alai|john)\." | wc -l
# Expected: ~110-112 (depending on role)
11. Verify Ollama running
curl -s http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected (ANVIL): qwen2.5-coder:32b, llama3.3, deepseek-r1, etc.
12. Verify litestream replication
pgrep -fl litestream
# Should show: litestream replicate -config /Users/makinja/system/config/litestream.yml
# Check logs:
tail -f ~/system/logs/litestream.log
# Should show periodic sync messages (every 1-30s depending on DB tier)
13. Test Mission Control
node ~/system/tools/mc.js stats
# Should show task counts, agents, recent activity
node ~/system/tools/mc.js list --limit 5
# Should show recent tasks
14. Test SSH to original ANVIL (if still alive)
ssh makinja@100.103.49.98 "hostname && uptime"
# Expected: ANVIL + uptime if machine is reachable
Troubleshooting
Error: "brew: command not found" after install
Cause: Homebrew not in PATH.
Fix:
eval "$(/opt/homebrew/bin/brew shellenv)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
Error: "bw: command not found"
Cause: Bitwarden CLI not installed or not in PATH.
Fix:
brew install bitwarden-cli
hash -r # Refresh shell PATH cache
LaunchAgent fails to load
Symptoms: launchctl bootstrap returns error code 119, 122, or 125.
Debug:
# Check specific agent status:
launchctl print gui/$(id -u)/com.alai.litestream
# Look for "state = waiting" or "last exit code"
# Check agent logs:
tail -f ~/system/logs/litestream.log
tail -f ~/Library/Logs/com.alai.*.log
Common exit codes:
119: Invalid plist XML (malformed after sed replacement)122: Path not found (e.g., /Users/makinja hardcoded but new user is /Users/alem)125: Permission denied (env var secret not readable)
Secret rehydration failed
Symptoms: Bootstrap log shows "WARN: Bitwarden item 'X' not found".
Fix manually:
# Get secret from Bitwarden:
SECRET=$(bw get password groq-api-key --session $BW_SESSION)
# Edit plist:
vi ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist
# Replace BW:groq-api-key with actual value in <string> tag
# Reload:
launchctl bootout gui/$(id -u)/com.alai.groq-model-benchmark
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist
Hardcoded /Users/makinja path mismatch
Cause: LaunchAgent plists contain hardcoded paths to /Users/makinja, but new Mac has different username (e.g., /Users/alem).
Fix (bulk replace):
NEW_USER=$(whoami)
cd ~/Library/LaunchAgents
for plist in com.alai.*.plist com.john.*.plist; do
sed -i.bak "s|/Users/makinja|/Users/$NEW_USER|g" "$plist"
done
# Reload all:
launchctl bootout gui/$(id -u)
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/*.plist
Ollama models missing
Cause: Fresh install has no models cached. Models are NOT in git repos (too large).
Fix (pull from Ollama registry):
ollama pull qwen2.5-coder:32b
ollama pull llama3.3:70b
ollama pull deepseek-r1:32b
ollama pull deepseek-r1:70b
ollama pull devstral:24b
ollama pull mistral-small
ollama pull llama3.2-vision:90b
ollama pull qwq:32b
# Verify:
ollama list
Expected download size: ~150 GB total for all models. This takes 2-6 hours on good connection.
Database restore fails with "replica not found"
Cause: Azure credentials invalid, or DB was never replicated (new DB created after litestream setup).
Debug:
# Test Azure auth:
az login --service-principal \
--username $AZURE_CLIENT_ID \
--password $AZURE_CLIENT_SECRET \
--tenant $AZURE_TENANT_ID
# List backups:
litestream snapshots -config ~/system/config/litestream.yml ~/system/databases/mission-control.db
# Should show timestamps of snapshots in Azure Blob Storage
If no snapshots: DB is new or replication was broken. Accept data loss or restore from other source (e.g., Time Machine if on ANVIL).
Known Limitations
- Untested end-to-end: bootstrap.sh has NOT been tested on a completely fresh Mac. Code paths for Xcode install prompt, Homebrew first-run, and BW unlock flow are based on best practices but unverified in production DR scenario.
- User rename not handled: If new Mac username != "makinja", LaunchAgent plists will fail due to hardcoded /Users/makinja paths. Manual sed replacement required (see Troubleshooting).
- npm install layer incomplete: ~/system/tools/ contains 1,310 scripts, some requiring
npm installin subdirs. bootstrap.sh does NOT auto-install these deps. Expect some tools to fail until deps are installed manually. - Ollama models not in backup: Models are fetched from Ollama registry on first use. Expect 2-6 hour delay to repopulate model cache (~150 GB).
- GitHub auth assumed: Script assumes SSH keys or PAT for GitHub already configured. If not, git clone will prompt interactively.
- No Keychain sync: macOS Keychain items (SSH keys, app passwords, etc.) are NOT part of this backup. Alem must re-enter credentials for Mail.app, Calendar, etc.
- No ~/felles or ~/Documents: User data directories are NOT backed up by this system. Rely on Time Machine or iCloud for personal files.
Testing Recommendations
Before trusting this runbook in a real disaster:
- Spin up a fresh Mac VM (UTM or Parallels) with macOS Sonoma
- Run through Steps 1-6 end-to-end without looking at ANVIL
- Verify LaunchAgent load count matches expected (~112)
- Verify DB restore works for at least mission-control.db and hivemind.db
- Document any new errors or missing secrets in this runbook
Assigned to: Petter Graff (CodeCraft) — MC task #8534
Last updated: 2026-04-20 | MC Task: #8534 | Tags: status=draft-untested, type=runbook, severity=critical
Incident — 2026-04-21 alai.no Contact Form Failure
2026-04-21 — alai.no Contact Form Silent Failure
Incident Classification
Severity: HIGH — Silent data loss (potential lead loss)
Duration: 2026-04-19 19:00 → 2026-04-21 11:30 (40.5 hours)
Detection: Manual inspection via Himalaya IMAP client
Status: RESOLVED (form handler redeployed to CF Pages Functions)
Timeline
- 2026-04-19 19:00 — alai.no migrated from Vercel to Cloudflare Pages (MC #8576)
- 2026-04-19 19:00 → 2026-04-21 11:30 — Contact form submissions received HTTP 200 OK but no emails delivered
- 2026-04-21 11:30 — CEO (Alem) noticed no inquiry emails received in days, requested investigation
- 2026-04-21 11:35 — John inspected
info@alai.noIMAP (viahimalaya search --folder INBOX from:noreply) — zero messages from contact form - 2026-04-21 11:45 — Root cause identified: CF Tunnel routing hijack + documenso-webhook false-positive response
- 2026-04-21 12:15 — CodeCraft dispatched to deploy dedicated contact handler as CF Pages Function (MC #8587)
- 2026-04-21 14:00 — Fix deployed and verified (E2E browser test + inbox check)
Impact Assessment
- Lost inquiries: Unknown (no form submission logging). Estimated 0-5 potential leads during 40-hour window.
- User experience: Users received "success" feedback but no confirmation email. No error notification.
- Business risk: Medium — alai.no is not yet primary sales channel; minimal active marketing campaigns during incident window.
Root Cause Analysis
Technical Chain of Failure
- alai.no contact form POSTs to
https://api.basicconsulting.no/contact(hardcoded Vercel pattern from pre-migration code) - Cloudflare Tunnel ingress rule matches
api.basicconsulting.no/*→ routes ALL POST requests tolocalhost:3001 documenso-webhook.jslistens on port 3001, designed for Documenso signature events- Webhook handler has catch-all route:
app.post('/*', (req, res) => res.json({ok: true})) - Contact form receives HTTP 200 +
{ok: true}→ assumes success, displays "Message sent" - No email handler ever invoked → no SMTP call → no delivery
Root Cause Categories
- Architectural: Assumed serverless runtime (Vercel Functions) but deployed to static hosting (CF Pages) without serverless equivalent
- Migration process: No pre-deployment checklist for "dynamic endpoints" (forms, APIs, webhooks)
- Testing gap: No E2E validation of email delivery — only HTTP response validated (curl 200 != email delivered)
- Monitoring gap: No alerting on zero-message rate for
info@alai.noINBOX (expected rate: ~1-3/week)
Detection Method
Manual IMAP inspection using Himalaya CLI:
himalaya search --account info@alai.no --folder INBOX "from:noreply" "since:2026-04-19"
# Result: No messages found
Lesson: HTTP 200 is NOT proof of delivery. Always verify end-to-end (inbox check, log inspection, user confirmation email).
Fix Summary
- CodeCraft deployed
/functions/contact.jsas CF Pages Function - Handler uses Resend API (
RESEND_API_KEYin Bitwarden → CF Pages env vars) - Form target updated to
https://alai.no/api/contact(CF Pages Functions route:/functions/→/api/) - Proveo validated: submit test form → received at
info@alai.nowithin 5 seconds
MC Task: #8587
Lessons Learned
What Went Well
- CEO noticed absence of expected emails (operational intuition)
- Himalaya CLI provided rapid IMAP audit without browser login
- Root cause identified within 15 minutes of investigation start
What Went Wrong
- Migration checklist did NOT include "verify all POST endpoints have backend handlers"
- No E2E test protocol for forms (HTTP 200 assumed sufficient)
- No monitoring/alerting on email delivery rates (silent failure undetected for 40 hours)
- Cloudflare Tunnel routing too broad (
/*catch-all dangerous for multi-service proxy)
Prevention Actions
| Action | Owner | MC Task | Status |
|---|---|---|---|
| Update site migration checklist: "Verify form handlers migrated" | Skillforge | #8587 | DONE (this doc) |
| Create Forms E2E Testing Protocol (HTTP + inbox check required) | Skillforge | #8587 | DONE (BookStack QA section) |
Add Grafana alert: info@alai.no message rate < 1/week → notify #ops |
FlowForge | #8588 | OPEN |
Audit all CF Tunnel ingress rules for overly-broad /* patterns |
Securion | #8589 | OPEN |
| Migrate snowit.ba contact form (same silent failure risk) | CodeCraft | #8591 | OPEN |
| Add form submission logging to all contact handlers (track volume even if email fails) | CodeCraft | #8592 | OPEN |
Related Incidents
- snowit.ba contact form: Same root cause (Vercel pattern, no CF Pages handler). Bouncing to
info@snowit.ba(LumisCare side, not ALAI). MC #8591 tracks. - getdrop.no waitlist: Already migrated correctly (CF Pages Function + D1 storage). No issue.
References
- Email Pipeline Runbook
- Forms E2E Testing Protocol (new)
- Static Hosting Migration — Progress Log
- Himalaya setup:
~/.config/himalaya/config.toml(info@alai.noIMAP credentials in Bitwarden)
Incident Postmortem — Bilko Deploy Fix 2026-04-22
Incident Postmortem — Bilko Deploy Fix 2026-04-22
Date: 2026-04-22
Severity: High (CEO time wasted + security leak)
Status: Resolved
Type: Blameless Postmortem
Summary
A 2-hour bug fix sprint (MC tasks #8626, #8627, #8628) aimed at fixing 3 bugs in Bilko demo resulted in ZERO live changes reaching the production demo URL (bilko-demo.alai.no). All code changes were pushed to the wrong branch (feat/intesa-bih-demo instead of main), CI pipeline was silently broken for 7 days, and client-specific content (Intesa BiH pitch) leaked to the public demo URL.
Timeline (UTC+1)
| Time | Event | Actor |
|---|---|---|
| 2026-04-21 13:32 | MC #8626 created (invoice template save button broken) | John |
| 2026-04-21 13:33 | MC #8627 created (invoice PDF download fails on unsaved invoice) | John |
| 2026-04-21 13:33 | MC #8628 created (settings logo upload missing) | John |
| 2026-04-21 13:46 | All 3 tasks marked ready_for_review (commit d408cc6 + 53fe1d6) | Brad Frost (Vizu) |
| 2026-04-22 09:00 | CEO: "Bilko demo nije updatan, bugs jos uvijek tu" | Alem |
| 2026-04-22 09:10 | Discovery: All fixes pushed to feat/intesa-bih-demo (no CI on that branch) | John |
| 2026-04-22 09:15 | Verification via curl + git log: main unchanged, bilko-demo.alai.no serving old code | John |
| 2026-04-22 09:36 | MC #8678 created: /intesa-bridge leak discovered (HTTP 200 on public demo) | John |
| 2026-04-22 10:00 | CI investigation: Last 5 runs all failed (since 2026-04-15) | Kelsey (FlowForge) |
| 2026-04-22 10:36 | MC #8696 created: ZAKON PI2 Deploy Verification Protocol | John |
| 2026-04-22 12:00 | Manual deploy attempt: GitHub PAT missing workflow scope (can't trigger CI fix) | FlowForge |
| 2026-04-22 12:50 | Manual docker build + push (CEO hands off to FlowForge) | Alem + FlowForge |
| 2026-04-22 21:41 | MC #8730 done: fix-bugs-22apr deployed, all 4 evidence checks pass | FlowForge |
| 2026-04-22 21:50 | MC #8678 code fix pushed (66d2220): intesa routes deleted from main | Brad Frost |
Impact
User-Facing
- Bilko demo bugs: Persisted for 1 extra day (low severity — internal demo, no external users)
- Intesa content leak: Unknown duration (potentially days) — BiH bank integration pitch content publicly accessible at /intesa-bridge on bilko-demo.alai.no
Internal
- CEO time lost: ~2 hours (debugging + manual deploy)
- Trust erosion: "Validacija ne radi" feedback — John claimed done without verifying live state
- CI health invisible: 7 days of broken deploys undetected
Root Causes (5 Failures)
1. Branch Assumption (No Pre-Flight Verification)
What happened: John inferred target branch from memory (assumed feat/intesa-bih-demo based on last session), dispatched builder without running curl -sI + git log to verify which branch serves bilko-demo.alai.no.
Why it matters: Wrong branch = wrong deploy target. All fixes landed on isolated feature branch with no CI and no domain mapping.
Prevention: ZAKON PI2 Check 2 — 4 pre-flight commands mandatory BEFORE code changes.
2. CI Broken for 7 Days Undetected
What happened: GitHub Actions workflow failing since 2026-04-15. No one noticed because:
- No daily CI health check in boot.sh
- Manual deploys used as workaround without logging CI status
gh run listnot part of standard deploy checklist
Root cause:
- GitHub Actions quota exhausted (monthly minutes limit)
--no-trafficflag on line 206 of gcp-deploy.yml prevents traffic promotion on existing services
Prevention: ZAKON PI2 Check 4 — gh run list --limit 5 before any push. If 5/5 = failure, STOP and fix CI first.
3. Intesa Content Leaked to Public URL
What happened: Commit 13c2efb merged /intesa-bridge and /intesa-cockpit routes to main branch. These were pitch-specific features for Dženana Hardaga (Intesa BiH IT director) and should have remained isolated on bilko-intesa-demo Cloud Run service.
Why it matters: Client-specific content (including BiH bank integration mockups) publicly visible on generic demo. Potential NDA violation + confusing UX for non-Intesa visitors.
Prevention:
- ZAKON PI2 Check 3 — Branch Purity CI check (
.github/workflows/branch-purity.yml) - Client prefix registry in
~/system/rules/client-prefix-registry.md - Automated check blocks PR merge if
intesa-*,corpint-*, etc. routes detected on main
4. PAT Missing workflow Scope
What happened: GitHub Personal Access Token used for CI fixes lacked workflow scope. FlowForge couldn't push branch-purity.yml or fix gcp-deploy.yml via automation.
Why it matters: Blocked automated CI repair. Forced manual workarounds + CEO paste-copy anti-pattern.
Prevention: ZAKON PI2 Check 6 — gh auth status --show-token at session start. Verify repo, workflow, packages:write scopes present.
5. Manual Paste-Copy Anti-Pattern
What happened: CEO built docker image locally, pasted output to John, who pasted to FlowForge agent. FlowForge took over from "image already built" state instead of owning full build→push→deploy flow.
Why it matters: Process fragmentation = more failure points. Agent can't verify build context, dockerfile, or .dockerignore changes if it didn't run the build.
Prevention: Always dispatch FlowForge BEFORE build step. Agent owns entire flow or none of it.
What Went Well
- Kelsey persona diagnosis: FlowForge correctly identified --no-traffic flag as root cause within 10 minutes of investigation
- ZAKON PI2 authored mid-incident: Turned incident into system improvement without waiting for postmortem
- .dockerignore fix: Reduced build context from 4.1GB → 50MB (8200% improvement) during incident resolution
- Evidence gate upheld: MC #8730 not marked done until curl + Playwright + revision checks passed
- Blameless culture: No punishment for agents; root cause analysis focused on system gaps
Action Items
| Action | Owner | MC Task | Deadline | Status |
|---|---|---|---|---|
| Sync ZAKON PI2 to BookStack | pi-orchestrator | #8718 | 2026-04-23 | PAUSED |
| Create DEPLOY-MAP.md in Bilko repo | Skillforge | #8715 | 2026-04-23 | DONE |
| Bake PI2 checks into pi-orchestrator v2 | pi-orchestrator | #8696 (item 3) | 2026-04-29 | IN PROGRESS |
| Add pre-deploy hook (~/.claude/hooks/pre-deploy-check.sh) | pi-orchestrator | #8696 (item 4) | 2026-04-29 | DONE |
| Patch mc.js done with evidence gate for H-priority deploy tasks | pi-orchestrator | #8696 (item 5) | 2026-04-29 | DONE |
| Create client-prefix-registry.md | pi-orchestrator | #8696 (item 7) | 2026-04-29 | DONE |
| Fix GitHub Actions quota (upgrade plan or optimize workflows) | John | TBD | 2026-05-01 | OPEN |
| Remove --no-traffic flag from gcp-deploy.yml for existing services | FlowForge | TBD | 2026-04-30 | OPEN |
| Upgrade GitHub PAT with workflow scope | John | TBD | 2026-04-25 | OPEN |
| Weekly CEO audit of mc.js --ceo-override usage | John | #8696 (item 8) | Ongoing | OPEN |
Lessons Learned
For John (Orchestrator)
- Never infer deploy target from memory. Always run curl + git log + gh run list before dispatching builder.
- CI health = system health. Broken CI for 7 days = broken deployment capability. Monitor actively.
- Claim verification: "Task done" without live URL verification = hallucination. CEO was right: "validacija ne radi."
For Builder Agents (Brad Frost, Vizu)
- Ready for review ≠ deployed. Code pushed to branch ≠ code live on target URL. Always verify deploy target match.
- Client-specific routes: If building intesa-*, corpint-*, etc. — verify target branch is NOT main before merging.
For FlowForge (DevOps)
- Own the full flow. If dispatched for deploy, own build→push→deploy→verify. Don't take over mid-stream from CEO paste-copy.
- --no-traffic flag: Only use on first-ever deploy. Never on existing services (blocks traffic promotion).
System-Level
- ZAKON PI2 works. All 5 root causes preventable with 6 hard checks. Enforce at agent level + hook level + MC gate level.
- Evidence gates prevent false claims. mc.js enforcement (item 5 of #8696) blocks "done" without verification.json.
- Blameless postmortems → system rules. This incident produced ZAKON PI2, DEPLOY-MAP.md standard, and client-prefix-registry. Net positive.
Related Rules Created
- ZAKON PI2:
~/system/rules/zakon-pi2-deploy-verification.md(BookStack synced) - Client Prefix Registry:
~/system/rules/client-prefix-registry.md - Pre-Deploy Hook:
~/.claude/hooks/pre-deploy-check.sh - Feedback Log:
~/.claude/projects/-Users-makinja/memory/feedback_verify_deploy_target_before_code.md
Metrics
- Incident duration: 32 hours (2026-04-21 13:46 → 2026-04-22 21:41)
- CEO time lost: ~2 hours
- Root causes identified: 5
- New rules created: 4
- MC tasks spawned: 10 (parent #8696 + 7 subtasks + 3 original bugs)
- Lines of ZAKON PI2: 136
- Evidence files generated: 11 (verification.json + 4 PNG + 6 TXT)
Follow-Up
Next review: 2026-04-29 (PI2 implementation deadline)
Owner: John
Success criteria: All 8 items in MC #8696 marked done + CI health green for 7 consecutive days
Postmortem by ALAI Skillforge, 2026-04-22
Credit: ALAI, 2026
pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile
pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile
Created: 2026-05-02
MC References: #10063 (phantom fix), #10517 (true fix)
Daemon: com.john.pi-orchestrator (currently STOPPED, reactivation pending CEO Step 3)
Symptom
John's H-priority tasks were being auto-paused without user action. The pi-orchestrator daemon would intercept high-priority john tasks and route them through queueForHuman instead of executing them, creating a silent work-stoppage pattern.
Investigation Finding — Phantom Fix in MC #10063
MC #10063 (2026-04-XX) claimed to fix the auto-pause behavior by adding configuration flags:
skip_interactive_owners: ["john", "alem"]interactive_grace_seconds: 300
Problem: These config keys were specified in the task's acceptance criteria and marked COMPLETE, but were never actually written to ~/system/config/pi-orchestrator-config.json.
Anti-pattern identified: "Proveo PASS but code doesn't match documentation" — the validation passed based on spec intent rather than verifying actual configuration state.
True Root Cause
The mechanism actually auto-pausing john H-tasks was a dead fallback block in ~/system/kernel/pi-orchestrator.js:
// Original lines 3409-3421 (13 lines, now removed)
if (!selectedTask) {
// Fallback: check for john tasks
const johnTask = execSync(
'node ~/system/tools/mc.js next-task --owner john',
{ encoding: 'utf8' }
).trim();
if (johnTask) {
queueForHuman(johnTask);
return null;
}
}
When task selection failed (empty queue or filter mismatch), this fallback would:
- Synchronously fetch the next john task via
mc.js next-task --owner john - Queue it for human review via
queueForHuman() - Return
null, preventing execution
This created the observed auto-pause behavior regardless of the missing config flags.
Fix Applied — MC #10517
Date: 2026-05-02
Builder: Codecraft
Validator: Proveo
Changes:
- Configuration reconciliation — Added missing flags to
~/system/config/pi-orchestrator-config.jsonat lines 93-94:"skip_interactive_owners": ["john", "alem"], "interactive_grace_seconds": 300 - Dead fallback removal — Replaced the 13-line execSync fallback block in
~/system/kernel/pi-orchestrator.js(original lines 3409-3421) with a 4-line comment + null return:// No fallback to john tasks — auto-pause removed per MC #10517. // Configuration now controls interactive routing via skip_interactive_owners. log('No task selected; returning null.'); return null;
Verification
Proveo validation: APPROVED 2026-05-02
Acceptance Criteria: 4/4 PASS
- AC1:
pi-orchestrator-config.jsoncontainsskip_interactive_owners: ["john", "alem"]✅ - AC2:
pi-orchestrator-config.jsoncontainsinteractive_grace_seconds: 300✅ - AC3: Dead fallback block removed from
pi-orchestrator.js(lines 3409-3421 replaced) ✅ - AC4: No
execSynccall tomc.js next-task --owner johnin the selection path ✅
Evidence:
- Config diff:
git diff ~/system/config/pi-orchestrator-config.json - Code diff:
git diff ~/system/kernel/pi-orchestrator.js - No remaining
queueForHumancalls in fallback path:grep -n "queueForHuman" ~/system/kernel/pi-orchestrator.jsshows only intentional usage in interactive routing logic
Daemon State
Current state: com.john.pi-orchestrator is STOPPED (unloaded via launchctl unload).
Reactivation: Pending CEO Step 3 directive. DO NOT restart daemon until explicitly approved — this is part of a phased rollout to validate the fix does not introduce regression.
To check status:
launchctl list | grep pi-orchestrator
# Empty output = daemon not loaded
To restart (when authorized):
launchctl load ~/Library/LaunchAgents/com.john.pi-orchestrator.plist
tail -f ~/system/logs/pi-orchestrator.log
Cross-References
- MC #10063: Original task claiming fix (phantom — config never written, Proveo validated spec not state)
- MC #10517: True fix reconciling config + removing dead fallback (Proveo APPROVED 2026-05-02)
- Related pattern: Feedback memo
feedback_task_description_state_verify.md— agents must tool-verify state before writing it into MC descriptions or acceptance criteria
Lessons
- Proveo must verify actual state, not spec intent. A config flag in the task description ≠ the flag exists in the file.
- Dead code can be the true mechanism. The "fix" in #10063 was irrelevant because the real culprit was a fallback block that ran regardless of config.
- Daemon restart ≠ verification. Stopping the daemon masked the symptom but didn't prove the fix. Reactivation under observation is the true test.
Generated by Skillforge for MC #10517 documentation deliverable. HiveMind sync pending.
Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)
Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)
Context
Problem: Pi-orchestrator was auto-generating GOTCHA docs at two sites, bypassing ZAKON #25 quality gate (H/BLOCKER → /prompt-forge → /mehanik). Pi-orch is NOT the authority for /prompt-forge work.
The Two Sites Removed
Site 1: Pre-Spawn Auto-Gen (Step 4.55)
- Fabricated GOTCHA before spawn so spawn-gate would permit task dispatch
- Violated /prompt-forge exclusivity for H/BLOCKER tasks
- REMOVED
Site 2: Post-Spawn Synthesis
- Fabricated GOTCHA after agent ran, based on proof-of-work artifacts
- Papered over agent omissions (agent's failure → pi-orch's rescue)
- Rationale for removal: Agent omission IS agent failure; pi-orch should not mask it
- REMOVED
Replacement Behavior
GOTCHA Missing Pre-Spawn
- mc.js blocks task with reason:
"awaiting_forge: GOTCHA doc missing — run /prompt-forge {id} first, then unblock" - Task stays
blockeduntil human review unblocks
GOTCHA Missing Post-Spawn
- mc.js blocks task with reason:
"agent omitted GOTCHA file — needs /prompt-forge and human review" - Task stays
blockeduntil human review unblocks
Status Note
mc.js does NOT have awaiting_forge as first-class status — used blocked with reason-prefixed text. Future enhancement: add awaiting_forge status (track in separate MC if scope warrants).
Current State
- Daemon STOPPED
- Code lands cold
- No production behavior change yet
Test Plan
- 7 tests at
~/system/tests/pi-orch-await-forge.test.js - 23 regression tests at
~/system/tests/spawn-gate.test.js - Run:
node --test ~/system/tests/pi-orch-await-forge.test.js
Change Genesis
- Pi-orch hardening Talas 3 (parent thread #10043 reform)
- Depends on α #10548 (Spawn Gate Node-Side Parity)
Cross-Reference
- MC #10548: Spawn Gate Node-Side Parity
- Parent: Pi-orchestrator auto-pause root cause + #10063 reconcile
Last updated: 2026-05-04 | Part of pi-orch hardening Talas 3
Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887
Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887
TL;DR
Email-agent.js silently dropped SEEN-flagged messages for 9+ days (2026-05-14 → 2026-05-23) due to HIMALAYA_DISABLED=1 forcing a fallback code path that filtered { seen: false }. This caused 17 missed messages across 5 accounts, including 2 paying-client-class emails (Asmir Merdžanović SEO work, cynthia.li medical contact). Fixed by replacing SEEN filter with date-range + DB dedup. Backfilled all missed messages, added audit tool, deployed hourly monitoring LaunchAgent.
Incident Timeline (UTC)
- 2026-05-14 → Newest alai/INBOX DB row before gap
- 2026-05-23 13:26 → Asmir Merdžanović email arrives at alai/INBOX uid=6, server already flags SEEN
- 2026-05-23 18:49 (CEST 20:49) → John boot detects DB:0 IMAP:1 gap during inbox-pending sweep
- 2026-05-23 ~21:00 → MC #101887 created, gate cleared, ST1-ST4 dispatched
- 2026-05-23 ~21:22 → ST3 backfill complete, 17 messages ingested
- 2026-05-23 ~21:26 → ST6 (this documentation) initiated
Root Cause
File: /Users/makinja/system/daemons/email-agent.js
Original code (lines 638-644, pre-fix): The fetchUnseenLegacy function used { seen: false } as its IMAP fetch filter, which translates to an IMAP SEARCH UNSEEN query. Any message already flagged \Seen on the server (e.g., by mobile client, webmail, or Outlook auto-marking) was invisible to this query.
const messages = client.fetch(
{ seen: false }, // ← PROBLEM: excludes SEEN messages
{ uid: true, envelope: true }
);
Trigger chain:
- LaunchAgent plist
/Users/makinja/Library/LaunchAgents/com.john.email-agent.plistsetsHIMALAYA_DISABLED=1as hard environment variable - This forces all accounts to fall back to
fetchUnseenLegacyinstead of the saferfetchAllRecentpath (which was introduced in MC #6832 to solve exactly this class of problem) - When
alem@alai.nois also accessed via mobile/web client, incoming messages are auto-flagged\Seenbefore daemon's next 5-minute cycle - Daemon runs every 5 minutes, sees 0 unseen, logs "alai: 0 unseen envelopes fetched", and continues — no alarm, no visibility
Why it went undetected: The daemon logs showed normal execution (no errors, no timeouts), just consistently 0 results for the alai account. The pattern looked like "no new email" rather than "email silently dropped."
Fixed code (lines 638-684, post-fix): Replaced { seen: false } with date-range filter { since: } + DB deduplication by UID set lookup:
// MC #101887 fix: SEEN filter caused 9-day gap. Switched to date-range + DB dedup.
const lookbackDays = parseInt(process.env.EMAIL_AGENT_LOOKBACK_DAYS || '7', 10);
const sinceDate = new Date(Date.now() - lookbackDays * 24 * 60 * 60 * 1000);
// Load existing UIDs for this account from DB to enable dedup
const db = emailInbox.getDb();
const existingUids = new Set(
db.prepare("SELECT message_id FROM emails WHERE account = ?").all(boxLabel).map(r => {
const m = r.message_id.match(/-uid-(\d+)$/);
return m ? parseInt(m[1], 10) : null;
}).filter(Boolean)
);
// Fetch envelopes only — date-range avoids SEEN-flag blind spot
const messages = client.fetch(
{ since: sinceDate }, // ← FIX: fetch all messages in date range
{ uid: true, envelope: true }
);
for await (const msg of messages) {
// Dedup: skip if UID already in DB
if (existingUids.has(msg.uid)) continue;
// ... insert logic
}
Impact Assessment
- Total missed: 17 messages across 5 accounts in 30-day lookback window
- Paying-client-class misses:
- Asmir Merdžanović (asmirmc@gmail.com) — "Potrebne informacije." re: 2 new SEO clients (alai/INBOX uid=6, john/INBOX uid=134)
- cynthia.li@jamrmed.com (Shenzhen Jamr Medical) — "New contact-Shenzhen Jamr" (john/INBOX uid=114)
- Informational/system misses: 13+ messages including Google Cloud alerts, TLDR newsletters, GitHub notifications, Cloudflare alerts
- Duration:
- alai account: 9 days (2026-05-14 → 2026-05-23)
- alem account: 11+ days (2026-05-13 → ongoing, separate IMAP connection failure)
- Accounts affected: alai (1 missed), dev (3 missed), john (13 missed); info/alem had no IMAP-side new messages in window (alem broken for separate reason)
Fix Applied
- Code fix:
~/system/daemons/email-agent.jslines 638-725 — replaced{ seen: false }with{ since: }+ DB dedup via UID set lookup (idempotent, safe for overlapping runs) - Backfill: 17 missed messages ingested via
~/system/tools/email-backfill-from-audit.js— used audit JSON as source of truth, patched subject/from metadata in 14 cases where IMAP envelope fetch failed (tool is idempotent, safe to re-run) - New audit tool:
~/system/tools/email-imap-db-audit.js— enumerates IMAP UIDs vs DB UIDs per account+folder for configurable N-day window, outputs JSON diff with missed UID samples - Monitoring LaunchAgent:
~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist+ wrapper~/system/tools/email-ingest-monitor.sh— runs hourly, executes audit tool, fires Slack #exec alarm whentotal_missed > 0
Remaining Open Items (NOT yet fixed)
- alem@alai.no IMAP connection broken since 2026-05-13 — credentials load OK from Vault, but server rejects connection with "Command failed" (no detailed error exposed by ImapFlow). Needs separate MC task for IMAP diagnostics + credential rotation test.
- Monitor LaunchAgent NOT auto-loaded — file exists at correct path, but launchctl does not auto-load new plists without manual intervention. CEO must run:
launchctl load -w ~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist(permission constraint, cannot be automated without sudo/TCC access). - HIMALAYA_DISABLED env flag still active in
com.john.email-agent.plist— the fix madefetchUnseenLegacysafe, but ideally the himalaya path should be vetted and re-enabled to reduce IMAP connection load. - 3 john/INBOX uids (61, 69, 71) backfilled with placeholder metadata — IMAP
fetchOnereturned "Command failed" for envelope fetch, so subject/from are "(no subject)" / empty. These need separate IMAP range-fetch backfill to recover actual metadata from server.
Reproduction / Detection Commands
# Detect the gap
node ~/system/tools/email-imap-db-audit.js
cat /tmp/alai/email-ingest-gap/imap-db-diff-30d.json | jq .summary
# Trigger monitor manually
launchctl kickstart -k gui/$(id -u)/com.alai.email-ingest-monitor
# Re-run backfill (idempotent)
node ~/system/tools/email-backfill-from-audit.js
# Check daemon status
launchctl list | grep email
tail -100 ~/system/logs/email-agent.log
# Test audit in verbose mode
node ~/system/tools/email-imap-db-audit.js --verbose
Lessons / Preventive Actions
- Silent skips are P0: Any code path that filters IMAP results without an alarm when count drops to 0 unexpectedly = future incident. The daemon should have emitted a warning when alai account returned 0 unseen for >7 consecutive cycles (35+ minutes) given its historical delivery rate.
- SEEN flag is not under our control: Any mobile/web client can pre-read messages and set
\Seenbefore the daemon polls. The ingest pipeline must not assumeUNSEEN = unread-by-us. Date-range + DB dedup is the only reliable pattern. - Audit > trust: ST2 audit revealed a 2nd unrelated paying-client miss (cynthia.li) we wouldn't have known about without full IMAP-vs-DB enumeration. Periodic audits should be part of email-agent health checks.
- Fallback paths are production code: The
fetchUnseenLegacypath was treated as a temporary fallback but ran in production for weeks/months withHIMALAYA_DISABLED=1. All fallback paths must have equal quality gates (logging, alarms, safety checks) as primary paths. - Monitoring must be fail-closed: The new monitor LaunchAgent is valuable, but it's not yet loaded (manual step required). For future daemons, the deploy checklist must verify LaunchAgent is loaded AND firing test alarms.
Related Artifacts
- MC: #101887 (this fix), supersedes #101886
- Triggering email evidence:
/tmp/alai/john-boot-20260523T1441/asmir-search.log - RCA:
/tmp/alai/email-ingest-gap/root-cause.md - Audit JSON:
/tmp/alai/email-ingest-gap/imap-db-diff-30d.json - Backfill log:
/tmp/alai/email-ingest-gap/backfill-run.log - Monitor runs:
/tmp/alai/email-ingest-gap/monitor-runs.log - Code fix:
~/system/daemons/email-agent.jslines 638-725 - Tools created:
~/system/tools/email-imap-db-audit.js(audit)~/system/tools/email-backfill-from-audit.js(backfill)~/system/tools/email-ingest-monitor.sh(monitor wrapper)
- LaunchAgent:
~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist
Technical Details
Missed Messages Breakdown (30-day window, all accounts)
| Account | Folder | Missed Count | Sample UIDs | Notes |
|---|---|---|---|---|
| alai | INBOX | 1 | 6 | Asmir email re: SEO clients |
| dev | INBOX | 3 | 4, 7, 11 | Google Cloud Logging alerts |
| john | INBOX | 13 | 61, 69, 71, 72, 79, 80, 82, 83, 88, 99, 102, 114, 134 | Mix: GitHub, TLDR, Cloudflare, cynthia.li, Asmir |
| info | INBOX | 0 | — | No new IMAP messages in window |
| alem | INBOX | N/A | — | IMAP connection broken, cannot audit |
Backfill Execution Summary
- Total inserted: 17 (first run)
- Total patched: 14 (second run — corrected subject/from metadata)
- Total skipped: 3 (UIDs 61, 69, 71 had no audit sample metadata, kept placeholder)
- Tool runs: 3 (idempotent, each run refined metadata)
Monitor Configuration
LaunchAgent: com.alai.email-ingest-monitor
- Schedule: Hourly (StartCalendarInterval)
- Command:
~/system/tools/email-ingest-monitor.sh - Output:
~/system/logs/email-ingest-monitor.log - Alarm channel: Slack #exec
- Trigger condition:
total_missed > 0in audit JSON - Status: Plist exists, NOT loaded (manual load required)
Sign-off
Documented by: Skillforge (ALAI agent)
Date: 2026-05-23
MC Task: #101887 ST6
Status: Fix deployed, backfill complete, monitoring deployed (pending manual load)
ALAI Mail Topology — Migadu Domains, Mailbox Inventory, John's 19-Account Ingest Loop (2026-06-08)
ALAI Mail Topology & John's Email Ingest Loop
Last updated: 2026-06-08 (v2 — 19 accounts, daemon-path docs, himalaya touch-points) | MC: #103182 | Built by: FlowForge | Validated by: Proveo (Angie Jones) — PASS
1. Mail Infrastructure — Migadu (Single Account)
All ALAI product domains are hosted on one Migadu account. MX records for every domain point to the same two servers:
aspmx1.migadu.com(priority 10)aspmx2.migadu.com(priority 20)
Domains on this account: alai.no, bilko.io, bilko.cloud, bilko.company, snowit.ba, basicconsulting.no, basicfakta.no, lumiscare.com
Migadu Admin Access
| Item | Value / Location |
|---|---|
| Admin login | alem@alai.no |
| API key | Vaultwarden item "migadu keyy" (86-char token — do NOT print) |
| IMAP host | imap.migadu.com |
| SMTP host | smtp.migadu.com |
| Web UI | https://admin.migadu.com |
Migadu API Quirks (DO NOT FORGET)
- GET aliases — response key is
address_aliases, notaliases. - Create alias — must send JSON body
{"local_part": "...", "destinations": ["..."]}with headerAccept: application/json(omitting Accept = HTML response, silent fail). - Alias destinations MUST be same-domain. Cross-domain targets (e.g.
info@alai.no → john@basicconsulting.no) return HTTP 400. Route to a real mailbox on the same domain instead. - No catch-all rewrites — verified via
/rewritesendpoint (empty on all domains). Any email to a non-existent local-part that has no alias bounces. - App-passwords for new mailboxes are created via
PUT /v1/domains/{domain}/mailboxes/{local_part}and stored as Vaultwarden items (never in logs). - Migadu catch-all copy (alem@alai.no): alem@alai.no is configured as a global catch-all copy recipient for all outgoing ALAI-managed-domain mail. This means emails sent FROM any ALAI account will also appear in alem's INBOX. Because
alemiterates before product accounts in the daemon list, it ingests those Message-IDs first; the UNIQUE(message_id) constraint causes product-account inserts to be no-ops. This affects ingest attribution for ALAI-origin probes only — external (non-ALAI) mail is not affected. See Section 6 for forwarding removal note.
2. Real Mailbox Inventory
These are the real mailboxes that exist in Migadu (verified 2026-06-08 via admin API). Only real mailboxes can be used as alias destinations.
| Domain | Real mailboxes (local parts) |
|---|---|
alai.no | john, alem, dev, post, admin |
bilko.io | admin, sales, privacy |
bilko.cloud | admin, sales |
bilko.company | admin, sales |
snowit.ba | admin, info, asmir, enis |
basicconsulting.no | john, info |
lumiscare.com | hello, admin |
Note: basicfakta.no is on this Migadu account but has no actively polled mailboxes in John's loop.
Note: lumiscare.com is ALAI's Migadu domain (our infrastructure). It is distinct from caresafetyinnovations.com, which remains a hard-stop boundary (see Section 6).
3. John's Email Ingest — All 19 Monitored Accounts
John's email ingest is managed by ~/system/tools/email-inbox.js and polled by ~/system/daemons/email-agent.js. As of MC #103182 final state (2026-06-08), 19 accounts are registered in email-inbox.db → email_accounts.
Original 6 Accounts (pre-MC #103182)
| Account name (DB key) | Email address | Vault item |
|---|---|---|
john | john@basicconsulting.no | existing |
info | info@basicconsulting.no | existing |
alai | john@alai.no | existing |
dev | dev@alai.no | existing |
alem | alem@alai.no | existing |
gmail | alembasic@gmail.com | existing |
11 Product/Role Accounts (added MC #103182 round 1)
| Account name (DB key) | Email address | Vault item name |
|---|---|---|
post-alai | post@alai.no | Migadu — post@alai.no |
admin-alai | admin@alai.no | Migadu — admin@alai.no |
sales-bilko-io | sales@bilko.io | Migadu — sales@bilko.io |
privacy-bilko-io | privacy@bilko.io | Migadu — privacy@bilko.io |
admin-bilko-io | admin@bilko.io | Migadu — admin@bilko.io |
sales-bilko-cloud | sales@bilko.cloud | Migadu — sales@bilko.cloud |
admin-bilko-cloud | admin@bilko.cloud | Migadu — admin@bilko.cloud |
sales-bilko-company | sales@bilko.company | Migadu — sales@bilko.company |
admin-bilko-company | admin@bilko.company | Migadu — admin@bilko.company |
info-snowit | info@snowit.ba | info@snowit.ba IMAP |
admin-snowit | admin@snowit.ba | Migadu — admin@snowit.ba |
2 LumisCare Accounts (added MC #103182 round 2 — CEO directive 2026-06-08)
CEO directive: LumisCare must be in John's reading loop. lumiscare.com is ALAI's own Migadu domain — these are operational mailboxes, not CareSafety-boundary addresses.
| Account name (DB key) | Email address | Vault item name |
|---|---|---|
hello-lumiscare | hello@lumiscare.com | Migadu — hello@lumiscare.com |
admin-lumiscare | admin@lumiscare.com | Migadu — admin@lumiscare.com |
Note on hello@lumiscare.com forwarding: A Migadu direct forward from hello@lumiscare.com → alem@alai.no was active since 2026-05-24. This was removed 2026-06-08 so the mailbox is polled directly under hello-lumiscare with clean labeling. Before removal, LumisCare contact mail appeared in the DB under alem (Migadu ingested the forwarded copy first). After removal, external mail to hello@lumiscare.com is stored under hello-lumiscare only. Confirmed behaviourally: gmail-origin probe stored as DB id=9195 under hello-lumiscare, not duplicated under alem.
App-passwords for the 5 newly created admin@* mailboxes (round 1) were generated via the Migadu API and stored as Vaultwarden items. Vault IDs: 558181ec, 8dfe8d2d, 2f38a16a, 7d0f9216, 2fb07c20.
4. Alias Map — Dead-Address Fixes (2026-06-08)
The following addresses were previously advertised (on websites, legal pages, landing pages) but did not correspond to any real mailbox — all mail to them was silently bouncing. Migadu aliases were created to route them to the nearest real same-domain mailbox.
| Dead address (was bouncing) | Now routes to | Why |
|---|---|---|
info@alai.no | john@alai.no | alai.no contact form was sending to this dead address — all website contact submissions were lost |
support@bilko.io | sales@bilko.io | bilko.io landing mailto link |
podrska@bilko.io | sales@bilko.io | bilko.io Bosnian support address on legal/terms pages |
legal@bilko.io | admin@bilko.io | bilko.io legal/terms page |
security@bilko.io | admin@bilko.io | bilko.io security disclosure address |
support@bilko.cloud | sales@bilko.cloud | bilko.cloud landing mailto |
support@bilko.company | sales@bilko.company | bilko.company landing mailto |
Pre-fix state: Only postmaster@{domain} → admin@{domain} aliases existed. No rewrites, no catch-all. All other non-existent local-parts bounced.
Post-fix: All advertised addresses now deliver to a real monitored mailbox. Nothing bounces.
5. Contact-Form Routing
| Product | Contact form path | Where mail ends up |
|---|---|---|
| alai.no website | Vercel serverless: ~/business/ALAI-Holding-AS/web/api/contact.js (nodemailer) |
Sends to info@alai.no (which now aliases to john@alai.no — monitored). Was dead before 2026-06-08 fix. |
| Bilko landing pages | Cloudflare Pages function: apps/landing-*/functions/api/lead.js |
Posts to Slack #ceo channel (C0AFJDP9V6U) + writes to Cloudflare KV (BILKO_LEADS). No email path — separate from IMAP polling. |
6. Boundary Accounts — NOT Polled (intentional)
| Address | Reason not polled |
|---|---|
asmir@snowit.ba | Personal mailbox belonging to Asmir (SnowIT partner). He reads his own mail. |
enis@snowit.ba | Personal mailbox belonging to Enis. Same reason. |
Any *@caresafetyinnovations.com | CareSafety hard-stop boundary — health/patient-adjacent service under external ownership. NOT on ALAI's Migadu account. Never poll. See CareSafety boundary memo in MEMORY. |
Important distinction: lumiscare.com (ALAI's Migadu domain — hello@, admin@) IS polled. caresafetyinnovations.com (external operator) is the hard boundary, not lumiscare.com.
7. Daemon Architecture — Production Path
Understanding the daemon path is critical when debugging ingest issues or adding accounts.
Production Execution Path
- LaunchAgent:
com.john.email-agent— startsemail-agent-wrapper.sh, setsHIMALAYA_DISABLED=1via plistEnvironmentVariableskey. - Wrapper:
~/system/daemons/email-agent-wrapper.sh— thin shell wrapper, does not set HIMALAYA_DISABLED itself. - Daemon:
~/system/daemons/email-agent.js— whenHIMALAYA_DISABLED=1, all 19 accounts use the legacy unseen-fetch IMAP path (direct node-imap, proven stable).
Himalaya Layer — Present but Bypassed in Production
Even with HIMALAYA_DISABLED=1, the daemon still routes account resolution through himalaya-adapter.js ACCOUNT_MAP. If an account name is missing from ACCOUNT_MAP, the daemon throws Unknown account: <name> and the account is skipped entirely.
- himalaya-adapter.js ACCOUNT_MAP — must list all 19 accounts (currently L34–56).
- ~/.config/himalaya/config.toml — must have 19
[accounts.*]stanzas (verified: grep count = 19). - When run without
HIMALAYA_DISABLED=1(bare wrapper invocation), the himalaya binary is called and times out after 120s per account (~82 min total for 19 accounts). This is expected and non-destructive but slow. Production LaunchAgent always sets the env flag.
Validated (2026-06-08T13:15Z): Zero "Unknown account" errors in both daemon runs (wrapper + legacy). All 19 accounts have last_checked_at = 2026-06-08T13:09:39Z.
8. Components — All 8 Touch-Points
Adding any new account requires updating all 8 of the following. Missing any one will cause silent failures or "Unknown account" errors.
| # | File | What to change |
|---|---|---|
| 1 | ~/system/tools/email-inbox.js |
(a) Add INSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>') seed row.(b) Add a guarded migration block to extend the emails table CHECK constraint to include the new account name. The CHECK constraint is hardcoded and cannot be altered without rebuilding the table (SQLite limitation). The guard must use a unique string from the new account name (e.g. !ddlRow.sql.includes("'<name>'")). All existing rows and all 25 columns must be preserved in the rebuilt table. This is the most error-prone step — see Section 9 for the gotcha detail.
|
| 2 | ~/system/tools/mail-native.js |
Add account-name → Vaultwarden item-name entry in VAULT_NAMES map. |
| 3 | ~/system/tools/himalaya-adapter.js |
Add account-name → email entry in ACCOUNT_MAP (L34–56 area). Without this, the daemon throws "Unknown account" and skips the account entirely even in legacy mode. |
| 4 | ~/.config/himalaya/config.toml |
Add a new [accounts.<name>] stanza. Required even when HIMALAYA_DISABLED=1. |
| 5 | ~/system/daemons/email-agent.js |
Add account to counts map (L2459 area). Also confirm it is present in the fetch loop and last_checked_at update loop (both must be mirrored). |
| 6 | ~/system/tools/email-imap-db-audit.js |
Add account to ACCOUNTS constant. |
| 7 | ~/system/tools/email-action-hard-check.js |
Add account to ALL_MONITORED_ACCOUNTS constant. |
| 8 | Vaultwarden (via bw CLI) |
Create app-password item named Migadu — <email> with the IMAP/SMTP password. New admin@ mailboxes require a new app-password generated via Migadu API (PUT /v1/domains/{d}/mailboxes/{lp}). Existing sales@/privacy@/info@ mailboxes may already have creds in Vaultwarden — check before creating. |
Files Changed in MC #103182 (round 1 — 11 accounts)
All files modified additively. Round 1 changed 5 files (himalaya touch-points were added in round 2 as BLOCKER-2 fix).
| File | Lines changed |
|---|---|
email-inbox.js | L159–172 (seeds) + L141–208 (CHECK migration, 17-account guard) |
mail-native.js | L76–88 (11 VAULT_NAMES entries) |
email-imap-db-audit.js | L51 (ACCOUNTS 5→16) |
email-action-hard-check.js | L14–22 (ALL_MONITORED_ACCOUNTS 17 accounts) |
email-agent.js | L1853–1861 (fetch loop), L1889–1895 (last_checked_at loop) |
Files Changed in MC #103182 (round 2 — LumisCare + BLOCKER-2 fix)
| File | Lines changed |
|---|---|
email-inbox.js | L212–311 (second guarded CHECK migration, 19-account guard: !ddlRow2.sql.includes("'hello-lumiscare'")); 2 new email_accounts seed rows |
mail-native.js | L90–91 (hello-lumiscare + admin-lumiscare VAULT_NAMES) |
himalaya-adapter.js | L34–56 (ACCOUNT_MAP expanded to 19 entries) |
~/.config/himalaya/config.toml | 2 new [accounts.*] stanzas (19 total) |
email-agent.js | L1862 (fetch loop), L1899 (last_checked_at loop), L2459–2468 (counts map) |
email-action-hard-check.js | L24 (hello-lumiscare + admin-lumiscare in ALL_MONITORED_ACCOUNTS) |
email-imap-db-audit.js | L60 (both accounts in ACCOUNTS array) |
Known Minor Issue (pre-existing, non-blocking)
After SMTP send via mail-native.js, the IMAP post-send copy to Sent folder times out with ETIMEOUT. Delivery succeeds (Message-ID is logged). This is a cosmetic issue in the IMAP cleanup code — pre-existing, unrelated to MC #103182. Separate MC recommended.
9. GOTCHA — emails Table CHECK Constraint
This is the most dangerous footgun when adding new accounts. Read before touching email-inbox.js.
The emails table in ~/system/databases/email-inbox.db has a hardcoded SQLite CHECK constraint:
account TEXT NOT NULL CHECK(account IN ('john','info','alai','dev','alem','gmail',
'post-alai','admin-alai',
'sales-bilko-io','privacy-bilko-io','admin-bilko-io',
'sales-bilko-cloud','admin-bilko-cloud',
'sales-bilko-company','admin-bilko-company',
'info-snowit','admin-snowit',
'hello-lumiscare','admin-lumiscare'
))
The trap: INSERT OR IGNORE silently discards rows that violate CHECK constraints — no exception is thrown, no warning is logged. If a new account name is not in this list, every email received by that account is permanently lost at ingest time. In MC #103182 this caused 27 real emails to be silently dropped before the issue was caught by Proveo.
The fix: SQLite does not support ALTER TABLE ... MODIFY COLUMN with a new CHECK constraint. The only way to extend it is to rebuild the table:
- Read current DDL:
SELECT sql FROM sqlite_master WHERE type='table' AND name='emails' - Guard the migration: check that the new account name is NOT already in the DDL (idempotency)
- In a transaction:
CREATE TABLE emails_new (...same schema + extended CHECK...)→INSERT INTO emails_new SELECT * FROM emails→ assert row count matches →DROP TABLE emails→ALTER TABLE emails_new RENAME TO emails→ recreate indexes → COMMIT - Rollback on any error or row count mismatch
The pattern already exists in email-inbox.js — follow it exactly. All 25 columns must be listed explicitly, including the post-migration additions: delegated_to, delegated_at, deadline, body, triaged_at, auto_forwarded.
10. Runbook — How to Add a New Mailbox to John's Loop
-
Verify the mailbox exists in Migadu.
Check viaGET /v1/domains/{domain}/mailboxesusing the admin API key ("migadu keyy" in Vaultwarden).
If it does not exist, create it via the admin UI or API first. -
Create an app-password for the mailbox.
Use Migadu admin UI (Mailbox settings > App Passwords) orPUT /v1/domains/{domain}/mailboxes/{local_part}.
Store the password as a new Vaultwarden item namedMigadu — {email}. -
[Touch-point 2] Add to
mail-native.jsVAULT_NAMES map.
Key = your chosen account name (e.g.sales-newdomain), value = the Vaultwarden item name. -
[Touch-point 3] Add to
himalaya-adapter.jsACCOUNT_MAP.
Add'<name>': '<email>'in the ACCOUNT_MAP object. Without this step the daemon throws "Unknown account" and the account is silently skipped. -
[Touch-point 4] Add stanza to
~/.config/himalaya/config.toml.
Follow the existing pattern for a Migadu account stanza. -
[Touch-point 1a] Add the email_accounts seed to
email-inbox.js.
AppendINSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>')in the seed block. -
[Touch-point 1b — CRITICAL] Add a guarded CHECK migration to
email-inbox.jsgetDb().
Read Section 9 first. Guard:!ddlRow.sql.includes("'<name>'"). Extend CHECK to include new account. Rebuild table in a transaction preserving all 25 columns. Test idempotency. -
[Touch-point 6] Add to
email-imap-db-audit.jsACCOUNTS array. -
[Touch-point 7] Add to
email-action-hard-check.jsALL_MONITORED_ACCOUNTS array. -
[Touch-point 5] Add to
email-agent.jscounts map, fetch loop, and last_checked_at loop.
All three locations must be mirrored. -
Run syntax checks on all modified files.
node --check ~/system/tools/email-inbox.js && node --check ~/system/tools/mail-native.js && node --check ~/system/tools/himalaya-adapter.js && node --check ~/system/daemons/email-agent.js -
Test connectivity.
node ~/system/tools/mail-native.js test --account <name>— expect IMAP OK + SMTP OK. -
Restart the email-agent daemon (LaunchAgent:
com.john.email-agent) so the updated accounts array and config take effect. -
Proveo ingest probe.
Send a test email from a non-ALAI sender (e.g. gmail account) with subjectINGEST-PROBE-<name>-<timestamp>. This avoids the Migadu catch-all pre-emption issue (see Section 1 API quirks). Trigger one daemon cycle. Confirm the row appears under the correct account name vianode ~/system/tools/email-inbox.js search "INGEST-PROBE". -
If adding a new alias (not a real mailbox): create the Migadu alias first (same-domain destination only, with
Accept: application/jsonheader). Then proceed from step 3.
11. Validation Evidence (MC #103182 — Final)
Round 1 (17 accounts — 2026-06-08T11:24Z)
| Check | Result |
|---|---|
| Code changes (5 files) verified by Proveo | PASS |
| DB registry — 17 rows in email_accounts | PASS |
| IMAP/SMTP connectivity — 11/11 new accounts | PASS |
| emails table CHECK migration (emails_new rebuild) | PASS — DDL confirmed, 4697 rows preserved |
| Ingest probes — 4/4 probe accounts persist to DB | PASS (round 2 probes after schema fix; DB ids 9052/9056/9057/9059/9062/9063/9064) |
| Regression — original 6 accounts | PASS — counts growing, timestamps advancing |
| No-loop / alias dedup (UNIQUE on message_id) | PASS — 0 duplicate message_ids |
| email-action-hard-check.js exit code | PASS — exit 0, 17 accounts in scope |
Blocker found and fixed during round 1 validation: The emails table had a hardcoded CHECK covering only the original 6 accounts. INSERT OR IGNORE silently dropped 27 real emails before the migration was applied. See Section 9 for the full gotcha description.
Round 2 (19 accounts — LumisCare + daemon path — 2026-06-08T13:15Z)
| Check | Result |
|---|---|
| ACCOUNT_MAP (himalaya-adapter.js) has 19 entries | PASS — L34–56 confirmed |
| config.toml has 19 [accounts.*] stanzas | PASS — grep count = 19 |
| email-agent.js counts map has 19 accounts | PASS — L2460–2468 |
| Zero "Unknown account" errors (wrapper run) | PASS — grep -c = 0 / 40 lines |
| Zero "Unknown account" errors (legacy/production run) | PASS — grep -c = 0 |
| Zero silent drops / CHECK failures (production run) | PASS |
| admin-lumiscare ingest proof | PASS — DB id=9070 under admin-lumiscare |
| hello-lumiscare ingest proof (external sender) | PASS — DB id=9195 under hello-lumiscare (gmail-origin probe) |
| sales-bilko-cloud ingest proof | PASS — DB id=9193 |
| sales-bilko-company ingest proof | PASS — DB id=9194 |
| hello@lumiscare.com forwarding removal (behavioural) | PASS — gmail-origin stored only under hello-lumiscare, not duplicated under alem |
| All 19 last_checked_at fresh | PASS — 2026-06-08T13:09:39Z all accounts |
| No duplicate message_ids | PASS — 0 rows |
| Regression (orig 6 + prior 11) | PASS — row counts growing, timestamps fresh |
Evidence files: /tmp/evidence-103182/flowforge-build.md, /tmp/evidence-103182/proveo-validation.md, /tmp/evidence-103182/daemon-wrapper-run.log, /tmp/evidence-103182/daemon-legacy-run.log