Operations

Runbooks, cold start procedures, service registry, monitoring.

Overview

Operations Overview

Runbooks, cold start procedures, service registry, and monitoring documentation.

Owner: John Last Verified: 2026-02-17

Contents

To be populated from ~/system/ops/

BookStack Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: BookStack

Service Type: Wiki / Knowledge Base Container: bookstack (lscr.io/linuxserver/bookstack:latest) Ports: 6875 (external) → 80 (internal) Internal URL: http://localhost:6875 External URL: http://192.168.68.61:6875 (LAN only, no Cloudflare tunnel yet) Database: MariaDB (bookstack_db) Compose File: ~/system/services/bookstack/docker-compose.yml


Service Info

BookStack is the documentation wiki for BasicAS Group. Stores runbooks, system docs, org info.

Stack:

Access:

API:


Status Check

Container Health

docker ps | grep bookstack

Expected output:

bookstack       Up X hours
bookstack_db    Up X hours

HTTP Check

curl -I http://localhost:6875

Expected: 200 OK or 302 Found

API Check

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/docs.json | head -5

Expected: JSON response with API docs.

Database Check

docker exec bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp -e "SELECT count(*) FROM pages;"

Restart Procedure

Quick Restart (Container Only)

docker restart bookstack

Full Stack Restart (Container + Database)

cd ~/system/services/bookstack
docker compose down
docker compose up -d

Wait 30 seconds, then verify:

docker ps | grep bookstack
curl -I http://localhost:6875

Sync System Docs to BookStack

BookStack is auto-populated from ~/system/ using the sync tool.

Sync All Mapped Content

node ~/system/tools/bookstack-sync.js sync

Sync Single File

node ~/system/tools/bookstack-sync.js sync ~/system/rules/development.md

Check Sync Status

node ~/system/tools/bookstack-sync.js status

Force Overwrite All

node ~/system/tools/bookstack-sync.js push

Mapping File: ~/system/config/bookstack-sync-map.json State File: ~/system/config/bookstack-sync-state.json


Troubleshooting

Problem: Container won't start

Check logs:

docker logs bookstack --tail 100

Common causes:

  1. Database not ready - wait 30s and retry
  2. Port 6875 already bound - check lsof -i :6875
  3. Volume permission issues - check ~/system/services/bookstack/data/

Fix:

cd ~/system/services/bookstack
docker compose down
docker compose up -d bookstack_db
sleep 30
docker compose up -d bookstack

Problem: Can't login (wrong password)

Check if admin credentials were changed in UI:

Reset admin password:

docker exec -it bookstack php /app/www/artisan bookstack:create-admin --email=admin@admin.com --name=Admin --password=newpassword

Problem: API returns 401 Unauthorized

Check token exists:

cat ~/system/config/bookstack.json

Regenerate token in UI:

  1. Login to BookStack
  2. Go to Settings → API Tokens
  3. Create new token
  4. Update ~/system/config/bookstack.json

Problem: Sync tool fails (500 error)

Check BookStack is running:

curl -I http://localhost:6875

Check API endpoint:

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves | head -20

Check logs:

docker logs bookstack --tail 100

Problem: Database connection issues

Check database health:

docker exec bookstack_db mysqladmin -u bookstack -pB4s1cAS_w1k1_2026! ping

Expected: mysqld is alive

Check connection settings:

docker exec bookstack env | grep DB_

Expected:

DB_HOST=bookstack_db
DB_PORT=3306
DB_USERNAME=bookstack
DB_PASSWORD=B4s1cAS_w1k1_2026!
DB_DATABASE=bookstackapp

API Usage

List Shelves

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves

List Books

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/books

List Pages

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/pages

Create Page

curl -X POST -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" \
  -H "Content-Type: application/json" \
  -d '{"book_id":1,"name":"Page Title","markdown":"# Content"}' \
  http://localhost:6875/api/pages

Full API docs: http://localhost:6875/api/docs


Dependencies


Backup

Database Dump

docker exec bookstack_db mysqldump -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d-%H%M%S).sql.gz

Data Volumes (includes uploads, images)

cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d-%H%M%S).tar.gz data/

Restore from Backup

# Stop service
cd ~/system/services/bookstack
docker compose down

# Restore database
gunzip -c ~/backups/bookstack-YYYYMMDD-HHMMSS.sql.gz | docker exec -i bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp

# Restore data (if needed)
cd ~/system/services/bookstack
tar -xzf ~/backups/bookstack-data-YYYYMMDD-HHMMSS.tar.gz

# Start service
docker compose up -d

Configuration

Key Environment Variables

Full config: ~/system/services/bookstack/docker-compose.yml

Application Settings (via UI)


Content Structure

BookStack organizes content as:

Shelf (top-level category)
  └─ Book (collection of pages)
       └─ Page (markdown document)
            └─ Chapter (optional grouping)

Current structure (as of 2026-02-10):


Notes


Last updated: 2026-02-10 Maintained by: John (AI Director)

BookStack MFA Setup

Last Verified: 2026-02-17 | Owner: John

BookStack MFA and API Token Setup

Service: BookStack Knowledge Base URL: http://localhost:6875 or http://192.168.68.61:6875


Overview

This runbook covers:

  1. Setting up Multi-Factor Authentication (MFA) for admin accounts
  2. Creating new API tokens after admin account changes
  3. Security best practices

Prerequisites


Part 1: Enable MFA (Multi-Factor Authentication)

Step 1: Login as Admin

  1. Open browser and navigate to http://localhost:6875
  2. Click "Sign In"
  3. Enter credentials:
    • Email: john@alai.no
    • Password: BkStk_J0hn_2026!Secure

Step 2: Access Account Settings

  1. Click on your profile icon (top-right corner)
  2. Select "Edit Profile" or "My Account"

Step 3: Enable MFA

  1. Scroll to "Multi-Factor Authentication" section

  2. Click "Setup MFA"

  3. Choose method:

    • TOTP (Recommended): Time-based One-Time Password (Google Authenticator, Authy, etc.)
    • Backup Codes: Generate backup recovery codes
  4. For TOTP setup:

    • Scan QR code with authenticator app
    • Enter 6-digit verification code
    • Save backup codes in secure location (~/system/config/bookstack-mfa-backup.txt)
  5. Click "Confirm" to enable MFA

Step 4: Test MFA

  1. Log out
  2. Log back in with same credentials
  3. Verify you're prompted for MFA code
  4. Enter code from authenticator app
  5. Successful login confirms MFA is working

Part 2: Create New API Token

The old API token was invalidated when the default admin@admin.com account was deleted. You need to create a new token for the john@alai.no account.

Step 1: Navigate to API Settings

  1. Login to BookStack as john@alai.no
  2. Click profile icon (top-right)
  3. Select "Edit Profile" or "My Account"
  4. Click on "API Tokens" tab

Step 2: Create Token

  1. Click "Create Token"
  2. Enter token details:
    • Name: System Integration Token
    • Expiry: Never (or set appropriate expiry)
  3. Click "Save"

Step 3: Copy Token Credentials

IMPORTANT: Token secret is only shown once!

You will see:

Copy both values immediately.

Step 4: Update Config File

Update ~/system/config/bookstack.json with new token:

# Edit the config file
nano ~/system/config/bookstack.json

Replace token_id and token_secret with new values:

{
  "url": "http://localhost:6875",
  "external_url": "http://192.168.68.61:6875",
  "token_id": "YOUR_NEW_TOKEN_ID",
  "token_secret": "YOUR_NEW_TOKEN_SECRET",
  "admin_email": "john@alai.no",
  "admin_password": "BkStk_J0hn_2026!Secure",
  "alem_email": "alem@basicconsulting.no",
  "alem_password": "V4YawdA13PdsRBIOtFz9"
}

Save the file (Ctrl+O, Enter, Ctrl+X in nano).

Step 5: Test API Token

# Read token from config
TOKEN_ID=$(cat ~/system/config/bookstack.json | grep token_id | cut -d'"' -f4)
TOKEN_SECRET=$(cat ~/system/config/bookstack.json | grep token_secret | cut -d'"' -f4)

# Test API call
curl -s -H "Authorization: Token $TOKEN_ID:$TOKEN_SECRET" http://localhost:6875/api/shelves

Expected: JSON response with list of shelves.

If you see {"error":{"message":"No matching API token was found"...}}, the token is incorrect.


Part 3: Additional Security Measures

Disable Guest Access (Optional)

If you want to require authentication for all access:

  1. Edit docker-compose.yml:

    cd ~/system/services/bookstack
    nano docker-compose.yml
    
  2. Change:

    - ALLOW_GUEST_ACCESS=true
    

    to:

    - ALLOW_GUEST_ACCESS=false
    
  3. Restart BookStack:

    docker compose restart bookstack
    

Review User Permissions

  1. Login as admin
  2. Go to Settings (gear icon) → Users
  3. Review all user accounts
  4. Set appropriate roles (Admin, Editor, Viewer)
  5. Remove or deactivate unused accounts

Enable Audit Log

  1. Settings → Audit Log
  2. Enable logging of user actions
  3. Review periodically for suspicious activity

Regular Backups

Ensure regular backups are configured:

# Database backup
docker exec bookstack_db mysqldump -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d).sql.gz

# Data backup
cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d).tar.gz data/

Add to daily cron job or LaunchAgent.


Troubleshooting

MFA Not Working

Problem: Can't login with MFA code

Solutions:

  1. Check time sync on server and phone (TOTP requires accurate time)
  2. Use backup codes if available
  3. Reset MFA via database (emergency only):
    docker exec bookstack_db mysql -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp \
      -e "UPDATE users SET mfa_values = NULL WHERE email = 'john@alai.no';"
    

Lost API Token

Problem: Token was not saved and is no longer visible

Solution:

  1. Delete old token in web UI (API Tokens tab)
  2. Create new token (see Part 2)
  3. Update config file

Cannot Access Web UI

Problem: BookStack returns 500 error or won't load

Solutions:

  1. Check container status: docker ps | grep bookstack
  2. Check logs: docker logs bookstack --tail 100
  3. Restart service: cd ~/system/services/bookstack && docker compose restart

Security Best Practices

  1. MFA on all admin accounts - Always enable MFA for admins
  2. Strong passwords - Use 20+ character passwords with mixed case, numbers, symbols
  3. Regular token rotation - Rotate API tokens every 90 days
  4. Least privilege - Give users minimum permissions needed
  5. Audit logs - Review regularly for suspicious activity
  6. Backups - Daily database + data backups
  7. HTTPS - Use Cloudflare tunnel for external access (see bookstack.md)
  8. Keep updated - Update BookStack image regularly

Next Steps

After completing this setup:

  1. Enable MFA for john@alai.no
  2. Create new API token
  3. Update ~/system/config/bookstack.json
  4. Test API token works
  5. Enable MFA for alem@basicconsulting.no
  6. Review and set user permissions
  7. Configure daily backups
  8. Consider Cloudflare tunnel for external access

Last updated: 2026-02-17 Maintained by: John (AI Director) Related: ~/system/context/docs/runbooks/bookstack.md

CEO Dashboard Runbook

Last Verified: 2026-02-17 | Owner: John

CEO Dashboard

URL: http://localhost:3030/ceo Server: Mission Control Dashboard (port 3030) Auto-refresh: 60 seconds Theme: Dark (ALAI brand)

Overview

The CEO Dashboard provides Alem with a single-screen view of all critical business metrics. It aggregates data from multiple sources (Mission Control tasks, sales pipeline, invoices, support tickets, decisions) into a real-time executive view.

Sections

1. Revenue Overview (Banner)

Top banner showing financial health:

Data Source: invoice-generator.js stats and invoice-generator.js list

2. Pipeline Funnel

Visual funnel showing lead progression:

Data Source: sales-pipeline.js stats

3. Active Projects (Kanban)

Project status board with 3 columns:

Data Source: Mission Control tasks table (filtered by project IS NOT NULL)

4. Decisions Pending

Top 5 GO/NO-GO decisions awaiting Alem's response:

Data Source: ~/system/specs/alem-decisions-2026-02.md (parsed from markdown)

5. Alerts Panel

Critical alerts requiring attention:

Color coding:

Data Sources: invoice-generator.js, ticket-sla-checker.js, MC tasks table

6. Upcoming Deadlines

Timeline of upcoming deadlines (next 14 days):

Data Source: Mission Control tasks table (filtered by description LIKE '%deadline%')

Technical Details

Implementation

Data Aggregation

Dashboard uses child_process.execSync to call existing tools:

const invoiceStatsRaw = execSync('node ~/system/tools/invoice-generator.js stats 2>/dev/null');
const pipelineRaw = execSync('node ~/system/tools/sales-pipeline.js stats 2>/dev/null');

Data is cached for 60 seconds to avoid hammering tools on every browser refresh.

Styling

Auto-refresh

Two mechanisms:

  1. HTML meta refresh: <meta http-equiv="refresh" content="60">
  2. JavaScript interval: setInterval(loadDashboard, 60000)

Access

Local

LAN Access

Dashboard is bound to 0.0.0.0:3030, accessible from any device on the network:

Mobile

Fully responsive. Recommended for iPad/tablet in landscape mode for best experience.

Future Enhancements

Phase 2 (Interactive)

Phase 3 (Advanced Metrics)

Phase 4 (AI Insights)

Maintenance

Update Decision File

When Alem makes decisions, update:

~/system/specs/alem-decisions-2026-02.md

Dashboard will auto-parse on next refresh.

Restart Dashboard

If changes are made to server code:

launchctl kickstart -k gui/$(id -u)/com.john.mc-dashboard

Check Logs

tail -f ~/system/logs/mc-dashboard.log
tail -f ~/system/logs/mc-dashboard-error.log

Troubleshooting

Dashboard shows "Loading..." indefinitely

Data shows 0 or N/A

Mobile layout broken

Infrastructure Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Local Infrastructure

Platform: Mac Studio M3 Ultra, 96GB RAM, macOS Services: Docker containers, LaunchAgents, Cloudflare tunnels


Docker Services

Status Check

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

Services

Container Image Port Health
mattermost mattermost/mattermost-enterprise 8065 healthcheck
mattermost-db postgres:13 5432 (internal)
planka ghcr.io/plankanban/planka 3100→1337 healthcheck
planka-db postgres:15-alpine 5433 (internal) healthcheck
documenso documenso/documenso 3003
documenso-db postgres 5434 (internal) healthcheck
bookstack lscr.io/linuxserver/bookstack 6875→80
bookstack_db lscr.io/linuxserver/mariadb 3306 (internal)

Restart a container

docker restart <container_name>
# Example: docker restart mattermost

Restart all

# Mattermost stack
cd ~/system/services/mattermost && docker compose down && docker compose up -d

# Planka stack
cd ~/system/services/planka && docker compose down && docker compose up -d

# Documenso
cd ~/system/services/documenso && docker compose down && docker compose up -d

# BookStack
cd ~/system/services/bookstack && docker compose down && docker compose up -d

View logs

docker logs <container_name> --tail 50
docker logs <container_name> -f  # follow

Disk cleanup (if disk >90%)

docker system prune -f            # Remove unused images, containers, networks
docker volume prune -f             # Remove unused volumes (CAREFUL: data loss)

Cloudflare Tunnels

Config

cat ~/.cloudflared/config.yml

Routes

Hostname Target Service
mm.basicconsulting.no localhost:8065 Mattermost
boards.basicconsulting.no localhost:3100 Planka
sign.basicconsulting.no localhost:3003 Documenso

Status

cloudflared tunnel info mattermost

Restart tunnel

# Tunnel runs as LaunchAgent
launchctl unload ~/Library/LaunchAgents/com.cloudflare.tunnel.plist
launchctl load ~/Library/LaunchAgents/com.cloudflare.tunnel.plist

LaunchAgents (Daemons)

List all custom daemons

launchctl list | grep -E "com\.(john|edita|cloudflare)"

Expected daemons

Daemon Interval Location
com.john.ops-agent 5 min ~/Library/LaunchAgents/
com.edita.autowork 30 min ~/Library/LaunchAgents/
com.john.mc-dashboard always ~/Library/LaunchAgents/
com.john.mc-session-worker on events ~/Library/LaunchAgents/

Load/unload

launchctl load ~/Library/LaunchAgents/<plist-name>.plist
launchctl unload ~/Library/LaunchAgents/<plist-name>.plist

Ollama (Local AI)

Status

curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models']]"

Models

Model Size Use
llama3.1:8b 5GB Fast classification (ops-agent)
qwen2.5-coder:32b 19GB Code generation, contextual responses
llama3.1:70b 40GB Research, writing

Restart Ollama

# Ollama runs as macOS app
killall ollama 2>/dev/null
open -a Ollama

Mission Control Dashboard

Status

curl -s http://localhost:3030 | head -1

Restart

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Full Health Check

# Human-readable
node ~/system/tools/health-check.js

# JSON (programmatic)
node ~/system/tools/health-check.js --json

# Quick (HTTP only)
node ~/system/tools/health-check.js --quick

After System Reboot

All LaunchAgents with RunAtLoad: true start automatically. Verify:

# 1. Check Docker is running
docker ps

# 2. Check all daemons
launchctl list | grep -E "com\.(john|edita|cloudflare)"

# 3. Run health check
node ~/system/tools/health-check.js

# 4. If anything missing, load it
launchctl load ~/Library/LaunchAgents/<missing>.plist

Created: 2026-02-10 Last Updated: 2026-02-10

Mission Control Dashboard

Last Verified: 2026-02-17 | Owner: John

Runbook: Mission Control Dashboard

Service Type: Task Management Web UI Runtime: Node.js (Express) Port: 3030 (internal + LAN accessible) Internal URL: http://localhost:3030 LAN URL: http://192.168.68.61:3030 (mobile-friendly) Database: SQLite (~/system/databases/mission-control.db) LaunchAgent: com.john.mc-dashboard Source: ~/system/tools/mc-dashboard.js


Service Info

Mission Control Dashboard is the web UI for task management. Provides CRUD operations, priority management, status tracking, and team coordination.

Features:

CLI Alternative:

node ~/system/tools/mc.js list|add|start|done|pause|resume|block

Status Check

LaunchAgent Status

launchctl list | grep mc-dashboard

Expected output: PID shown (e.g., 12345 0 com.john.mc-dashboard)

If not running: - 0 com.john.mc-dashboard (no PID)

HTTP Check

curl -I http://localhost:3030

Expected: 200 OK

LAN Access Check (from another device)

curl -I http://192.168.68.61:3030

Expected: 200 OK

Database Check

sqlite3 ~/system/databases/mission-control.db "SELECT count(*) FROM tasks WHERE status = 'open';"

Restart Procedure

Stop Service

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Start Service

launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Restart (Stop + Start)

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Note: LaunchAgent auto-restarts on crash (KeepAlive=true).


View Logs

stdout (General logs)

tail -f ~/system/logs/mc-dashboard.log

stderr (Error logs)

tail -f ~/system/logs/mc-dashboard.err

Recent errors

tail -50 ~/system/logs/mc-dashboard.err

Troubleshooting

Problem: Dashboard won't start

Check LaunchAgent:

launchctl list | grep mc-dashboard

Check error log:

tail -50 ~/system/logs/mc-dashboard.err

Common causes:

  1. Port 3030 already bound - check lsof -i :3030
  2. Database locked - check for stale processes using SQLite
  3. Node.js not found - check which node
  4. Permission issues - check file ownership

Fix:

# Kill any process on port 3030
lsof -ti :3030 | xargs kill -9

# Restart
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Problem: Can't connect from mobile (LAN)

Check service is listening on all interfaces:

lsof -i :3030

Expected: *:3030 (listening on all IPs, not just 127.0.0.1)

Check firewall:

sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate

If firewall is on, allow Node.js:

sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /opt/homebrew/bin/node

Check Mac IP:

ipconfig getifaddr en0  # WiFi
ipconfig getifaddr en1  # Ethernet

Expected: 192.168.68.61 (or similar)

Problem: Tasks not updating (stale data)

Check database integrity:

sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"

Expected: ok

Check last write:

ls -lh ~/system/databases/mission-control.db

Restart dashboard:

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Problem: 500 errors in UI

Check server logs:

tail -f ~/system/logs/mc-dashboard.log ~/system/logs/mc-dashboard.err

Check database:

sqlite3 ~/system/databases/mission-control.db "SELECT * FROM tasks LIMIT 1;"

Common causes:

  1. Database schema mismatch - migrate database
  2. Corrupted task data - fix in SQLite
  3. Node.js error - check stack trace in error log

CLI Integration

Mission Control has two interfaces:

  1. Dashboard (UI) - http://localhost:3030
  2. CLI - node ~/system/tools/mc.js

Both read/write the same SQLite database: ~/system/databases/mission-control.db

CLI Commands

# List tasks
node ~/system/tools/mc.js list
node ~/system/tools/mc.js list --owner john

# Start task (creates /tmp/mc-active-task)
node ~/system/tools/mc.js start <id>

# Complete task
node ~/system/tools/mc.js done <id> "outcome summary"

# Pause task (removes /tmp/mc-active-task)
node ~/system/tools/mc.js pause <id>

# Block task
node ~/system/tools/mc.js block <id> "blocker reason"

# Show full details
node ~/system/tools/mc.js show <id>

# Who's working on what
node ~/system/tools/mc.js active

Dependencies


Backup

Database Backup

cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +%Y%m%d-%H%M%S).db

Automated Backup (daily)

Add to crontab or LaunchAgent:

0 2 * * * cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +\%Y\%m\%d).db

Restore from Backup

# Stop dashboard
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist

# Restore database
cp ~/backups/mission-control-YYYYMMDD-HHMMSS.db ~/system/databases/mission-control.db

# Start dashboard
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Configuration

LaunchAgent Plist

Path: ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Key settings:

Application Config

Port: 3030 (hardcoded in mc-dashboard.js) Database: ~/system/databases/mission-control.db (hardcoded) Auto-refresh: 30 seconds (client-side)

To change port:

  1. Edit ~/system/tools/mc-dashboard.js
  2. Change const PORT = 3030; to desired port
  3. Restart LaunchAgent

Mission Control Session Worker

LaunchAgent: com.john.mc-session-worker Purpose: Background daemon for session-level task monitoring

Status check:

launchctl list | grep mc-session-worker

Notes


Last updated: 2026-02-10 Maintained by: John (AI Director)

Planka Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Planka

Service Type: Kanban Board / Project Management Container: planka (ghcr.io/plankanban/planka:2.0.0-rc.4) Ports: 3100 (external) → 1337 (internal) External URL: https://boards.basicconsulting.no Database: PostgreSQL 15 (planka-db) Compose File: ~/system/services/planka/docker-compose.yml


Service Info

Planka is the visual project management tool for BasicAS Group. Kanban-style boards for task tracking.

Stack:

External Access:

Admin Access:


Status Check

Container Health

docker ps | grep planka

Expected output:

planka        Up X hours (healthy)
planka-db     Up X hours (healthy)

HTTP Check

curl -I http://localhost:3100

Expected: 200 OK or 302 Found

External Access Check

curl -I https://boards.basicconsulting.no

Expected: 200 OK or 302 Found

Database Check

docker exec planka-db psql -U postgres -d planka -c "SELECT count(*) FROM \"user\";"

Restart Procedure

Quick Restart (Container Only)

docker restart planka

Full Stack Restart (Container + Database)

cd ~/system/services/planka
docker compose down
docker compose up -d

Wait 30 seconds for healthcheck to pass, then verify:

docker ps | grep planka
curl -I http://localhost:3100

Troubleshooting

Problem: Container won't start

Check logs:

docker logs planka --tail 100

Common causes:

  1. Database not ready - wait 30s and retry
  2. Port 3100 already bound - check lsof -i :3100
  3. Volume permission issues - check docker volumes

Fix:

cd ~/system/services/planka
docker compose down
docker compose up -d planka-db
sleep 30
docker compose up -d planka

Problem: Login issues (can't sign in with admin credentials)

Check environment variables:

docker exec planka env | grep DEFAULT_ADMIN

Expected:

DEFAULT_ADMIN_EMAIL=john@basicconsulting.no
DEFAULT_ADMIN_PASSWORD=BasicAS2026!
DEFAULT_ADMIN_NAME=John AI
DEFAULT_ADMIN_USERNAME=john

If admin was changed in UI, default credentials won't work. Reset via database:

docker exec planka-db psql -U postgres -d planka -c "SELECT email, username FROM \"user\" WHERE \"isAdmin\" = true;"

Problem: 502 Bad Gateway (external access)

Check container is running:

docker ps | grep planka

Check Cloudflare tunnel:

cloudflared tunnel info boards

Check BASE_URL:

docker exec planka env | grep BASE_URL

Expected: BASE_URL=https://boards.basicconsulting.no

Problem: Database connection issues

Check database health:

docker exec planka-db pg_isready -U postgres -d planka

Check connection string:

docker exec planka env | grep DATABASE_URL

Expected: DATABASE_URL=postgresql://postgres@planka-db/planka


API Access

Planka has a REST API. Example:

Get Boards (requires auth token)

curl -H "Authorization: Bearer <TOKEN>" http://localhost:3100/api/boards

Get Token:

  1. Login via UI
  2. Inspect browser Network tab → find accessToken in response
  3. Or use user credentials to authenticate programmatically

Dependencies

No dependencies on other local services.


Backup

Database Dump

docker exec planka-db pg_dump -U postgres planka | gzip > ~/backups/planka-$(date +%Y%m%d-%H%M%S).sql.gz

Docker Volumes (includes file uploads)

docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-db-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .

Restore from Backup

# Stop service
cd ~/system/services/planka
docker compose down

# Restore database
gunzip -c ~/backups/planka-YYYYMMDD-HHMMSS.sql.gz | docker exec -i planka-db psql -U postgres -d planka

# Restore volumes (if needed)
docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-data-YYYYMMDD-HHMMSS.tar.gz -C /data
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-db-data-YYYYMMDD-HHMMSS.tar.gz -C /data

# Start service
docker compose up -d

Configuration

Key Environment Variables

Full config: ~/system/services/planka/docker-compose.yml


Notes


Last updated: 2026-02-10 Maintained by: John (AI Director)

Ops Agent Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Ops Agent

Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)


What It Does

Autonomous operations agent that runs 24/7:

  1. MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
  2. Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
  3. Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
  4. Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
  5. Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
  6. Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
  7. Escalation — creates HIGH priority MC task + MM alert when it can't resolve

Status Check

# Is it running?
launchctl list | grep ops-agent

# Recent activity
tail -50 ~/system/logs/ops-agent.log

# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log

# State file
cat /tmp/ops-agent-state.json

# Stats
cat ~/system/agents/state/ops.json

Restart

# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Verify
launchctl list | grep ops-agent

Manual Run (Testing)

# Run one cycle manually
node ~/system/daemons/ops-agent.js

# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty

Troubleshooting

Ops agent not running

# Check if loaded
launchctl list | grep ops-agent
# Expected: "-  0  com.john.ops-agent"

# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist

Not processing messages

# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool

# Check MM connectivity
node ~/system/tools/mm.js status

# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool

# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping

Classification wrong (Ollama issues)

# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
  "stream": false,
  "options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

# If Ollama down, ops-agent falls back to keyword heuristics (still works)

Health check reporting false positives

# Run health check directly
node ~/system/tools/health-check.js

# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool

# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>

Auto-fix loop (service keeps restarting)

# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool

# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json

# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>

Planka card not created

# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
  -H "Content-Type: application/json" \
  -d '{"emailOrUsername":"john","password":"BasicAS2026!"}'

# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10

Dependencies

Service Required Fallback
Mattermost (8065) YES Agent skips MM check cycle
Ollama (11434) NO Falls back to keyword classification
MC (mc.js) YES Tasks not created (error logged)
Planka (3100) NO Cards not created (task still created in MC)
HiveMind NO Intel not posted (ops still works)

Configuration

Monitored MM Teams

Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad

Ignored Users (bots)

john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js

Billable Logic

Health Check Services

Defined in health-check.js:


Files

File Purpose
~/system/daemons/ops-agent.js Main daemon code
~/Library/LaunchAgents/com.john.ops-agent.plist LaunchAgent config
~/system/tools/health-check.js Service health monitor
~/system/tools/auto-fix.js Automated recovery
~/system/agents/identities/ops.md Agent identity card
~/system/agents/state/ops.json Persistent state
/tmp/ops-agent-state.json Runtime state (last check timestamp)
/tmp/mm-token.json Cached MM auth token
/tmp/ops-fix-history.json Auto-fix attempt tracking
~/system/logs/ops-agent.log Activity log
~/system/logs/ops-agent-launchd.log LaunchAgent stdout
~/system/logs/ops-agent-launchd-error.log LaunchAgent stderr

Disaster Recovery

Complete reset

# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json

# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Note: First run will check messages from last 30 minutes only (default)

Rollback to mm-responder

# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist

# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)

Metrics

Check via MC:

node ~/system/tools/mc.js stats          # Task creation stats
node ~/system/tools/mc.js list --owner ops  # Tasks created by ops-agent

Check via state:

cat ~/system/agents/state/ops.json       # Cumulative stats
cat /tmp/ops-agent-state.json            # Current cycle stats

Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10

Service Registry

Last Verified: 2026-02-17 | Owner: John

Service Registry — ALAI Holding

Last Updated: 2026-02-12 Owner: John (AI Director)


Domains

Domain Registrar Nameservers Points To Purpose Renewal
basicconsulting.no one.com Cloudflare Cloudflare Tunnel Consulting brand Check one.com
mm.basicconsulting.no Cloudflare Tunnel → localhost:8065 Mattermost
sign.basicconsulting.no Cloudflare Tunnel → localhost:3003 Documenso
boards.basicconsulting.no Cloudflare Tunnel → localhost:3100 Planka
vault.basicconsulting.no Cloudflare Tunnel → localhost:8200 Vaultwarden
alai.no one.com Vercel Vercel ALAI Holding website Check one.com
getdrop.no one.com Vercel (pending) Vercel → drop-landing Drop fintech landing Check one.com
basicfakta.no one.com Vercel Vercel BasicFakta SaaS Check one.com

Hosting & Deploy

Service Platform URL Deploy Method
Drop landing Vercel getdrop.no vercel --prod from ~/ALAI/products/Drop/landing
ALAI website Vercel alai.no vercel --prod from ~/ALAI/web
BasicFakta Vercel basicfakta.no TBD

Local Services (Mac Studio M3 Ultra, 96GB)

Service Type Port Domain Purpose Status
Mattermost Docker 8065 mm.basicconsulting.no Team chat Active
Planka Docker 3100 boards.basicconsulting.no Kanban boards Active
Documenso Docker 3003 sign.basicconsulting.no E-signatures Active
BookStack Docker 6875 localhost only Internal wiki Active
Vaultwarden Docker 8200 vault.basicconsulting.no Password manager Active
MC Dashboard Node.js 3030 localhost (LAN) Mission Control Active
Ollama Native 11434 localhost Local AI Active
n8n Docker 5678 localhost Workflow automation Active
MinIO Docker 9000 localhost S3 storage (Documenso) Active

Cloudflare

Item Value
Account ID d0ac2afb6bb5b298723b85a114151a04
Tunnel ID 3315a609-7934-45c5-ad0c-56d86d16374d
CLI /opt/homebrew/bin/cloudflared
Zone basicconsulting.no

Email

Address Provider Purpose
john@basicconsulting.no one.com Support / John agent
info@basicconsulting.no one.com Edita / general
alem@basicconsulting.no one.com CEO
post@alai.no TBD Drop + ALAI public contact

Accounts & SaaS

Service URL Purpose Owner
Vercel vercel.com Static hosting john-3447
Cloudflare dash.cloudflare.com DNS, tunnel, CDN Alem
one.com one.com Domain registrar + email Alem
GitHub github.com Code repos TBD
Fiken fiken.no Accounting Alem
Flowcase everdeen.flowcase.com CV management Alem

Daemons (LaunchAgents)

Daemon Interval Purpose
com.john.ops-agent 5 min MM monitoring, health, auto-fix
com.john.mc-dashboard always Web dashboard :3030
com.john.mc-session-worker events Session state extraction
com.john.morning-routine 07:00 Daily briefing
com.john.agentforge 4h Auto-audit agents
com.john.mm-bridge 5s poll Alem→John chat (#ceo)
com.edita.autowork 30 min Background task worker
com.john.health-check 5 min Service health monitoring
com.john.email-agent 5 min Email triage
com.john.intake-watcher 5 min Email→task pipeline
com.edita.job-hunter periodic Opportunity scanning

Maintenance Notes

Ops Agent

Runbook: Ops Agent

Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)


What It Does

Autonomous operations agent that runs 24/7:

  1. MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
  2. Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
  3. Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
  4. Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
  5. Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
  6. Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
  7. Escalation — creates HIGH priority MC task + MM alert when it can't resolve

Status Check

# Is it running?
launchctl list | grep ops-agent

# Recent activity
tail -50 ~/system/logs/ops-agent.log

# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log

# State file
cat /tmp/ops-agent-state.json

# Stats
cat ~/system/agents/state/ops.json

Restart

# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Verify
launchctl list | grep ops-agent

Manual Run (Testing)

# Run one cycle manually
node ~/system/daemons/ops-agent.js

# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty

Troubleshooting

Ops agent not running

# Check if loaded
launchctl list | grep ops-agent
# Expected: "-  0  com.john.ops-agent"

# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist

Not processing messages

# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool

# Check MM connectivity
node ~/system/tools/mm.js status

# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool

# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping

Classification wrong (Ollama issues)

# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
  "stream": false,
  "options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

# If Ollama down, ops-agent falls back to keyword heuristics (still works)

Health check reporting false positives

# Run health check directly
node ~/system/tools/health-check.js

# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool

# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>

Auto-fix loop (service keeps restarting)

# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool

# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json

# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>

Planka card not created

# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
  -H "Content-Type: application/json" \
  -d '{"emailOrUsername":"john","password":"BasicAS2026!"}'

# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10

Dependencies

Service Required Fallback
Mattermost (8065) YES Agent skips MM check cycle
Ollama (11434) NO Falls back to keyword classification
MC (mc.js) YES Tasks not created (error logged)
Planka (3100) NO Cards not created (task still created in MC)
HiveMind NO Intel not posted (ops still works)

Configuration

Monitored MM Teams

Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad

Ignored Users (bots)

john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js

Billable Logic

Health Check Services

Defined in health-check.js:


Files

File Purpose
~/system/daemons/ops-agent.js Main daemon code
~/Library/LaunchAgents/com.john.ops-agent.plist LaunchAgent config
~/system/tools/health-check.js Service health monitor
~/system/tools/auto-fix.js Automated recovery
~/system/agents/identities/ops.md Agent identity card
~/system/agents/state/ops.json Persistent state
/tmp/ops-agent-state.json Runtime state (last check timestamp)
/tmp/mm-token.json Cached MM auth token
/tmp/ops-fix-history.json Auto-fix attempt tracking
~/system/logs/ops-agent.log Activity log
~/system/logs/ops-agent-launchd.log LaunchAgent stdout
~/system/logs/ops-agent-launchd-error.log LaunchAgent stderr

Disaster Recovery

Complete reset

# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json

# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Note: First run will check messages from last 30 minutes only (default)

Rollback to mm-responder

# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist

# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)

Metrics

Check via MC:

node ~/system/tools/mc.js stats          # Task creation stats
node ~/system/tools/mc.js list --owner ops  # Tasks created by ops-agent

Check via state:

cat ~/system/agents/state/ops.json       # Cumulative stats
cat /tmp/ops-agent-state.json            # Current cycle stats

Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10

Daemons & Services

Tools Manifest

CHECK THIS BEFORE CREATING NEW TOOLS. If a tool exists, use it. If you create a new tool, add it here.

TOOL-FIRST PROTOCOL: ~/system/rules/tool-first-protocol.md Redoslijed: Naši alati → Naši skillovi → Naša baza (HiveMind) → Internet → Ažuriraj bazu

Last audit: 2026-02-13 — Spring cleaning: 22 deprecated tools archived, 3 empty DBs deleted, 1 broken daemon unloaded, MEMORY.md trimmed 229→184 lines.

Task Management

Tool Command Description
task.sh ~/system/tools/task.sh list|add|start|done|block Task CLI using Taskwarrior 3 (cross-session)
mc.js node ~/system/tools/mc.js list|add|start|done|show|routes Mission Control - Task management with agent routing
mc.js routes node ~/system/tools/mc.js routes List available task routes (backend, frontend, devops, qa, bizdev, general)
mc.js add --route node ~/system/tools/mc.js add "Task" --route backend Create task with route - auto-spawns agent on start

Task → Agent Routing: MC tasks can be tagged with routes that automatically spawn appropriate Ollama agents when task starts.

Briefings & Analysis

Tool Command Description
ceo-briefing.js node ~/system/tools/ceo-briefing.js --full ZAKON #11: All-source CEO briefing (5 email accounts, MC tasks, HiveMind, sessions, daemon briefing). Zero LLM cost.
ceo-briefing.js node ~/system/tools/ceo-briefing.js --quick Quick boot summary (counts + top items, <500 tokens). Called by boot.sh.
ceo-briefing.js node ~/system/tools/ceo-briefing.js --email All 5 email accounts: inbox + sent for each.
ceo-briefing.js node ~/system/tools/ceo-briefing.js --followup Open/blocked MC tasks overview.
ceo-briefing.js node ~/system/tools/ceo-briefing.js --topic "X" Topic search across sessions + HiveMind + all email accounts.
council-briefing.js node ~/system/tools/council-briefing.js AI Council: 4 personas (Growth, Revenue, Skeptic, Ops) analyze business data via Ollama. Posts to Slack #exec. Nightly at 22:00.
meeting-prep.js node ~/system/tools/meeting-prep.js [--ics file.ics] [--date YYYY-MM-DD] Calendar-aware meeting prep: ICS parsing, CRM attendee lookup, pipeline context, contextual notes.
council-briefing.js node ~/system/tools/council-briefing.js --model 70b Use 70b model for deeper analysis
council-briefing.js node ~/system/tools/council-briefing.js --dry-run Gather data only, no Ollama/Slack
john-morning.sh bash ~/system/tools/john-morning.sh Morning routine: Quran, tasks, HiveMind, health, daily synthesis. Daily at 07:00.
memory-synthesizer.js node ~/system/tools/memory-synthesizer.js daily [date] Summarize day's intel → HiveMind memo. Auto in morning-routine.
memory-synthesizer.js node ~/system/tools/memory-synthesizer.js weekly Synthesize week → HiveMind memo. Auto Sundays 23:00.
memory-synthesizer.js node ~/system/tools/memory-synthesizer.js promote Promote weekly → long-term knowledge
memory-synthesizer.js node ~/system/tools/memory-synthesizer.js prune Delete daily memos >30 days
memory-synthesizer.js node ~/system/tools/memory-synthesizer.js view [tier] View tiered memory (daily/weekly/longterm)

Meeting & Transcript Processing

Tool Command Description
transcript-to-tasks.js node ~/system/tools/transcript-to-tasks.js <file> Extract action items from meeting transcript → MC tasks via Ollama
transcript-to-tasks.js node ~/system/tools/transcript-to-tasks.js <file> --preview Preview extracted actions (no task creation)
transcript-to-tasks.js node ~/system/tools/transcript-to-tasks.js <file> --owner john Assign all extracted tasks to owner

Formats: .txt, .md, .srt, .vtt. Tasks prefixed with [TRANSCRIPT].

Health & Quality

Tool Command Description
drift-detector.js node ~/system/tools/drift-detector.js snapshot Behavioral drift analysis engine — records daily metrics from 5 data sources (session claims, verification audits, email-audit.db, mission-control.db, hivemind.db) to drift.db. Anomaly detection with σ-based thresholds. Alerts to HiveMind + Slack. Daily at 23:55 via com.john.drift-detector LaunchAgent. Created: 2026-02-23.
drift-detector.js node ~/system/tools/drift-detector.js analyze [--days N] Analyze recent metric trends (default: 7 days). Returns trend, per-metric stats, anomalies.
drift-detector.js node ~/system/tools/drift-detector.js report [--days N] Human-readable drift report (default: 30 days).
drift-detector.js node ~/system/tools/drift-detector.js alert-test Test alert pipeline (HiveMind + Slack).
daemon-health.sh bash ~/system/daemons/daemon-health.sh Daemon health monitor with Slack alerts — monitors ALL com.john.* LaunchAgents, sends alerts to #alerts channel for failures/warnings/recoveries, runs every 15 min via LaunchAgent. Created: 2026-02-23.
daemon-health.sh bash ~/system/daemons/daemon-health.sh --status Show current daemon status (KeepAlive vs interval-based)
daemon-health.sh bash ~/system/daemons/daemon-health.sh --test Test Slack alert integration
stbs-health.js node ~/system/tools/stbs-health.js STBS v3 production monitoring — 5 hardening components (SQLite BUSY retry, heartbeat, optimistic lock, approval tokens, session staleness). MC #1724.
stbs-health.js node ~/system/tools/stbs-health.js --json JSON output (for ops-watchdog integration)
stbs-health.js node ~/system/tools/stbs-health.js --alert Alert mode (exit 1 if any threshold exceeded)
stbs-health.js node ~/system/tools/stbs-health.js --metric <name> Check specific metric only
md-health.js node ~/system/tools/md-health.js Markdown health scanner: broken links, TODOs, empty files, stale dates. Integrated in AgentForge.
md-health.js node ~/system/tools/md-health.js --json JSON output (for programmatic use)
md-health.js node ~/system/tools/md-health.js --fix-todos List all TODOs across codebase
md-health.js node ~/system/tools/md-health.js ~/path Scan specific path
doc-index.sh bash ~/system/tools/doc-index.sh [--output file.json] [--verbose] Document indexer — scans ~/projects, ~/ALAI, ~/companies for all markdown files. Creates JSON index with metadata (path, category, size, modified). Output: ~/system/databases/doc-index.json
doc-index.sh bash ~/system/tools/doc-index.sh --verbose Verbose mode — shows progress and breakdown by category
bookstack-sync.js node ~/system/tools/bookstack-sync.js sync Sync system docs to BookStack wiki (full sync)
bookstack-sync.js node ~/system/tools/bookstack-sync.js status Show what needs syncing (new/changed/ok)
bookstack-sync.js node ~/system/tools/bookstack-sync.js push Force overwrite all pages
bookstack-sync.js node ~/system/tools/bookstack-sync.js auto-sync Auto-sync changed files (daemon mode)

BookStack Sync v2 Features (2026-02-18):

| bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js | Scan all pages, tag stale ones, generate report | | bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js --dry-run | Scan and report only (no tagging) | | bookstack-staleness.js | node ~/system/tools/bookstack-staleness.js --slack | Post report to Slack #general | | bookstack-webhook-relay.js | Service running on localhost:3077/webhook (internal only) | Receives BookStack webhook events and forwards to Slack |

Backup & Data Protection

Tool Command Description
db-backup.sh bash ~/system/daemons/db-backup.sh Safe daily backup of all SQLite databases using sqlite3 .backup. 30-day retention. Daily at 03:00 via LaunchAgent.
db-backup-verify.sh bash ~/system/tools/db-backup-verify.sh Verify backup integrity for today's backups. Checks file size and runs PRAGMA integrity_check on all backups.

Backup Strategy:

BookStack Auto-Sync:

BookStack Staleness Monitor:

BookStack Webhook Relay:

API Utilities

Tool Command Description
api-fallback.js require('./api-fallback') Tiered API fallback + caching. fetchWithFallback(key, tiers, opts) tries each tier, caches result.
api-fallback.js node ~/system/tools/api-fallback.js cache-stats Show cache stats
api-fallback.js node ~/system/tools/api-fallback.js cache-clear Clear API cache

Cache: ~/system/cache/api-fallback/ (file-based, per-key, TTL-aware)

Usage Tracking

Tool Command Description
usage-tracker.js node ~/system/tools/usage-tracker.js log <agent> <model> <in> <out> Log AI call usage (auto-hooked in agent-runner.js + council-briefing.js)
usage-tracker.js node ~/system/tools/usage-tracker.js stats Usage summary (today, month, all-time)
usage-tracker.js node ~/system/tools/usage-tracker.js stats --agent <name> Per-agent breakdown
usage-tracker.js node ~/system/tools/usage-tracker.js stats --month Daily breakdown this month
usage-tracker.js node ~/system/tools/usage-tracker.js top Top agents by cost
usage-tracker.js node ~/system/tools/usage-tracker.js recent [limit] Recent calls

DB: ~/system/db/usage.db (SQLite). Auto-logged from agent-runner.js (Ollama) and council-briefing.js.

Session Tracking

Tool Command Description
session-ledger.sh Auto (Stop/PreCompact hook) Deterministic session extraction (files, commands, topics, errors, git)
session-search.sh bash ~/system/tools/session-search.sh topic|file|task|keyword|errors|recent Search sessions
daily-consolidate.sh bash ~/system/tools/daily-consolidate.sh [YYYY-MM-DD] Consolidate day's sessions into daily log
weekly-digest.sh bash ~/system/tools/weekly-digest.sh [YYYY-MM-DD] Generate weekly summary

Session files: ~/system/memory/sessions/YYYY-MM-DD-HHMM-sessionid.md

Memory

Tool Command Description
hivemind.js node ~/system/agents/hivemind/hivemind.js read [agent] [limit] Read shared intelligence (replaces memory-lookup.js)
hivemind.js node ~/system/agents/hivemind/hivemind.js post <agent> <type> <msg> Post intel
hivemind.js node ~/system/agents/hivemind/hivemind.js query <search> Search intel
hivemind.js node ~/system/agents/hivemind/hivemind.js memo save|get|search|list Key-value memory store
facts.js node ~/system/tools/facts.js save|get|list|correct|history|display|search|seed Long-running critical facts — SQLite event-sourced memory that survives context compression. Boot-injected.
facts.js display node ~/system/tools/facts.js display Compact boot output of all critical facts
facts.js seed node ~/system/tools/facts.js seed [--force] Populate/reset initial seed data
memory-indexer.py python ~/system/tools/memory-indexer.py Index memory for search

Communication

Tool Command Description
slack.js node ~/system/tools/slack.js send <channel> "msg" Send plain text message to Slack channel
slack.js node ~/system/tools/slack.js sendBlocks <channel> <blocksFile> [fallback] Send Block Kit formatted message (blocks from JSON file)
slack.js node ~/system/tools/slack.js read <channel> [limit] Read recent messages from channel
slack.js node ~/system/tools/slack.js channels List all Slack channels
slack.js node ~/system/tools/slack.js create-channel <name> Create new channel
slack.js node ~/system/tools/slack.js unread Check unread messages
slack.js node ~/system/tools/slack.js users List workspace users
slack.js node ~/system/tools/slack.js status Check Slack connection
slack-blocks.js node ~/system/tools/slack-blocks.js test [channel] Slack Block Kit formatting library — test command sends sample to channel
slack-blocks.js require('./slack-blocks') Module API: builder(), tenderAlert(), tenderDigest(), emailBriefing(), emailEscalation(), weeklyPipeline(), pipelineEvent(), opsAlert(), send()
slack-bot.js node ~/system/tools/slack-bot.js Slack bot daemon — Claude Haiku via CLI (Socket Mode). AI backend: API → CLI → Ollama
slack-bot.js node ~/system/tools/slack-bot.js --test Test AI backend connection
email-to-task.js node ~/system/tools/email-to-task.js --from "x" --subject "y" --message-id "z" --class ACTION [--priority high] Auto-create MC tasks from ACTION emails with deduplication
email-to-task.js node ~/system/tools/email-to-task.js --status Show email classification stats
email-inbox.js node ~/system/tools/email-inbox.js status SQLite-backed email inbox — per-account stats (john, info, alai)
email-inbox.js node ~/system/tools/email-inbox.js pending List unanswered ACTION emails
email-inbox.js node ~/system/tools/email-inbox.js search "keyword" Full-text search in subject/from/sender name
email-inbox.js node ~/system/tools/email-inbox.js mark <id> responded|archived|read|ignored Update email status
email-inbox.js node ~/system/tools/email-inbox.js stale [hours] Show emails unanswered > N hours (default 48)
email-inbox.js node ~/system/tools/email-inbox.js insert --message-id "x" --account john --from-addr "x" --subject "x" --classification ACTION --priority high Insert email into inbox DB

| MCP email | mcp__email__emails_find | Search emails (sender, subject, date, folder). Account: "john" or "info" | | MCP email | mcp__email__email_send | Send emails (to, subject, body, HTML, attachments) | | MCP email | mcp__email__email_respond | Reply/forward with proper threading | | MCP email | mcp__email__emails_modify | Mark read/unread, flag, archive, move | | MCP email | mcp__email__folders_list | List all email folders |

| mail-native.js | node ~/system/tools/mail-native.js search\|read\|send\|reply\|forward\|folders\|unread\|flag\|move\|attachment\|test | Direct IMAP/SMTP CLI — zero MCP dependency. Works from daemons, agents, interactive. Supports --folder and --account params. | | email-audit.js | node ~/system/tools/email-audit.js find\|stats\|recent | Centralized audit logger for ALL email operations. DB: email-audit.db. Module API: logEmail(), findEmails(), stats(), recent(). |

EMAIL PRAVILO: SVE email operacije koriste MCP email tools (custom: email-mcp-bridge.js).

Slack: alai-talk.slack.com (channels: ops, development, client-support, exec)

Credential Management (Vaultwarden)

Tool Command Description
vault.js node ~/system/tools/vault.js get <name> Get password from Vaultwarden by item name
vault.js node ~/system/tools/vault.js get <name> --field <field> Get specific field (custom field, username, notes)
vault.js node ~/system/tools/vault.js get <name> --json Get full item as JSON
vault.js node ~/system/tools/vault.js add <name> <user> <pass> [opts] Create new vault item (--uri, --notes, --field k=v, --hidden-field k=v)
vault.js node ~/system/tools/vault.js list List all vault items
vault.js node ~/system/tools/vault.js login Interactive unlock + cache session (no TTL, /tmp/bw-session)
vault.js node ~/system/tools/vault.js migrate Migrate 10 config files to vault (one-time)
vault.js node ~/system/tools/vault.js sync Force sync with Vaultwarden server (clears cache)
vault.js node ~/system/tools/vault.js refresh Force reload in-memory credential cache
password-share.js node ~/system/tools/password-share.js create|retrieve|list|cleanup|audit Secure one-time password sharing with clients
client-vault.js node ~/system/tools/client-vault.js init|add|list|get|rotate|check-rotation Per-client encrypted credential storage

Vault Module API (for other tools):

const vault = require('~/system/tools/vault.js');
const pass = await vault.get('Email - john@alai.no');
const token = await vault.get('Slack Bot', 'token');
const val = await vault.getWithFallback('Slack Bot', 'token', () => jsonFallback());
vault.hasSession(); // boolean, non-throwing

Session: BW_SESSION env → /tmp/bw-session (0600, no TTL). Session key via env var (NOT in ps aux). Cache: First call loads all items (~600ms), subsequent <1ms. Refreshes on sync/add/refresh(). Non-TTY: Daemons get VAULT_LOCKED error (no hang). Graceful retry pattern. Vault items: AWS Console, Microsoft Azure, Vaultwarden Admin, Sentry + 10 migrated services. Note: vault-helper.js DELETED — all consumers now use vault.js directly.

Agent Infrastructure

Tool Command Description
agent-reporter.js node ~/system/tools/agent-reporter.js --task <id> --agent <name> --status <status> --summary <text> Structured agent output — validates against schema, stores in mission-control.db, emits events, posts to HiveMind
agent-reporter.js node ~/system/tools/agent-reporter.js --help Show usage and examples
agent-reporter.js node ~/system/tools/agent-reporter.js --task 937 --agent B1 --status completed --summary "..." --deliverables '[...]' Full structured report with deliverables, metrics, evidence
schema-validator.py PostToolUse hook on TaskUpdate Validates agent output JSON against agent-output-schema.json, logs violations to /tmp/schema-violations.log (warning-only, never blocks)
goal-verifier.js node ~/system/tools/goal-verifier.js --task <id> Automated goal verification — reads goal-schema.json, runs verification commands, updates statuses, stores in goals.db, emits events
goal-verifier.js node ~/system/tools/goal-verifier.js --help Show usage, goal types, and operators
goal-verifier.js node ~/system/tools/goal-verifier.js --task 937 --verbose Run verification with detailed output per goal
goal-verifier.js node ~/system/tools/goal-verifier.js --task 937 --dry-run Preview what would be verified without running commands
agent-worker.js node ~/system/tools/agent-worker.js Local-model-first agent worker — polls MC, executes via Ollama tool agent, queues complex tasks for human
agent-worker.js node ~/system/tools/agent-worker.js --once Run single cycle then exit
agent-worker.js node ~/system/tools/agent-worker.js --dry-run Show next task without executing
agent-worker.js node ~/system/tools/agent-worker.js --status Show worker status, queue stats
agent-worker.js node ~/system/tools/agent-worker.js --stop Stop daemon gracefully
human-queue.js node ~/system/tools/human-queue.js list Show all tasks queued for human review
human-queue.js node ~/system/tools/human-queue.js claim <id> Claim task (remove from queue, resume in MC)
human-queue.js node ~/system/tools/human-queue.js stats Queue statistics (by priority, reason, age)
human-queue.js node ~/system/tools/human-queue.js clear Clear entire human queue
human-queue.js node ~/system/tools/human-queue.js notify Send Slack summary if queue > 0

Agent Output Schema: ~/system/specs/agent-output-schema.json (JSON Schema draft-07) DB Table: mission-control.db.agent_reports (task_id, agent, status, summary, report_json) Event: agent.report emitted to event bus on report submission Created: 2026-02-15 (MC #937 Phase 1)

Goal Schema: ~/system/specs/goal-schema.json (JSON Schema draft-07) DB: ~/system/databases/goals.db (goals, goal_history tables) Verification: verification-gate.py enforces goal verification for H/M priority tasks (if goal-schema.json present) Events: goal.verified, goal.failed emitted to event bus Created: 2026-02-15 (MC #937 Phase 4)

Subagents (~/.claude/agents/)

Agent Role Description
builder.md Build Implements ONE task using GOTCHA, self-validates, reports via agent-reporter.js or TaskUpdate
validator.md Verify Read-only GOTCHA compliance check + acceptance criteria, reports via agent-reporter.js

Local AI (Ollama on Mac Studio M3 Ultra)

2 Tools — Executor + Orchestrator

Tool Command Description
agent-runner.js node ~/system/tools/agent-runner.js <agent> --task "X" Executor — sends ONE task to Ollama with agent identity + state
agent-runner.js node ~/system/tools/agent-runner.js list List all agents with status
agent-scheduler.js node ~/system/kernel/agent-scheduler.js spawn <agent> <task> Orchestrator — forks agent-runner.js as child processes for parallel execution
team-coordinator.js node ~/system/kernel/team-coordinator.js assign|execute|status|message|sync Team Orchestrator — multi-team coordination (Backend/Frontend/DevOps/QA) with cross-team messaging

Relationship: agent-scheduler.js spawns agent-runner.js. Runner = single agent. Scheduler = multi-agent. team-coordinator.js uses scheduler for team execution. What agents do: Generate text responses via Ollama. They don't execute anything. State: ~/system/agents/state/*.json (persists between runs) Identities: ~/system/agents/identities/*.md (15 agents)

| offline-mode.js | node ~/system/tools/offline-mode.js status | Offline Mode — check Ollama readiness for Claude fallback | | offline-mode.js | node ~/system/tools/offline-mode.js run "task" | Route task to best local model (auto-detects type) | | offline-mode.js | node ~/system/tools/offline-mode.js run "task" --agent dev | Use specific agent identity | | offline-mode.js | node ~/system/tools/offline-mode.js run "task" --text-only | Text-only mode (no tool execution) | | offline-mode.js | node ~/system/tools/offline-mode.js queue | Show outputs waiting for Claude review | | offline-mode.js | node ~/system/tools/offline-mode.js capabilities | What local models can/can't do | | offline-mode.js | node ~/system/tools/offline-mode.js batch tasks.txt | Run tasks from file (one per line) | | offline-mode.js | node ~/system/tools/offline-mode.js enable\|disable | Toggle offline mode on/off | | offline-mode.js | node ~/system/tools/offline-mode.js whitelist | Show safe read-only commands allowed offline | | offline-mode.js | node ~/system/tools/offline-mode.js check "command" | Check if command is whitelisted for offline use |

Offline Mode: When Claude API hits usage limits, switch to local Ollama models. Auto-routes tasks to best model (qwen-coder for code, 70b for reasoning, 8b for trivial). All outputs saved to ~/system/offline-queue/ with NEEDS_REVIEW status. Claude reviews when back online. Capability matrix built in — knows what local models can/can't do. Created 2026-02-12.

Ollama Background Workers (~/system/tools/ollama-workers/)

Tool Command Description
run-all.sh bash ~/system/tools/ollama-workers/run-all.sh Run all background workers (embedding-backfill, session-summarizer, knowledge-scorer)
run-all.sh bash ~/system/tools/ollama-workers/run-all.sh --dry-run Preview all workers, no writes
run-all.sh bash ~/system/tools/ollama-workers/run-all.sh --status Check Ollama + Qdrant health
knowledge-scorer.js node ~/system/tools/ollama-workers/knowledge-scorer.js run [--limit N] [--offset ID] [--dry-run] Score and tag Qdrant 'knowledge' entries: quality_score (1-5) + category via llama3.1:8b. Skips already-scored. Default limit 500/run.
embedding-backfill.js node ~/system/tools/ollama-workers/embedding-backfill.js run [--db knowledge|hivemind|flywheel|all] [--limit N] [--dry-run] Find rows with NULL embeddings across knowledge.db/hivemind.db/flywheel.db, batch-embed via Ollama bge-m3 (batches of 32), write BLOB back to SQLite, upsert to Qdrant.

Workers: Idempotent (skip already-processed). Safe to run repeatedly. Use --dry-run to preview. Logs to ~/system/logs/ollama-workers/.

Tier Routing (CC Rate Limit Optimization)

Tool Command Description
ollama-engine.js require('./ollama-engine') Centralized Ollama API — generate(), classify(), healthCheck(). Consolidates duplicated Ollama HTTP code from 5+ files.
ollama-engine.js node ~/system/tools/ollama-engine.js test Run health check + generate test
tier-router.js require('./tier-router') Central AI Router — classify(caller, task) → {tier, engine, model}. Routes tasks to Ollama (local) or human-queue. NO CC/API.
tier-router.js node ~/system/tools/tier-router.js test Run routing tests
tier-router.js node ~/system/tools/tier-router.js classify <caller> <task> Test classification for caller+task
tier-router.js node ~/system/tools/tier-router.js stats Show routing stats (ollama vs human-queue)
ollama-tool-agent.js node ~/system/tools/ollama-tool-agent.js --task "X" --model Y Ollama + Tools — multi-turn agent with read-only tools (read_file, glob, grep, list_dir, run_cmd). Replaces CC for explore/validate tasks.
ollama-tool-agent.js node ~/system/tools/ollama-tool-agent.js --task "X" --verbose Verbose mode (show tool calls)

Tier Routing Architecture:

Models

Model Size Use For
qwen2.5-coder:32b 19GB Coding, debugging, refactoring
llama3.1:70b 40GB Research, writing, analysis
llama3.1:8b 5GB Fast validation, simple queries

Routing & Decision

Tool Command Description
route.js node ~/system/tools/route.js project <name> Lookup project (internal/external)
route.js node ~/system/tools/route.js query "<request>" Match request to company by routes
route.js node ~/system/tools/route.js list List all projects and companies
route.js node ~/system/tools/route.js add <name> <type> Add project to registry
decision.js node ~/system/tools/decision.js log <key> <decision> [--by alem] [--tags X] [--task ID] [--rationale "..."] [--evidence "path"] [--supersedes ID] Decision audit log — queryable decision trail with rationale, evidence, and supersede chains. Stores in mission-control.db decisions table.
decision.js node ~/system/tools/decision.js list [--tags X] [--since DATE] [--by alem] [--limit N] List all decisions (optionally filtered by tags, date, or author)
decision.js node ~/system/tools/decision.js query "<term>" Full-text search across key+decision+rationale
decision.js node ~/system/tools/decision.js show <id> Show single decision with history chain and supersede references
decision.js node ~/system/tools/decision.js history <key> All decisions for a specific key (newest first), shows decision evolution
decision.js node ~/system/tools/decision.js latest [--limit 10] Most recent decisions (default 10) — used in boot display for Alem
decision.js node ~/system/tools/decision.js stats Decision statistics: count by tag, by decided_by, by month

Database: ~/system/databases/mission-control.db (decisions table)

Registry: ~/system/databases/projects.json

Event Bus

Tool Command Description
event-bus.js node ~/system/tools/event-bus.js emit <type> <json> [--publisher X] SQLite event bus — async emit/subscribe/dispatch. Decouples tools from point-to-point execSync.
event-bus.js node ~/system/tools/event-bus.js list [--type X] [--status X] [--limit N] List events (supports * wildcard for type)
event-bus.js node ~/system/tools/event-bus.js show <id> Show event details with payload
event-bus.js node ~/system/tools/event-bus.js replay <id> Re-process a failed/completed event
event-bus.js node ~/system/tools/event-bus.js dead-letter list|resolve|replay Dead letter queue management
event-bus.js node ~/system/tools/event-bus.js stats Event bus statistics (counts, last 24h by type)
event-bus.js node ~/system/tools/event-bus.js subscriptions list|register|seed Manage handler subscriptions
event-bus.js node ~/system/tools/event-bus.js dispatch [--once] [--interval N] Start dispatch loop (default 2s)
event-handlers.js require('./event-handlers.js') All subscriber handlers — task, lead, invoice, draft, email, job events
durable-runner.js node ~/system/tools/durable-runner.js start <name> --steps '["s1","s2"]' [--mc-task <id>] Durable workflow execution engine with SQLite persistence. Checkpoint/resume capability. Emits events via outbox table.
durable-runner.js node ~/system/tools/durable-runner.js status|resume|rollback <workflow-id> Workflow status, resume from checkpoint, or rollback to step N
durable-runner.js node ~/system/tools/durable-runner.js step-complete <id> <step> [--output '{}'] Mark step complete with output/files/commits
durable-runner.js (module) const { DurableRunner } = require('./durable-runner') Module API: createWorkflow(), completeStep(), failStep(), resume(), rollback()
chain-runner.js node ~/system/tools/chain-runner.js run <chain> "<input>" [--mc-task <id>] [--durable] YAML-defined agent chain orchestrator. DAG-ordered steps, Saga rollback, $INPUT/$ORIGINAL substitution, injection sanitization.
chain-runner.js node ~/system/tools/chain-runner.js list List all available chains from ~/system/agents/chains/*.yaml
chain-runner.js node ~/system/tools/chain-runner.js show <chain> Show chain definition with steps, deps, timeouts
chain-runner.js node ~/system/tools/chain-runner.js resume <workflow-id> Resume a durable chain workflow from checkpoint
chain-runner.js (module) const { ChainRunner } = require('./chain-runner') Module API: loadChain(), run(), listChains(), showChain(), resolveAgent()

Event Bus Architecture (Transactional Outbox Pattern):

GOTCHA Core

Tool Command Description
utils.js require('~/system/lib/utils') Shared utility library (log, file, path, time, validate)
sales-pipeline.js node ~/system/tools/sales-pipeline.js add|list|show|advance|stats|forecast|auto-actions Lead CRM — tracks leads from prospect to won/lost. Auto-actions: archive old leads (lost >30d), escalate stale proposals (>14d no activity)
outbound.js node ~/system/tools/outbound.js start|list|stats Cold outreach prospecting — 3-email sequence (Day 1 intro, Day 3 follow-up, Day 7 final). Creates lead (cold_email), drafts intro email (LOW risk), schedules Day 3+7 reminders. Tags leads with outbound-seq.
email-to-contact.js node ~/system/tools/email-to-contact.js backfill Auto-populate contacts.db from email classifications. Creates contacts, logs interactions, skips spam/own.
email-to-contact.js node ~/system/tools/email-to-contact.js stats CRM import statistics (auto-imported vs manual, interactions)
contacts.js node ~/system/tools/contacts.js add|list|show|search|update|log|tag|stats Central contact database — all partners, clients, brokers, vendors
contacts.js node ~/system/tools/contacts.js export-n8n Export n8n-monitored emails for Known Contact workflow
contacts.js node ~/system/tools/contacts.js import-leads Import contacts from leads.db
unified-crm.js node ~/system/tools/unified-crm.js pipeline|client|search|dashboard READ-ONLY integration layer across 5 databases (contacts, leads, invoices, tickets, MC tasks)
contract-manager.js node ~/system/tools/contract-manager.js add|list|show|renew|terminate|renewal-check|status Contract lifecycle management — tracks contract status (draft→sent→signed→active→expired→terminated), auto-renewal alerts, MC task creation, Slack notifications. DB: contracts.db. Types: NDA, DPA, contract, SLA, MSA.
contract-manager.js node ~/system/tools/contract-manager.js renewal-check [--dry-run] Check for contracts expiring within 30 days, create MC renewal tasks (auto-renew only), send Slack alerts to #ops
document-store.js node ~/system/tools/document-store.js store <client> <type> <file> Document storage & retention system — organizes business documents with retention policies. Standard path: ~/ALAI/clients/{client}/documents/{type}/. Types: contract (10y), nda (5y), invoice (5y), proposal (2y), dpa (10y), agreement (10y), signed (10y). DB: documents.db
document-store.js node ~/system/tools/document-store.js list [client] [--type TYPE] List documents with optional filters
document-store.js node ~/system/tools/document-store.js find <search> Search documents by client/filename/notes
document-store.js node ~/system/tools/document-store.js retention-check Flag documents past retention period (non-destructive)
document-store.js node ~/system/tools/document-store.js stats Storage statistics by type and client
send-signing-email.js node ~/system/tools/send-signing-email.js send|send-single|test|check ALAI branded document signing — creates DocuSeal submission + sends ALAI branded email with embedded logo via SMTP. Standard for all contracts/NDAs/DPAs. Always test first with test command.
nda-generator.js node ~/system/tools/nda-generator.js create <email> --name "Name" --company "Company" NDA PDF generator + DocuSeal signing flow — generates ALAI-branded NDA PDF via Puppeteer, uploads to DocuSeal, creates submission, sends ALAI branded signing emails. Flags: --preview (local PDF only), --test (send to post@alai.no), --orgnr, --address, --phone, --project.
fiken.js node ~/system/tools/fiken.js status|companies|invoices|contacts|balances|dashboard Fiken API v2 integration — invoices list/show/sync, contacts list/show/sync, bank balances, CEO dashboard data. Syncs to invoices.db + contacts.db.
invoice-generator.js node ~/system/tools/invoice-generator.js create|list|show|pay|pdf|send|remind|check-overdue|auto-remind|dashboard|stats Invoice CRUD with VAT, PDF/HTML generation, MCP email draft creation, auto-reminders (3 levels: friendly/firm/urgent), automatic escalation system (Day 7/14/30+)
invoice-generator.js node ~/system/tools/invoice-generator.js auto-remind [--dry-run] Automatic invoice reminder escalation — Day 7: friendly (LOW risk draft), Day 14: firm (LOW risk draft + Slack), Day 30+: HIGH MC task + URGENT Slack. Norwegian templates.
support-ticket.js node ~/system/tools/support-ticket.js create|list|show|update|assign|comment|stats Support ticket system with SLA tracking (P1-P4)
email-to-ticket.js node ~/system/tools/email-to-ticket.js --sender "email" --subject "subject" --body "body" --uid uid Email → ticket bridge — detects support emails, creates tickets, generates ACK drafts, Slack + HiveMind notifications
ticket-sla-checker.js node ~/system/tools/ticket-sla-checker.js SLA breach detector — monitors open tickets, escalates to Slack #ops, generates escalation drafts, HiveMind logs
ticket-resolve-notify.js node ~/system/tools/ticket-resolve-notify.js --ticket-id TKT-12345 Resolution notifier — generates client resolution email draft, HiveMind log
team-coordinator.js node ~/system/tools/team-coordinator.js teams|assign|handoff|block|unblock|sync|status Cross-team orchestration
onboard-client.js node ~/system/tools/onboard-client.js new|status|list|timeline|undo One-command client onboarding — orchestrates project scaffold, sales pipeline, support, teams, routing, welcome email, pipeline events, HiveMind
expansion-dashboard.js node ~/system/tools/expansion-dashboard.js [--compact] Aggregate view: companies, pipeline, invoices, support, teams
proposal-gen.js node ~/system/tools/proposal-gen.js create|edit|pdf|send|list|show|approve|reject Professional proposal generator — auto-populates from leads, generates PDF, sends via SMTP (3 templates: standard, landing-page, webapp)
pipeline-events.js node ~/system/tools/pipeline-events.js check-reminders Stage transition event handlers — auto-triggered by sales-pipeline.js on advance/lose, generates drafts (→ drafts.db), creates reminders (~/system/reminders/), logs to HiveMind, sends Slack notifications. Handlers: onQualified, onProposal, onNegotiating, onWon, onActive, onLost
follow-up.js node ~/system/tools/follow-up.js check [--auto] Follow-up reminder processor — scans ~/system/reminders/ for due reminders, generates language-aware follow-up drafts (NO/EN/BS), 3 escalation levels (day 3/7/14), Slack alert on day 14
follow-up.js node ~/system/tools/follow-up.js list List all pending follow-up reminders with due dates and escalation levels
follow-up.js node ~/system/tools/follow-up.js add <lead_id> <type> <days> Manually create follow-up reminder (types: proposal, inquiry)
drafts.js node ~/system/tools/drafts.js list|show|approve|reject|send|stats Draft approval workflow — 3-level risk classification (low/medium/high), content-based pattern matching, smart auto-approval
drafts.js node ~/system/tools/drafts.js process-auto [--dry-run] Auto-classify and process all pending drafts (LOW→approve+send, MEDIUM→approve+Slack+send, HIGH→manual)
drafts.js node ~/system/tools/drafts.js auto-approve [--type type1,type2] Auto-approve low-risk drafts (optional type filter)
drafts.js node ~/system/tools/drafts.js mark-sent <id> [--message-id mid] Mark draft as sent (updates linked invoice status)
drafts.js node ~/system/tools/drafts.js import Import JSON drafts from ~/system/drafts/
intake-analyzer.js node ~/system/tools/intake-analyzer.js detect-lang "text" Language detection (NO/EN/BS) via character markers + word frequency
intake-analyzer.js node ~/system/tools/intake-analyzer.js analyze "text" Request analysis via Ollama — extracts category/scope/urgency, generates 3 pricing options from Vizu pricing.md
intake-analyzer.js (module) const { detectLanguage, analyzeInquiry, generateOptions } = require('./intake-analyzer') Module API for client intake pipeline

intake-analyzer.js: Language detector (æøå→NO, ćčšžđ→BS, word frequency lists) + request analyzer (Ollama llama3.1:8b JSON extraction) + option generator (reads ~/ALAI/pipeline/Vizu/finance/pricing.md, maps category→packages, generates A/B/C options). Heuristic fallback when Ollama unavailable. Pure Node.js, no dependencies. Created: 2026-02-13 (MC #840).

follow-up.js: Automated follow-up reminder system. Proposal reminders: day 3 (gentle), day 7 (nudge), day 14 (final + Slack). General inquiry: day 5. Language-aware templates (NO/EN/BS) extracted from lead intake analysis. Idempotent processing (marks reminders as processed). Legacy reminder migration: infers missing escalation_level and lang fields from due date and lead notes. Wired into gotcha-health.sh (runs every 15 min). Reminder format: JSON files in ~/system/reminders/ with fields: id, lead_id, type, due_date, escalation_level, created_at, processed, lang. Created: 2026-02-13 (MC #840).

Image Generation

Tool Command Description
image-gen.js node ~/system/tools/image-gen.js --prompt "desc" --output path.png Generate image via Gemini (free) or Together.ai
image-gen.js node ~/system/tools/image-gen.js --setup gemini YOUR_KEY Save API key to config
image-gen.js node ~/system/tools/image-gen.js --prompt "desc" --count 4 Generate multiple images

Providers: Gemini (default, free, no CC), Together.ai (FLUX, free tier) Keys: ~/system/config/image-gen.json or env vars GEMINI_API_KEY, TOGETHER_API_KEY Get key: https://aistudio.google.com/apikey (2 min, no credit card)

| brand-compositor.js | node ~/system/tools/brand-compositor.js all | Deterministic brand asset generator — resize/composite REAL logo (profile-pic.png) onto social banners, profiles, favicons. No AI generation. | | brand-compositor.js | node ~/system/tools/brand-compositor.js profile\|avatar\|banner-linkedin\|banner-twitter\|og-image\|favicon | Generate specific asset type | | design-engine.js | node ~/system/tools/design-engine.js render <template> --data '{}' --output path.png | Puppeteer-based HTML/CSS template rendering engine — pixel-perfect typography with Inter font, retina quality | | design-engine.js | node ~/system/tools/design-engine.js list | List available templates |

Brand Compositor: Uses sharp (npm) for deterministic resize + composite. Same pixels every time. Source: ~/system/context/branding/alai/social/profile-pic.png. Output: ~/system/context/branding/alai/social/. Options: --source <file>, --output <dir>. Design Engine: Uses Puppeteer (headless Chrome) to render HTML templates with professional typography (kerning, ligatures, OpenType). Templates: linkedin-banner (1584x396), twitter-banner (1500x500), og-image (1200x630), profile-card (400x400), favicon (180x180). Uses {{mustache}} placeholders. Reuses browser for batch rendering. Module export: require('./design-engine'). Options: --data '{"key":"value"}', --output path.png, --scale 2. Created: 2026-02-10

Intel & News Aggregation

Tool Command Description
intel-briefing.js node ~/system/tools/intel-briefing.js Full daily briefing — fetch RSS + HN, summarize via Ollama, deliver to Slack #exec + HiveMind
intel-briefing.js node ~/system/tools/intel-briefing.js --preview Preview briefing in terminal
intel-briefing.js node ~/system/tools/intel-briefing.js --fetch Fetch only — list items without summarization
intel-briefing.js node ~/system/tools/intel-briefing.js --hours 48 Custom lookback period (default: 24h)

Sources (7): Anthropic News, Anthropic Engineering, Claude Code Changelog, OpenAI News, TechCrunch AI, Simon Willison, Hacker News API Summarization: Ollama llama3.1:8b (local, $0 cost) Delivery: Slack #exec channel + HiveMind + ~/system/logs/intel-briefing-{date}.md Daemon: com.edita.intel-briefing (daily 7:00 AM) MCP RSS: @missionsquad/mcp-rss added to Edita MCP config for live RSS queries Created: 2026-02-11

Tender Hunting & Public Procurement

Tool Command Description
tender-hunter-agent.js node ~/system/daemons/tender-hunter-agent.js Doffin (Norway) — TED API scanner for Norwegian IT tenders. Analyzes via Ollama, scores company fit (ALAI), stores in tenders.db. NO Puppeteer, NO Finn.no, NO TheHub.
tender-hunter-agent.js node ~/system/daemons/tender-hunter-agent.js --briefing Generate briefing from tenders.db (HOT/WARM summary)
tender-hunter-agent.js node ~/system/daemons/tender-hunter-agent.js --dry-run --verbose Test mode with detailed logging
bih-tender-hunter.js node ~/system/daemons/bih-tender-hunter.js BiH Tender Hunter — TED API (primary) + ejn.gov.ba (secondary) scanner for BiH IT tenders. Analyzes via Ollama, scores company fit (SnowIT), stores in bih-tenders.db.
bih-tender-hunter.js node ~/system/daemons/bih-tender-hunter.js --briefing Generate briefing from bih-tenders.db
bih-tender-hunter.js node ~/system/daemons/bih-tender-hunter.js --pages 5 Custom page count (default: 3)
bih-tender-hunter.js node ~/system/daemons/bih-tender-hunter.js --source ted|ejn Filter by data source (default: all)
bih-tender-hunter.js node ~/system/daemons/bih-tender-hunter.js --help Show usage and options

Doffin Agent:

BiH Agent:

Reporting & Analytics

Tool Command Description
auto-report.js node ~/system/tools/auto-report.js daily Daily brief — revenue, pipeline, tasks, decisions, alerts. Generates email draft in ~/system/drafts/
auto-report.js node ~/system/tools/auto-report.js weekly Weekly report — revenue summary, pipeline progress, team performance, achievements. Email draft with ALAI branding
auto-report.js node ~/system/tools/auto-report.js preview Preview report in terminal without generating draft
client-status-update.js node ~/system/tools/client-status-update.js generate [--dry-run] Weekly client status updates — queries MC for completed tasks per project, matches to client contacts, generates ALAI-branded HTML email drafts (MEDIUM risk). LaunchAgent: Mondays 08:00.
client-status-update.js node ~/system/tools/client-status-update.js list Show recently generated status update drafts

Auto-Report Features:

Dashboards

Dashboard URL Description
Mission Control https://mc.alai.no Task management, sessions, active work
CEO Dashboard https://mc.alai.no/ceo Executive metrics — revenue, pipeline, projects, decisions, alerts
Client Portal https://mc.alai.no/client?token=XXX Client-facing project status — tasks, tickets, SLA. Token-authenticated.

CEO Dashboard Features:

Client Portal Features:

Testing & Verification

Tool Command Description
smoke-test.js node ~/system/tools/smoke-test.js Run all smoke tests (Docker, Slack, daemons, MC, HiveMind)
smoke-test.js node ~/system/tools/smoke-test.js report Run all + post report to Slack #ops
smoke-test.js node ~/system/tools/smoke-test.js slack|docker|daemons|mc|hivemind Test specific suite
smoke-test.js node ~/system/tools/smoke-test.js api <url> Test specific API endpoint
health-check.js node ~/system/tools/health-check.js Monitor all services (Docker, HTTP, system, daemons) with human/JSON output
health-check.js node ~/system/tools/health-check.js --quick HTTP endpoints only (fast check)
health-check.js node ~/system/tools/health-check.js --json JSON output for programmatic use
daemon-health.js node ~/system/tools/daemon-health.js Daemon heartbeat monitor — checks all com.john.* LaunchAgents, reports PID/exit/status, detects unloaded plists
daemon-health.js node ~/system/tools/daemon-health.js --quick Quick status only
daemon-health.js node ~/system/tools/daemon-health.js --json JSON output for dashboards
auto-fix.js node ~/system/tools/auto-fix.js <service> <issue> Automated service recovery (restart loop prevention: max 3/hour)
ops-watchdog.js node ~/system/daemons/ops-watchdog.js Master watchdog daemon — health checks every 120s, auto-recovery via auto-fix.js, Slack alerts, event bus integration. Config: ~/system/config/ops-watchdog.json
cold-start.sh bash ~/system/ops/cold-start.sh Bring entire system up from fresh boot — 5-layer startup (infra→docker→core→business→workers→enrichment), pre-flight checks, verification
planka-sync.js node ~/system/tools/planka-sync.js test|status|sync <mc-id> MC↔Planka bidirectional sync — auto-moves cards on mc.js start/done/pause/resume
preflight-check.js node ~/system/tools/preflight-check.js --task <id> Pre-closure quality gate aggregator — checks GOTCHA, HOP Build, evidence, CoVe, validator, HiveMind, syntax before mc.js done
MCP playwright mcp__playwright__* (nativni Claude toolovi) Browser automation — navigate, click, fill, screenshot

Reports: ~/system/reports/smoke-test-*.json Protocol: Smoke test BEFORE + AFTER infra changes. Playwright for UI. npm test for code.

Deploy Quality Gate

Tool Command Description
qa-19.js node ~/system/tools/qa-19.js check <task-id> PRIMARY quality gate (ZAKON #14). 19-point check in 5 phases. Adapts per task type.
qa-19.js node ~/system/tools/qa-19.js list Show all 19 checks
quality-gate.js DELETED 2026-02-26 Superseded by qa-19.js. Do not use.

Checks (19): RAG queried, GOTCHA written, tools checked, context read, build passes, tests pass, no secrets, no debug artifacts, error handling, performance, output matches spec, evidence captured, destination verified, visual check, backup taken, self-review, validator review, quality gate, CEO acceptance. Rule: ZAKON #14 — Run qa-19.js check <task-id> before mc.js done. Minimum 15/19 (M priority) or 17/19 (H priority).

Anti-Hallucination & Drift Detection

Tool Command Description
cove.js node ~/system/tools/cove.js verify --task-id <id> --claims-file <path> Chain-of-Verification — deterministically re-verify session claims using claim-types.json spec. Reads JSONL, executes file/syntax/server/build checks, writes cove-report.json
cove.js node ~/system/tools/cove.js report --task-id <id> Display CoVe verification report for a task
vcr.js node ~/system/tools/vcr.js record --session-id <id> --tool <name> --input <json> --output <text> --duration <ms> Record a tool interaction to vcr.db (used by vcr-recorder.py hook)
vcr.js node ~/system/tools/vcr.js replay <session-id> Replay recorded session — re-executes deterministic tools (Read/Glob/Grep), compares output hashes, flags regressions
vcr.js node ~/system/tools/vcr.js list [--days 7] List recorded VCR sessions
vcr.js node ~/system/tools/vcr.js compare <session1> <session2> Diff two sessions — detect behavioral changes between recordings
drift-detector.js node ~/system/tools/drift-detector.js snapshot Collect today's behavioral metrics from all data sources (claims, email-audit, MC, HiveMind, verification audits)
drift-detector.js node ~/system/tools/drift-detector.js analyze Analyze recent trends — anomaly detection via rolling 7-day mean ± 2σ
drift-detector.js node ~/system/tools/drift-detector.js report [--days 30] Human-readable drift report with ASCII table

VCR activation: touch /tmp/vcr-recording to start, rm /tmp/vcr-recording to stop. Hook: vcr-recorder.py (PostToolUse, advisory). Drift daemon: com.john.drift-detector runs daily at 23:55 (snapshot + analyze). Alerts: HiveMind (always) + Slack #john-alerts (MEDIUM+). Rule: ~/system/rules/determinism-spectrum.md — maps all 44 system components to 5-level determinism scale.

Test Quality

Tool Command Description
test-auditor.js node ~/system/tools/test-auditor.js <project-dir> Scan test suite for weak validation — detects "no crash" without rejection, missing stupid-user inputs, unused chaos strings
test-auditor.js node ~/system/tools/test-auditor.js <dir> --json JSON output for pipeline integration

Detects: (1) Chaos tests with "no crash" but no rejection assertion, (2) Form fields missing stupid-user inputs (numbers in names, letters in phones), (3) CHAOS_STRINGS defined but unused. Exit: 0=clean, 1=findings. Rule: ~/system/rules/testing.md (Mandatory Input Rejection Tests section)

Plan Enforcement

Tool Command Description
plan-advance-step.js node ~/system/tools/plan-advance-step.js Manually advance to next plan step with gate checks (for builder agents)
plan-adherence-report.js node ~/system/tools/plan-adherence-report.js <task-id> Post-execution adherence report — did agent follow the plan? Shows step execution, violations, summary

Plan Enforcement Architecture:

Build Mode

Tool Command Description
build-mode.js node ~/system/tools/build-mode.js start <dir> [--task N] [--concurrency N] [--yolo] Activate build mode — bypass process hooks for project dir
build-mode.js node ~/system/tools/build-mode.js stop [--status completed|failed] Deactivate build mode
build-mode.js node ~/system/tools/build-mode.js status Show current build mode state
build-mode.js node ~/system/tools/build-mode.js pause|resume Pause/resume build mode
build-mode.js node ~/system/tools/build-mode.js sessions [--limit N] List build sessions
build-mode.js node ~/system/tools/build-mode.js autocoder [--project-dir <dir>] [--yolo] Launch AutoCoder agent
build-mode.js node ~/system/tools/build-mode.js update-features <total> <passing> Update feature progress

Build Mode: Switches from Operations→Build mode. Bypasses GOTCHA checklist, delegation enforcer, agent protocol, verification gate for files WITHIN project dir. Security hooks (forbidden paths, hallucination, bash security) remain active. 8h TTL auto-expire. DB: build_sessions table in mission-control.db. Flag: /tmp/build-mode-active.json. Hook: ~/.claude/hooks/build_mode.py (shared module). AutoCoder: ~/system/services/autocoder/ — autonomous coding agent (Python, Claude Agent SDK). Initializer creates features in SQLite, Coding Agent implements them. Supports parallel mode (--concurrency) and YOLO mode (skip browser tests). Skill: /build <dir> — activates build mode via skill.

Build Pipeline

Tool Command Description
build-project.js node ~/system/tools/build-project.js prep "Name" "type" "Description" Scaffold + CLAUDE.md + onboard + spec + task
build-project.js node ~/system/tools/build-project.js deploy "Name" Vercel deploy
build-project.js node ~/system/tools/build-project.js status "Name" Check project state
assert-log.sh source ~/system/tools/assert-log.sh Structured assertion library for deterministic verification (Phase 1)
gate-pre-claim.sh bash ~/system/tools/gate-pre-claim.sh --spec spec.json --workdir /path Pre-claim verification gate — file exists, hash changed, forbidden patterns (Phase 2)
gate-pre-claim.sh bash ~/system/tools/gate-pre-claim.sh --snapshot --workdir /path Snapshot file hashes before build
gate-pre-deploy.sh bash ~/system/tools/gate-pre-deploy.sh --project-dir /path Pre-deploy verification gate — tests, build, artifacts, TODO check (Phase 4)

| pipeline-controller.js | node ~/system/tools/pipeline-controller.js create\|status\|advance\|gate\|gate-pass\|abort\|resume\|history\|list\|dashboard | Central pipeline orchestrator — tracks projects through 13 lifecycle phases (lead→support), automated gate checks, phase history, abort/resume. DB: pipeline.db | | pipeline-watchdog.js | node ~/system/tools/pipeline-watchdog.js scan\|status [--auto-resume] [--notify] | Detects stalled pipelines (2h threshold), orphan Claude team tasks (1h), stale MC tasks. Marks stalled, auto-resumes, Slack alerts (2h cooldown). Skips aborted. | | docuseal-webhook.js | node ~/system/tools/docuseal-webhook.js start [--port 3033] | Standalone DocuSeal webhook server — emits contract.signed events to event-bus. Port 3033. MC #1039 | | docuseal-register-webhook.js | node ~/system/tools/docuseal-register-webhook.js register\|list\|delete [--url URL] | DocuSeal webhook registration helper — register/list/delete webhooks via API. Requires vault session. MC #1756 | | test-docuseal-webhook.sh | bash ~/system/tools/test-docuseal-webhook.sh | Test DocuSeal webhook endpoint with mock payloads. MC #1756 | | rollback.js | node ~/system/tools/rollback.js tag\|list\|rollback\|status <project> | Git tag-based deployment rollback — tag deploys, list history, one-command rollback. Projects in ~/projects/. | | post-mortem.js | node ~/system/tools/post-mortem.js generate\|create\|list\|show | Incident post-mortem management — generate from ticket, create blank, list/show. Template: ~/system/template/post-mortem.md. Output: ~/system/reports/post-mortems/ |

Types: landing-page | nextjs-app | api-backend Templates: ~/system/template/types/<type>/CLAUDE.md + spec.md CI/CD: ~/system/template/github-actions/ci.yml (copied by scaffold.sh), ~/system/template/docker-compose.staging.yml Deploy: --platform vercel|railway|fly (auto-detects from type if omitted) Pipeline Gates: Part of Zero-Hallucination Deterministic Build Pipeline

Client Interaction & Design Review

Tool Command Description
preview-share.js node ~/system/tools/preview-share.js start|stop|status|list Client preview sharing — starts local dev server + Cloudflare tunnel for public URL. Auto-detects build output dirs.
design-approval.js node ~/system/tools/design-approval.js create|list|approve|reject|show|stats Design review workflow — tracks design approval from draft→sent→reviewing→approved/rejected→implemented. DB: design-reviews.db
design-board.js node ~/system/tools/design-board.js create|list|stop|restart Client-facing design review board — ALAI-branded web page with design options, feedback form, approve/reject. Cloudflare tunnel (http2 protocol) for public URL. Health check endpoint. Integrates with design-reviews.db.
client-signoff.js node ~/system/tools/client-signoff.js create <project> <email> --type uat|delivery [--project-type webapp] [--message "X"] UAT + delivery approval workflow. Sends email with approval link, client approves/rejects via web UI (https://mc.alai.no/signoff/{token}), pipeline auto-advances. Commands: create, status, approve, reject, checklist, check, list. DB: design-reviews.db

UAT Template: ~/system/template/uat-checklist.md (per project type: webapp, landing-page, api-backend) DB: ~/system/databases/design-reviews.db (reviews + signoffs tables)

File Editing

Tool Command Description
smart-edit.js node ~/system/tools/smart-edit.js view <file> [start-end] Show file lines with line numbers
smart-edit.js node ~/system/tools/smart-edit.js replace <file> <start-end> <content> Replace line range with new content
smart-edit.js node ~/system/tools/smart-edit.js insert <file> <after> <content> Insert content after line number
smart-edit.js node ~/system/tools/smart-edit.js delete <file> <start-end> Delete line range
smart-edit.js node ~/system/tools/smart-edit.js append <file> <content> Append content to end of file

Why: Line-number based editing is more reliable than str_replace (exact match failures). Inspired by The Harness Problem. Reduces edit fail rate from ~15-20% to ~5%. Backup: Auto-creates .bak before each edit. Use --no-backup to skip. Stdin: Use - as content arg to pipe content via stdin (for multi-line edits). Lines: 1-indexed, inclusive ranges (10-15 = lines 10 through 15). Workflow: view to see lines → replace/insert/delete by line number.

Daemons (LaunchAgents)

Daemon Interval Description
com.john.slack-bot always Slack bot — Claude Haiku via Socket Mode. AI: API → CLI → Ollama. Needs SLACK_BOT_TOKEN + SLACK_APP_TOKEN
com.john.mc-dashboard always Mission Control web dashboard (port 3030) — includes CEO Dashboard at /ceo, DocuSeal webhook at /webhooks/docuseal (auto-advances pipeline on NDA/contract signing)
com.john.mc-session-worker on session events Session state extraction
com.john.pipeline-watcher 60 sec Pipeline event dispatcher + invoice auto-reminder daemon — checks unsigned proposals, triggers invoice escalation (Day 7/14/30+ reminders)
com.john.event-dispatcher always Event bus dispatcher daemon — polls events.db every 2s, routes to handlers, retry with backoff, dead letter queue
com.john.outbox-processor always Outbox processor daemon — polls durable-runner.db + mission-control.db outbox tables every 2s, emits to event-bus, purges old events (7d+). MC #1760
com.john.ops-watchdog always Master watchdog — health checks every 120s, auto-recovery, Slack alerts, event bus. Config: ~/system/config/ops-watchdog.json
com.john.client-status-update Monday 08:00 Weekly client status update generator — queries MC for completed tasks, generates ALAI-branded email drafts per project
com.john.network-watchdog 60 sec Network monitoring daemon — ping gateway, DNS resolution check, internet connectivity check. Alert chain: Slack ops → macOS notification → log. 3 consecutive failures trigger alert with 10min cooldown. Tracks uptime stats.
com.john.vault-keeper always Vault auto-unlock daemon — auto-unlocks Vaultwarden using macOS Keychain password, session refresh every 15min, circuit breaker, macOS notifications

Ops Documentation: ~/system/ops/ — service catalog, dependency map, 15 runbooks, cold-start script, ops README. Ops Dashboard: https://mc.alai.no/ops (status page), /api/ops/health (JSON), /api/ops/history (events)

Env Vars (both profiles):

Boards (Planka — Kanban)

Tool URL Description
Planka https://boards.alai.no Kanban boards per project (Trello-like)
Planka local http://localhost:3100 Direct local access (use https://boards.alai.no for sharing)

Admin: john / BasicAS2026! User: alem / Alem2026! Password reset: node ~/system/tools/planka-admin.js reset-password <username> <new-pass> Add user: node ~/system/tools/planka-admin.js add-user <email> <username> <name> <pass> SMTP: Configured (send.one.com:465, john@alai.no) — za notifikacije Docker: ~/system/services/planka/docker-compose.yml Projects: Wizard NUF, Ren Drom, Riad Basic, Drop Fintech, ALAI Internal, BasicAS Operations Hosting: Azure Container Apps (boards.alai.no via Cloudflare DNS)

Setup & Backup

Tool Command Description
syslog.sh bash ~/system/tools/syslog.sh add "opis" System Changelog — logira promjene za oba agenta
syslog.sh bash ~/system/tools/syslog.sh today Današnje changelog entries
syslog.sh bash ~/system/tools/syslog.sh recent [N] Zadnjih N entries
setup-backup.sh bash ~/system/tools/setup-backup.sh "opis" Backup setup files + changelog
sync-to-mini.sh bash ~/system/tools/sync-to-mini.sh [--execute] Sync GOTCHA to Mac Mini
daemon-manager.js node ~/system/daemons/daemon-manager.js list|start|stop|status Manage persistent background services
team-cleanup.sh bash ~/system/tools/team-cleanup.sh [--force] [--days N] Clean stale Agent Teams task/team dirs (default 7d)

Company Management

Tool Command Description
company.sh ~/system/tools/company.sh list|info|add Company registry management
company-worker.js node ~/system/tools/company-worker.js run|run-all|status|list|dry-run Autonomous work loop generator for pipeline companies. Generates MC tasks per company (Securion/Proveo/Proxima), posts to Slack/HiveMind, emits events. Config: ~/system/tools/config/company-worker-config.json
skill-resolver.js node ~/system/tools/skill-resolver.js resolve <skill-name> [--company X] Resolve skill path with company override. Priority: ~/companies/COMPANY/skills/SKILL/SKILL.md (if company set) → ~/.claude/skills/SKILL/SKILL.md (global fallback). Returns absolute path or exit 1. Performance: ~47ms.
tool-resolver.js node ~/system/tools/tool-resolver.js check <tool-name> [--company X] Check if tool allowed for company via tools.json config. Modes: whitelist (financial), blacklist (dev), inherit-all (orchestrators). Pattern matching: exact + glob (invoice-*.js). Returns ALLOWED|DENIED with reason on stderr. Performance: ~49ms.

Skills (Claude Code Slash Commands)

Command Description
/plan-with-team Creates plan with builder/validator teams
/build-plan Executes approved plan using TaskList
/code-review Systematic GOTCHA code review (security, quality, performance)
/debugging Systematic bug investigation and resolution
/security-audit OWASP Top 10 + config + infra security review
/design-system AI-powered design generator — multi-tool (v0.dev, Google Stitch, Figma Make, Codia AI). Prompt templates per tool. Brief → kickass design + code.
/figma-design Figma WebSocket bridge operations — populate design systems, create screens programmatically
/build Switch to Build Mode — bypass process hooks, launch AutoCoder, track sessions

Workflow: /plan-with-team "task" → plan → approval → /build-plan → execution Build: /build <project_dir> → activate build mode → code freely → stop Design: /design-system "brief" → AI tool selection → optimized prompts → Figma + code Review: /code-review <file> or /security-audit <target> Debug: /debugging "<bug description>"

Vector & Semantic Search

Tool Command Description
vector-db.js node ~/system/tools/vector-db.js help Hybrid Vector DB: SQLite + vector columns for semantic search. Reusable module.
vector-db.js (module) const { VectorDB } = require('./vector-db') Module API: createCollection(), insert(), search(), hybridSearch(), bulkInsert()
vector-db.js search node ~/system/tools/vector-db.js search <db> <collection> <query> Semantic search via Ollama nomic-embed-text (768-dim)
vector-db.js hybrid node ~/system/tools/vector-db.js hybrid <db> <col> <query> --where "cond" SQL filter + vector ranking combined
knowledge-base.js node ~/system/tools/knowledge-base.js add <url-or-file> [--tag t] KB: drop URL/file → chunk → vector store. Semantic search over all docs.
knowledge-base.js node ~/system/tools/knowledge-base.js search <query> [--tag t] Semantic search across knowledge base documents
humanizer.js echo "text" | node ~/system/tools/humanizer.js [--deep] Remove AI patterns from text. Quick (regex) or deep (Ollama rewrite). Module: require('./humanizer')
hourly-backup.sh bash ~/system/tools/hourly-backup.sh [--dry-run|--list] Hourly auto-commit to 'auto-backup' branch across all repos. LaunchAgent: com.john.hourly-backup.
db-backup.sh bash ~/system/tools/db-backup.sh [--list|--restore] Daily SQLite backup (14 DBs). sqlite3 .backup, tar.gz, 30-day rotation. LaunchAgent: com.john.db-backup (03:00).
cron-notify.sh bash ~/system/tools/cron-notify.sh "job" "OK|ERROR" "details" Post cron results to Slack #ops channel. Used by db-backup, hourly-backup.
memory-indexer.py python3 ~/system/tools/memory-indexer.py index|search|stats|test-embed Index ~/system/ MD files into knowledge.db (SQLite + Ollama nomic-embed-text, 768-dim, tag='memory-file')

Vector Pattern: Embeddings stored as BLOB (Float32Array) in SQLite. Cosine similarity computed in JS. Model: nomic-embed-text (768-dim, local Ollama). Batch embedding supported (32/batch). Usage tracked via usage-tracker.js. Unified model: ALL embedding tools use nomic-embed-text via Ollama — no model mismatch.

RAG & Knowledge Flywheel

Tool Command Description
retrieval-orchestrator.js node ~/system/tools/retrieval-orchestrator.js query "text" [--limit N] [--verbose] Multi-store retrieval: HiveMind + Knowledge DB + RAG Cache + Sessions → RRF merge
retrieval-orchestrator.js node ~/system/tools/retrieval-orchestrator.js stats Store statistics (coverage, entry counts)
retrieval-orchestrator.js node ~/system/tools/retrieval-orchestrator.js stores List available stores and status
session-archiver.js node ~/system/tools/session-archiver.js stats Session file statistics (count, size, savings)
session-archiver.js node ~/system/tools/session-archiver.js archive [--dry-run] [--days 14] Strip raw transcripts from old sessions
session-archiver.js node ~/system/tools/session-archiver.js index [--limit N] Embed session summaries into knowledge DB
session-archiver.js node ~/system/tools/session-archiver.js cleanup [--dry-run] Archive + index (LaunchAgent runs daily 03:00)
docuseal-monitor.js node ~/system/tools/docuseal-monitor.js check Poll DocuSeal for new signings → Slack + email + HiveMind + contracts.db
docuseal-monitor.js node ~/system/tools/docuseal-monitor.js status Show recent DocuSeal submissions with signer status
docuseal-monitor.js node ~/system/tools/docuseal-monitor.js history All tracked signings from contracts.db
rag-health.js node ~/system/tools/rag-health.js Full RAG health check: Ollama, Knowledge DB, HiveMind, RAG Cache, Session Archiver, Orchestrator smoke
rag-health.js node ~/system/tools/rag-health.js --json JSON output (for ops-watchdog integration)
rag-health.js node ~/system/tools/rag-health.js --alert Exit 1 if any critical check fails (for cron/alerting)
rag-health.js node ~/system/tools/rag-health.js --smoke Run orchestrator smoke query only
lightrag.js node ~/system/tools/lightrag.js query "question" [--mode hybrid|local|global|naive] LightRAG REST client — semantic query, document upload, graph exploration, RAG cache sync via configured Azure/Cloud endpoint
lightrag.js node ~/system/tools/lightrag.js upload <file-or-dir> [--recursive] Upload documents to LightRAG knowledge graph
lightrag.js node ~/system/tools/lightrag.js explore [--entity "name"] [--limit N] Explore knowledge graph entities and relationships
lightrag.js node ~/system/tools/lightrag.js status Get LightRAG system status and statistics
lightrag.js node ~/system/tools/lightrag.js sync-from-rag Import rag-router cache → LightRAG
lightrag.js node ~/system/tools/lightrag.js sync-to-rag Export LightRAG results → rag-router cache
lightrag-migrate.js node ~/system/tools/lightrag-migrate.js start [--source hivemind|knowledge|both] [--rate 2] [--limit 1000] [--tier 1] [--type type1,type2] [--tag tag] [--dry-run] Daemon: migrate HiveMind + Knowledge DB to LightRAG (HTTP API). Idempotent, rate-limited (default 2 docs/min), resumable with state tracking.
lightrag-migrate.js node ~/system/tools/lightrag-migrate.js status Show migration progress (source, last_id, total_migrated, failed, rate)
lightrag-migrate.js node ~/system/tools/lightrag-migrate.js stop Stop running migration daemon (graceful SIGTERM + kill)
lightrag-migrate.js node ~/system/tools/lightrag-migrate.js reset Clear migration state file (/tmp/lightrag-migration-state.json)
rag-router.js node ~/system/tools/rag-router.js query "text" RAG intelligence router — embed, cache search, local model dispatch, interaction logging
rag-router.js node ~/system/tools/rag-router.js learn "question" "answer" Add Q&A pair to RAG cache
rag-router.js node ~/system/tools/rag-router.js stats Flywheel metrics (cache hit rate, cost savings)
rag-router.js node ~/system/tools/rag-router.js test Run self-test suite
rag-router.js node ~/system/tools/rag-router.js capture <id> "response" Capture external response for interaction, auto-index to cache
rag-router.js (module) const { RAGRouter } = require('./rag-router') Module API: query(), learn(), capture(), stats()
rag-mcp.js MCP server (stdio) RAG MCP server — exposes rag_query, rag_learn, rag_stats tools. Config: ~/.claude/mcp.json
MCP rag mcp__rag__rag_query Route query through RAG cache + local models. Returns response or needs_external flag
MCP rag mcp__rag__rag_learn Add Q&A pair to RAG cache with source tracking
MCP rag mcp__rag__rag_stats Flywheel metrics (cache hit rate, cost savings, training queue)
flywheel-extractor.js node ~/system/tools/flywheel-extractor.js extract [--output path] [--batch-name "X"] Extract external interactions from flywheel.db → JSONL for alaiML training
flywheel-extractor.js node ~/system/tools/flywheel-extractor.js stats Show training queue size, extraction batches
flywheel-indexer.js node ~/system/tools/flywheel-indexer.js index [--batch YYYYMMDD] [--dry-run] Sync high-quality external responses back to rag_cache (closes the loop)
flywheel-indexer.js node ~/system/tools/flywheel-indexer.js stats Show pending/cached/total counts
flywheel-session-extractor.js node ~/system/tools/flywheel-session-extractor.js extract [--dry-run] [--limit N] Extract Q&A pairs from Claude Code session transcripts → RAG cache
flywheel-session-extractor.js node ~/system/tools/flywheel-session-extractor.js stats Show extraction metrics (processed/pending sessions, pairs extracted)
flywheel-session-extractor.js node ~/system/tools/flywheel-session-extractor.js reprocess <session-id> Force re-extract a specific session

RAG Flywheel Architecture:

OSINT Investigation

Tool Command Description
investigate.js node ~/system/tools/investigate.js investigate --phone X --name Y --email Z --location W OSINT person lookup — spawns 4 parallel Claude subagents (phone, social, business, news) + synthesizer. SQLite backend with confidence scoring.
investigate.js node ~/system/tools/investigate.js show <id> Show investigation findings grouped by category
investigate.js node ~/system/tools/investigate.js list List all investigations
investigate.js node ~/system/tools/investigate.js report <id> Full formatted investigation report
investigate.js node ~/system/tools/investigate.js save-findings <id> <source> <json> Save agent findings (internal — used by orchestrator)
investigate.js node ~/system/tools/investigate.js complete <id> Mark investigation as complete

Architecture: 4 parallel investigator agents + 1 synthesizer:

  1. Phone Lookup — phone directories, carrier, business listings
  2. Social Media — LinkedIn, Facebook, Instagram, GitHub, Twitter/X
  3. Business Registry — BiH registar, OpenCorporates, Brønnøysund, court records
  4. News & Public — klix.ba, avaz.ba, nrk.no, Google News, academic records
  5. Synthesizer — deduplication, cross-reference, confidence upgrade, profile building

Confidence levels: verified (2+ sources), likely (1 reliable), possible (indirect), unverified (uncertain) Phone parser: Auto-detects BiH (06x→+387) and Norwegian (4x/9x→+47) numbers DB: ~/system/databases/investigations.db Created: 2026-02-21

Databases (~/system/databases/)

Database Description
investigations.db OSINT person investigations — use investigate.js
leads.db Sales pipeline / Lead CRM — use sales-pipeline.js
invoices.db Invoice tracking — use invoice-generator.js
contracts.db Contract lifecycle management — use contract-manager.js
documents.db Document storage & retention — use document-store.js
tickets.db Support tickets with SLA — use support-ticket.js
teams.db Cross-team coordination — use team-coordinator.js
strategy-tracker.db Strategic goals
alem-directives.db Alem's direct orders
projects.db Project lifecycle (phases, milestones, metrics)
hivemind.db Agent shared intelligence
facts.db Critical facts with event-sourced history — use facts.js
drafts.db Email draft approval workflow — use drafts.js
events.db Event bus store — use event-bus.js
flywheel.db RAG flywheel — interactions log + cache. Use rag-router.js
projects.json Routing registry — use route.js
company-registry.json Company information registry

Enforcement Hooks (~/.claude/hooks/)

Hook Matcher Description
security-guard.py .* (all tools) Blocks forbidden paths, dangerous commands, delete protection, business-critical doc enforcement
agent-protocol-enforcer.py Task CORE PROTOCOL enforcement for subagent spawning
gotcha-enforcer.py Write|Edit|NotebookEdit|Bash Boot flag + MC active task enforcement
gate-pre-commit.py Bash Pre-commit validation
hallucination-detector.py Write|Edit Phantom tools, phantom paths, wrong ports, phantom require/import detection
teammate-quality-gate.py TeammateIdle Quality gate for agent teammates — checks TODO/FIXME markers, syntax errors in recent files. Exit 2 = keep working

Global: All hooks apply to ALL agents (parent + subagents) via ~/.claude/settings.json. ZAKON #1: AI bez enforcement-a ne radi. Hooks su deterministički enforcement.

Design & Figma

Tool Command Description
figma-extract.js node ~/system/tools/figma-extract.js extract-tokens <file-key> Extract design tokens (colors, typography, effects) from Figma file
figma-extract.js node ~/system/tools/figma-extract.js extract-components <file-key> List components with metadata and variants
figma-extract.js node ~/system/tools/figma-extract.js frame-to-prompt <file-key> <node> Generate implementation prompt from Figma frame
figma-extract.js node ~/system/tools/figma-extract.js file-info <file-key> File metadata and pages
figma-to-react.js node ~/system/tools/figma-to-react.js <file-key> <node-id> --output Login.tsx Figma → React + Tailwind — generates production React TSX from Figma frame via REST API. Post-processing: Pass 1 token replacement (figma-token-map.json), Pass 2 component mapping (figma-component-map.json), Pass 3 icon resolution (Lucide). Flag: --no-post-process to skip.
figma-to-react.js node ~/system/tools/figma-to-react.js <file-key> <node-id> --component Name Custom component name (default: derived from frame name)
figma-to-react.js node ~/system/tools/figma-to-react.js <file-key> <node-id> Output to stdout (pipe to file or preview)
figma-validate.js node ~/system/tools/figma-validate.js compare <file-key> <node-id> <url> --output /tmp/validate/ Visual validation tool — compare built page vs Figma design via pixel diff. Exit: 0=PASS 1=FAIL 2=ERROR. Enforces ZAKON 0.1
figma-validate.js node ~/system/tools/figma-validate.js compare ... --threshold 0.05 --viewport 1920x1080 Custom threshold (default 0.1=10%) and viewport (default 375x812)
figma-token-sync.js node ~/system/tools/figma-token-sync.js <file-key> --output ./tokens/ --format all Figma Variables → Design Tokens — extracts Variables API → W3C DTCG JSON + Tailwind theme + CSS custom properties. Supports modes (light/dark).
figma-token-sync.js node ~/system/tools/figma-token-sync.js <file-key> --format tailwind --output ./tailwind-tokens.js Single format: tailwind, css, w3c, json, or all
figma-token-map.json ~/system/config/figma-token-map.json Hex color → Tailwind token lookup table for figma-to-react.js Pass 1 (token replacement). Source: Bilko tailwind.config.ts
figma-component-map.json ~/system/config/figma-component-map.json Figma component → shadcn/ui mapping + Lucide icon map for figma-to-react.js Pass 2-3 (component mapping, icon resolution)
figma-populate.js bun ~/system/tools/figma-populate.js <channel-id> Populate Figma with design tokens (colors, typography, spacing, radius, buttons) via WebSocket bridge
v0-generate.js node ~/system/tools/v0-generate.js generate "prompt" v0.dev Platform API wrapper — prompt → React+Tailwind code. Also generates optimized prompts for manual use.
v0-generate.js node ~/system/tools/v0-generate.js generate --brief Name --screen login --industry fintech --primary "#hex" Structured brief → optimized prompt
v0-generate.js node ~/system/tools/v0-generate.js prompt --brief Name --industry fintech Output prompt only (no API call) — for copy-paste into v0.dev or Google Stitch
v0-generate.js node ~/system/tools/v0-generate.js setup <api-key> Save v0.dev API key
design-to-code.js node ~/system/tools/design-to-code.js assemble --stitch-code <html> --assets-dir <dir> --target-page <tsx> Assemble Stitch HTML + Figma assets → Next.js TSX. Converts HTML→JSX, inline styles→Tailwind, integrates assets, optional logic preservation.
design-to-code.js node ~/system/tools/design-to-code.js assemble ... --preserve-logic Extract and keep business logic (useState, handlers) from existing page
MCP figma mcp__figma__* (native Claude tools) Figma MCP integration — direct Figma access from Claude

Config: ~/system/config/figma.json or FIGMA_TOKEN env var v0 Config: ~/system/config/v0.json or V0_API_KEY env var File key: From Figma URL — figma.com/design/<FILE-KEY>/... Node ID: From Figma URL (select frame, copy link) or use figma-extract.js list-nodes <file-key> Figma bridge: WebSocket on port 3055 (bun). Channel ID from Figma Desktop → Plugins → Claude MCP Plugin. External AI tools: v0.dev ($20/mo), Google Stitch (free: stitch.withgoogle.com), Figma Make (native), Codia AI (Figma plugin) Design output: ~/system/design-output/ Created: 2026-02-12 (figma-extract), 2026-02-13 (figma-populate, v0-generate, /design-system skill), 2026-02-14 (figma-to-react, figma-validate, figma-token-sync)

Browser Form Filling

Tool Command Description
form-filler.py python ~/system/tools/form-filler.py <url> <fields.json> Fill web forms from JSON config — visible browser (Alem sees), CAPTCHA pause, screenshot
form-filler.py python ~/system/tools/form-filler.py <url> <fields.json> --headless --submit Headless auto-fill + submit
form-filler.py python ~/system/tools/form-filler.py <url> <fields.json> --wait-for-captcha --submit Fill, pause for CAPTCHA, submit
form-filler.py python ~/system/tools/form-filler.py <url> <fields.json> --screenshot /tmp/out.png Fill + screenshot
form-filler.py python ~/system/tools/form-filler.py <url> <fields.json> --dry-run Print fields without browser

Pre-built configs: ~/system/tools/form-configs/

JSON format: {"fields": [{"selector": "label=X", "value": "Y", "type": "text|select|checkbox|radio|date|click|file"}], "submit_selector": "button[type='submit']"} Selectors: CSS (input[name='x']), text=, placeholder=, label=, role=, nth=N suffix Requires: Python Playwright (pip install playwright) Created: 2026-02-18

Archived (NE POSTOJE — samo za referencu)

Tool Status Note
session-save.sh REMOVED (2026-02-07) Orphaned code, never hooked, conflicts with session-ledger.sh
memory-lookup.js REMOVED Zamijenjeno HiveMind-om
memory-search.js REMOVED Zamijenjeno HiveMind-om
mail.js NEVER EXISTED Haluciniran
mail-filter.js NEVER EXISTED Haluciniran
security.js NEVER EXISTED Haluciniran — pravi enforcement = ~/.claude/hooks/
secure-config.js NEVER EXISTED Haluciniran
keychain-helper.js NEVER EXISTED Haluciniran
design-enforcer.js NEVER EXISTED Haluciniran
optimize-images.js NEVER EXISTED Haluciniran
strategy-tracker.js NEVER EXISTED Haluciniran
deploy-strategy-tracker.js NEVER EXISTED Haluciniran
prompt-tester.js NEVER EXISTED Haluciniran
self-improve.js NEVER EXISTED Haluciniran
send-to-edita.js NEVER EXISTED Haluciniran
generate-boot.js NEVER EXISTED Haluciniran
generate-today.js NEVER EXISTED Haluciniran
solution-finder.js NEVER EXISTED Haluciniran
docusign.js NEVER EXISTED Haluciniran
validator.js ARCHIVED (2026-02-06) Was orphaned — see ~/system/archive/
laws-enforcer.js ARCHIVED (2026-02-06) Was checker-only — see ~/system/archive/
email-smtp-imap-mcp DEPRECATED (2026-02-11) Community MCP server — unreliable, replaced by custom email-mcp-bridge.js
mcp-email-server (ai-zerolab) TESTED (2026-02-11) Python MCP — ClosedResourceError bug, not used

brand-package.js

Purpose: Generate brand package (guidelines, colors, typography) for company factory pipeline
Location: ~/system/tools/brand-package.js
Usage: node ~/system/tools/brand-package.js "ProjectName" --logo /path/to/logo.png [--colors "primary:#hex,secondary:#hex"] [--output /path/]
Dependencies: None (pure Node.js)
Output: Creates brand-guidelines.md, colors.json, typography.json
Features: Extracts colors from PNG logo, supports color overrides, generates complete brand identity
Created: 2026-02-09

Go-Live Runbook

Go-Live Runbook

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

1. Go-Live Overview

What: {{PROJECT_NAME}} v{{VERSION}} production launch When: {{LAUNCH_DATE}} at {{LAUNCH_TIME}} {{TIMEZONE}} Deployment window: {{WINDOW_START}} – {{WINDOW_END}} ({{WINDOW_DURATION}}h window) Go-Live Type: {{TYPE}}

Incident Commander: {{IC}} (primary), {{IC_BACKUP}} (backup) Technical Lead: {{TECH_LEAD}} Communications Lead: {{COMMS_LEAD}} War Room: {{WAR_ROOM_LINK}} Status Page: {{STATUS_PAGE_URL}}


2. Pre-Launch Checklist

T-7 Days: Infrastructure Verification

Owner: {{INFRA_OWNER}} | Due: T-7 days


T-5 Days: DNS Configuration

Owner: {{DNS_OWNER}} | Due: T-5 days


T-5 Days: SSL Certificates

Owner: {{SSL_OWNER}} | Due: T-5 days


T-3 Days: CDN Configuration

Owner: {{CDN_OWNER}} | Due: T-3 days


T-3 Days: Database Migration

Owner: {{DB_OWNER}} | Due: T-3 days


T-2 Days: Feature Flags

Owner: {{FF_OWNER}} | Due: T-2 days


T-2 Days: Third-Party Integrations

Owner: {{INTEGRATION_OWNER}} | Due: T-2 days


T-1 Day: Monitoring & Alerting

Owner: {{MONITORING_OWNER}} | Due: T-1 day


T-1 Day: Backup Verification

Owner: {{BACKUP_OWNER}} | Due: T-1 day


Owner: {{LEGAL_OWNER}} | Due: T-1 day


T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)


3. Launch Day Procedure (Hour by Hour)

H-0: Deployment Start

Time Action Owner Status Notes
H+0:00 Announce in war room: "Deployment started" {{IC}}
H+0:00 Take final pre-deploy database backup {{DB_OWNER}}
H+0:05 Enable maintenance mode (if applicable) {{DEPLOY_OWNER}}
H+0:10 Trigger production deployment pipeline {{DEPLOY_OWNER}} Pipeline: {{PIPELINE_LINK}}
H+0:15 Monitor deployment progress {{TECH_LEAD}}

H+0:15 → H+0:45: Database Migration Execution

Time Action Owner Status
H+0:15 Confirm deployment artifact ready {{DEPLOY_OWNER}}
H+0:20 Run database migrations: bash scripts/migrate-prod.sh {{DB_OWNER}}
H+0:25 Verify migration completed: bash scripts/verify-migration.sh {{DB_OWNER}}
H+0:30 Confirm new application instances healthy {{TECH_LEAD}}
H+0:40 Deploy new application version to all instances {{DEPLOY_OWNER}}

H+0:45 → H+1:00: DNS Cutover

Time Action Owner Status
H+0:45 Point DNS to production load balancer {{DNS_OWNER}}
H+0:50 Monitor DNS propagation {{DNS_OWNER}}
H+0:55 Confirm HTTPS working from external network {{TECH_LEAD}}
H+1:00 Disable maintenance mode {{DEPLOY_OWNER}}

H+1:00 → H+1:30: Smoke Tests

Time Action Owner Status
H+1:00 Run automated smoke tests: bash scripts/smoke-tests.sh production {{QA_OWNER}}
H+1:10 Manual smoke test — critical user journey 1 {{QA_OWNER}}
H+1:15 Manual smoke test — critical user journey 2 {{QA_OWNER}}
H+1:20 Verify payment processing (test transaction) {{QA_OWNER}}
H+1:25 Verify email delivery (test email) {{QA_OWNER}}
H+1:30 All smoke tests PASS → proceed to monitoring {{IC}}

H+1:30 → H+2:00: Monitoring Verification

Time Action Owner Status
H+1:30 Verify error rate < {{ERROR_THRESHOLD}}% {{TECH_LEAD}}
H+1:35 Verify P99 latency < {{P99_THRESHOLD}}ms {{TECH_LEAD}}
H+1:40 Verify no unexpected spikes in DB CPU/connections {{DB_OWNER}}
H+1:50 Begin enabling feature flags (per rollout plan) {{FF_OWNER}}
H+2:00 Declare go-live successful {{IC}}

4. Post-Launch Monitoring (T+1 to T+7)

Enhanced Monitoring Period

Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal

Period Check Frequency Responsible
H+0 to H+4 Every 30 min On-call engineer
H+4 to H+24 Every 60 min On-call engineer
Day 2-7 Standard monitoring On-call rotation

Metrics to watch during enhanced monitoring:

Support Escalation Procedures

Issue Type First Contact Escalation
User-facing errors Customer support → Engineering On-call engineer
Performance degradation On-call engineer Tech lead + Eng manager
Data issues On-call engineer DB owner + Engineering lead
Security concern Security contact → CISO Immediate escalation

Performance Baseline Comparison

Compare post-launch metrics to pre-launch staging baseline:

Metric Staging Baseline Production Actual Delta Status
P95 latency {{STG_P95}}ms TBD TBD TBD
Error rate {{STG_ERR}}% TBD TBD TBD
Throughput {{STG_RPS}} rps TBD TBD TBD

5. Rollback Triggers & Procedure

Rollback Decision Criteria

Automatic rollback triggers:

Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):

Rollback Procedure (Quick Reference)

  1. Announce in war room: "Initiating rollback"
  2. Update status page: "We are investigating an issue and may revert recent changes"
  3. Run: bash scripts/rollback.sh production (or trigger CI pipeline rollback)
  4. Monitor health checks — confirm previous version healthy
  5. If DB migration included: run down migration bash scripts/migrate-down.sh production
  6. Verify all smoke tests pass on previous version
  7. Update status page: "Issue resolved, system restored"
  8. Notify stakeholders

Full rollback procedure: See rollback-plan.md


6. Communication Plan

Pre-Launch Communications

Audience Channel When Message
Internal team Slack #launches T-3 days Launch schedule and plan
Customer support Briefing doc + Slack T-2 days Features, FAQ, escalation path
Existing users Email / in-app banner T-1 day "Exciting updates coming"
Status page subscribers Status page T-4 hours Scheduled maintenance notification

Launch Day Communications

Audience Channel When Message
Status page status page T-0 "Scheduled deployment in progress"
Internal Slack #launches At success "🚀 {{PROJECT}} is live!"
Users Email / in-app H+1 after success Launch announcement
Status page status page H+1 "Deployment complete — all systems normal"

7. Stakeholder Notification Timeline

Milestone Notify Channel Owner
Deployment started Engineering team Slack war room {{IC}}
Smoke tests pass Engineering + Product Slack {{IC}}
Go-live declared All stakeholders Email + Slack {{COMMS_LEAD}}
Rollback initiated All stakeholders + Management Immediate call + Slack {{IC}}


Approval

Role Name Date Signature
Author
Reviewer
Approver

Operational Runbook

Operational Runbook

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

1. Service Overview

Service: {{PROJECT_NAME}} Purpose: {{SERVICE_PURPOSE}} Technology stack: {{STACK}} Architecture reference: Deployment Architecture

Service URLs:

Environment URL Health Check
Production {{PROD_URL}} {{PROD_URL}}/health
Staging {{STG_URL}} {{STG_URL}}/health

Key dashboards:


2. Common Operational Tasks

2.1 Service Restart Procedure

When to use: Application unresponsive, hanging workers, suspected deadlock

Steps:

Option A — Rolling restart (no downtime):

# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --force-new-deployment

# Kubernetes
kubectl rollout restart deployment/{{DEPLOYMENT}} -n {{NAMESPACE}}

Option B — Emergency restart (brief downtime, use only if rolling restart fails):

# Stop all instances
{{STOP_COMMAND}}
# Wait for drain
sleep 30
# Start fresh
{{START_COMMAND}}

Verify:

# Check all instances healthy
{{HEALTH_CHECK_COMMAND}}
# Check for errors post-restart
{{LOG_CHECK_COMMAND}}

Expected restart time: {{RESTART_TIME}} minutes Alert expected: Service restart will trigger deployment alert — acknowledge in PagerDuty


2.2 Log Retrieval & Analysis

Centralized logs: {{LOG_URL}}

Quick log retrieval:

# Last 100 error lines
{{LOG_TOOL}} --filter "level=error" --since "1h" --service {{SERVICE}}

# Logs for a specific user
{{LOG_TOOL}} --filter "user_id={{USER_ID}}" --since "24h"

# Logs for a specific request
{{LOG_TOOL}} --filter "request_id={{REQUEST_ID}}"

# Database slow query logs
{{DB_LOG_COMMAND}}

Log format reference: See Monitoring & Observability


2.3 Database Maintenance

Connection count check:

SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;

Kill idle connections:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < now() - interval '5 minutes'
  AND pid <> pg_backend_pid();

Running queries (detect long-running):

SELECT pid, duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '1 minute'
  AND state != 'idle';

Vacuum / analyze (if table bloat suspected):

VACUUM ANALYZE {{TABLE_NAME}};

Check replication lag:

SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

2.4 Cache Clearing / Warming

Clear all cache (use with caution — may spike DB load):

{{CACHE_FLUSH_COMMAND}}

Clear specific key pattern:

{{CACHE_DELETE_PATTERN_COMMAND}}

Check cache hit rate:

{{CACHE_STATS_COMMAND}}

Warm cache after clearing:

# Run cache warming script
bash scripts/warm-cache.sh {{ENVIRONMENT}}
# Or trigger warming job
{{WARM_CACHE_JOB_COMMAND}}

Expected DB load spike after cache clear: {{CACHE_CLEAR_IMPACT}} minutes of elevated load


2.5 Certificate Renewal

Automated renewal: Configured via {{CERT_TOOL}} (Let's Encrypt / ACM) Auto-renewal trigger: 30 days before expiry

Manual renewal (if auto-renewal fails):

# Check expiry
echo | openssl s_client -connect {{DOMAIN}}:443 2>/dev/null | openssl x509 -noout -dates

# Manual renewal
{{CERT_RENEW_COMMAND}}

# Verify
{{CERT_VERIFY_COMMAND}}

Verify renewal alert is working:


2.6 Scaling Up / Down

Scale up (increase capacity):

# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --desired-count {{COUNT}}

# Kubernetes
kubectl scale deployment/{{DEPLOYMENT}} --replicas={{COUNT}} -n {{NAMESPACE}}

Verify scale-out:

# Check instance count
{{INSTANCE_COUNT_COMMAND}}
# Confirm health
{{HEALTH_CHECK_COMMAND}}

Scale down (reduce capacity — use cautiously):


3. Troubleshooting Playbooks

3.1 High CPU Usage

Symptoms: CPU alert fires, slow responses, possible OOM

  1. Identify the source:
    # Top processes by CPU
    {{CPU_TOP_COMMAND}}
    
  2. Check for: runaway loops, large queries being processed, missing cache causing recalculation
  3. Check for recently deployed code — did CPU spike after a deploy? → Consider rollback
  4. Check queue depth — backed-up job queue causes worker CPU spike
  5. If single instance: restart that instance ({{RESTART_SINGLE_COMMAND}})
  6. If all instances: scale up immediately, then investigate root cause
  7. Escalate if: CPU > {{CPU_ESCALATE}}% for > {{ESCALATE_DURATION}} min after scaling

3.2 Memory Leaks

Symptoms: Slowly increasing memory, eventual OOM kill / restart loop

  1. Check memory trend in monitoring dashboard — linear increase over hours = leak
  2. Identify the leak:
    • Enable heap dump: {{HEAP_DUMP_COMMAND}}
    • Profile with: {{PROFILER}}
  3. Short-term mitigation: Schedule rolling restarts every {{RESTART_INTERVAL}}h
    {{SCHEDULED_RESTART_COMMAND}}
    
  4. Create ticket with heap dump attached — requires developer investigation
  5. Escalate if: Restart cycle < {{MIN_RESTART_INTERVAL}}h (memory fills too fast)

3.3 Slow Database Queries

Symptoms: High P99 latency, DB CPU spike, timeouts in logs

  1. Find slow queries:
    SELECT query, calls, mean_exec_time, max_exec_time
    FROM pg_stat_statements
    ORDER BY mean_exec_time DESC
    LIMIT 20;
    
  2. Check for missing indexes: Look for sequential scans on large tables
  3. Check for blocking queries:
    SELECT blocking.pid, blocking.query, blocked.pid, blocked.query
    FROM pg_stat_activity blocked
    JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
    
  4. Kill blocking query if safe:
    SELECT pg_cancel_backend({{PID}});
    -- If cancel doesn't work:
    SELECT pg_terminate_backend({{PID}});
    
  5. Create ticket — developer must optimize the query

3.4 Service Connectivity Issues

Symptoms: Connectivity errors between services, 502/503 errors

  1. Check health endpoints:
    curl -I {{SERVICE_URL}}/health
    
  2. Check network security groups / firewall rules — was anything changed recently?
  3. Check service discovery — DNS resolving correctly?
    nslookup {{SERVICE_INTERNAL_DNS}}
    
  4. Check if service is running:
    {{SERVICE_STATUS_COMMAND}}
    
  5. Check logs for connection errors:
    {{CONNECTIVITY_LOG_COMMAND}}
    

3.5 High Error Rates

Symptoms: Error rate alert, user complaints, 5xx in logs

  1. Identify error type: {{LOG_ERROR_COMMAND}} — what errors, what services, what endpoints?
  2. Check if correlated with: recent deployment, external service outage, traffic spike
  3. Check external service status pages:
    • {{SERVICE_1}} status: {{STATUS_PAGE_1}}
    • {{SERVICE_2}} status: {{STATUS_PAGE_2}}
  4. If recent deployment: Consider rollback if errors affecting > {{ROLLBACK_ERROR_THRESHOLD}}% of requests
  5. If external service down: Check circuit breaker status, enable fallback
  6. Escalate if: Error rate > {{ESCALATE_ERROR_RATE}}% for > {{ESCALATE_DURATION}} min

3.6 Disk Space Issues

Symptoms: Disk space alert, application errors writing files

  1. Check disk usage:
    df -h
    du -sh /var/log/* | sort -rh | head -10
    
  2. Quick wins:
    # Rotate and compress logs
    logrotate -f /etc/logrotate.conf
    # Clear old Docker images
    docker image prune -a --filter "until=24h"
    # Clear /tmp
    find /tmp -mtime +7 -delete
    
  3. If database disk: Check for table bloat, dead tuples, WAL accumulation
    SELECT pg_size_pretty(pg_database_size('{{DB_NAME}}'));
    
  4. Escalate if: Disk > {{DISK_ESCALATE}}% and cannot free space quickly

4. Health Check Endpoints

Endpoint Method Expected Response What It Checks
{{BASE_URL}}/health GET HTTP 200 {"status":"ok"} Application running
{{BASE_URL}}/health/ready GET HTTP 200 {"status":"ready"} App + DB + Cache connected
{{BASE_URL}}/health/live GET HTTP 200 {"status":"alive"} App process alive
{{BASE_URL}}/health/db GET HTTP 200 {"status":"ok","latency_ms":X} Database reachable
{{BASE_URL}}/health/cache GET HTTP 200 {"status":"ok"} Redis reachable

Health check from load balancer: {{HEALTH_CHECK_PATH}} every {{LB_INTERVAL}}s Unhealthy threshold: {{UNHEALTHY_COUNT}} consecutive failures


5. Alert Response Procedures

Alert Immediate Action Runbook Section
HighErrorRate Check logs, identify error type, assess scope 3.5 High Error Rates
SlowP99 Check DB slow queries, recent deploys 3.3 Slow DB Queries
ServiceDown Restart service, check logs 2.1 Service Restart
HighCPU Scale up, identify source 3.1 High CPU
DiskAlmostFull Clear logs/tmp, escalate if > 90% 3.6 Disk Space
DBReplicationLag Check replication, network, disk on replica DB section
CertificateExpiring Trigger manual renewal 2.5 Certificate Renewal

6. Escalation Matrix

Situation First Contact Escalation Ultimate Escalation
Service down On-call engineer Tech lead Engineering manager
Data loss / corruption On-call + Tech lead CTO CTO
Security incident Security contact CISO CEO
Payment system down On-call + Payment owner Stripe/payment provider support Engineering manager

Emergency contacts:

Role Name Phone Slack
On-call (primary) {{PRIMARY}} {{PHONE}} {{SLACK}}
On-call (backup) {{BACKUP}} {{PHONE}} {{SLACK}}
Tech Lead {{TECH_LEAD}} {{PHONE}} {{SLACK}}
Engineering Manager {{ENG_MGR}} {{PHONE}} {{SLACK}}

7. On-Call Handoff Procedure

Handoff cadence: {{HANDOFF_CADENCE}} Handoff time: {{HANDOFF_TIME}}

Outgoing on-call must document:

Handoff document template: {{HANDOFF_TEMPLATE_LINK}}


8. Maintenance Window Procedure

Maintenance window schedule: {{MAINTENANCE_WINDOW}} (lowest traffic period)

Pre-maintenance:

  1. Announce in Slack #ops: "Maintenance window {{DATE}} {{TIME}}-{{END_TIME}}"
  2. Update status page: "Scheduled maintenance" with details
  3. Notify impacted customers if downtime expected > {{DOWNTIME_NOTIFY_THRESHOLD}} minutes
  4. Confirm rollback plan is ready

During maintenance:

  1. Enable maintenance mode (if applicable): {{MAINTENANCE_MODE_CMD}}
  2. Execute maintenance tasks per the specific runbook for the task
  3. Run smoke tests after each major step
  4. Document every action taken with timestamps

Post-maintenance:

  1. Disable maintenance mode: {{DISABLE_MAINTENANCE_CMD}}
  2. Run full smoke test suite
  3. Monitor for 30 minutes
  4. Update status page: "Maintenance complete, all systems normal"
  5. Post-maintenance report in #ops Slack channel


Approval

Role Name Date Signature
Author
Reviewer
Approver

Incident Report

Incident Report

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

1. Incident Metadata

Field Value
Incident ID INC-{{YYYY}}-{{SEQ}}
Severity P{{SEVERITY}}
Status {{STATUS}}
Incident Commander {{IC}}
Technical Lead {{TECH_LEAD}}
Communications Lead {{COMMS_LEAD}}
Declared at {{START_TIME}} {{TIMEZONE}}
Resolved at {{END_TIME}} {{TIMEZONE}}
Total duration {{DURATION}}
Affected service(s) {{SERVICES}}
Environment Production / Staging

2. Executive Summary

{{EXECUTIVE_SUMMARY}}

Example: "On {{DATE}}, a database connection pool exhaustion caused the {{SERVICE}} API to return 503 errors for approximately 47 minutes, affecting {{AFFECTED_COUNT}} users and resulting in an estimated {{REVENUE_IMPACT}} in lost transactions. The root cause was a code change in the v{{VERSION}} deployment that introduced N+1 queries under high load."


3. Detection

Detected by: {{DETECTION_METHOD}} Detected at: {{DETECTION_TIME}} Lag from start to detection: {{DETECTION_LAG}} minutes Detecting system: {{DETECTING_SYSTEM}}

Alerting effectiveness:

Improvements to detection identified:


4. Detailed Timeline

Timezone: All times in {{TIMEZONE}}

Time Event Actor Notes
{{TIME}} {{EVENT_1}} {{ACTOR}}
{{TIME}} {{EVENT_2}} System Alert ID: {{ALERT_ID}}
{{TIME}} {{EVENT_3}} {{ENGINEER}}
{{TIME}} {{EVENT_4}} {{IC}}
{{TIME}} {{EVENT_5}} {{ENGINEER}}
{{TIME}} {{EVENT_6}} {{ENGINEER}}
{{TIME}} {{EVENT_7}} System
{{TIME}} {{EVENT_8}} {{IC}}

5. Impact Assessment

Users Affected

Metric Value
Total users affected {{USER_COUNT}}
% of total user base {{USER_PERCENT}}%
Geography affected {{GEOGRAPHY}}
User tier affected {{USER_TIER}}

Services Affected

Service Impact Type Severity Duration
{{SERVICE_1}} {{IMPACT_TYPE}} {{SEV}} {{DURATION}}
{{SERVICE_2}} {{IMPACT_TYPE}} {{SEV}} {{DURATION}}

Data Impact

Type Assessment
Data loss {{DATA_LOSS}}
Data corruption {{DATA_CORRUPTION}}
Data exposure {{DATA_EXPOSURE}}
Verification method {{VERIFICATION}}

Financial Impact

Category Amount Notes
Lost transactions ${{AMOUNT}} {{TRANSACTION_COUNT}} failed transactions
SLA credits ${{AMOUNT}} Per SLA contract
Operational cost ${{AMOUNT}} Engineering hours to resolve
Total estimated ${{TOTAL}}

SLA Breach Assessment

SLA Metric Target Actual Breach
Uptime {{UPTIME_SLA}}% {{ACTUAL_UPTIME}}% {{BREACH}}
Response time (P99) < {{P99_SLA}}ms {{P99_ACTUAL}}ms {{BREACH}}
MTTR < {{MTTR_SLA}} {{MTTR_ACTUAL}} {{BREACH}}

6. Root Cause Analysis

5 Whys

Why # Question Answer
Why 1 Why did users see errors? {{ANSWER_1}}
Why 2 Why was the API returning 503? {{ANSWER_2}}
Why 3 Why was the connection pool exhausted? {{ANSWER_3}}
Why 4 Why was the N+1 query introduced? {{ANSWER_4}}
Why 5 Why did code review miss it? {{ANSWER_5}}

Root cause: {{ROOT_CAUSE}}

Contributing Factors

  1. {{FACTOR_1}}
  2. {{FACTOR_2}}
  3. {{FACTOR_3}}

Trigger Event

What triggered this specific incident now: {{TRIGGER}}


7. Resolution Steps

Step Time Action Result
1 {{TIME}} {{ACTION_1}} {{RESULT_1}}
2 {{TIME}} {{ACTION_2}} {{RESULT_2}}
3 {{TIME}} {{ACTION_3}} {{RESULT_3}}

Resolution commands (for runbook):

# {{RESOLUTION_DESCRIPTION}}
{{RESOLUTION_COMMAND}}

8. What Went Well

  1. {{WENT_WELL_1}}
  2. {{WENT_WELL_2}}
  3. {{WENT_WELL_3}}

9. What Went Wrong

  1. {{WENT_WRONG_1}}
  2. {{WENT_WRONG_2}}
  3. {{WENT_WRONG_3}}

10. Action Items

# Action Owner Due Date Priority Status
1 {{ACTION_1}} {{OWNER}} {{DUE}} High Open
2 {{ACTION_2}} {{OWNER}} {{DUE}} High Open
3 {{ACTION_3}} {{OWNER}} {{DUE}} Medium Open
4 {{ACTION_4}} {{OWNER}} {{DUE}} High Open
5 {{ACTION_5}} {{OWNER}} {{DUE}} Low Open

11. Lessons Learned

  1. {{LESSON_1}}
  2. {{LESSON_2}}
  3. {{LESSON_3}}

Incident ID Date Similarity Resolved
INC-{{ID}} {{DATE}} {{DESCRIPTION}} Yes / No

13. Communication Log

Time Channel Message Summary Audience Sent By
{{TIME}} Status page "Investigating reports of elevated errors" All users {{SENDER}}
{{TIME}} Status page "Identified root cause, applying fix" All users {{SENDER}}
{{TIME}} Status page "Incident resolved, all systems normal" All users {{SENDER}}
{{TIME}} Email Customer notification for SLA breach Affected customers {{SENDER}}


Approval

Role Name Date Signature
Author
Reviewer
Approver

Post-Mortem

Post-Mortem

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

Blameless Culture Statement

This post-mortem is conducted in a blameless spirit. Our goal is to understand how and why the incident occurred — not to assign fault to individuals. People make the best decisions they can with the information and tools available at the time. When things go wrong, we look for systemic improvements that make the right action easier and the wrong action harder for everyone.


1. Incident Reference & Metadata

Field Value
Incident ID INC-{{YYYY}}-{{SEQ}}
Severity P{{SEVERITY}}
Incident Report INC-{{YYYY}}-{{SEQ}}
Post-Mortem Facilitator {{FACILITATOR}}
Post-Mortem Date {{PM_DATE}}
Attendees {{ATTENDEES}}
Status Draft / In Review / Final

2. Executive Summary

{{EXECUTIVE_SUMMARY}}

Example: "A database index was dropped during a migration on {{DATE}}, causing query performance to degrade by 50× under load. This resulted in a 1h 23min degraded service period affecting {{USERS}} users. We have restored the index, added migration validation tooling, and created safeguards to prevent similar incidents."


3. Impact Summary

Metric Value
Total duration {{DURATION}} (detected at {{DETECTED}}, resolved at {{RESOLVED}})
Users affected {{USER_COUNT}} ({{USER_PERCENT}}% of user base)
Requests affected {{REQUEST_COUNT}} ({{REQUEST_PERCENT}}% error rate during incident)
Estimated revenue impact ${{REVENUE}}
SLA breach {{SLA_BREACH}}
SLA credits owed ${{CREDITS}}

4. Detailed Timeline

timeline
    title Incident Timeline
    {{TIME_1}} : {{EVENT_1}}
    {{TIME_2}} : {{EVENT_2}}
    {{TIME_3}} : {{EVENT_3}}
    {{TIME_4}} : {{EVENT_4}}
    {{TIME_5}} : {{EVENT_5}}
Time Event MTTD/MTTR Marker
{{T1}} {{EVENT}} ← Incident start
{{T2}} {{EVENT}}
{{T3}} {{EVENT}} ← Detection (MTTD = T3 - T1)
{{T4}} {{EVENT}}
{{T5}} {{EVENT}}
{{T6}} {{EVENT}}
{{T7}} {{EVENT}}
{{T8}} {{EVENT}} ← Resolved (MTTR = T8 - T1)

MTTD (Mean Time to Detect): {{MTTD}} minutes MTTR (Mean Time to Resolve): {{MTTR}} minutes


5. Root Cause Analysis

5.1 5 Whys Analysis

Why # Question Answer
Why 1 Why did users experience {{SYMPTOM}}? {{WHY_1}}
Why 2 Why did {{WHY_1_ANSWER}} happen? {{WHY_2}}
Why 3 Why did {{WHY_2_ANSWER}} happen? {{WHY_3}}
Why 4 Why did {{WHY_3_ANSWER}} happen? {{WHY_4}}
Why 5 Why did {{WHY_4_ANSWER}} happen? {{WHY_5}}

Root cause: {{ROOT_CAUSE}}

5.2 Contributing Factors

Factor Type Action Required
{{FACTOR_1}} Technical / Process / Human Yes / No
{{FACTOR_2}} Technical / Process / Human Yes / No
{{FACTOR_3}} Technical / Process / Human Yes / No

5.3 Trigger Event

The specific trigger for this incident: {{TRIGGER}}


6. What Went Well

  1. {{CATEGORY_1}}: {{DESCRIPTION}}
  2. {{CATEGORY_2}}: {{DESCRIPTION}}
  3. {{CATEGORY_3}}: {{DESCRIPTION}}

7. What Went Wrong

  1. {{CATEGORY_1}}: {{DESCRIPTION}}
  2. {{CATEGORY_2}}: {{DESCRIPTION}}
  3. {{CATEGORY_3}}: {{DESCRIPTION}}

8. Where We Got Lucky

  1. {{LUCKY_1}}
  2. {{LUCKY_2}}
  3. {{LUCKY_3}}

9. Action Items

Short-Term Fixes (This Sprint)

# Action Owner Due Priority Ticket
1 {{SHORT_TERM_1}} {{OWNER}} {{DATE}} Critical {{TICKET}}
2 {{SHORT_TERM_2}} {{OWNER}} {{DATE}} High {{TICKET}}
3 {{SHORT_TERM_3}} {{OWNER}} {{DATE}} Medium {{TICKET}}

Long-Term Improvements (Next Quarter)

# Action Owner Due Priority Ticket
1 {{LONG_TERM_1}} {{OWNER}} {{DATE}} High {{TICKET}}
2 {{LONG_TERM_2}} {{OWNER}} {{DATE}} Medium {{TICKET}}

Process Changes

# Change Owner Implementation Date
1 {{PROCESS_1}} {{OWNER}} {{DATE}}
2 {{PROCESS_2}} {{OWNER}} {{DATE}}

10. Follow-Up Tracking

Follow-up review date: {{FOLLOWUP_DATE}} (4 weeks after incident) Follow-up owner: {{FOLLOWUP_OWNER}}

Action Item Expected Completion Verified Complete Effective
{{ACTION_1}} {{DATE}} Yes / No Yes / No / TBD
{{ACTION_2}} {{DATE}}

11. Recurrence Prevention

Before this incident: {{BEFORE_STATE}}

After implementing action items: {{AFTER_STATE}}

Confidence in prevention: {{CONFIDENCE}} / 10 Residual risk: {{RESIDUAL_RISK}}


12. Review & Sign-Off

Post-mortem presented at: {{MEETING}} on {{MEETING_DATE}} Meeting recording: {{RECORDING_LINK}} Meeting notes: {{NOTES_LINK}}



Approval

Role Name Date Signature
Author
Reviewer
Approver

SLA Report

SLA Report

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version Date Author Changes
0.1 {{DATE}} {{AUTHOR}} Initial draft

1. Reporting Period

Field Value
Period {{MONTH}} {{YEAR}}
From {{START_DATE}} 00:00:00 UTC
To {{END_DATE}} 23:59:59 UTC
Report Generated {{REPORT_DATE}}
Generated By {{AUTHOR}}

2. SLA Summary Table

Metric SLA Target Actual Status Notes
Availability (uptime) ≥ {{AVAIL_SLA}}% {{AVAIL_ACTUAL}}% ✅ Pass / ❌ Breach
P95 Response Time ≤ {{P95_SLA}}ms {{P95_ACTUAL}}ms ✅ Pass / ❌ Breach
P99 Response Time ≤ {{P99_SLA}}ms {{P99_ACTUAL}}ms ✅ Pass / ❌ Breach
Error Rate ≤ {{ERR_SLA}}% {{ERR_ACTUAL}}% ✅ Pass / ❌ Breach
MTTR (P1 incidents) ≤ {{MTTR_SLA}} {{MTTR_ACTUAL}} ✅ Pass / ❌ Breach
MTTD (alert detection) ≤ {{MTTD_SLA}} {{MTTD_ACTUAL}} ✅ Pass / ❌ Breach
Scheduled maintenance ≤ {{MAINT_SLA}}h/mo {{MAINT_ACTUAL}}h ✅ Pass / ❌ Breach

Overall SLA compliance this period: {{OVERALL_STATUS}}


3. Availability Report

3.1 Uptime Percentage

Service Total Minutes Downtime Minutes Uptime Minutes Uptime %
{{SERVICE_1}} {{TOTAL_MIN}} {{DOWN_MIN}} {{UP_MIN}} {{UP_PCT}}%
{{SERVICE_2}} {{TOTAL_MIN}} {{DOWN_MIN}} {{UP_MIN}} {{UP_PCT}}%
Aggregate {{AGG_UPTIME}}%

Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.

3.2 Downtime Incidents

Incident ID Start End Duration Service Cause SLA Counted
INC-{{ID}} {{START}} {{END}} {{DURATION}}min {{SERVICE}} {{CAUSE}} Yes / Excluded

Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes

3.3 Maintenance Windows

Date Duration Service Affected Pre-announced Purpose
{{DATE}} {{DURATION}}min {{SERVICE}} Yes ({{DAYS}} days advance notice) {{PURPOSE}}

4. Performance Report

4.1 Response Time

Service / Endpoint P50 P90 P95 P99 Max SLA (P95) Status
Overall {{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌
GET / {{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌
POST /api/{{RESOURCE}} {{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌

4.2 Throughput

Service Avg Requests/sec Peak Requests/sec Peak Time
{{SERVICE_1}} {{AVG_RPS}} {{PEAK_RPS}} {{PEAK_TIME}}

Total requests served this period: {{TOTAL_REQUESTS}}

4.3 Error Rate

Service Total Requests 4xx Errors 5xx Errors Error Rate SLA Status
{{SERVICE_1}} {{TOTAL}} {{4XX}} {{5XX}} {{ERR_RATE}}% ≤ {{ERR_SLA}}% ✅ / ❌

5. Incident Summary

5.1 Incidents by Severity

Severity Count Total Duration Avg MTTR
P1 (Critical) {{P1_COUNT}} {{P1_DURATION}} {{P1_MTTR}}
P2 (High) {{P2_COUNT}} {{P2_DURATION}} {{P2_MTTR}}
P3 (Medium) {{P3_COUNT}} {{P3_DURATION}} {{P3_MTTR}}
P4 (Low) {{P4_COUNT}} {{P4_DURATION}} {{P4_MTTR}}
Total {{TOTAL_COUNT}} {{TOTAL_DURATION}} {{AVG_MTTR}}

5.2 MTTR (Mean Time to Resolve)

Severity SLA Target This Period Last Period Trend
P1 ≤ {{P1_MTTR_SLA}} {{P1_MTTR_ACT}} {{P1_MTTR_PREV}} ↑ / ↓ / →
P2 ≤ {{P2_MTTR_SLA}} {{P2_MTTR_ACT}} {{P2_MTTR_PREV}} ↑ / ↓ / →

5.3 MTTD (Mean Time to Detect)

Period MTTD vs SLA Trend
This period {{MTTD_ACT}} {{MTTD_STATUS}} ↑ / ↓ / →
Last period {{MTTD_PREV}}

6. SLA Breach Analysis

{{#if SLA_BREACH}}

Breach Details

Breach # Metric SLA Actual Duration Customers Affected
1 {{METRIC}} {{SLA_TARGET}} {{ACTUAL}} {{BREACH_DURATION}} {{CUSTOMERS}}

Root Cause

{{BREACH_ROOT_CAUSE}}

Remediation

{{BREACH_REMEDIATION}}

Contractual Obligations

Customer Contract Reference Credit Due Notification Required Notification Sent
{{CUSTOMER}} {{CONTRACT_REF}} ${{CREDIT}} Yes {{DATE}}

{{else}}

No SLA breaches this period. All commitments met.

{{/if}}


7. Trend Analysis

Availability Trend (Last 6 Months)

Month Uptime % vs Target Incidents
{{MONTH_6}} {{PCT}}% {{STATUS}} {{COUNT}}
{{MONTH_5}} {{PCT}}% {{STATUS}} {{COUNT}}
{{MONTH_4}} {{PCT}}% {{STATUS}} {{COUNT}}
{{MONTH_3}} {{PCT}}% {{STATUS}} {{COUNT}}
{{MONTH_2}} {{PCT}}% {{STATUS}} {{COUNT}}
{{MONTH_1}} (This period) {{PCT}}% {{STATUS}} {{COUNT}}

P95 Latency Trend (Last 6 Months)

Month P95 (ms) vs SLA
{{MONTH_6}} {{P95}}ms ✅ / ❌
{{MONTH_5}} {{P95}}ms ✅ / ❌
{{MONTH_4}} {{P95}}ms ✅ / ❌
{{MONTH_3}} {{P95}}ms ✅ / ❌
{{MONTH_2}} {{P95}}ms ✅ / ❌
{{MONTH_1}} (This period) {{P95}}ms ✅ / ❌

8. Improvement Initiatives

Initiative Source Owner Target Date Status Expected Impact
{{INITIATIVE_1}} Post-mortem INC-{{ID}} {{OWNER}} {{DATE}} {{STATUS}} +{{IMPACT}}% availability
{{INITIATIVE_2}} Proactive {{OWNER}} {{DATE}} {{STATUS}} P99 < {{P99}} ms
{{INITIATIVE_3}} Customer feedback {{OWNER}} {{DATE}} {{STATUS}} Reduce MTTR by 30%

9. Customer Communication Summary

Date Type Recipients Subject Sent By
{{DATE}} Incident notification All customers {{SUBJECT}} {{SENDER}}
{{DATE}} SLA credit notice Affected customers {{SUBJECT}} {{SENDER}}
{{DATE}} Monthly SLA report Enterprise customers {{SUBJECT}} {{SENDER}}

10. Next Period Targets

Metric This Period Next Period Target Rationale
Availability {{AVAIL_ACT}}% {{AVAIL_NEXT}}% {{RATIONALE}}
P95 latency {{P95_ACT}}ms {{P95_NEXT}}ms {{RATIONALE}}
Error rate {{ERR_ACT}}% {{ERR_NEXT}}% {{RATIONALE}}
MTTR (P1) {{MTTR_ACT}} {{MTTR_NEXT}} {{RATIONALE}}


Approval

Role Name Date Signature
Author
Reviewer
Approver

Terminal & Tmux Shortcuts

Terminal & Tmux Shortcuts

Brzi pregled prečica za svakodnevni rad u terminalu i tmux-u.


Tmux — Panel Navigacija

Prefix: Ctrl+A (naš custom config)

Prečica Opis
Ctrl+Ao Prebaci na sljedeći panel (kruži redom)
Ctrl+A Prebaci na panel u tom smjeru
Ctrl+Aq + broj Pokaže brojeve panela, pritisni broj za skok
Ctrl+Az Zoom (fullscreen) trenutni panel (ponovi za undo)
Ctrl+Ax Zatvori trenutni panel
Ctrl+A% Podijeli panel vertikalno (lijevo/desno)
Ctrl+A" Podijeli panel horizontalno (gore/dole)

Tmux — Window Navigacija

Prečica Opis
Ctrl+An Sljedeći window
Ctrl+Ap Prethodni window
Ctrl+A0-9 Direktno na window po broju
Ctrl+Ac Kreiraj novi window
Ctrl+A, Preimenuj trenutni window
Ctrl+Aw Lista svih windowa (interaktivni izbor)

Tmux — Session Management

Prečica Opis
Ctrl+Ad Detach iz sesije (sesija ostaje živa)
Ctrl+As Lista sesija (prebaci se)
Ctrl+A$ Preimenuj sesiju
tmux ls Lista svih sesija iz terminala
tmux a -t <ime> Attach na sesiju
tmux new -s <ime> Nova sesija

Tmux — Copy Mode (Scroll)

Prečica Opis
Ctrl+A[ Uđi u copy/scroll mode
q Izađi iz copy mode-a
ili PgUp PgDn Skrolaj
Space → selektuj → Enter Kopiraj tekst

Terminal — Readline Prečice

Prečica Opis
Ctrl+A Skok na početak linije
Ctrl+E Skok na kraj linije
Ctrl+K Obriši od kursora do kraja
Ctrl+U Obriši od kursora do početka
Ctrl+W Obriši riječ unazad
Ctrl+R Pretraži historiju komandi
Ctrl+L Očisti ekran
Ctrl+C Prekini trenutnu komandu
Ctrl+D Izlaz (EOF)

Claude Code — Prečice

Prečica Opis
Enter Pošalji poruku
Shift+Tab Accept edits
Esc Cancel / Interrupt
Ctrl+O Expand/collapse tool output
/help Pomoć
/clear Očisti kontekst

Tip: Na Studio serveru tmux prefix je Ctrl+A (ne default Ctrl+B). Konfig: ~/.tmux.conf

Baikal CalDAV Runbook

Service: Baikal CalDAV

Label: Docker container baikal + LaunchAgent com.john.calendar-bridge Tier: P2 (Business) Port: 5232 (local), calendar.basicconsulting.no (public via Cloudflare)

What It Does

Self-hosted CalDAV server for ALAI Business calendar. Alem syncs from iPhone/MacBook via native Calendar app. calendar-bridge.js daemon scans emails every 5min, detects meeting invites, forwards to alem@alai.no, and creates CalDAV events.

Architecture

Email (john@) → email-agent.js → calendar-bridge.js → Baikal CalDAV → Alem iPhone/Mac
                                       ↓
                               mail-native.js forward → alem@alai.no

Components

Component Location Type
Baikal server ~/system/services/baikal/docker-compose.yml Docker
calendar-bridge.js ~/system/tools/calendar-bridge.js Tool + Daemon
LaunchAgent ~/Library/LaunchAgents/com.john.calendar-bridge.plist Daemon (5min)
Cloudflare tunnel calendar.basicconsulting.no → localhost:5232 Tunnel
Credentials Vaultwarden → "Baikal CalDAV" Vault
Calendar "ALAI Business" (CalDAV user: alem) CalDAV
Data ~/system/services/baikal/data/ Persistent volume

Dependencies

Health Check

# Quick check
node ~/system/tools/calendar-bridge.js test

# Docker container
docker ps --filter name=baikal

# CalDAV endpoint
curl -s -o /dev/null -w "%{http_code}" http://localhost:5232/dav.php/

# Public URL (expect 401 = auth required = healthy)
curl -s -o /dev/null -w "%{http_code}" https://calendar.basicconsulting.no/dav.php/

# List events
node ~/system/tools/calendar-bridge.js list

Common Failures & Fixes

Failure 1: Baikal container down

Symptoms: calendar-bridge.js test fails, CalDAV 502/connection refused Fix:

cd ~/system/services/baikal && docker compose up -d

Failure 2: Cloudflare tunnel not routing

Symptoms: Public URL returns 404 or timeout, local URL works fine Fix:

# Check config includes calendar entry
grep calendar ~/.cloudflared/config.yml
# Restart tunnel
launchctl kickstart -k gui/$(id -u)/com.john.cloudflared

Failure 3: Calendar-bridge scan finds nothing

Symptoms: Meeting invites arrive but no events created, no forwards Check:

# Check daemon is running
launchctl list | grep calendar-bridge
# Check logs
tail -50 ~/system/logs/calendar-bridge.log
# Check state file
cat ~/system/logs/calendar-bridge-state.json
# Manual scan with verbose
node ~/system/tools/calendar-bridge.js scan --verbose

Failure 4: Alem can't sync from iPhone

Symptoms: iPhone Calendar shows error, events not showing Check:

  1. Verify credentials in Vault: node ~/system/tools/vault.js get "Baikal CalDAV"
  2. Test public CalDAV endpoint (should return 401, not 502/404)
  3. iPhone settings: Server = calendar.basicconsulting.no/dav.php/principals/alem

Failure 5: Authentication failure

Symptoms: 401 with correct password Fix: Password might be out of sync. Re-hash in Baikal DB:

NEW_PASS=$(bw get password "Baikal CalDAV" --session $(cat /tmp/bw-session))
DIGEST=$(printf "alem:BaikalDAV:$NEW_PASS" | md5)
docker exec baikal sqlite3 /var/www/baikal/Specific/db/db.sqlite \
  "UPDATE users SET digesta1='$DIGEST' WHERE username='alem';"

Restart Procedure

# Restart Baikal
cd ~/system/services/baikal && docker compose restart

# Restart calendar-bridge daemon
launchctl kickstart -k gui/$(id -u)/com.john.calendar-bridge

Backup

MC Task

Created: #3029 (Deploy), #3035 (Documentation + Watchdog)

ALAI Infrastructure Map & Ops Runbooks

ALAI Infrastructure Map & Ops Runbooks

Last updated: 2026-03-12 | Author: John (AI Director)

1. Infrastructure Overview

Azure VM — vm-alai-support

PropertyValue
IP4.223.110.181
RegionSweden Central
SizeStandard_B2als_v2 (2 vCPU, 4GB RAM)
OSUbuntu 22.04 LTS
SSHssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
Resource Grouprg-alai-support
Cost~$35/mo (Founders Hub credits, expires 2026-11-15)
Compose/opt/alai/docker-compose.yml

ANVIL — Mac Studio M3 Max (Local)

PropertyValue
RoleAI inference, product dev, agent orchestration
ServicesOllama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed
TunnelCloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc

2. Services on Azure VM (16 containers)

ServiceURLContainer
BookStack (Wiki)docs.basicconsulting.noalai-bookstack-1
Documenso (e-Sign)sign.basicconsulting.noalai-documenso-1
Planka (Boards)boards.basicconsulting.noalai-planka-1
Vaultwardenvault.basicconsulting.noalai-vaultwarden-1
Baikal (CalDAV)calendar.basicconsulting.noalai-baikal-1
Grafanagrafana.basicconsulting.noalai-grafana-1
Prometheusprometheus.basicconsulting.noalai-prometheus-1
Paperless-ngxarchive.basicconsulting.noalai-paperless-1
Caddy (TLS proxy)alai-caddy-1

3. ANVIL Daemons

DaemonLaunchAgentScript
Pi-Orchestratorcom.john.pi-orchestrator~/system/kernel/pi-orchestrator.js
Telegram Agentcom.john.telegram-agent~/system/tools/telegram-agent.js
Email Agentcom.john.email-agent~/system/daemons/email-agent.js
Vault Keepercom.john.vault-keeper~/system/daemons/vault-keeper.js
Event Dispatchercom.john.event-dispatcher~/system/daemons/event-dispatcher.js
Tool-Shedcom.john.tool-shed~/system/tools/tool-shed.js (:3050)

4. DNS — Cloudflare

Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0

SubdomainTargetProxy
docs, sign, boards, vault, calendar, grafana, prometheus, archive4.223.110.181 (Azure VM)Orange cloud
lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vncCloudflare Tunnel (ANVIL)Orange cloud

5. Runbooks

5.1 Azure VM Full Restart

az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps  # verify 16 containers

5.2 Single Service Recovery

ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50

5.3 TLS Certificate Issues

Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.

5.4 ANVIL Daemon Recovery

launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log

5.5 Database Backup

docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql

5.6 Pi-Orchestrator Not Processing

curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10

5.7 Email Agent Not Fetching

export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log

5.8 SSH IP Update

az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
  -n AllowSSH --source-address-prefixes "NEW_IP"

6. Security

7. Monthly Cost

ItemCost
Azure VM (B2als_v2)~$35/mo
CloudflareFree
Total~$36/mo (Azure Founders Hub credits until Nov 2026)

ALAI Infrastructure Map & Ops Runbooks

ALAI Infrastructure Map & Ops Runbooks

Last updated: 2026-03-12 | Author: John (AI Director)

1. Infrastructure Overview

Azure VM — vm-alai-support

PropertyValue
IP4.223.110.181
RegionSweden Central
SizeStandard_B2als_v2 (2 vCPU, 4GB RAM)
OSUbuntu 22.04 LTS
SSHssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
Resource Grouprg-alai-support
Cost~$35/mo (Founders Hub credits, expires 2026-11-15)
Compose/opt/alai/docker-compose.yml

ANVIL — Mac Studio M3 Max (Local)

PropertyValue
RoleAI inference, product dev, agent orchestration
ServicesOllama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed
TunnelCloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc

2. Services on Azure VM (16 containers)

ServiceURLContainer
BookStack (Wiki)docs.basicconsulting.noalai-bookstack-1
Documenso (e-Sign)sign.basicconsulting.noalai-documenso-1
Planka (Boards)boards.basicconsulting.noalai-planka-1
Vaultwardenvault.basicconsulting.noalai-vaultwarden-1
Baikal (CalDAV)calendar.basicconsulting.noalai-baikal-1
Grafanagrafana.basicconsulting.noalai-grafana-1
Prometheusprometheus.basicconsulting.noalai-prometheus-1
Paperless-ngxarchive.basicconsulting.noalai-paperless-1
Caddy (TLS proxy)alai-caddy-1

3. ANVIL Daemons

DaemonLaunchAgentScript
Pi-Orchestratorcom.john.pi-orchestrator~/system/kernel/pi-orchestrator.js
Telegram Agentcom.john.telegram-agent~/system/tools/telegram-agent.js
Email Agentcom.john.email-agent~/system/daemons/email-agent.js
Vault Keepercom.john.vault-keeper~/system/daemons/vault-keeper.js
Event Dispatchercom.john.event-dispatcher~/system/daemons/event-dispatcher.js
Tool-Shedcom.john.tool-shed~/system/tools/tool-shed.js (:3050)

4. DNS — Cloudflare

Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0

SubdomainTargetProxy
docs, sign, boards, vault, calendar, grafana, prometheus, archive4.223.110.181 (Azure VM)Orange cloud
lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vncCloudflare Tunnel (ANVIL)Orange cloud

5. Runbooks

5.1 Azure VM Full Restart

az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps  # verify 16 containers

5.2 Single Service Recovery

ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50

5.3 TLS Certificate Issues

Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.

5.4 ANVIL Daemon Recovery

launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log

5.5 Database Backup

docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql

5.6 Pi-Orchestrator Not Processing

curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10

5.7 Email Agent Not Fetching

export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log

5.8 SSH IP Update

az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
  -n AllowSSH --source-address-prefixes "NEW_IP"

6. Security

7. Monthly Cost

ItemCost
Azure VM (B2als_v2)~$35/mo
CloudflareFree
Total~$36/mo (Azure Founders Hub credits until Nov 2026)

System Map — Infrastructure & Services

ALAI System Map

Ažurirano: 2026-03-16
Autor: John (AI Director, AI-first OS)


☁️ Azure VM — Supporting Services (Production)

VM: vm-alai-support | Azure Founders Hub | Sweden Central
Specs: Standard_B2als_v2 — 2 vCPU / 4GB RAM / 30GB SSD | IP: 4.223.110.181
Compose: /opt/alai/docker-compose.yml

SSH port 22 je zatvoren/firewall'd — pristup samo kroz Caddy/Cloudflare

Servis URL Status
BookStack (wiki/docs) https://docs.alai.no
Vaultwarden (passwords) https://vault.basicconsulting.no
Documenso (e-sign) https://sign.basicconsulting.no
Grafana (monitoring) https://grafana.basicconsulting.no
Planka (kanban) https://boards.basicconsulting.no
Baikal (CalDAV) https://cal.basicconsulting.no ❌ down
Prometheus (interno, bez javnog URL-a) ?
Caddy (reverse proxy za sve gore)

🖥️ ANVIL (MacBook Pro M3 Max) — Lokalni Dev

Docker containers (dev baze za produkte)

Container Port Projekt
lumiscare-postgres 5432 Lumiscare
lumiscare-redis 6379 Lumiscare
plock-db 5434 Plock
plock-redis 6380 Plock
backend-postgres 5435 (shared backend)
backend-redis 6381 (shared backend)
bilko-postgres 5436 Bilko
bilko-redis 6382 Bilko
drop-postgres 5433 Drop
lobby-postgres 5437 Lobby
qdrant 6333-6334 RAG vector search
sonarqube 9000 Code quality
bookstack (lokalno) 6875 ⚠️ Dev/sync kopija, prod=Azure
bookstack_db 3306 (bookstack lokalni DB)

⚠️ Ovo su DEV baze — production servisi su na Azure ili u cloud providerima

Lokalni servisi (ne Docker)

Servis Port Detalji
Ollama ANVIL 11434 10 modela (qwen2.5-coder:32b, llama3.1:8b, llama-guard...)
N8N 5678 Workflow automation (lokalni, via LaunchAgent)
MC Dashboard (interno) Mission Control web UI
Caddy Vault (interno) Secret proxy
Tender Dashboard (interno) Anbud-tracking UI
Tool Shed (interno) Tool registry API

Ollama Modeli

Host Modeli Najveći
ANVIL (localhost:11434) 10 qwen2.5-coder:32b (23GB), llama-guard3:8b
FORGE (10.0.0.2:11434) 5 deepseek-r1:70b (42GB), qwen3:32b (20GB)

⚙️ Aktivni LaunchAgent Daemoni (~33)

ALAI Kernel

agent-timeout-monitor · idle-learning-daemon · ram-monitor · task-router

John's Agents

browser-worker · caddy-vault · cloudflared · comms-agent · documenso-webhook · draft-sender · email-tracker · event-dispatcher · hook-daemon · intake-watcher · mc-dashboard · n8n · network-watchdog · ops-watchdog · outbox-processor · pi-orchestrator · pipeline-watcher · slack-bot · telegram-agent · tender-dashboard · tool-shed · vault-keeper · vault-proxy

Produkt Monitoring

drop.health-check


🗄️ Aktivne SQLite Baze (~54) — ~/system/databases/

Baza Namjena
mission-control.db (10MB) Svi MC taskovi (3847 done, 36 open)
hivemind.db (52MB) Intel, knowledge, sessions, events
knowledge.db (187MB) RAG knowledge base
flywheel.db (36MB) RAG cache
events.db (11MB) Event bus log
guardrails-audit.db (9.6MB) AI safety audit
bee-index.db (3.4MB) Code/file index
tenders.db (184KB) Anbud/tender tracker
leads.db (224KB) CRM leads
contacts.db (96KB) CRM kontakti
hivemind-archive.db (5.9MB) HiveMind arhiva
email-inbox.db (164KB) Email inbox
drafts.db (292KB) Email draftovi
routing-outcomes.db (64KB) AI routing metrike
tool-audit.db (900KB) Tool usage audit
bih-tenders.db (284KB) BiH tender scraper
strategy-tracker.db (128KB) Strategija/OKR
teams.db (40KB) Timovi
projects.db (40KB) Projekti
pipeline.db (56KB) Sales pipeline
sprint-pipeline.db (32KB) Sprint tracker
goals.db (44KB) Ciljevi
invoices.db (36KB) Fakture
baikal-caldav.db (108KB) Kalendar (CalDAV backup)
+ još ~30 manjih baza contacts, emails, tickets, vcr, distill...

🌐 Eksterni Servisi

Servis Namjena
Anthropic API Claude (claude-3-5-sonnet, claude-opus)
Fiken Regnskap, fakture, lønn (NO)
Cloudflare DNS, Tunnel, DDoS zaštita
Slack (basicconsulting) Interna komunikacija
Telegram Notifikacije, bot
Dropbox File sync
one.com Email hosting (SMTP/IMAP)
GitHub Code repos
Azure Founders Hub VM hosting

🔧 Tools & Scripts — ~/system/tools/


📁 Ključni Direktorijumi

~/system/
  tools/          ← 1,310 JS/SH skripti
  databases/      ← ~54 aktivnih SQLite baza
  config/         ← json konfiguracije, daemon registry
  agents/         ← hivemind, agent definicije
  notes/          ← ovaj fajl i drugi notesi
  backups/        ← dnevni backup svake baze
  services/       ← docker-compose po servisu

~/ALAI/
  products/       ← Drop, Bilko, Plock, Gotiva, Lobby, Lumiscare...
  internal/       ← configs, tools, docs
  legal/          ← ugovori, compliance, templates

🚦 Mission Control Status (2026-03-16)

Status Broj
✅ done 3,847
⏸️ paused 664
🔴 blocked 120
🔵 open 36

ALAI Domain Migration — basicconsulting.no → alai.no

ALAI Domain Migration — basicconsulting.no → alai.no

Context

ALAI rebrand did not include support stack migration. 11 subdomains remain on legacy basicconsulting.no domain.

Current Live State (by Zone)

basicconsulting.no (Cloudflare zone 4670dbd0acfeab4174ac0d4746d11ea0)

alai.no (Cloudflare zone 3dc40d9c37fee79c4281f7e86870c0b5)

snowit.ba (AWS Route53 zone Z04121493CAJZ75TQUPIW)

Cloudflare Tunnel Config

Incident: sign.basicconsulting.no 404 (2026-04-18)

Symptom: DNS resolved to Cloudflare proxy but returned 404.

Root cause: Tunnel ingress had route sign.basicconsulting.no → localhost:3003 but cloudflared could not reach backend.

Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied).

Result: Documenso Sign In page now live.

Alem TODO

See Also

Created: 2026-04-19 | Source: ~/ALAI/products/Bilko/docs/runbooks/alai-support-stack-migration.md

AWS CLI Setup — john-deploy IAM

AWS CLI Setup — john-deploy IAM

Credentials Location

~/.aws/credentials
[default]
aws_access_key_id = AKIAUXDEHCNUHFX472XL
aws_secret_access_key = (stored in Vault: "AWS CLI - john-deploy IAM")

IAM User Details

Permissions

Known permissions (unverified full list):

Validated Usage

Usage Pattern

# Export credentials as env vars
export AWS_ACCESS_KEY_ID=AKIAUXDEHCNUHFX472XL
export AWS_SECRET_ACCESS_KEY="(from Vault)"
export AWS_DEFAULT_REGION=eu-central-1

# Example: Route53 change
aws route53 change-resource-record-sets \
  --hosted-zone-id Z04121493CAJZ75TQUPIW \
  --change-batch file://change-batch.json

MCP Docker AWS Tool

Tool: mcp__MCP_DOCKER__call_aws

Note: This tool has its own config and uses environment variables. May not share the same credentials as CLI.

Security Notes

See Also

Created: 2026-04-19 | Validated: 2026-04-14 + 2026-04-19

Slack alaiops Bot — Backend Architecture

Slack alaiops Bot — Backend Architecture

Basic Info

Tokens Location

  1. Primary: macOS Keychain
    • slack-bot/slack-bot-token
    • slack-bot/slack-app-token
  2. Fallback 1: Bitwarden/Vault
  3. Fallback 2: Environment variables

Daemon

Backend Chain (via comms-responder.js)

Priority-based fallback system (lower number = higher priority, faster response):

  1. Groq (priority 5, ~100-500ms) — PRIMARY
    • Model: llama-3.1-8b-instant
    • Added: 2026-04-18
    • Requires: GROQ_API_KEY env var
    • Adapter: ~/system/tools/adapters/groq.js
  2. Claude API (priority 10, ~2s)
  3. Claude CLI (priority 20, ~20s)
  4. Ollama (priority 30, ~40s) — FALLBACK ONLY

Groq Adapter

// Registered in ~/system/tools/adapters/index.js
const groq = require("./groq.js");

// Usage
const response = await groq.send("prompt", {
  model: "llama-3.1-8b-instant",
  temperature: 0.7,
  max_tokens: 512
});

Event Subscriptions

Status: Re-enabled 2026-04-18 after scope fix

Critical fix: Bot NO LONGER requires admin scopes (caused "Enterprise only" error). Removed admin scopes from User token, kept 15 bot scopes.

Active bot scopes (15):

Dead Pattern Warning

If bot stops responding, check logs first:

tail -100 ~/system/logs/slack-bot.log

Benign pattern (ignore): "Dedup: skipping" — message already processed

Error patterns (investigate):

Test Commands

# Send test message
node ~/system/tools/slack.js send general "Test from John"

# Read channel history
node ~/system/tools/slack.js read general 10

# Check bot status
pgrep -f slack-bot.js && echo "Running" || echo "Stopped"

See Also

Created: 2026-04-19 | Last updated: 2026-04-18 (Groq backend added)

Documenso Self-Hosted — sign.basicconsulting.no

Documenso Self-Hosted — sign.basicconsulting.no

Service Details

Admin Credentials

API Integration

Test cURL

curl -H "Authorization: api_xn907c9xczrteoba" \
  https://sign.basicconsulting.no/api/v1/documents

# Expected response:
{"documents":[],"totalPages":0}

Bilko Sign Integration

Documenso is used as the signing backend for Bilko (accounting SaaS).

GCP Secret Manager

bilko-api Environment Variables

DOCUMENSO_API_URL=https://sign.basicconsulting.no
DOCUMENSO_API_KEY=(from GCP Secret Manager)

Incident History

2026-04-18: 404 Error

Symptom: sign.basicconsulting.no returned 404 Not Found

Root cause: Cloudflare Tunnel ingress had route to localhost:3003 but cloudflared could not reach backend

Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied)

Result: Documenso Sign In page now live

Maintenance

Backup API Tokens

Version Updates

# Check current version
curl -s https://sign.basicconsulting.no/api/health | jq .version

# Update (on Azure VM)
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /path/to/documenso
docker-compose pull
docker-compose up -d

Future Migration

Target: sign.alai.no (part of ALAI domain migration)

See Also

Created: 2026-04-19 | API token created: 2026-04-19 | Incident fixed: 2026-04-18

Azure Blob Offsite Backup Setup

Azure Blob Offsite Backup Setup

Overview

Purpose: Offsite backup for ALAI system databases and git bundles
Region: North Europe (Dublin) — geographic separation from primary Sweden Central VM
Retention: 365 days with lifecycle policies (Hot → Cool → Archive → Delete)
Recovery Time Objective: 4 hours (manual restore)

Azure Resources

Resource TypeNamePurpose
Resource Groupalai-backups-rgIsolation boundary for backup storage
Storage Accountalaibackups0ebbBlob storage (LRS, Standard tier)
Containersystem-db-backupsSQLite databases (hivemind.db, mission-control.db, etc.)
Containersystem-git-bundlesGit repository bundles
Service Principalalai-backup-writerScoped write-only access (Storage Blob Data Contributor)

Service Principal Setup

# Create service principal
az ad sp create-for-rbac --name alai-backup-writer --skip-assignment

# Assign Storage Blob Data Contributor to SA only (not subscription)
STORAGE_ID=$(az storage account show --name alaibackups0ebb --query id -o tsv)
az role assignment create \
  --assignee <service-principal-app-id> \
  --role "Storage Blob Data Contributor" \
  --scope "$STORAGE_ID"

# Store credentials in ~/system/config/azure-backup.env
cat > ~/system/config/azure-backup.env <

Lifecycle Policy

Hot → Cool: 30 days
Cool → Archive: 90 days
Archive → Delete: 365 days
Delete blobs: Last modified > 365 days

az storage account management-policy create \
  --account-name alaibackups0ebb \
  --policy @lifecycle-policy.json

lifecycle-policy.json:

{
  "rules": [
    {
      "enabled": true,
      "name": "archive-old-backups",
      "type": "Lifecycle",
      "definition": {
        "actions": {
          "baseBlob": {
            "tierToCool": {"daysAfterModificationGreaterThan": 30},
            "tierToArchive": {"daysAfterModificationGreaterThan": 90},
            "delete": {"daysAfterModificationGreaterThan": 365}
          }
        },
        "filters": {"blobTypes": ["blockBlob"]}
      }
    }
  ]
}

Backup Scripts

LightRAG to Azure Blob

#!/bin/bash
# ~/system/tools/migrate-lightrag-to-azure.sh

source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_FILE="/tmp/lightrag-backup-$TIMESTAMP.tar.gz"

tar -czf "$BACKUP_FILE" ~/system/lightrag/
az storage blob upload \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "lightrag-$TIMESTAMP.tar.gz" \
  --file "$BACKUP_FILE" \
  --auth-mode login

rm "$BACKUP_FILE"

Ollama Models Export

#!/bin/bash
# ~/system/tools/ollama-models-export.sh --azure

source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
EXPORT_DIR="/tmp/ollama-export-$TIMESTAMP"

mkdir -p "$EXPORT_DIR"
ollama list | tail -n +2 | awk '{print $1}' > "$EXPORT_DIR/model-list.txt"

while read -r model; do
  ollama show "$model" --modelfile > "$EXPORT_DIR/$model.modelfile"
done < "$EXPORT_DIR/model-list.txt"

tar -czf "$EXPORT_DIR.tar.gz" "$EXPORT_DIR"
az storage blob upload \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "ollama-models-$TIMESTAMP.tar.gz" \
  --file "$EXPORT_DIR.tar.gz"

rm -rf "$EXPORT_DIR" "$EXPORT_DIR.tar.gz"

Disaster Recovery Path

  1. List available backups:
az storage blob list \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --output table
  1. Download latest backup:
az storage blob download \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "lightrag-20260420-143000.tar.gz" \
  --file /tmp/restore-lightrag.tar.gz
  1. Verify SHA-256 checksum:
shasum -a 256 /tmp/restore-lightrag.tar.gz
  1. Restore to target system:
tar -xzf /tmp/restore-lightrag.tar.gz -C ~/system/

Monitoring

node ~/system/agents/hivemind/hivemind.js post john alert \
  "Azure backup failed 2 consecutive runs — check ~/system/logs/azure-backup.log"

ANVIL Memory Troubleshooting — Mac Studio

ANVIL Memory Troubleshooting — Mac Studio (M2 Ultra 192GB)

Incident Summary

Date: 2026-04-20
Symptom: System freezes, Chrome/Claude unresponsive, OOM kernel panics
Root Cause: Zombie Ollama runner processes + duplicate launchd agents + runaway grep processes
Resolution: Ollama config tuning, duplicate agent removal, zombie cleanup daemon, Ollama 0.21.0 upgrade

Root Causes

  1. Ollama zombie runners: ollama ps reports 0 models loaded, but pgrep -fl ollama_llama_server shows 4-6 GB processes still resident
  2. Duplicate launchd agents: Both com.alai.ollama-serve.plist and com.alai.ollama-serve-v2.plist running simultaneously → 2x Ollama daemons
  3. grep memory leak: grep -rn commands on large codebases hang and consume 8+ GB RAM each
  4. Preload warmup bloat: com.john.ollama-warmup.plist loading 3 models on boot → 48 GB baseline before any work

Permanent Fix — Ollama Config

File: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.alai.ollama-serve-v2</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/ollama</string>
    <string>serve</string>
  </array>
  <key>EnvironmentVariables</key>
  <dict>
    <key>OLLAMA_HOST</key>
    <string>0.0.0.0:11434</string>
    <key>OLLAMA_KEEP_ALIVE</key>
    <string>60s</string>
    <key>OLLAMA_MAX_LOADED_MODELS</key>
    <string>1</string>
    <key>OLLAMA_NUM_PARALLEL</key>
    <string>1</string>
  </dict>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/ollama-serve.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/ollama-serve-error.log</string>
</dict>
</plist>

Key parameters:

Zombie Cleanup Daemon

File: ~/Library/LaunchAgents/com.alai.zombie-cleanup.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.alai.zombie-cleanup</string>
  <key>ProgramArguments</key>
  <array>
    <string>/bin/bash</string>
    <string>/Users/makinja/system/tools/zombie-proc-cleanup.sh</string>
  </array>
  <key>StartInterval</key>
  <integer>3600</integer>
  <key>StandardOutPath</key>
  <string>/tmp/zombie-cleanup.log</string>
</dict>
</plist>

Script: ~/system/tools/zombie-proc-cleanup.sh

#!/bin/bash
# Kill zombie Ollama runners (no parent process or disconnected from ollama serve)
pgrep -fl ollama_llama_server | while read -r pid rest; do
  parent=$(ps -o ppid= -p "$pid" | xargs)
  if [[ -z "$parent" ]] || ! ps -p "$parent" | grep -q ollama; then
    echo "$(date): Killing zombie Ollama runner $pid"
    kill -9 "$pid"
  fi
done

# Kill grep processes older than 5 minutes (likely hung)
ps -eo pid,etime,command | grep 'grep -rn' | while read -r pid etime rest; do
  minutes=$(echo "$etime" | awk -F: '{print ($1*60)+$2}')
  if [[ "$minutes" -gt 5 ]]; then
    echo "$(date): Killing hung grep process $pid (runtime: $etime)"
    kill -9 "$pid"
  fi
done

Disabled Agents

launchctl unload ~/Library/LaunchAgents/com.alai.ollama-serve.plist
launchctl unload ~/Library/LaunchAgents/com.john.ollama-warmup.plist
rm ~/Library/LaunchAgents/com.alai.ollama-serve.plist
rm ~/Library/LaunchAgents/com.john.ollama-warmup.plist

Ollama Upgrade

brew upgrade ollama  # 0.19.0 → 0.21.0
# Changelog: Fixed memory leak in runner cleanup (issue #4821)

OOM Symptom Recognition

Command:

vm_stat | awk '/Pages free/ {printf "%.1f GB\n", $3*16384/1024/1024/1024}'

Thresholds:

Quick triage:

ps aux | sort -nrk 4 | head -10  # Top 10 memory hogs
pgrep -fl ollama_llama_server    # Zombie Ollama runners
pgrep -fl grep                    # Hung grep processes

Prevention Checklist

  1. Monitor free RAM hourly: vm_stat check in cron
  2. Zombie cleanup daemon running: launchctl list | grep zombie-cleanup
  3. Only one Ollama launchd agent: launchctl list | grep ollama → expect 1 line
  4. No warmup preload agents: launchctl list | grep warmup → empty
  5. Grep with timeout: timeout 60 grep -rn ... instead of bare grep -rn

Email Pipeline + Edita PA — Runbook

Email Pipeline + Edita PA — Runbook

MC: #8521 | Related: #8466 (OWN classifier fix) | Date: 2026-04-20


Overview

The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).

Architecture


OWN Classifier Logic

The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.

Constants (email-agent.js lines 118-123)

const OWN_SYSTEM_PREFIXES = [
  'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
  'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
  'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];

isOwnSystemEmail() Function (lines 446-456)

Two-tier check:

  1. Exact match: OWN_ADDRESSES array (hardcoded machine addresses)
  2. Prefix + domain: Any prefix in OWN_SYSTEM_PREFIXES on domains in OWN_SYSTEM_DOMAINS

Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.


TLDR_SKIP Routing

Newsletters from dan@tldrnewsletter.com do NOT create MC tasks. They are handled exclusively by tldr-briefing.js daemon.

// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';

// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
  return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}

VIP Ordering

Classification priority (lines 464-481):

  1. VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
  2. TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
  3. OWN: System emails → archive, no task
  4. Other: Spam allowlist check → Ollama classification

Edita PA Phases

Phase 0: --dry-run (Log-Only)

Classification + logging only. No archive, no escalate, no respond.

node ~/system/daemons/email-agent.js --dry-run

Phase 1: --allow-archive (CURRENT)

Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).

node ~/system/daemons/email-agent.js --allow-archive

Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).

Phase 2: Full Live (NOT YET APPROVED)

Archive + escalate + respond. Requires CEO explicit approval.

node ~/system/daemons/email-agent.js --allow-all

Unit Testing

Test classifier without IMAP/Vault dependencies:

node ~/system/daemons/test-email-classifier.js

Scenarios (16 total):


Rollback

Revert to dry-run mode:

launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist

# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh

launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist

Monitoring


Generated by Skillforge | ALAI, 2026


Contact Form Handlers

This section documents all contact forms across ALAI properties and their email delivery mechanisms.

alai.no Contact Form

Test procedure:

curl -X POST https://alai.no/api/contact \
  -H "Content-Type: application/json" \
  -d '{"name": "Test User", "email": "test@example.com", "message": "E2E test 2026-04-21 14:00"}'

# Verify inbox:
himalaya search --account info-alai --folder INBOX "subject:Contact Form"

snowit.ba Contact Form

getdrop.no Waitlist

Test procedure:

wrangler d1 execute drop-waitlist --command "SELECT * FROM submissions ORDER BY created_at DESC LIMIT 5"

merdzanovic.ba Contact Form


Form Handler Migration Checklist

When migrating sites from Vercel/Netlify to Cloudflare Pages:

  1. Inventory: Identify all POST endpoints (forms, webhooks, API routes)
  2. Port handlers: Rewrite Vercel API routes as CF Pages Functions (/functions/*.js)
  3. Environment variables: Copy SMTP/API credentials to CF Pages env vars
  4. Update form actions: Change form targets to new CF Pages routes (e.g., /api/contact)
  5. E2E test: Follow Forms E2E Testing Protocol (HTTP + inbox check MANDATORY)
  6. Monitor: Check inbox/DB for 24 hours post-migration to catch silent failures

Reference incident: 2026-04-21 alai.no Contact Form Failure


Himalaya IMAP Setup (for Form Testing)

Himalaya CLI provides rapid inbox verification without browser login.

Install

brew install himalaya

Configure Account

Add to ~/.config/himalaya/config.toml:

[accounts.info-alai]
default = false
email = "info@alai.no"
display-name = "ALAI Info"

[accounts.info-alai.imap]
host = "imap.one.com"
port = 993
encryption = "tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"

[accounts.info-alai.smtp]
host = "send.one.com"
port = 587
encryption = "start-tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"

Usage

# Unlock Bitwarden first
bw unlock --raw > /tmp/bw-session

# List recent messages
himalaya list --account info-alai --folder INBOX --page-size 20

# Search for form submissions
himalaya search --account info-alai --folder INBOX "from:noreply@alai.no"

# Search by date range
himalaya search --account info-alai --folder INBOX "since:2026-04-21"

Credentials: Bitwarden item "Email - info@alai.no"


Updated: 2026-04-21 | Skillforge

Email Pipeline + Edita PA — Runbook

Email Pipeline + Edita PA — Runbook

MC: #8521 | Related: #8466 (OWN classifier fix) | Date: 2026-04-20


Overview

The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).

Architecture


OWN Classifier Logic

The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.

Constants (email-agent.js lines 118-123)

const OWN_SYSTEM_PREFIXES = [
  'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
  'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
  'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];

isOwnSystemEmail() Function (lines 446-456)

Two-tier check:

  1. Exact match: OWN_ADDRESSES array (hardcoded machine addresses)
  2. Prefix + domain: Any prefix in OWN_SYSTEM_PREFIXES on domains in OWN_SYSTEM_DOMAINS

Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.


TLDR_SKIP Routing

Newsletters from dan@tldrnewsletter.com do NOT create MC tasks. They are handled exclusively by tldr-briefing.js daemon.

// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';

// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
  return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}

VIP Ordering

Classification priority (lines 464-481):

  1. VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
  2. TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
  3. OWN: System emails → archive, no task
  4. Other: Spam allowlist check → Ollama classification

Edita PA Phases

Phase 0: --dry-run (Log-Only)

Classification + logging only. No archive, no escalate, no respond.

node ~/system/daemons/email-agent.js --dry-run

Phase 1: --allow-archive (CURRENT)

Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).

node ~/system/daemons/email-agent.js --allow-archive

Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).

Phase 2: Full Live (NOT YET APPROVED)

Archive + escalate + respond. Requires CEO explicit approval.

node ~/system/daemons/email-agent.js --allow-all

Unit Testing

Test classifier without IMAP/Vault dependencies:

node ~/system/daemons/test-email-classifier.js

Scenarios (16 total):


Rollback

Revert to dry-run mode:

launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist

# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh

launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist

Monitoring


Generated by Skillforge | ALAI, 2026

Ollama Fleet Architecture

Ollama Fleet Architecture

MC: #8522 | Related: #8477 (triage preload), #8471 (vault-keeper watchdog), #8472 (YouTube daemon fix) | Date: 2026-04-20


Overview

ALAI operates a two-node Ollama fleet: ANVIL (local dev Mac) and FORGE (Ubuntu 22.04 GPU workstation). ANVIL handles triage workloads (email, TLDR, quick classification), FORGE handles heavy inference (32B+ models, RAG pipelines).


ANVIL Ollama Configuration

Capacity Limits

LaunchAgent: com.alai.ollama-serve-v2

Label: com.alai.ollama-serve-v2
Plist: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist
Port: 11434
Environment:
  OLLAMA_FLASH_ATTENTION=1
  OLLAMA_KV_CACHE_TYPE=q8_0
  OLLAMA_MAX_LOADED_MODELS=2
  OLLAMA_KEEP_ALIVE=30s

Triage Preload Pattern

MC #8477 — Prevent qwen2.5-coder:32b (23GB) from blocking email/TLDR triage.

Strategy

Preload llama3.1:8b with keep_alive=-1 (indefinite) so it's always resident for fast triage operations. 5GB footprint.

LaunchAgent: com.john.ollama-triage-preload

Label: com.john.ollama-triage-preload
Script: ~/system/tools/ollama-triage-preload.sh
Trigger: RunAtLoad + StartInterval 300s (every 5 min)
Log: ~/system/logs/ollama-triage-preload-stdout.log

Script Logic (ollama-triage-preload.sh)

  1. Check if llama3.1:8b is already loaded via /api/ps
  2. If not loaded, send minimal prompt with keep_alive=-1
  3. Log success/skip
curl -sf -X POST "$OLLAMA_URL/api/generate" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama3.1:8b\",
    \"prompt\": \"ready\",
    \"stream\": false,
    \"keep_alive\": -1,
    \"options\": {
      \"num_predict\": 1
    }
  }"

Model Tier System

Tier Model Size Use Case Keep Alive Node
Triage llama3.1:8b 5GB Email classification, TLDR summarization, quick routing -1 (indefinite) ANVIL
Heavy qwen2.5-coder:32b 23GB Code generation, architecture review, complex reasoning 30s (on-demand) ANVIL
Primary devstral:24b ~15GB Agent orchestration, planning, context routing 300s FORGE

FORGE Failover

Consumers (email-agent.js, tldr-briefing.js, YouTube daemon) can set FORGE_FIRST=0 environment variable to skip FORGE and use ANVIL directly.

# Force ANVIL-only
export FORGE_FIRST=0
node ~/system/daemons/youtube-daemon.js

Default behavior: Try FORGE (10.0.0.2:11434), fallback to ANVIL (localhost:11434) on timeout.


Vault-Keeper Watchdog (MC #8471 — PENDING)

Monitors ~/system/.cache/vault-keeper-heartbeat file. If stale > 1 hour, SENTINEL alerts.

Implementation

LaunchAgent: com.john.vault-keeper-watchdog
Interval: 600s (10 min)
Script: ~/system/daemons/vault-keeper-watchdog.sh
Alert: Slack #sentinel-alerts

Logic

  1. Read heartbeat file timestamp
  2. Compare with current time
  3. If > 3600s, send SENTINEL alert with vault-keeper logs

YouTube Daemon Lesson (MC #8472)

Log redirection corruption: tee + subshell arithmetic capture caused output mangling.

Anti-Pattern

# WRONG — tee inside $() breaks arithmetic
NEW_COUNT=$(node ~/system/daemons/youtube-processor.js | tee -a "$LOG")

Correct Pattern

# RIGHT — separate logging stream
node ~/system/daemons/youtube-processor.js >> "$LOG" 2>&1

LaunchAgent Duplication

Never use both KeepAlive and StartInterval in same plist. StartInterval triggers even if process is still running, causing overlap.

# WRONG
<key>KeepAlive</key>
<true/>
<key>StartInterval</key>
<integer>3600</integer>

# RIGHT (pick one)
<key>StartInterval</key>
<integer>3600</integer>

Fleet Monitoring

ANVIL

curl http://localhost:11434/api/ps
curl http://localhost:11434/api/tags
tail -f ~/system/logs/ollama-triage-preload-stdout.log

FORGE

curl http://10.0.0.2:11434/api/ps
ssh forge "tail -f /var/log/ollama.log"

Mission Control

node ~/system/tools/mc.js list --tag ollama
node ~/system/tools/cost-tracker.js summary --service ollama

Generated by Skillforge | ALAI, 2026

Static Hosting Migration — Progress Log

Static Hosting Migration — Progress Log

MC: #8523 (tracking), #8482 (basicconsulting.no), #8489 (bilko.io) | Date: 2026-04-20


Overview

ALAI is migrating 8 static sites from Vercel/Azure VM to Cloudflare Pages for cost savings (€0 vs €12-14/mo), operational simplification, and DDoS/WAF coverage. See full blueprint at ~/system/specs/ALAI-STATIC-HOSTING-BLUEPRINT.md.


Migration Log

Date Domain From To Downtime TTFB Before TTFB After Notes
2026-04-20 basicconsulting.no Vercel (76.76.21.21) CF Pages ~60s 114ms 51ms (warm avg) MC #8482. DNS: A→CNAME. Validation required domain re-add. TTFB improved 55%. Proveo pilot validated #8490.
2026-04-20 bilko.io one.com (down) CF Pages N/A (site was down) N/A 68ms (warm avg) MC #8489. Apex CNAME not possible on one.com free tier (paid feature). Switched to Cloudflare NS (ana.ns.cloudflare.com, bob.ns.cloudflare.com). CF Pages zone ID: 62d89b79f0648d3fa1d045335a989ea7. DNS: CNAME flattening bilko.io → bilko-io.pages.dev (proxied), www → bilko-io.pages.dev.

Paused Migrations

MC #8483 — basicfakta.no

Reason: Inventory error. Site has serverless functions (Vercel Edge), not pure static. Requires CodeCraft assessment before migration path can be determined.

MC #8484 — snowit.no

Reason: Inventory error. Site has API routes (Next.js), not pure static. Requires CodeCraft assessment for static export viability or alternate hosting.


Audit Verdict: bilko-demo.alai.no (MC #8486)

Decision: Stays on GCP Cloud Run. Not eligible for CF Pages migration.

Reason: Full-stack Next.js app with dynamic API routes and server-side rendering. Static export would break functionality. Current platform (Cloud Run) is correct fit.


Lessons Learned

one.com Apex CNAME Limitation

one.com free tier does NOT support apex CNAME (requires paid plan). For domains registered at one.com, the migration path is:

  1. Switch nameservers to Cloudflare (ana.ns.cloudflare.com, bob.ns.cloudflare.com)
  2. Import DNS records via Cloudflare zone scan
  3. Set up CNAME flattening in Cloudflare (apex → CF Pages project, proxied)

Propagation time: 15 minutes to 4 hours for .no domains.

Inventory Validation Pre-Migration

Before scheduling a migration, verify the site is truly static:

If any of the above exist, the site is NOT static and requires CodeCraft review.

TTFB Improvements

Cloudflare Pages with CDN caching (orange-cloud proxy) delivers 50-60% TTFB improvement over Vercel for static sites. Cold start overhead is negligible (CF edge network vs Vercel edge).


Remaining Migrations

Domain Current Host Status MC Task
alai.no CF Pages ✅ Complete (already on target platform) N/A
basicconsulting.no CF Pages ✅ Complete (2026-04-20) #8482
bilko.io CF Pages ✅ Complete (2026-04-20) #8489
basicfakta.no Vercel ⏸ Paused (serverless functions found) #8483
snowit.no Vercel ⏸ Paused (API routes found) #8484
getdrop.no Azure VM 🔄 Pending (DNS on Vercel, move to CF) #8485
kenyhot.pro Vercel 🔄 Pending (coordinate with client) #8487
merdzanovic.ba Vercel 🔄 Pending (coordinate with client) #8488

DNS Consolidation Status

Domain Registrar Current NS Target NS Status
alai.no one.com Cloudflare Cloudflare ✅ Done
basicconsulting.no one.com Cloudflare Cloudflare ✅ Done
bilko.io one.com Cloudflare Cloudflare ✅ Done (2026-04-20)
getdrop.no one.com Vercel Cloudflare 🔄 Pending
basicfakta.no one.com Vercel Cloudflare 🔄 Pending
snowit.no one.com Unknown Cloudflare 🔄 Pending

Generated by Skillforge | ALAI, 2026

ANVIL DR Bootstrap Runbook (Mac Air)

ANVIL DR Bootstrap Runbook (Mac Air)

When to use

This runbook is for recovering the ALAI AI factory infrastructure when:

SPOF Context: As of 2026-04-20, ANVIL is the single Mac Studio hosting 112 LaunchAgent daemons, 68 SQLite databases (litestream-replicated), Ollama (8 models), and the entire ~/system + ~/.claude infrastructure. This runbook enables recovery to any fresh Mac with admin access.


Prerequisites

Before starting bootstrap, ensure you have:

  1. Fresh Mac with admin account (macOS Sonoma or later, Apple Silicon preferred)
  2. Tailscale app installed + logged into alembasic@ tailnet (download from tailscale.com/download)
  3. GitHub account with read access to:
    • github.com/johnatbasicas/clawd (~/system repo, auto-backup branch)
    • github.com/johnatbasicas/claude-config (~/.claude repo)
  4. Bitwarden account unlocked with master password ready (Alem's personal vault: alembasic@gmail.com)
  5. Internet connection (stable, for 2-3 GB of Homebrew packages + Ollama models)

Step-by-step Bootstrap

Phase 1: Foundation

1. Install Xcode Command Line Tools

xcode-select --install

Expected: GUI dialog appears. Click "Install" and wait 5-10 minutes. Verify with:

xcode-select -p
# Should output: /Library/Developer/CommandLineTools

2. Install Homebrew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Expected: Homebrew installs to /opt/homebrew. Add to shell profile:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

# Verify:
brew --version
# Should show: Homebrew 4.x.x

3. Install Bitwarden CLI + unlock vault

brew install bitwarden-cli

# Unlock vault (enter master password when prompted):
bw login alembasic@gmail.com
export BW_SESSION=$(bw unlock --raw)

# Verify:
bw status | jq .status
# Should show: "unlocked"

Note: Keep this terminal window open. BW_SESSION is needed for bootstrap script.


Phase 2: Clone Infrastructure Repos

4. Clone ~/system (clawd repo)

# If using SSH (recommended if SSH keys already set up):
git clone git@github.com:johnatbasicas/clawd.git ~/system

# OR if using HTTPS with GitHub PAT:
git clone https://github.com/johnatbasicas/clawd.git ~/system

# Switch to auto-backup branch (contains latest portability artifacts):
cd ~/system
git checkout auto-backup
git pull

Expected:

ls ~/system/
# Should show: Brewfile, bootstrap.sh, config/, databases/, tools/, etc.

5. Clone ~/.claude (claude-config repo)

git clone git@github.com:johnatbasicas/claude-config.git ~/.claude

# Verify:
ls ~/.claude/
# Should show: CLAUDE.md, hooks/, agents/, skills/, projects/

Phase 3: Run Bootstrap Script

6. Execute bootstrap (with BW_SESSION active)

cd ~/system
bash bootstrap.sh workstation

Role options:

What the script does:

  1. Re-checks Xcode CLT + Homebrew (idempotent)
  2. Installs ~70 brew packages from Brewfile (15-30 min depending on connection)
  3. Copies 112 LaunchAgent plists from ~/system/config/launchagents/ to ~/Library/LaunchAgents/
  4. Rehydrates BW:<item> placeholders in plists by calling bw get password <item>
  5. Loads all LaunchAgents via launchctl bootstrap
  6. Verifies core services (Ollama, litestream)

Expected output (tail of bootstrap.log):

[bootstrap] Bootstrap COMPLETE. Next steps:
[bootstrap]   - Verify SSH: ssh makinja@100.103.49.98
[bootstrap]   - Check MC: node ~/system/tools/mc.js list
[bootstrap]   - Log: /Users/makinja/bootstrap.log

LaunchAgents loaded: 112
Ollama models available: 8
Litestream: RUNNING

If BW rehydration fails: You'll see warnings like:

WARN: Bitwarden item 'groq-api-key' not found — com.alai.groq-model-benchmark.plist will need manual fix

Fix manually after bootstrap completes (see Troubleshooting section).


Phase 4: Database Restore (if DBs lost/corrupt)

When to run: Only if ~/system/databases/ is empty or you need to restore from Azure backups (e.g., ANVIL disk died).

7. Set Azure auth environment variables

export AZURE_CLIENT_ID="1a0b3018-0c31-474b-918f-531b0a29a669"
export AZURE_CLIENT_SECRET=$(bw get password alai-backup-writer-secret)
export AZURE_TENANT_ID="cd0a7929-1d14-4f81-820d-b36e45f72cf7"

8. Restore P0 critical databases

mkdir -p ~/system/databases

# Mission Control:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/mission-control.db

# HiveMind:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/hivemind.db

# Tasks:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/tasks.db

# Costs:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/costs.db

# Events:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/events.db

9. Restore P0 financial databases

# Fiken (accounting cache):
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/fiken.db

# Invoices:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/invoices.db

# Contracts:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/contracts.db

# Leads:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/leads.db

Note: The -if-replica-exists flag prevents overwriting local DB if it's newer than Azure backup. Omit this flag to force restore.

Bulk restore all 68 DBs (if needed):

for db in mission-control hivemind tasks costs events fiken invoices contracts leads \
          orchestrator-queue orchestrator-workers durable-runner session-index knowledge \
          emails email-inbox alem-directives agent-routing bee-index companies contacts \
          deploy-registry design-reviews distill documents drafts drift email-audit \
          email-briefing email-index email-tracking escalations facts flywheel goals \
          guardrails-audit health-events hivemind-archive master-control mc minions \
          observability orchestrator-events pipeline projects routing-outcomes skill-improvements \
          skill-registry sprint-pipeline strategy-tracker teams tenders tickets tool-audit \
          tool-registry trace-events applications-tracker baikal-caldav prompt-cache \
          prompt-metrics semantic-reuse-index stbs telemetry token-cost usage vcr bih-tenders browser-tasks; do
  echo "Restoring $db..."
  litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/$db.db || echo "WARN: $db restore failed or skipped"
done

Verify restores:

ls -lh ~/system/databases/*.db | wc -l
# Should show: 68 (or close, depending on which DBs had replicas)

# Check specific DB integrity:
sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"
# Should output: ok

Bitwarden Items Required

The following Bitwarden vault items MUST exist in Alem's vault before running bootstrap. These are referenced as BW:<item> placeholders in LaunchAgent plists:

Item NameUsed ByPurpose
alai-backup-writer-secretlitestream, Azure backupsAzure SP client secret for Storage Blob write access
cf-access-client-secretBookStack sync, CF-protected APIsCloudflare Access client secret for docs.basicconsulting.no
groq-api-keyGroq model benchmark daemonGroq API key for LLM model testing
slack-app-tokenSlack integrationSlack app-level token for socket mode
slack-bot-tokenSlack integrationSlack bot user OAuth token (xoxb-...)

How to verify items exist:

bw get item alai-backup-writer-secret --session $BW_SESSION
bw get item cf-access-client-secret --session $BW_SESSION
bw get item groq-api-key --session $BW_SESSION
bw get item slack-app-token --session $BW_SESSION
bw get item slack-bot-token --session $BW_SESSION

If missing: Contact Alem or check Vaultwarden (https://vault.basicconsulting.no) for backup credentials. These secrets are also in ANVIL's Keychain if ANVIL is still accessible.


Post-Bootstrap Verification

10. Check LaunchAgents loaded

launchctl list | grep -E "com\.(alai|john)\." | wc -l
# Expected: ~110-112 (depending on role)

11. Verify Ollama running

curl -s http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected (ANVIL): qwen2.5-coder:32b, llama3.3, deepseek-r1, etc.

12. Verify litestream replication

pgrep -fl litestream
# Should show: litestream replicate -config /Users/makinja/system/config/litestream.yml

# Check logs:
tail -f ~/system/logs/litestream.log
# Should show periodic sync messages (every 1-30s depending on DB tier)

13. Test Mission Control

node ~/system/tools/mc.js stats
# Should show task counts, agents, recent activity

node ~/system/tools/mc.js list --limit 5
# Should show recent tasks

14. Test SSH to original ANVIL (if still alive)

ssh makinja@100.103.49.98 "hostname && uptime"
# Expected: ANVIL + uptime if machine is reachable

Troubleshooting

Error: "brew: command not found" after install

Cause: Homebrew not in PATH.

Fix:

eval "$(/opt/homebrew/bin/brew shellenv)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile

Error: "bw: command not found"

Cause: Bitwarden CLI not installed or not in PATH.

Fix:

brew install bitwarden-cli
hash -r  # Refresh shell PATH cache

LaunchAgent fails to load

Symptoms: launchctl bootstrap returns error code 119, 122, or 125.

Debug:

# Check specific agent status:
launchctl print gui/$(id -u)/com.alai.litestream
# Look for "state = waiting" or "last exit code"

# Check agent logs:
tail -f ~/system/logs/litestream.log
tail -f ~/Library/Logs/com.alai.*.log

Common exit codes:

Secret rehydration failed

Symptoms: Bootstrap log shows "WARN: Bitwarden item 'X' not found".

Fix manually:

# Get secret from Bitwarden:
SECRET=$(bw get password groq-api-key --session $BW_SESSION)

# Edit plist:
vi ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist

# Replace BW:groq-api-key with actual value in <string> tag

# Reload:
launchctl bootout gui/$(id -u)/com.alai.groq-model-benchmark
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist

Hardcoded /Users/makinja path mismatch

Cause: LaunchAgent plists contain hardcoded paths to /Users/makinja, but new Mac has different username (e.g., /Users/alem).

Fix (bulk replace):

NEW_USER=$(whoami)
cd ~/Library/LaunchAgents

for plist in com.alai.*.plist com.john.*.plist; do
  sed -i.bak "s|/Users/makinja|/Users/$NEW_USER|g" "$plist"
done

# Reload all:
launchctl bootout gui/$(id -u)
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/*.plist

Ollama models missing

Cause: Fresh install has no models cached. Models are NOT in git repos (too large).

Fix (pull from Ollama registry):

ollama pull qwen2.5-coder:32b
ollama pull llama3.3:70b
ollama pull deepseek-r1:32b
ollama pull deepseek-r1:70b
ollama pull devstral:24b
ollama pull mistral-small
ollama pull llama3.2-vision:90b
ollama pull qwq:32b

# Verify:
ollama list

Expected download size: ~150 GB total for all models. This takes 2-6 hours on good connection.

Database restore fails with "replica not found"

Cause: Azure credentials invalid, or DB was never replicated (new DB created after litestream setup).

Debug:

# Test Azure auth:
az login --service-principal \
  --username $AZURE_CLIENT_ID \
  --password $AZURE_CLIENT_SECRET \
  --tenant $AZURE_TENANT_ID

# List backups:
litestream snapshots -config ~/system/config/litestream.yml ~/system/databases/mission-control.db

# Should show timestamps of snapshots in Azure Blob Storage

If no snapshots: DB is new or replication was broken. Accept data loss or restore from other source (e.g., Time Machine if on ANVIL).


Known Limitations


Testing Recommendations

Before trusting this runbook in a real disaster:

  1. Spin up a fresh Mac VM (UTM or Parallels) with macOS Sonoma
  2. Run through Steps 1-6 end-to-end without looking at ANVIL
  3. Verify LaunchAgent load count matches expected (~112)
  4. Verify DB restore works for at least mission-control.db and hivemind.db
  5. Document any new errors or missing secrets in this runbook

Assigned to: Petter Graff (CodeCraft) — MC task #8534


Last updated: 2026-04-20 | MC Task: #8534 | Tags: status=draft-untested, type=runbook, severity=critical

Incident — 2026-04-21 alai.no Contact Form Failure

2026-04-21 — alai.no Contact Form Silent Failure

Incident Classification

Severity: HIGH — Silent data loss (potential lead loss)
Duration: 2026-04-19 19:00 → 2026-04-21 11:30 (40.5 hours)
Detection: Manual inspection via Himalaya IMAP client
Status: RESOLVED (form handler redeployed to CF Pages Functions)

Timeline

Impact Assessment

Root Cause Analysis

Technical Chain of Failure

  1. alai.no contact form POSTs to https://api.basicconsulting.no/contact (hardcoded Vercel pattern from pre-migration code)
  2. Cloudflare Tunnel ingress rule matches api.basicconsulting.no/* → routes ALL POST requests to localhost:3001
  3. documenso-webhook.js listens on port 3001, designed for Documenso signature events
  4. Webhook handler has catch-all route: app.post('/*', (req, res) => res.json({ok: true}))
  5. Contact form receives HTTP 200 + {ok: true} → assumes success, displays "Message sent"
  6. No email handler ever invoked → no SMTP call → no delivery

Root Cause Categories

Detection Method

Manual IMAP inspection using Himalaya CLI:

himalaya search --account info@alai.no --folder INBOX "from:noreply" "since:2026-04-19"
# Result: No messages found

Lesson: HTTP 200 is NOT proof of delivery. Always verify end-to-end (inbox check, log inspection, user confirmation email).

Fix Summary

  1. CodeCraft deployed /functions/contact.js as CF Pages Function
  2. Handler uses Resend API (RESEND_API_KEY in Bitwarden → CF Pages env vars)
  3. Form target updated to https://alai.no/api/contact (CF Pages Functions route: /functions//api/)
  4. Proveo validated: submit test form → received at info@alai.no within 5 seconds

MC Task: #8587

Lessons Learned

What Went Well

What Went Wrong

Prevention Actions

Action Owner MC Task Status
Update site migration checklist: "Verify form handlers migrated" Skillforge #8587 DONE (this doc)
Create Forms E2E Testing Protocol (HTTP + inbox check required) Skillforge #8587 DONE (BookStack QA section)
Add Grafana alert: info@alai.no message rate < 1/week → notify #ops FlowForge #8588 OPEN
Audit all CF Tunnel ingress rules for overly-broad /* patterns Securion #8589 OPEN
Migrate snowit.ba contact form (same silent failure risk) CodeCraft #8591 OPEN
Add form submission logging to all contact handlers (track volume even if email fails) CodeCraft #8592 OPEN

References


Authored: 2026-04-21 | Owner: Skillforge | Reviewed: John

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Date: 2026-04-22
Severity: High (CEO time wasted + security leak)
Status: Resolved
Type: Blameless Postmortem

Summary

A 2-hour bug fix sprint (MC tasks #8626, #8627, #8628) aimed at fixing 3 bugs in Bilko demo resulted in ZERO live changes reaching the production demo URL (bilko-demo.alai.no). All code changes were pushed to the wrong branch (feat/intesa-bih-demo instead of main), CI pipeline was silently broken for 7 days, and client-specific content (Intesa BiH pitch) leaked to the public demo URL.

Timeline (UTC+1)

Time Event Actor
2026-04-21 13:32 MC #8626 created (invoice template save button broken) John
2026-04-21 13:33 MC #8627 created (invoice PDF download fails on unsaved invoice) John
2026-04-21 13:33 MC #8628 created (settings logo upload missing) John
2026-04-21 13:46 All 3 tasks marked ready_for_review (commit d408cc6 + 53fe1d6) Brad Frost (Vizu)
2026-04-22 09:00 CEO: "Bilko demo nije updatan, bugs jos uvijek tu" Alem
2026-04-22 09:10 Discovery: All fixes pushed to feat/intesa-bih-demo (no CI on that branch) John
2026-04-22 09:15 Verification via curl + git log: main unchanged, bilko-demo.alai.no serving old code John
2026-04-22 09:36 MC #8678 created: /intesa-bridge leak discovered (HTTP 200 on public demo) John
2026-04-22 10:00 CI investigation: Last 5 runs all failed (since 2026-04-15) Kelsey (FlowForge)
2026-04-22 10:36 MC #8696 created: ZAKON PI2 Deploy Verification Protocol John
2026-04-22 12:00 Manual deploy attempt: GitHub PAT missing workflow scope (can't trigger CI fix) FlowForge
2026-04-22 12:50 Manual docker build + push (CEO hands off to FlowForge) Alem + FlowForge
2026-04-22 21:41 MC #8730 done: fix-bugs-22apr deployed, all 4 evidence checks pass FlowForge
2026-04-22 21:50 MC #8678 code fix pushed (66d2220): intesa routes deleted from main Brad Frost

Impact

User-Facing

Internal

Root Causes (5 Failures)

1. Branch Assumption (No Pre-Flight Verification)

What happened: John inferred target branch from memory (assumed feat/intesa-bih-demo based on last session), dispatched builder without running curl -sI + git log to verify which branch serves bilko-demo.alai.no.

Why it matters: Wrong branch = wrong deploy target. All fixes landed on isolated feature branch with no CI and no domain mapping.

Prevention: ZAKON PI2 Check 2 — 4 pre-flight commands mandatory BEFORE code changes.

2. CI Broken for 7 Days Undetected

What happened: GitHub Actions workflow failing since 2026-04-15. No one noticed because:

Root cause:

  1. GitHub Actions quota exhausted (monthly minutes limit)
  2. --no-traffic flag on line 206 of gcp-deploy.yml prevents traffic promotion on existing services

Prevention: ZAKON PI2 Check 4 — gh run list --limit 5 before any push. If 5/5 = failure, STOP and fix CI first.

3. Intesa Content Leaked to Public URL

What happened: Commit 13c2efb merged /intesa-bridge and /intesa-cockpit routes to main branch. These were pitch-specific features for Dženana Hardaga (Intesa BiH IT director) and should have remained isolated on bilko-intesa-demo Cloud Run service.

Why it matters: Client-specific content (including BiH bank integration mockups) publicly visible on generic demo. Potential NDA violation + confusing UX for non-Intesa visitors.

Prevention:

4. PAT Missing workflow Scope

What happened: GitHub Personal Access Token used for CI fixes lacked workflow scope. FlowForge couldn't push branch-purity.yml or fix gcp-deploy.yml via automation.

Why it matters: Blocked automated CI repair. Forced manual workarounds + CEO paste-copy anti-pattern.

Prevention: ZAKON PI2 Check 6 — gh auth status --show-token at session start. Verify repo, workflow, packages:write scopes present.

5. Manual Paste-Copy Anti-Pattern

What happened: CEO built docker image locally, pasted output to John, who pasted to FlowForge agent. FlowForge took over from "image already built" state instead of owning full build→push→deploy flow.

Why it matters: Process fragmentation = more failure points. Agent can't verify build context, dockerfile, or .dockerignore changes if it didn't run the build.

Prevention: Always dispatch FlowForge BEFORE build step. Agent owns entire flow or none of it.

What Went Well

Action Items

Action Owner MC Task Deadline Status
Sync ZAKON PI2 to BookStack pi-orchestrator #8718 2026-04-23 PAUSED
Create DEPLOY-MAP.md in Bilko repo Skillforge #8715 2026-04-23 DONE
Bake PI2 checks into pi-orchestrator v2 pi-orchestrator #8696 (item 3) 2026-04-29 IN PROGRESS
Add pre-deploy hook (~/.claude/hooks/pre-deploy-check.sh) pi-orchestrator #8696 (item 4) 2026-04-29 DONE
Patch mc.js done with evidence gate for H-priority deploy tasks pi-orchestrator #8696 (item 5) 2026-04-29 DONE
Create client-prefix-registry.md pi-orchestrator #8696 (item 7) 2026-04-29 DONE
Fix GitHub Actions quota (upgrade plan or optimize workflows) John TBD 2026-05-01 OPEN
Remove --no-traffic flag from gcp-deploy.yml for existing services FlowForge TBD 2026-04-30 OPEN
Upgrade GitHub PAT with workflow scope John TBD 2026-04-25 OPEN
Weekly CEO audit of mc.js --ceo-override usage John #8696 (item 8) Ongoing OPEN

Lessons Learned

For John (Orchestrator)

For Builder Agents (Brad Frost, Vizu)

For FlowForge (DevOps)

System-Level

Metrics

Follow-Up

Next review: 2026-04-29 (PI2 implementation deadline)
Owner: John
Success criteria: All 8 items in MC #8696 marked done + CI health green for 7 consecutive days


Postmortem by ALAI Skillforge, 2026-04-22
Credit: ALAI, 2026

pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile

pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile

Created: 2026-05-02
MC References: #10063 (phantom fix), #10517 (true fix)
Daemon: com.john.pi-orchestrator (currently STOPPED, reactivation pending CEO Step 3)


Symptom

John's H-priority tasks were being auto-paused without user action. The pi-orchestrator daemon would intercept high-priority john tasks and route them through queueForHuman instead of executing them, creating a silent work-stoppage pattern.


Investigation Finding — Phantom Fix in MC #10063

MC #10063 (2026-04-XX) claimed to fix the auto-pause behavior by adding configuration flags:

Problem: These config keys were specified in the task's acceptance criteria and marked COMPLETE, but were never actually written to ~/system/config/pi-orchestrator-config.json.

Anti-pattern identified: "Proveo PASS but code doesn't match documentation" — the validation passed based on spec intent rather than verifying actual configuration state.


True Root Cause

The mechanism actually auto-pausing john H-tasks was a dead fallback block in ~/system/kernel/pi-orchestrator.js:

// Original lines 3409-3421 (13 lines, now removed)
if (!selectedTask) {
  // Fallback: check for john tasks
  const johnTask = execSync(
    'node ~/system/tools/mc.js next-task --owner john',
    { encoding: 'utf8' }
  ).trim();
  
  if (johnTask) {
    queueForHuman(johnTask);
    return null;
  }
}

When task selection failed (empty queue or filter mismatch), this fallback would:

  1. Synchronously fetch the next john task via mc.js next-task --owner john
  2. Queue it for human review via queueForHuman()
  3. Return null, preventing execution

This created the observed auto-pause behavior regardless of the missing config flags.


Fix Applied — MC #10517

Date: 2026-05-02
Builder: Codecraft
Validator: Proveo

Changes:

  1. Configuration reconciliation — Added missing flags to ~/system/config/pi-orchestrator-config.json at lines 93-94:
    "skip_interactive_owners": ["john", "alem"],
    "interactive_grace_seconds": 300
    
  2. Dead fallback removal — Replaced the 13-line execSync fallback block in ~/system/kernel/pi-orchestrator.js (original lines 3409-3421) with a 4-line comment + null return:
    // No fallback to john tasks — auto-pause removed per MC #10517.
    // Configuration now controls interactive routing via skip_interactive_owners.
    log('No task selected; returning null.');
    return null;
    

Verification

Proveo validation: APPROVED 2026-05-02
Acceptance Criteria: 4/4 PASS

Evidence:


Daemon State

Current state: com.john.pi-orchestrator is STOPPED (unloaded via launchctl unload).

Reactivation: Pending CEO Step 3 directive. DO NOT restart daemon until explicitly approved — this is part of a phased rollout to validate the fix does not introduce regression.

To check status:

launchctl list | grep pi-orchestrator
# Empty output = daemon not loaded

To restart (when authorized):

launchctl load ~/Library/LaunchAgents/com.john.pi-orchestrator.plist
tail -f ~/system/logs/pi-orchestrator.log

Cross-References


Lessons

  1. Proveo must verify actual state, not spec intent. A config flag in the task description ≠ the flag exists in the file.
  2. Dead code can be the true mechanism. The "fix" in #10063 was irrelevant because the real culprit was a fallback block that ran regardless of config.
  3. Daemon restart ≠ verification. Stopping the daemon masked the symptom but didn't prove the fix. Reactivation under observation is the true test.

Generated by Skillforge for MC #10517 documentation deliverable. HiveMind sync pending.

Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)

Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)

Context

Problem: Pi-orchestrator was auto-generating GOTCHA docs at two sites, bypassing ZAKON #25 quality gate (H/BLOCKER → /prompt-forge → /mehanik). Pi-orch is NOT the authority for /prompt-forge work.

The Two Sites Removed

Site 1: Pre-Spawn Auto-Gen (Step 4.55)

Site 2: Post-Spawn Synthesis

Replacement Behavior

GOTCHA Missing Pre-Spawn

GOTCHA Missing Post-Spawn

Status Note

mc.js does NOT have awaiting_forge as first-class status — used blocked with reason-prefixed text. Future enhancement: add awaiting_forge status (track in separate MC if scope warrants).

Current State

Test Plan

Change Genesis

Cross-Reference

Last updated: 2026-05-04 | Part of pi-orch hardening Talas 3

Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887

Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887

TL;DR

Email-agent.js silently dropped SEEN-flagged messages for 9+ days (2026-05-14 → 2026-05-23) due to HIMALAYA_DISABLED=1 forcing a fallback code path that filtered { seen: false }. This caused 17 missed messages across 5 accounts, including 2 paying-client-class emails (Asmir Merdžanović SEO work, cynthia.li medical contact). Fixed by replacing SEEN filter with date-range + DB dedup. Backfilled all missed messages, added audit tool, deployed hourly monitoring LaunchAgent.

Incident Timeline (UTC)

Root Cause

File: /Users/makinja/system/daemons/email-agent.js

Original code (lines 638-644, pre-fix): The fetchUnseenLegacy function used { seen: false } as its IMAP fetch filter, which translates to an IMAP SEARCH UNSEEN query. Any message already flagged \Seen on the server (e.g., by mobile client, webmail, or Outlook auto-marking) was invisible to this query.

const messages = client.fetch(
  { seen: false },  // ← PROBLEM: excludes SEEN messages
  { uid: true, envelope: true }
);

Trigger chain:

  1. LaunchAgent plist /Users/makinja/Library/LaunchAgents/com.john.email-agent.plist sets HIMALAYA_DISABLED=1 as hard environment variable
  2. This forces all accounts to fall back to fetchUnseenLegacy instead of the safer fetchAllRecent path (which was introduced in MC #6832 to solve exactly this class of problem)
  3. When alem@alai.no is also accessed via mobile/web client, incoming messages are auto-flagged \Seen before daemon's next 5-minute cycle
  4. Daemon runs every 5 minutes, sees 0 unseen, logs "alai: 0 unseen envelopes fetched", and continues — no alarm, no visibility

Why it went undetected: The daemon logs showed normal execution (no errors, no timeouts), just consistently 0 results for the alai account. The pattern looked like "no new email" rather than "email silently dropped."

Fixed code (lines 638-684, post-fix): Replaced { seen: false } with date-range filter { since: } + DB deduplication by UID set lookup:

// MC #101887 fix: SEEN filter caused 9-day gap. Switched to date-range + DB dedup.
const lookbackDays = parseInt(process.env.EMAIL_AGENT_LOOKBACK_DAYS || '7', 10);
const sinceDate = new Date(Date.now() - lookbackDays * 24 * 60 * 60 * 1000);

// Load existing UIDs for this account from DB to enable dedup
const db = emailInbox.getDb();
const existingUids = new Set(
  db.prepare("SELECT message_id FROM emails WHERE account = ?").all(boxLabel).map(r => {
    const m = r.message_id.match(/-uid-(\d+)$/);
    return m ? parseInt(m[1], 10) : null;
  }).filter(Boolean)
);

// Fetch envelopes only — date-range avoids SEEN-flag blind spot
const messages = client.fetch(
  { since: sinceDate },  // ← FIX: fetch all messages in date range
  { uid: true, envelope: true }
);

for await (const msg of messages) {
  // Dedup: skip if UID already in DB
  if (existingUids.has(msg.uid)) continue;
  // ... insert logic
}

Impact Assessment

Fix Applied

  1. Code fix: ~/system/daemons/email-agent.js lines 638-725 — replaced { seen: false } with { since: } + DB dedup via UID set lookup (idempotent, safe for overlapping runs)
  2. Backfill: 17 missed messages ingested via ~/system/tools/email-backfill-from-audit.js — used audit JSON as source of truth, patched subject/from metadata in 14 cases where IMAP envelope fetch failed (tool is idempotent, safe to re-run)
  3. New audit tool: ~/system/tools/email-imap-db-audit.js — enumerates IMAP UIDs vs DB UIDs per account+folder for configurable N-day window, outputs JSON diff with missed UID samples
  4. Monitoring LaunchAgent: ~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist + wrapper ~/system/tools/email-ingest-monitor.sh — runs hourly, executes audit tool, fires Slack #exec alarm when total_missed > 0

Remaining Open Items (NOT yet fixed)

Reproduction / Detection Commands

# Detect the gap
node ~/system/tools/email-imap-db-audit.js
cat /tmp/alai/email-ingest-gap/imap-db-diff-30d.json | jq .summary

# Trigger monitor manually
launchctl kickstart -k gui/$(id -u)/com.alai.email-ingest-monitor

# Re-run backfill (idempotent)
node ~/system/tools/email-backfill-from-audit.js

# Check daemon status
launchctl list | grep email
tail -100 ~/system/logs/email-agent.log

# Test audit in verbose mode
node ~/system/tools/email-imap-db-audit.js --verbose

Lessons / Preventive Actions

Technical Details

Missed Messages Breakdown (30-day window, all accounts)

Account Folder Missed Count Sample UIDs Notes
alai INBOX 1 6 Asmir email re: SEO clients
dev INBOX 3 4, 7, 11 Google Cloud Logging alerts
john INBOX 13 61, 69, 71, 72, 79, 80, 82, 83, 88, 99, 102, 114, 134 Mix: GitHub, TLDR, Cloudflare, cynthia.li, Asmir
info INBOX 0 No new IMAP messages in window
alem INBOX N/A IMAP connection broken, cannot audit

Backfill Execution Summary

Monitor Configuration

LaunchAgent: com.alai.email-ingest-monitor

Sign-off

Documented by: Skillforge (ALAI agent)

Date: 2026-05-23

MC Task: #101887 ST6

Status: Fix deployed, backfill complete, monitoring deployed (pending manual load)

ALAI Mail Topology — Migadu Domains, Mailbox Inventory, John's 19-Account Ingest Loop (2026-06-08)

ALAI Mail Topology & John's Email Ingest Loop

Last updated: 2026-06-08 (v2 — 19 accounts, daemon-path docs, himalaya touch-points)  |  MC: #103182  |  Built by: FlowForge  |  Validated by: Proveo (Angie Jones) — PASS


1. Mail Infrastructure — Migadu (Single Account)

All ALAI product domains are hosted on one Migadu account. MX records for every domain point to the same two servers:

Domains on this account: alai.no, bilko.io, bilko.cloud, bilko.company, snowit.ba, basicconsulting.no, basicfakta.no, lumiscare.com

Migadu Admin Access

ItemValue / Location
Admin loginalem@alai.no
API keyVaultwarden item "migadu keyy" (86-char token — do NOT print)
IMAP hostimap.migadu.com
SMTP hostsmtp.migadu.com
Web UIhttps://admin.migadu.com

Migadu API Quirks (DO NOT FORGET)


2. Real Mailbox Inventory

These are the real mailboxes that exist in Migadu (verified 2026-06-08 via admin API). Only real mailboxes can be used as alias destinations.

DomainReal mailboxes (local parts)
alai.nojohn, alem, dev, post, admin
bilko.ioadmin, sales, privacy
bilko.cloudadmin, sales
bilko.companyadmin, sales
snowit.baadmin, info, asmir, enis
basicconsulting.nojohn, info
lumiscare.comhello, admin

Note: basicfakta.no is on this Migadu account but has no actively polled mailboxes in John's loop.

Note: lumiscare.com is ALAI's Migadu domain (our infrastructure). It is distinct from caresafetyinnovations.com, which remains a hard-stop boundary (see Section 6).


3. John's Email Ingest — All 19 Monitored Accounts

John's email ingest is managed by ~/system/tools/email-inbox.js and polled by ~/system/daemons/email-agent.js. As of MC #103182 final state (2026-06-08), 19 accounts are registered in email-inbox.db → email_accounts.

Original 6 Accounts (pre-MC #103182)

Account name (DB key)Email addressVault item
johnjohn@basicconsulting.noexisting
infoinfo@basicconsulting.noexisting
alaijohn@alai.noexisting
devdev@alai.noexisting
alemalem@alai.noexisting
gmailalembasic@gmail.comexisting

11 Product/Role Accounts (added MC #103182 round 1)

Account name (DB key)Email addressVault item name
post-alaipost@alai.noMigadu — post@alai.no
admin-alaiadmin@alai.noMigadu — admin@alai.no
sales-bilko-iosales@bilko.ioMigadu — sales@bilko.io
privacy-bilko-ioprivacy@bilko.ioMigadu — privacy@bilko.io
admin-bilko-ioadmin@bilko.ioMigadu — admin@bilko.io
sales-bilko-cloudsales@bilko.cloudMigadu — sales@bilko.cloud
admin-bilko-cloudadmin@bilko.cloudMigadu — admin@bilko.cloud
sales-bilko-companysales@bilko.companyMigadu — sales@bilko.company
admin-bilko-companyadmin@bilko.companyMigadu — admin@bilko.company
info-snowitinfo@snowit.bainfo@snowit.ba IMAP
admin-snowitadmin@snowit.baMigadu — admin@snowit.ba

2 LumisCare Accounts (added MC #103182 round 2 — CEO directive 2026-06-08)

CEO directive: LumisCare must be in John's reading loop. lumiscare.com is ALAI's own Migadu domain — these are operational mailboxes, not CareSafety-boundary addresses.

Account name (DB key)Email addressVault item name
hello-lumiscarehello@lumiscare.comMigadu — hello@lumiscare.com
admin-lumiscareadmin@lumiscare.comMigadu — admin@lumiscare.com

Note on hello@lumiscare.com forwarding: A Migadu direct forward from hello@lumiscare.com → alem@alai.no was active since 2026-05-24. This was removed 2026-06-08 so the mailbox is polled directly under hello-lumiscare with clean labeling. Before removal, LumisCare contact mail appeared in the DB under alem (Migadu ingested the forwarded copy first). After removal, external mail to hello@lumiscare.com is stored under hello-lumiscare only. Confirmed behaviourally: gmail-origin probe stored as DB id=9195 under hello-lumiscare, not duplicated under alem.

App-passwords for the 5 newly created admin@* mailboxes (round 1) were generated via the Migadu API and stored as Vaultwarden items. Vault IDs: 558181ec, 8dfe8d2d, 2f38a16a, 7d0f9216, 2fb07c20.


4. Alias Map — Dead-Address Fixes (2026-06-08)

The following addresses were previously advertised (on websites, legal pages, landing pages) but did not correspond to any real mailbox — all mail to them was silently bouncing. Migadu aliases were created to route them to the nearest real same-domain mailbox.

Dead address (was bouncing)Now routes toWhy
info@alai.nojohn@alai.noalai.no contact form was sending to this dead address — all website contact submissions were lost
support@bilko.iosales@bilko.iobilko.io landing mailto link
podrska@bilko.iosales@bilko.iobilko.io Bosnian support address on legal/terms pages
legal@bilko.ioadmin@bilko.iobilko.io legal/terms page
security@bilko.ioadmin@bilko.iobilko.io security disclosure address
support@bilko.cloudsales@bilko.cloudbilko.cloud landing mailto
support@bilko.companysales@bilko.companybilko.company landing mailto

Pre-fix state: Only postmaster@{domain} → admin@{domain} aliases existed. No rewrites, no catch-all. All other non-existent local-parts bounced.
Post-fix: All advertised addresses now deliver to a real monitored mailbox. Nothing bounces.


5. Contact-Form Routing

ProductContact form pathWhere mail ends up
alai.no website Vercel serverless: ~/business/ALAI-Holding-AS/web/api/contact.js (nodemailer) Sends to info@alai.no (which now aliases to john@alai.no — monitored). Was dead before 2026-06-08 fix.
Bilko landing pages Cloudflare Pages function: apps/landing-*/functions/api/lead.js Posts to Slack #ceo channel (C0AFJDP9V6U) + writes to Cloudflare KV (BILKO_LEADS). No email path — separate from IMAP polling.

6. Boundary Accounts — NOT Polled (intentional)

AddressReason not polled
asmir@snowit.baPersonal mailbox belonging to Asmir (SnowIT partner). He reads his own mail.
enis@snowit.baPersonal mailbox belonging to Enis. Same reason.
Any *@caresafetyinnovations.comCareSafety hard-stop boundary — health/patient-adjacent service under external ownership. NOT on ALAI's Migadu account. Never poll. See CareSafety boundary memo in MEMORY.

Important distinction: lumiscare.com (ALAI's Migadu domain — hello@, admin@) IS polled. caresafetyinnovations.com (external operator) is the hard boundary, not lumiscare.com.


7. Daemon Architecture — Production Path

Understanding the daemon path is critical when debugging ingest issues or adding accounts.

Production Execution Path

Himalaya Layer — Present but Bypassed in Production

Even with HIMALAYA_DISABLED=1, the daemon still routes account resolution through himalaya-adapter.js ACCOUNT_MAP. If an account name is missing from ACCOUNT_MAP, the daemon throws Unknown account: <name> and the account is skipped entirely.

Validated (2026-06-08T13:15Z): Zero "Unknown account" errors in both daemon runs (wrapper + legacy). All 19 accounts have last_checked_at = 2026-06-08T13:09:39Z.


8. Components — All 8 Touch-Points

Adding any new account requires updating all 8 of the following. Missing any one will cause silent failures or "Unknown account" errors.

#FileWhat to change
1 ~/system/tools/email-inbox.js (a) Add INSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>') seed row.
(b) Add a guarded migration block to extend the emails table CHECK constraint to include the new account name. The CHECK constraint is hardcoded and cannot be altered without rebuilding the table (SQLite limitation). The guard must use a unique string from the new account name (e.g. !ddlRow.sql.includes("'<name>'")). All existing rows and all 25 columns must be preserved in the rebuilt table. This is the most error-prone step — see Section 9 for the gotcha detail.
2 ~/system/tools/mail-native.js Add account-name → Vaultwarden item-name entry in VAULT_NAMES map.
3 ~/system/tools/himalaya-adapter.js Add account-name → email entry in ACCOUNT_MAP (L34–56 area). Without this, the daemon throws "Unknown account" and skips the account entirely even in legacy mode.
4 ~/.config/himalaya/config.toml Add a new [accounts.<name>] stanza. Required even when HIMALAYA_DISABLED=1.
5 ~/system/daemons/email-agent.js Add account to counts map (L2459 area). Also confirm it is present in the fetch loop and last_checked_at update loop (both must be mirrored).
6 ~/system/tools/email-imap-db-audit.js Add account to ACCOUNTS constant.
7 ~/system/tools/email-action-hard-check.js Add account to ALL_MONITORED_ACCOUNTS constant.
8 Vaultwarden (via bw CLI) Create app-password item named Migadu — <email> with the IMAP/SMTP password. New admin@ mailboxes require a new app-password generated via Migadu API (PUT /v1/domains/{d}/mailboxes/{lp}). Existing sales@/privacy@/info@ mailboxes may already have creds in Vaultwarden — check before creating.

Files Changed in MC #103182 (round 1 — 11 accounts)

All files modified additively. Round 1 changed 5 files (himalaya touch-points were added in round 2 as BLOCKER-2 fix).

FileLines changed
email-inbox.jsL159–172 (seeds) + L141–208 (CHECK migration, 17-account guard)
mail-native.jsL76–88 (11 VAULT_NAMES entries)
email-imap-db-audit.jsL51 (ACCOUNTS 5→16)
email-action-hard-check.jsL14–22 (ALL_MONITORED_ACCOUNTS 17 accounts)
email-agent.jsL1853–1861 (fetch loop), L1889–1895 (last_checked_at loop)

Files Changed in MC #103182 (round 2 — LumisCare + BLOCKER-2 fix)

FileLines changed
email-inbox.jsL212–311 (second guarded CHECK migration, 19-account guard: !ddlRow2.sql.includes("'hello-lumiscare'")); 2 new email_accounts seed rows
mail-native.jsL90–91 (hello-lumiscare + admin-lumiscare VAULT_NAMES)
himalaya-adapter.jsL34–56 (ACCOUNT_MAP expanded to 19 entries)
~/.config/himalaya/config.toml2 new [accounts.*] stanzas (19 total)
email-agent.jsL1862 (fetch loop), L1899 (last_checked_at loop), L2459–2468 (counts map)
email-action-hard-check.jsL24 (hello-lumiscare + admin-lumiscare in ALL_MONITORED_ACCOUNTS)
email-imap-db-audit.jsL60 (both accounts in ACCOUNTS array)

Known Minor Issue (pre-existing, non-blocking)

After SMTP send via mail-native.js, the IMAP post-send copy to Sent folder times out with ETIMEOUT. Delivery succeeds (Message-ID is logged). This is a cosmetic issue in the IMAP cleanup code — pre-existing, unrelated to MC #103182. Separate MC recommended.


9. GOTCHA — emails Table CHECK Constraint

This is the most dangerous footgun when adding new accounts. Read before touching email-inbox.js.

The emails table in ~/system/databases/email-inbox.db has a hardcoded SQLite CHECK constraint:

account TEXT NOT NULL CHECK(account IN ('john','info','alai','dev','alem','gmail',
  'post-alai','admin-alai',
  'sales-bilko-io','privacy-bilko-io','admin-bilko-io',
  'sales-bilko-cloud','admin-bilko-cloud',
  'sales-bilko-company','admin-bilko-company',
  'info-snowit','admin-snowit',
  'hello-lumiscare','admin-lumiscare'
))

The trap: INSERT OR IGNORE silently discards rows that violate CHECK constraints — no exception is thrown, no warning is logged. If a new account name is not in this list, every email received by that account is permanently lost at ingest time. In MC #103182 this caused 27 real emails to be silently dropped before the issue was caught by Proveo.

The fix: SQLite does not support ALTER TABLE ... MODIFY COLUMN with a new CHECK constraint. The only way to extend it is to rebuild the table:

  1. Read current DDL: SELECT sql FROM sqlite_master WHERE type='table' AND name='emails'
  2. Guard the migration: check that the new account name is NOT already in the DDL (idempotency)
  3. In a transaction: CREATE TABLE emails_new (...same schema + extended CHECK...)INSERT INTO emails_new SELECT * FROM emails → assert row count matches → DROP TABLE emailsALTER TABLE emails_new RENAME TO emails → recreate indexes → COMMIT
  4. Rollback on any error or row count mismatch

The pattern already exists in email-inbox.js — follow it exactly. All 25 columns must be listed explicitly, including the post-migration additions: delegated_to, delegated_at, deadline, body, triaged_at, auto_forwarded.


10. Runbook — How to Add a New Mailbox to John's Loop

  1. Verify the mailbox exists in Migadu.
    Check via GET /v1/domains/{domain}/mailboxes using the admin API key ("migadu keyy" in Vaultwarden).
    If it does not exist, create it via the admin UI or API first.
  2. Create an app-password for the mailbox.
    Use Migadu admin UI (Mailbox settings > App Passwords) or PUT /v1/domains/{domain}/mailboxes/{local_part}.
    Store the password as a new Vaultwarden item named Migadu — {email}.
  3. [Touch-point 2] Add to mail-native.js VAULT_NAMES map.
    Key = your chosen account name (e.g. sales-newdomain), value = the Vaultwarden item name.
  4. [Touch-point 3] Add to himalaya-adapter.js ACCOUNT_MAP.
    Add '<name>': '<email>' in the ACCOUNT_MAP object. Without this step the daemon throws "Unknown account" and the account is silently skipped.
  5. [Touch-point 4] Add stanza to ~/.config/himalaya/config.toml.
    Follow the existing pattern for a Migadu account stanza.
  6. [Touch-point 1a] Add the email_accounts seed to email-inbox.js.
    Append INSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>') in the seed block.
  7. [Touch-point 1b — CRITICAL] Add a guarded CHECK migration to email-inbox.js getDb().
    Read Section 9 first. Guard: !ddlRow.sql.includes("'<name>'"). Extend CHECK to include new account. Rebuild table in a transaction preserving all 25 columns. Test idempotency.
  8. [Touch-point 6] Add to email-imap-db-audit.js ACCOUNTS array.
  9. [Touch-point 7] Add to email-action-hard-check.js ALL_MONITORED_ACCOUNTS array.
  10. [Touch-point 5] Add to email-agent.js counts map, fetch loop, and last_checked_at loop.
    All three locations must be mirrored.
  11. Run syntax checks on all modified files.
    node --check ~/system/tools/email-inbox.js && node --check ~/system/tools/mail-native.js && node --check ~/system/tools/himalaya-adapter.js && node --check ~/system/daemons/email-agent.js
  12. Test connectivity.
    node ~/system/tools/mail-native.js test --account <name> — expect IMAP OK + SMTP OK.
  13. Restart the email-agent daemon (LaunchAgent: com.john.email-agent) so the updated accounts array and config take effect.
  14. Proveo ingest probe.
    Send a test email from a non-ALAI sender (e.g. gmail account) with subject INGEST-PROBE-<name>-<timestamp>. This avoids the Migadu catch-all pre-emption issue (see Section 1 API quirks). Trigger one daemon cycle. Confirm the row appears under the correct account name via node ~/system/tools/email-inbox.js search "INGEST-PROBE".
  15. If adding a new alias (not a real mailbox): create the Migadu alias first (same-domain destination only, with Accept: application/json header). Then proceed from step 3.

11. Validation Evidence (MC #103182 — Final)

Round 1 (17 accounts — 2026-06-08T11:24Z)

CheckResult
Code changes (5 files) verified by ProveoPASS
DB registry — 17 rows in email_accountsPASS
IMAP/SMTP connectivity — 11/11 new accountsPASS
emails table CHECK migration (emails_new rebuild)PASS — DDL confirmed, 4697 rows preserved
Ingest probes — 4/4 probe accounts persist to DBPASS (round 2 probes after schema fix; DB ids 9052/9056/9057/9059/9062/9063/9064)
Regression — original 6 accountsPASS — counts growing, timestamps advancing
No-loop / alias dedup (UNIQUE on message_id)PASS — 0 duplicate message_ids
email-action-hard-check.js exit codePASS — exit 0, 17 accounts in scope

Blocker found and fixed during round 1 validation: The emails table had a hardcoded CHECK covering only the original 6 accounts. INSERT OR IGNORE silently dropped 27 real emails before the migration was applied. See Section 9 for the full gotcha description.

Round 2 (19 accounts — LumisCare + daemon path — 2026-06-08T13:15Z)

CheckResult
ACCOUNT_MAP (himalaya-adapter.js) has 19 entriesPASS — L34–56 confirmed
config.toml has 19 [accounts.*] stanzasPASS — grep count = 19
email-agent.js counts map has 19 accountsPASS — L2460–2468
Zero "Unknown account" errors (wrapper run)PASS — grep -c = 0 / 40 lines
Zero "Unknown account" errors (legacy/production run)PASS — grep -c = 0
Zero silent drops / CHECK failures (production run)PASS
admin-lumiscare ingest proofPASS — DB id=9070 under admin-lumiscare
hello-lumiscare ingest proof (external sender)PASS — DB id=9195 under hello-lumiscare (gmail-origin probe)
sales-bilko-cloud ingest proofPASS — DB id=9193
sales-bilko-company ingest proofPASS — DB id=9194
hello@lumiscare.com forwarding removal (behavioural)PASS — gmail-origin stored only under hello-lumiscare, not duplicated under alem
All 19 last_checked_at freshPASS — 2026-06-08T13:09:39Z all accounts
No duplicate message_idsPASS — 0 rows
Regression (orig 6 + prior 11)PASS — row counts growing, timestamps fresh

Evidence files: /tmp/evidence-103182/flowforge-build.md, /tmp/evidence-103182/proveo-validation.md, /tmp/evidence-103182/daemon-wrapper-run.log, /tmp/evidence-103182/daemon-legacy-run.log