Operations

Runbooks, cold start procedures, service registry, monitoring.

Overview
BookStack Runbook
BookStack MFA Setup
CEO Dashboard Runbook
Infrastructure Runbook
Mission Control Dashboard
Planka Runbook
Ops Agent Runbook
Service Registry
Ops Agent
Daemons & Services
Go-Live Runbook
Operational Runbook
Incident Report
Post-Mortem
SLA Report
Terminal & Tmux Shortcuts
Baikal CalDAV Runbook
ALAI Infrastructure Map & Ops Runbooks
ALAI Infrastructure Map & Ops Runbooks
System Map — Infrastructure & Services
ALAI Domain Migration — basicconsulting.no → alai.no
AWS CLI Setup — john-deploy IAM
Slack alaiops Bot — Backend Architecture
Documenso Self-Hosted — sign.basicconsulting.no
Azure Blob Offsite Backup Setup
ANVIL Memory Troubleshooting — Mac Studio
Email Pipeline + Edita PA — Runbook
Email Pipeline + Edita PA — Runbook
Ollama Fleet Architecture
Static Hosting Migration — Progress Log
ANVIL DR Bootstrap Runbook (Mac Air)
Incident — 2026-04-21 alai.no Contact Form Failure
Incident Postmortem — Bilko Deploy Fix 2026-04-22
pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile
Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)
Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887
ALAI Mail Topology — Migadu Domains, Mailbox Inventory, John's 19-Account Ingest Loop (2026-06-08)

Overview

Operations Overview

Runbooks, cold start procedures, service registry, and monitoring documentation.

Owner: John Last Verified: 2026-02-17

To be populated from ~/system/ops/

BookStack Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: BookStack

Service Type: Wiki / Knowledge Base Container: bookstack (lscr.io/linuxserver/bookstack:latest) Ports: 6875 (external) → 80 (internal) Internal URL: http://localhost:6875 External URL: http://192.168.68.61:6875 (LAN only, no Cloudflare tunnel yet) Database: MariaDB (bookstack_db) Compose File: ~/system/services/bookstack/docker-compose.yml

Service Info

BookStack is the documentation wiki for BasicAS Group. Stores runbooks, system docs, org info.

Stack:

bookstack - Main app (LinuxServer.io build)
bookstack_db - MariaDB (LinuxServer.io build)

Access:

Admin URL: http://localhost:6875 or http://192.168.68.61:6875
Admin Email: admin@admin.com
Admin Password: password
WARNING: Default admin credentials! Change immediately after first login.

API:

Token ID: jpipe2-c33b96497a61ca91
Token Secret: 100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4
Config: ~/system/config/bookstack.json
Sync Tool: node ~/system/tools/bookstack-sync.js sync

Status Check

Container Health

docker ps | grep bookstack

Expected output:

bookstack       Up X hours
bookstack_db    Up X hours

HTTP Check

curl -I http://localhost:6875

Expected: 200 OK or 302 Found

API Check

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/docs.json | head -5

Expected: JSON response with API docs.

Database Check

docker exec bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp -e "SELECT count(*) FROM pages;"

Restart Procedure

Quick Restart (Container Only)

docker restart bookstack

Full Stack Restart (Container + Database)

cd ~/system/services/bookstack
docker compose down
docker compose up -d

Wait 30 seconds, then verify:

docker ps | grep bookstack
curl -I http://localhost:6875

Sync System Docs to BookStack

BookStack is auto-populated from ~/system/ using the sync tool.

Sync All Mapped Content

node ~/system/tools/bookstack-sync.js sync

Sync Single File

node ~/system/tools/bookstack-sync.js sync ~/system/rules/development.md

Check Sync Status

node ~/system/tools/bookstack-sync.js status

Force Overwrite All

node ~/system/tools/bookstack-sync.js push

Mapping File: ~/system/config/bookstack-sync-map.json State File: ~/system/config/bookstack-sync-state.json

Troubleshooting

Problem: Container won't start

Check logs:

docker logs bookstack --tail 100

Common causes:

Database not ready - wait 30s and retry
Port 6875 already bound - check lsof -i :6875
Volume permission issues - check ~/system/services/bookstack/data/

Fix:

cd ~/system/services/bookstack
docker compose down
docker compose up -d bookstack_db
sleep 30
docker compose up -d bookstack

Check if admin credentials were changed in UI:

Default: admin@admin.com / password
If changed, use new credentials or reset via database

Reset admin password:

docker exec -it bookstack php /app/www/artisan bookstack:create-admin --email=admin@admin.com --name=Admin --password=newpassword

Problem: API returns 401 Unauthorized

Check token exists:

cat ~/system/config/bookstack.json

Regenerate token in UI:

Problem: Sync tool fails (500 error)

Check BookStack is running:

curl -I http://localhost:6875

Check API endpoint:

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves | head -20

Check logs:

docker logs bookstack --tail 100

Problem: Database connection issues

Check database health:

docker exec bookstack_db mysqladmin -u bookstack -pB4s1cAS_w1k1_2026! ping

Expected: mysqld is alive

Check connection settings:

docker exec bookstack env | grep DB_

Expected:

DB_HOST=bookstack_db
DB_PORT=3306
DB_USERNAME=bookstack
DB_PASSWORD=B4s1cAS_w1k1_2026!
DB_DATABASE=bookstackapp

API Usage

List Shelves

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/shelves

List Books

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/books

List Pages

curl -s -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" http://localhost:6875/api/pages

Create Page

curl -X POST -H "Authorization: Token jpipe2-c33b96497a61ca91:100527aa211096463db2f775c9a267c816d11d54b1ec3e038b2b41ee2ae6c6c4" \
  -H "Content-Type: application/json" \
  -d '{"book_id":1,"name":"Page Title","markdown":"# Content"}' \
  http://localhost:6875/api/pages

Full API docs: http://localhost:6875/api/docs

Dependencies

Docker - Service runtime
No external dependencies - LAN-only access

Backup

Database Dump

docker exec bookstack_db mysqldump -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d-%H%M%S).sql.gz

Data Volumes (includes uploads, images)

cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d-%H%M%S).tar.gz data/

Restore from Backup

# Stop service
cd ~/system/services/bookstack
docker compose down

# Restore database
gunzip -c ~/backups/bookstack-YYYYMMDD-HHMMSS.sql.gz | docker exec -i bookstack_db mysql -u bookstack -pB4s1cAS_w1k1_2026! bookstackapp

# Restore data (if needed)
cd ~/system/services/bookstack
tar -xzf ~/backups/bookstack-data-YYYYMMDD-HHMMSS.tar.gz

# Start service
docker compose up -d

Configuration

Key Environment Variables

APP_URL - Public URL (http://192.168.68.61:6875)
APP_KEY - Laravel encryption key (base64-encoded)
DB_HOST - Database host (bookstack_db)
DB_USERNAME - Database user (bookstack)
DB_PASSWORD - Database password
DB_DATABASE - Database name (bookstackapp)
QUEUE_CONNECTION - Job queue driver (database)
PUID/PGID - User/group IDs (1000/1000)
TZ - Timezone (Europe/Sarajevo)

Full config: ~/system/services/bookstack/docker-compose.yml

Application Settings (via UI)

Access: Settings (gear icon, top-right)
Customize: Branding, registration, auth, permissions

Content Structure

BookStack organizes content as:

Shelf (top-level category)
  └─ Book (collection of pages)
       └─ Page (markdown document)
            └─ Chapter (optional grouping)

Current structure (as of 2026-02-10):

2 shelves (BasicAS System, Organization)
15 books (System Architecture, Operations, Runbooks, etc.)
43 pages (GOTCHA framework, rules, agent docs, runbooks, etc.)

Notes

Admin password: Default is password - MUST be changed!
External access: LAN-only (no Cloudflare tunnel) - consider adding tunnel for remote access
API token: Stored in plaintext in config file - secure via file permissions (chmod 600)
Sync tool: Auto-updates BookStack from ~/system/ markdown files
Timezone: Europe/Sarajevo (BiH time)
LinuxServer.io build: Community-maintained, not official BookStack image

Last updated: 2026-02-10 Maintained by: John (AI Director)

BookStack MFA Setup

Last Verified: 2026-02-17 | Owner: John

BookStack MFA and API Token Setup

Service: BookStack Knowledge Base URL: http://localhost:6875 or http://192.168.68.61:6875

Overview

This runbook covers:

Setting up Multi-Factor Authentication (MFA) for admin accounts
Creating new API tokens after admin account changes
Security best practices

Prerequisites

BookStack is running and accessible
Admin account: john@alai.no (password: BkStk_J0hn_2026!Secure)
Browser access to BookStack web interface

Part 1: Enable MFA (Multi-Factor Authentication)

Open browser and navigate to http://localhost:6875
Click "Sign In"
Enter credentials:
- Email: john@alai.no
- Password: BkStk_J0hn_2026!Secure

Step 2: Access Account Settings

Click on your profile icon (top-right corner)
Select "Edit Profile" or "My Account"

Step 3: Enable MFA

Scroll to "Multi-Factor Authentication" section
Click "Setup MFA"
Choose method:
- TOTP (Recommended): Time-based One-Time Password (Google Authenticator, Authy, etc.)
- Backup Codes: Generate backup recovery codes
For TOTP setup:
- Scan QR code with authenticator app
- Enter 6-digit verification code
- Save backup codes in secure location (~/system/config/bookstack-mfa-backup.txt)
Click "Confirm" to enable MFA

Step 4: Test MFA

Log out
Log back in with same credentials
Verify you're prompted for MFA code
Enter code from authenticator app
Successful login confirms MFA is working

Part 2: Create New API Token

The old API token was invalidated when the default admin@admin.com account was deleted. You need to create a new token for the john@alai.no account.

Step 1: Navigate to API Settings

Step 2: Create Token

Click "Create Token"
Enter token details:
- Name: System Integration Token
- Expiry: Never (or set appropriate expiry)
Click "Save"

Step 3: Copy Token Credentials

IMPORTANT: Token secret is only shown once!

You will see:

Token ID: (example: jpipe2-abc123xyz)
Token Secret: (long hexadecimal string)

Copy both values immediately.

Step 4: Update Config File

Update ~/system/config/bookstack.json with new token:

# Edit the config file
nano ~/system/config/bookstack.json

Replace token_id and token_secret with new values:

{
  "url": "http://localhost:6875",
  "external_url": "http://192.168.68.61:6875",
  "token_id": "YOUR_NEW_TOKEN_ID",
  "token_secret": "YOUR_NEW_TOKEN_SECRET",
  "admin_email": "john@alai.no",
  "admin_password": "BkStk_J0hn_2026!Secure",
  "alem_email": "alem@basicconsulting.no",
  "alem_password": "V4YawdA13PdsRBIOtFz9"
}

Save the file (Ctrl+O, Enter, Ctrl+X in nano).

Step 5: Test API Token

# Read token from config
TOKEN_ID=$(cat ~/system/config/bookstack.json | grep token_id | cut -d'"' -f4)
TOKEN_SECRET=$(cat ~/system/config/bookstack.json | grep token_secret | cut -d'"' -f4)

# Test API call
curl -s -H "Authorization: Token $TOKEN_ID:$TOKEN_SECRET" http://localhost:6875/api/shelves

Expected: JSON response with list of shelves.

If you see {"error":{"message":"No matching API token was found"...}}, the token is incorrect.

Part 3: Additional Security Measures

Disable Guest Access (Optional)

If you want to require authentication for all access:

Edit docker-compose.yml:

cd ~/system/services/bookstack
nano docker-compose.yml

Change:

- ALLOW_GUEST_ACCESS=true

to:

- ALLOW_GUEST_ACCESS=false

Restart BookStack:
```
docker compose restart bookstack
```

Review User Permissions

Enable Audit Log

Settings → Audit Log
Enable logging of user actions
Review periodically for suspicious activity

Regular Backups

Ensure regular backups are configured:

# Database backup
docker exec bookstack_db mysqldump -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp | gzip > ~/backups/bookstack-$(date +%Y%m%d).sql.gz

# Data backup
cd ~/system/services/bookstack
tar -czf ~/backups/bookstack-data-$(date +%Y%m%d).tar.gz data/

Add to daily cron job or LaunchAgent.

Troubleshooting

MFA Not Working

Solutions:

Check time sync on server and phone (TOTP requires accurate time)
Use backup codes if available

Reset MFA via database (emergency only):

docker exec bookstack_db mysql -u bookstack -p8CdydCxVBD7wBoCVRXZE bookstackapp \
  -e "UPDATE users SET mfa_values = NULL WHERE email = 'john@alai.no';"

Lost API Token

Problem: Token was not saved and is no longer visible

Solution:

Delete old token in web UI (API Tokens tab)
Create new token (see Part 2)
Update config file

Cannot Access Web UI

Problem: BookStack returns 500 error or won't load

Solutions:

Check container status: docker ps | grep bookstack
Check logs: docker logs bookstack --tail 100
Restart service: cd ~/system/services/bookstack && docker compose restart

Security Best Practices

MFA on all admin accounts - Always enable MFA for admins
Strong passwords - Use 20+ character passwords with mixed case, numbers, symbols
Regular token rotation - Rotate API tokens every 90 days
Least privilege - Give users minimum permissions needed
Audit logs - Review regularly for suspicious activity
Backups - Daily database + data backups
HTTPS - Use Cloudflare tunnel for external access (see bookstack.md)
Keep updated - Update BookStack image regularly

Next Steps

After completing this setup:

Enable MFA for john@alai.no
Create new API token
Update ~/system/config/bookstack.json
Test API token works
Enable MFA for alem@basicconsulting.no
Review and set user permissions
Configure daily backups
Consider Cloudflare tunnel for external access

Last updated: 2026-02-17 Maintained by: John (AI Director) Related: ~/system/context/docs/runbooks/bookstack.md

CEO Dashboard Runbook

Last Verified: 2026-02-17 | Owner: John

CEO Dashboard

URL: http://localhost:3030/ceo Server: Mission Control Dashboard (port 3030) Auto-refresh: 60 seconds Theme: Dark (ALAI brand)

Overview

The CEO Dashboard provides Alem with a single-screen view of all critical business metrics. It aggregates data from multiple sources (Mission Control tasks, sales pipeline, invoices, support tickets, decisions) into a real-time executive view.

Sections

1. Revenue Overview (Banner)

MRR (Monthly Recurring Revenue) — Estimated from total invoiced / months
Outstanding — Total unpaid invoices
3-Month Trend — Revenue trend (TODO: implement calculation)
Next Invoice Due — Next upcoming payment deadline

Data Source: invoice-generator.js stats and invoice-generator.js list

2. Pipeline Funnel

Visual funnel showing lead progression:

Prospect → Qualified → Proposal Sent → Negotiating → Won
Each stage shows count of active leads

Data Source: sales-pipeline.js stats

3. Active Projects (Kanban)

Project status board with 3 columns:

Active — In progress tasks with project tag
Pending — Paused tasks with project tag
Stalled — Blocked tasks with project tag

Data Source: Mission Control tasks table (filtered by project IS NOT NULL)

4. Decisions Pending

Top 5 GO/NO-GO decisions awaiting Alem's response:

Title of decision
Recommendation (MUST GO / GO / CONDITIONAL GO / NO-GO)
Visual badge indicating action needed

Data Source: ~/system/specs/alem-decisions-2026-02.md (parsed from markdown)

5. Alerts Panel

Critical alerts requiring attention:

Overdue invoices (from invoice-generator.js check-overdue)
SLA breaches (from ticket-sla-checker.js)
Stale tasks (open >7 days from MC)

Color coding:

🔴 Critical (red) — SLA breaches
⚠️ Warning (yellow) — Overdue invoices
ℹ️ Info (blue) — Stale tasks

Data Sources: invoice-generator.js, ticket-sla-checker.js, MC tasks table

6. Upcoming Deadlines

Timeline of upcoming deadlines (next 14 days):

Tasks with "deadline" keyword in description
Sorted by creation date (proxy for urgency)

Data Source: Mission Control tasks table (filtered by description LIKE '%deadline%')

Technical Details

Implementation

Added as route /ceo to existing MC dashboard server
Server file: ~/system/tools/mc-dashboard.js
HTML file: ~/system/tools/ceo-dashboard.html
API endpoint: GET /api/ceo/dashboard (JSON)

Data Aggregation

Dashboard uses child_process.execSync to call existing tools:

const invoiceStatsRaw = execSync('node ~/system/tools/invoice-generator.js stats 2>/dev/null');
const pipelineRaw = execSync('node ~/system/tools/sales-pipeline.js stats 2>/dev/null');

Data is cached for 60 seconds to avoid hammering tools on every browser refresh.

Styling

Pure CSS (no frameworks)
ALAI brand colors:
- Background: #09090b
- Accent: #00E5A0
- Cards: #18181b
- Borders: #27272a
- Text: #e4e4e7
Responsive grid layout
Mobile-friendly (single column on mobile)

Auto-refresh

Two mechanisms:

HTML meta refresh: <meta http-equiv="refresh" content="60">
JavaScript interval: setInterval(loadDashboard, 60000)

Access

Local

Direct: http://localhost:3030/ceo
From MC dashboard: Click "CEO Dashboard" link (TODO: add link to MC dashboard)

LAN Access

Dashboard is bound to 0.0.0.0:3030, accessible from any device on the network:

Find Mac Studio IP: ifconfig | grep "inet " | grep -v 127.0.0.1
Access from phone/tablet: http://[MAC_IP]:3030/ceo

Mobile

Fully responsive. Recommended for iPad/tablet in landscape mode for best experience.

Future Enhancements

Phase 2 (Interactive)

Click on decisions to mark GO/NO-GO (updates alem-decisions file)
Click on alerts to take action (send reminder, escalate ticket)
Filter pipeline by source/date range
Drill-down from project kanban to task list

Phase 3 (Advanced Metrics)

Revenue trend calculation (3-month moving average)
Pipeline conversion rates (qualified → won)
Task velocity (tasks closed per week)
SLA compliance percentage over time
Contract expiration warnings

Phase 4 (AI Insights)

Weekly digest summary (Ollama-generated)
Anomaly detection (sudden drop in pipeline, spike in alerts)
Predictive revenue forecasting
Recommendations engine (which decision to prioritize)

Maintenance

Update Decision File

When Alem makes decisions, update:

~/system/specs/alem-decisions-2026-02.md

Dashboard will auto-parse on next refresh.

Restart Dashboard

If changes are made to server code:

launchctl kickstart -k gui/$(id -u)/com.john.mc-dashboard

Check Logs

tail -f ~/system/logs/mc-dashboard.log
tail -f ~/system/logs/mc-dashboard-error.log

Troubleshooting

Dashboard shows "Loading..." indefinitely

Check API endpoint: curl http://localhost:3030/api/ceo/dashboard
Check browser console for JavaScript errors
Verify MC dashboard daemon is running: launchctl list | grep mc-dashboard

Data shows 0 or N/A

Verify tool outputs: node ~/system/tools/invoice-generator.js stats
Check tool paths in mc-dashboard.js API route
Ensure database files exist in ~/system/databases/

Mobile layout broken

Clear browser cache
Test responsive design in browser dev tools
Check CSS media queries in ceo-dashboard.html

Server: /Users/makinja/system/tools/mc-dashboard.js
HTML: /Users/makinja/system/tools/ceo-dashboard.html
Daemon: ~/Library/LaunchAgents/com.john.mc-dashboard.plist
Manifest: ~/system/tools/manifest.md
Decisions: ~/system/specs/alem-decisions-2026-02.md

Infrastructure Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Local Infrastructure

Platform: Mac Studio M3 Ultra, 96GB RAM, macOS Services: Docker containers, LaunchAgents, Cloudflare tunnels

Docker Services

Status Check

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

Services

Container	Image	Port	Health
mattermost	mattermost/mattermost-enterprise	8065	healthcheck
mattermost-db	postgres:13	5432 (internal)	—
planka	ghcr.io/plankanban/planka	3100→1337	healthcheck
planka-db	postgres:15-alpine	5433 (internal)	healthcheck
documenso	documenso/documenso	3003	—
documenso-db	postgres	5434 (internal)	healthcheck
bookstack	lscr.io/linuxserver/bookstack	6875→80	—
bookstack_db	lscr.io/linuxserver/mariadb	3306 (internal)	—

Restart a container

docker restart <container_name>
# Example: docker restart mattermost

Restart all

# Mattermost stack
cd ~/system/services/mattermost && docker compose down && docker compose up -d

# Planka stack
cd ~/system/services/planka && docker compose down && docker compose up -d

# Documenso
cd ~/system/services/documenso && docker compose down && docker compose up -d

# BookStack
cd ~/system/services/bookstack && docker compose down && docker compose up -d

View logs

docker logs <container_name> --tail 50
docker logs <container_name> -f  # follow

Disk cleanup (if disk >90%)

docker system prune -f            # Remove unused images, containers, networks
docker volume prune -f             # Remove unused volumes (CAREFUL: data loss)

Cloudflare Tunnels

Config

cat ~/.cloudflared/config.yml

Routes

Hostname	Target	Service
mm.basicconsulting.no	localhost:8065	Mattermost
boards.basicconsulting.no	localhost:3100	Planka
sign.basicconsulting.no	localhost:3003	Documenso

Status

cloudflared tunnel info mattermost

Restart tunnel

# Tunnel runs as LaunchAgent
launchctl unload ~/Library/LaunchAgents/com.cloudflare.tunnel.plist
launchctl load ~/Library/LaunchAgents/com.cloudflare.tunnel.plist

LaunchAgents (Daemons)

List all custom daemons

launchctl list | grep -E "com\.(john|edita|cloudflare)"

Expected daemons

Daemon	Interval	Location
com.john.ops-agent	5 min	~/Library/LaunchAgents/
com.edita.autowork	30 min	~/Library/LaunchAgents/
com.john.mc-dashboard	always	~/Library/LaunchAgents/
com.john.mc-session-worker	on events	~/Library/LaunchAgents/

Load/unload

launchctl load ~/Library/LaunchAgents/<plist-name>.plist
launchctl unload ~/Library/LaunchAgents/<plist-name>.plist

Ollama (Local AI)

Status

curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models']]"

Models

Model	Size	Use
llama3.1:8b	5GB	Fast classification (ops-agent)
qwen2.5-coder:32b	19GB	Code generation, contextual responses
llama3.1:70b	40GB	Research, writing

Restart Ollama

# Ollama runs as macOS app
killall ollama 2>/dev/null
open -a Ollama

Mission Control Dashboard

Status

curl -s http://localhost:3030 | head -1

Restart

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Full Health Check

# Human-readable
node ~/system/tools/health-check.js

# JSON (programmatic)
node ~/system/tools/health-check.js --json

# Quick (HTTP only)
node ~/system/tools/health-check.js --quick

After System Reboot

All LaunchAgents with RunAtLoad: true start automatically. Verify:

# 1. Check Docker is running
docker ps

# 2. Check all daemons
launchctl list | grep -E "com\.(john|edita|cloudflare)"

# 3. Run health check
node ~/system/tools/health-check.js

# 4. If anything missing, load it
launchctl load ~/Library/LaunchAgents/<missing>.plist

Created: 2026-02-10 Last Updated: 2026-02-10

Mission Control Dashboard

Last Verified: 2026-02-17 | Owner: John

Runbook: Mission Control Dashboard

Service Type: Task Management Web UI Runtime: Node.js (Express) Port: 3030 (internal + LAN accessible) Internal URL: http://localhost:3030 LAN URL: http://192.168.68.61:3030 (mobile-friendly) Database: SQLite (~/system/databases/mission-control.db) LaunchAgent: com.john.mc-dashboard Source: ~/system/tools/mc-dashboard.js

Service Info

Mission Control Dashboard is the web UI for task management. Provides CRUD operations, priority management, status tracking, and team coordination.

Features:

Task list with filters (open/closed, owner, priority)
Create/edit/delete tasks
Start/pause/resume tasks
Priority management (H/M/L)
Owner assignment (john/edita/—)
Real-time status updates
Mobile-responsive design
Auto-refresh every 30 seconds

CLI Alternative:

node ~/system/tools/mc.js list|add|start|done|pause|resume|block

Status Check

LaunchAgent Status

launchctl list | grep mc-dashboard

Expected output: PID shown (e.g., 12345 0 com.john.mc-dashboard)

If not running: - 0 com.john.mc-dashboard (no PID)

HTTP Check

curl -I http://localhost:3030

Expected: 200 OK

LAN Access Check (from another device)

curl -I http://192.168.68.61:3030

Expected: 200 OK

Database Check

sqlite3 ~/system/databases/mission-control.db "SELECT count(*) FROM tasks WHERE status = 'open';"

Restart Procedure

Stop Service

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Start Service

launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Restart (Stop + Start)

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Note: LaunchAgent auto-restarts on crash (KeepAlive=true).

View Logs

stdout (General logs)

tail -f ~/system/logs/mc-dashboard.log

stderr (Error logs)

tail -f ~/system/logs/mc-dashboard.err

Recent errors

tail -50 ~/system/logs/mc-dashboard.err

Troubleshooting

Problem: Dashboard won't start

Check LaunchAgent:

launchctl list | grep mc-dashboard

Check error log:

tail -50 ~/system/logs/mc-dashboard.err

Common causes:

Port 3030 already bound - check lsof -i :3030
Database locked - check for stale processes using SQLite
Node.js not found - check which node
Permission issues - check file ownership

Fix:

# Kill any process on port 3030
lsof -ti :3030 | xargs kill -9

# Restart
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Problem: Can't connect from mobile (LAN)

Check service is listening on all interfaces:

lsof -i :3030

Expected: *:3030 (listening on all IPs, not just 127.0.0.1)

Check firewall:

sudo /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate

If firewall is on, allow Node.js:

sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /opt/homebrew/bin/node

Check Mac IP:

ipconfig getifaddr en0  # WiFi
ipconfig getifaddr en1  # Ethernet

Expected: 192.168.68.61 (or similar)

Problem: Tasks not updating (stale data)

Check database integrity:

sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"

Expected: ok

Check last write:

ls -lh ~/system/databases/mission-control.db

Restart dashboard:

launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Problem: 500 errors in UI

Check server logs:

tail -f ~/system/logs/mc-dashboard.log ~/system/logs/mc-dashboard.err

Check database:

sqlite3 ~/system/databases/mission-control.db "SELECT * FROM tasks LIMIT 1;"

Common causes:

Database schema mismatch - migrate database
Corrupted task data - fix in SQLite
Node.js error - check stack trace in error log

CLI Integration

Mission Control has two interfaces:

Dashboard (UI) - http://localhost:3030
CLI - node ~/system/tools/mc.js

Both read/write the same SQLite database: ~/system/databases/mission-control.db

CLI Commands

# List tasks
node ~/system/tools/mc.js list
node ~/system/tools/mc.js list --owner john

# Start task (creates /tmp/mc-active-task)
node ~/system/tools/mc.js start <id>

# Complete task
node ~/system/tools/mc.js done <id> "outcome summary"

# Pause task (removes /tmp/mc-active-task)
node ~/system/tools/mc.js pause <id>

# Block task
node ~/system/tools/mc.js block <id> "blocker reason"

# Show full details
node ~/system/tools/mc.js show <id>

# Who's working on what
node ~/system/tools/mc.js active

Dependencies

Node.js - Runtime (/opt/homebrew/bin/node)
SQLite3 - Database (built-in with Node.js)
LaunchAgent - Auto-start on login
No external services - Fully local

Backup

Database Backup

cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +%Y%m%d-%H%M%S).db

Automated Backup (daily)

Add to crontab or LaunchAgent:

0 2 * * * cp ~/system/databases/mission-control.db ~/backups/mission-control-$(date +\%Y\%m\%d).db

Restore from Backup

# Stop dashboard
launchctl unload ~/Library/LaunchAgents/com.john.mc-dashboard.plist

# Restore database
cp ~/backups/mission-control-YYYYMMDD-HHMMSS.db ~/system/databases/mission-control.db

# Start dashboard
launchctl load ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Configuration

LaunchAgent Plist

Path: ~/Library/LaunchAgents/com.john.mc-dashboard.plist

Key settings:

KeepAlive: true - Auto-restart on crash
RunAtLoad: true - Start on login
StandardOutPath - Log stdout
StandardErrorPath - Log stderr
EnvironmentVariables: HOME - User home directory

Application Config

Port: 3030 (hardcoded in mc-dashboard.js) Database: ~/system/databases/mission-control.db (hardcoded) Auto-refresh: 30 seconds (client-side)

To change port:

Edit ~/system/tools/mc-dashboard.js
Change const PORT = 3030; to desired port
Restart LaunchAgent

Mission Control Session Worker

LaunchAgent: com.john.mc-session-worker Purpose: Background daemon for session-level task monitoring

Status check:

launchctl list | grep mc-session-worker

Notes

Access: LAN-accessible (no auth) - consider adding auth for remote access
Mobile-friendly: Responsive design, touch-optimized
No auth: Anyone on LAN can create/modify tasks - secure network required
Auto-refresh: Dashboard auto-refreshes every 30s
Active task enforcement: ~/system/.claude/hooks/gotcha-enforcer.py checks /tmp/mc-active-task before Write/Edit
CLI vs UI: Both interfaces are equal - use whichever is convenient

Last updated: 2026-02-10 Maintained by: John (AI Director)

Planka Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Planka

Service Type: Kanban Board / Project Management Container: planka (ghcr.io/plankanban/planka:2.0.0-rc.4) Ports: 3100 (external) → 1337 (internal) External URL: https://boards.basicconsulting.no Database: PostgreSQL 15 (planka-db) Compose File: ~/system/services/planka/docker-compose.yml

Service Info

Planka is the visual project management tool for BasicAS Group. Kanban-style boards for task tracking.

Stack:

planka - Main app (RC4)
planka-db - PostgreSQL 15 (alpine)

External Access:

Exposed via Cloudflare Tunnel: boards.basicconsulting.no
Trust proxy enabled for correct client IPs

Admin Access:

Web UI: http://localhost:3100 (local) or https://boards.basicconsulting.no
Username: john
Password: BasicAS2026!
Email: john@basicconsulting.no
Database: postgresql://postgres@planka-db/planka (internal only, no auth)

Status Check

Container Health

docker ps | grep planka

Expected output:

planka        Up X hours (healthy)
planka-db     Up X hours (healthy)

HTTP Check

curl -I http://localhost:3100

Expected: 200 OK or 302 Found

External Access Check

curl -I https://boards.basicconsulting.no

Expected: 200 OK or 302 Found

Database Check

docker exec planka-db psql -U postgres -d planka -c "SELECT count(*) FROM \"user\";"

Restart Procedure

Quick Restart (Container Only)

docker restart planka

Full Stack Restart (Container + Database)

cd ~/system/services/planka
docker compose down
docker compose up -d

Wait 30 seconds for healthcheck to pass, then verify:

docker ps | grep planka
curl -I http://localhost:3100

Troubleshooting

Problem: Container won't start

Check logs:

docker logs planka --tail 100

Common causes:

Database not ready - wait 30s and retry
Port 3100 already bound - check lsof -i :3100
Volume permission issues - check docker volumes

Fix:

cd ~/system/services/planka
docker compose down
docker compose up -d planka-db
sleep 30
docker compose up -d planka

Check environment variables:

docker exec planka env | grep DEFAULT_ADMIN

Expected:

DEFAULT_ADMIN_EMAIL=john@basicconsulting.no
DEFAULT_ADMIN_PASSWORD=BasicAS2026!
DEFAULT_ADMIN_NAME=John AI
DEFAULT_ADMIN_USERNAME=john

If admin was changed in UI, default credentials won't work. Reset via database:

docker exec planka-db psql -U postgres -d planka -c "SELECT email, username FROM \"user\" WHERE \"isAdmin\" = true;"

Problem: 502 Bad Gateway (external access)

Check container is running:

docker ps | grep planka

Check Cloudflare tunnel:

cloudflared tunnel info boards

Check BASE_URL:

docker exec planka env | grep BASE_URL

Expected: BASE_URL=https://boards.basicconsulting.no

Problem: Database connection issues

Check database health:

docker exec planka-db pg_isready -U postgres -d planka

Check connection string:

docker exec planka env | grep DATABASE_URL

Expected: DATABASE_URL=postgresql://postgres@planka-db/planka

API Access

Planka has a REST API. Example:

Get Boards (requires auth token)

curl -H "Authorization: Bearer <TOKEN>" http://localhost:3100/api/boards

Get Token:

Dependencies

Docker - Service runtime
Cloudflare Tunnel - External access (boards.basicconsulting.no)

No dependencies on other local services.

Backup

Database Dump

docker exec planka-db pg_dump -U postgres planka | gzip > ~/backups/planka-$(date +%Y%m%d-%H%M%S).sql.gz

Docker Volumes (includes file uploads)

docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -czf /backup/planka-db-data-$(date +%Y%m%d-%H%M%S).tar.gz -C /data .

Restore from Backup

# Stop service
cd ~/system/services/planka
docker compose down

# Restore database
gunzip -c ~/backups/planka-YYYYMMDD-HHMMSS.sql.gz | docker exec -i planka-db psql -U postgres -d planka

# Restore volumes (if needed)
docker run --rm -v planka-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-data-YYYYMMDD-HHMMSS.tar.gz -C /data
docker run --rm -v planka-db-data:/data -v ~/backups:/backup alpine tar -xzf /backup/planka-db-data-YYYYMMDD-HHMMSS.tar.gz -C /data

# Start service
docker compose up -d

Configuration

Key Environment Variables

BASE_URL - External URL (https://boards.basicconsulting.no)
DATABASE_URL - PostgreSQL connection string
SECRET_KEY - Encryption key for sessions/tokens
TOKEN_EXPIRES_IN - JWT token expiry (365 days)
DEFAULT_LANGUAGE - UI language (en-US)
DEFAULT_ADMIN_* - Initial admin user credentials
TRUST_PROXY - Enable for correct IPs behind Cloudflare

Full config: ~/system/services/planka/docker-compose.yml

Notes

Version: 2.0.0-rc.4 (release candidate, not stable)
Auth method: Password-based (no SSO/LDAP yet)
Database: Uses PostgreSQL with trust auth (no password) - secure as internal-only
Token expiry: 365 days (1 year) - very long, consider shorter for security
Admin password: Stored in docker-compose.yml (plaintext) - consider secrets management

Last updated: 2026-02-10 Maintained by: John (AI Director)

Ops Agent Runbook

Last Verified: 2026-02-17 | Owner: John

Runbook: Ops Agent

Service: com.john.ops-agent Type: LaunchAgent daemon (Node.js) Interval: Every 5 minutes (300s) Location: ~/system/daemons/ops-agent.js Plist: ~/Library/LaunchAgents/com.john.ops-agent.plist Owner: John Cost: $0 (Ollama local AI)

What It Does

Autonomous operations agent that runs 24/7:

MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
Escalation — creates HIGH priority MC task + MM alert when it can't resolve

Status Check

# Is it running?
launchctl list | grep ops-agent

# Recent activity
tail -50 ~/system/logs/ops-agent.log

# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log

# State file
cat /tmp/ops-agent-state.json

# Stats
cat ~/system/agents/state/ops.json

Restart

# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Verify
launchctl list | grep ops-agent

Manual Run (Testing)

# Run one cycle manually
node ~/system/daemons/ops-agent.js

# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty

Troubleshooting

Ops agent not running

# Check if loaded
launchctl list | grep ops-agent
# Expected: "-  0  com.john.ops-agent"

# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist

Not processing messages

# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool

# Check MM connectivity
node ~/system/tools/mm.js status

# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool

# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping

Classification wrong (Ollama issues)

# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
  "stream": false,
  "options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

# If Ollama down, ops-agent falls back to keyword heuristics (still works)

Health check reporting false positives

# Run health check directly
node ~/system/tools/health-check.js

# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool

# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>

Auto-fix loop (service keeps restarting)

# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool

# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json

# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>

Planka card not created

# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
  -H "Content-Type: application/json" \
  -d '{"emailOrUsername":"john","password":"BasicAS2026!"}'

# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10

Dependencies

Service	Required	Fallback
Mattermost (8065)	YES	Agent skips MM check cycle
Ollama (11434)	NO	Falls back to keyword classification
MC (mc.js)	YES	Tasks not created (error logged)
Planka (3100)	NO	Cards not created (task still created in MC)
HiveMind	NO	Intel not posted (ops still works)

Configuration

Monitored MM Teams

Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad

Ignored Users (bots)

john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js

Billable Logic

basic team = INTERNAL (not billable)
wizard, rendrom, riad = BILLABLE (client teams)

Health Check Services

Defined in health-check.js:

8 Docker containers
6 HTTP endpoints
2 system checks (disk, memory)
4 LaunchAgent daemons

Files

File	Purpose
~/system/daemons/ops-agent.js	Main daemon code
~/Library/LaunchAgents/com.john.ops-agent.plist	LaunchAgent config
~/system/tools/health-check.js	Service health monitor
~/system/tools/auto-fix.js	Automated recovery
~/system/agents/identities/ops.md	Agent identity card
~/system/agents/state/ops.json	Persistent state
/tmp/ops-agent-state.json	Runtime state (last check timestamp)
/tmp/mm-token.json	Cached MM auth token
/tmp/ops-fix-history.json	Auto-fix attempt tracking
~/system/logs/ops-agent.log	Activity log
~/system/logs/ops-agent-launchd.log	LaunchAgent stdout
~/system/logs/ops-agent-launchd-error.log	LaunchAgent stderr

Disaster Recovery

Complete reset

# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json

# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Note: First run will check messages from last 30 minutes only (default)

Rollback to mm-responder

# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist

# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)

Metrics

Check via MC:

node ~/system/tools/mc.js stats          # Task creation stats
node ~/system/tools/mc.js list --owner ops  # Tasks created by ops-agent

Check via state:

cat ~/system/agents/state/ops.json       # Cumulative stats
cat /tmp/ops-agent-state.json            # Current cycle stats

Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10

Service Registry

Last Verified: 2026-02-17 | Owner: John

Service Registry — ALAI Holding

Last Updated: 2026-02-12 Owner: John (AI Director)

Domains

Domain	Registrar	Nameservers	Points To	Purpose	Renewal
basicconsulting.no	one.com	Cloudflare	Cloudflare Tunnel	Consulting brand	Check one.com
mm.basicconsulting.no	—	Cloudflare	Tunnel → localhost:8065	Mattermost	—
sign.basicconsulting.no	—	Cloudflare	Tunnel → localhost:3003	Documenso	—
boards.basicconsulting.no	—	Cloudflare	Tunnel → localhost:3100	Planka	—
vault.basicconsulting.no	—	Cloudflare	Tunnel → localhost:8200	Vaultwarden	—
alai.no	one.com	Vercel	Vercel	ALAI Holding website	Check one.com
getdrop.no	one.com	Vercel (pending)	Vercel → drop-landing	Drop fintech landing	Check one.com
basicfakta.no	one.com	Vercel	Vercel	BasicFakta SaaS	Check one.com

Hosting & Deploy

Service	Platform	URL	Deploy Method
Drop landing	Vercel	getdrop.no	`vercel --prod` from ~/ALAI/products/Drop/landing
ALAI website	Vercel	alai.no	`vercel --prod` from ~/ALAI/web
BasicFakta	Vercel	basicfakta.no	TBD

Local Services (Mac Studio M3 Ultra, 96GB)

Service	Type	Port	Domain	Purpose	Status
Mattermost	Docker	8065	mm.basicconsulting.no	Team chat	Active
Planka	Docker	3100	boards.basicconsulting.no	Kanban boards	Active
Documenso	Docker	3003	sign.basicconsulting.no	E-signatures	Active
BookStack	Docker	6875	localhost only	Internal wiki	Active
Vaultwarden	Docker	8200	vault.basicconsulting.no	Password manager	Active
MC Dashboard	Node.js	3030	localhost (LAN)	Mission Control	Active
Ollama	Native	11434	localhost	Local AI	Active
n8n	Docker	5678	localhost	Workflow automation	Active
MinIO	Docker	9000	localhost	S3 storage (Documenso)	Active

Cloudflare

Item	Value
Account ID	d0ac2afb6bb5b298723b85a114151a04
Tunnel ID	3315a609-7934-45c5-ad0c-56d86d16374d
CLI	`/opt/homebrew/bin/cloudflared`
Zone	basicconsulting.no

Email

Address	Provider	Purpose
john@basicconsulting.no	one.com	Support / John agent
info@basicconsulting.no	one.com	Edita / general
alem@basicconsulting.no	one.com	CEO
post@alai.no	TBD	Drop + ALAI public contact

Accounts & SaaS

Service	URL	Purpose	Owner
Vercel	vercel.com	Static hosting	john-3447
Cloudflare	dash.cloudflare.com	DNS, tunnel, CDN	Alem
one.com	one.com	Domain registrar + email	Alem
GitHub	github.com	Code repos	TBD
Fiken	fiken.no	Accounting	Alem
Flowcase	everdeen.flowcase.com	CV management	Alem

Daemons (LaunchAgents)

Daemon	Interval	Purpose
com.john.ops-agent	5 min	MM monitoring, health, auto-fix
com.john.mc-dashboard	always	Web dashboard :3030
com.john.mc-session-worker	events	Session state extraction
com.john.morning-routine	07:00	Daily briefing
com.john.agentforge	4h	Auto-audit agents
com.john.mm-bridge	5s poll	Alem→John chat (#ceo)
com.edita.autowork	30 min	Background task worker
com.john.health-check	5 min	Service health monitoring
com.john.email-agent	5 min	Email triage
com.john.intake-watcher	5 min	Email→task pipeline
com.edita.job-hunter	periodic	Opportunity scanning

Maintenance Notes

Domain renewals: All on one.com — check annually
SSL: Vercel = auto (Let's Encrypt), Cloudflare = auto
Docker updates: docker compose pull in ~/system/services/{service}/
Backups: bash ~/system/tools/db-backup.sh (daily via daemon)

Ops Agent

Runbook: Ops Agent

What It Does

Autonomous operations agent that runs 24/7:

MM Monitoring — reads all 4 Mattermost teams (basic, wizard, rendrom, riad)
Message Classification — Ollama llama3.1:8b classifies: ROUTINE / TASK / INCIDENT
Intelligent Response — Ollama qwen2.5-coder:32b generates contextual MM replies
Task Creation — creates MC tasks with BILLABLE/INTERNAL tag + Planka cards
Health Monitoring — runs health-check.js (Docker, HTTP, system, daemons)
Auto-Fix — auto-fix.js for known issues (max 3 attempts/hour/service)
Escalation — creates HIGH priority MC task + MM alert when it can't resolve

Status Check

# Is it running?
launchctl list | grep ops-agent

# Recent activity
tail -50 ~/system/logs/ops-agent.log

# LaunchAgent stdout/stderr
tail -20 ~/system/logs/ops-agent-launchd.log
tail -20 ~/system/logs/ops-agent-launchd-error.log

# State file
cat /tmp/ops-agent-state.json

# Stats
cat ~/system/agents/state/ops.json

Restart

# Graceful restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Verify
launchctl list | grep ops-agent

Manual Run (Testing)

# Run one cycle manually
node ~/system/daemons/ops-agent.js

# Watch output in real-time
node ~/system/daemons/ops-agent.js 2>&1 | tee /dev/tty

Troubleshooting

Ops agent not running

# Check if loaded
launchctl list | grep ops-agent
# Expected: "-  0  com.john.ops-agent"

# If not loaded:
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# If load fails, check plist:
plutil -lint ~/Library/LaunchAgents/com.john.ops-agent.plist

Not processing messages

# Check state — is last_check_ms recent?
cat /tmp/ops-agent-state.json | python3 -m json.tool

# Check MM connectivity
node ~/system/tools/mm.js status

# Check MM token
cat /tmp/mm-token.json | python3 -m json.tool

# Verify Mattermost is up
curl -s http://localhost:8065/api/v4/system/ping

Classification wrong (Ollama issues)

# Check Ollama is running
curl -s http://localhost:11434/api/tags | python3 -m json.tool

# Test classification manually
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Classify: ROUTINE, TASK, or INCIDENT. Reply ONE word.\n\nMessage: Can you fix the login page?",
  "stream": false,
  "options": {"temperature": 0.1, "num_predict": 10}
}' | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

# If Ollama down, ops-agent falls back to keyword heuristics (still works)

Health check reporting false positives

# Run health check directly
node ~/system/tools/health-check.js

# JSON output for debugging
node ~/system/tools/health-check.js --json 2>/dev/null | python3 -m json.tool

# Check specific service
docker ps --format '{{.Names}} {{.Status}}' | grep <service>

Auto-fix loop (service keeps restarting)

# Check fix history (max 3/hour enforcement)
cat /tmp/ops-fix-history.json | python3 -m json.tool

# Clear fix history (reset counter)
rm /tmp/ops-fix-history.json

# Check auto-fix directly
node ~/system/tools/auto-fix.js <service> <issue>

Planka card not created

# Check Planka is up
curl -s http://localhost:3100/api/access-tokens -X POST \
  -H "Content-Type: application/json" \
  -d '{"emailOrUsername":"john","password":"BasicAS2026!"}'

# Check ops-agent log for Planka errors
grep "Planka" ~/system/logs/ops-agent.log | tail -10

Dependencies

Service	Required	Fallback
Mattermost (8065)	YES	Agent skips MM check cycle
Ollama (11434)	NO	Falls back to keyword classification
MC (mc.js)	YES	Tasks not created (error logged)
Planka (3100)	NO	Cards not created (task still created in MC)
HiveMind	NO	Intel not posted (ops still works)

Configuration

Monitored MM Teams

Defined in ops-agent.js. Currently: basic, wizard, rendrom, riad

Ignored Users (bots)

john, edita, system-bot, boards, calls, tester — defined by user ID in ops-agent.js

Billable Logic

basic team = INTERNAL (not billable)
wizard, rendrom, riad = BILLABLE (client teams)

Health Check Services

Defined in health-check.js:

8 Docker containers
6 HTTP endpoints
2 system checks (disk, memory)
4 LaunchAgent daemons

Files

File	Purpose
~/system/daemons/ops-agent.js	Main daemon code
~/Library/LaunchAgents/com.john.ops-agent.plist	LaunchAgent config
~/system/tools/health-check.js	Service health monitor
~/system/tools/auto-fix.js	Automated recovery
~/system/agents/identities/ops.md	Agent identity card
~/system/agents/state/ops.json	Persistent state
/tmp/ops-agent-state.json	Runtime state (last check timestamp)
/tmp/mm-token.json	Cached MM auth token
/tmp/ops-fix-history.json	Auto-fix attempt tracking
~/system/logs/ops-agent.log	Activity log
~/system/logs/ops-agent-launchd.log	LaunchAgent stdout
~/system/logs/ops-agent-launchd-error.log	LaunchAgent stderr

Disaster Recovery

Complete reset

# 1. Stop daemon
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Clear state
rm -f /tmp/ops-agent-state.json /tmp/mm-token.json /tmp/ops-fix-history.json

# 3. Restart
launchctl load ~/Library/LaunchAgents/com.john.ops-agent.plist

# Note: First run will check messages from last 30 minutes only (default)

Rollback to mm-responder

# 1. Stop ops-agent
launchctl unload ~/Library/LaunchAgents/com.john.ops-agent.plist

# 2. Restore mm-responder
cp ~/system/archive/mm-responder.sh.archived-2026-02-10 ~/system/daemons/mm-responder.sh
chmod +x ~/system/daemons/mm-responder.sh
launchctl load ~/Library/LaunchAgents/com.john.mm-responder.plist

# 3. Update health-check.js daemon list (add mm-responder, remove ops-agent)

Metrics

Check via MC:

node ~/system/tools/mc.js stats          # Task creation stats
node ~/system/tools/mc.js list --owner ops  # Tasks created by ops-agent

Check via state:

cat ~/system/agents/state/ops.json       # Cumulative stats
cat /tmp/ops-agent-state.json            # Current cycle stats

Created: 2026-02-10 Last Updated: 2026-02-10 Next Review: 2026-03-10

Daemons & Services

Tools Manifest

CHECK THIS BEFORE CREATING NEW TOOLS. If a tool exists, use it. If you create a new tool, add it here.

TOOL-FIRST PROTOCOL: ~/system/rules/tool-first-protocol.md Redoslijed: Naši alati → Naši skillovi → Naša baza (HiveMind) → Internet → Ažuriraj bazu

Last audit: 2026-02-13 — Spring cleaning: 22 deprecated tools archived, 3 empty DBs deleted, 1 broken daemon unloaded, MEMORY.md trimmed 229→184 lines.

Task Management

Tool	Command	Description
task.sh	`~/system/tools/task.sh list\|add\|start\|done\|block`	Task CLI using Taskwarrior 3 (cross-session)
mc.js	`node ~/system/tools/mc.js list\|add\|start\|done\|show\|routes`	Mission Control - Task management with agent routing
mc.js routes	`node ~/system/tools/mc.js routes`	List available task routes (backend, frontend, devops, qa, bizdev, general)
mc.js add --route	`node ~/system/tools/mc.js add "Task" --route backend`	Create task with route - auto-spawns agent on start

Task → Agent Routing: MC tasks can be tagged with routes that automatically spawn appropriate Ollama agents when task starts.

Routes: backend (dev), frontend (designer+dev), devops (devops), qa (auditor), bizdev (marketer), general (dev)
Agent output is captured and stored in task.agent_output field
Visible in mc.js show <id> command
If Ollama unavailable, gracefully degrades (logs error, doesn't block task)
Agent runs in background via exec() - non-blocking
Logs to HiveMind on spawn/completion/error

Briefings & Analysis

Tool	Command	Description
ceo-briefing.js	`node ~/system/tools/ceo-briefing.js --full`	ZAKON #11: All-source CEO briefing (5 email accounts, MC tasks, HiveMind, sessions, daemon briefing). Zero LLM cost.
ceo-briefing.js	`node ~/system/tools/ceo-briefing.js --quick`	Quick boot summary (counts + top items, <500 tokens). Called by boot.sh.
ceo-briefing.js	`node ~/system/tools/ceo-briefing.js --email`	All 5 email accounts: inbox + sent for each.
ceo-briefing.js	`node ~/system/tools/ceo-briefing.js --followup`	Open/blocked MC tasks overview.
ceo-briefing.js	`node ~/system/tools/ceo-briefing.js --topic "X"`	Topic search across sessions + HiveMind + all email accounts.
council-briefing.js	`node ~/system/tools/council-briefing.js`	AI Council: 4 personas (Growth, Revenue, Skeptic, Ops) analyze business data via Ollama. Posts to Slack #exec. Nightly at 22:00.
meeting-prep.js	`node ~/system/tools/meeting-prep.js [--ics file.ics] [--date YYYY-MM-DD]`	Calendar-aware meeting prep: ICS parsing, CRM attendee lookup, pipeline context, contextual notes.
council-briefing.js	`node ~/system/tools/council-briefing.js --model 70b`	Use 70b model for deeper analysis
council-briefing.js	`node ~/system/tools/council-briefing.js --dry-run`	Gather data only, no Ollama/Slack
john-morning.sh	`bash ~/system/tools/john-morning.sh`	Morning routine: Quran, tasks, HiveMind, health, daily synthesis. Daily at 07:00.
memory-synthesizer.js	`node ~/system/tools/memory-synthesizer.js daily [date]`	Summarize day's intel → HiveMind memo. Auto in morning-routine.
memory-synthesizer.js	`node ~/system/tools/memory-synthesizer.js weekly`	Synthesize week → HiveMind memo. Auto Sundays 23:00.
memory-synthesizer.js	`node ~/system/tools/memory-synthesizer.js promote`	Promote weekly → long-term knowledge
memory-synthesizer.js	`node ~/system/tools/memory-synthesizer.js prune`	Delete daily memos >30 days
memory-synthesizer.js	`node ~/system/tools/memory-synthesizer.js view [tier]`	View tiered memory (daily/weekly/longterm)

Meeting & Transcript Processing

Tool	Command	Description
transcript-to-tasks.js	`node ~/system/tools/transcript-to-tasks.js <file>`	Extract action items from meeting transcript → MC tasks via Ollama
transcript-to-tasks.js	`node ~/system/tools/transcript-to-tasks.js <file> --preview`	Preview extracted actions (no task creation)
transcript-to-tasks.js	`node ~/system/tools/transcript-to-tasks.js <file> --owner john`	Assign all extracted tasks to owner

Formats: .txt, .md, .srt, .vtt. Tasks prefixed with [TRANSCRIPT].

Health & Quality

Tool	Command	Description
drift-detector.js	`node ~/system/tools/drift-detector.js snapshot`	Behavioral drift analysis engine — records daily metrics from 5 data sources (session claims, verification audits, email-audit.db, mission-control.db, hivemind.db) to drift.db. Anomaly detection with σ-based thresholds. Alerts to HiveMind + Slack. Daily at 23:55 via com.john.drift-detector LaunchAgent. Created: 2026-02-23.
drift-detector.js	`node ~/system/tools/drift-detector.js analyze [--days N]`	Analyze recent metric trends (default: 7 days). Returns trend, per-metric stats, anomalies.
drift-detector.js	`node ~/system/tools/drift-detector.js report [--days N]`	Human-readable drift report (default: 30 days).
drift-detector.js	`node ~/system/tools/drift-detector.js alert-test`	Test alert pipeline (HiveMind + Slack).
daemon-health.sh	`bash ~/system/daemons/daemon-health.sh`	Daemon health monitor with Slack alerts — monitors ALL com.john.* LaunchAgents, sends alerts to #alerts channel for failures/warnings/recoveries, runs every 15 min via LaunchAgent. Created: 2026-02-23.
daemon-health.sh	`bash ~/system/daemons/daemon-health.sh --status`	Show current daemon status (KeepAlive vs interval-based)
daemon-health.sh	`bash ~/system/daemons/daemon-health.sh --test`	Test Slack alert integration
stbs-health.js	`node ~/system/tools/stbs-health.js`	STBS v3 production monitoring — 5 hardening components (SQLite BUSY retry, heartbeat, optimistic lock, approval tokens, session staleness). MC #1724.
stbs-health.js	`node ~/system/tools/stbs-health.js --json`	JSON output (for ops-watchdog integration)
stbs-health.js	`node ~/system/tools/stbs-health.js --alert`	Alert mode (exit 1 if any threshold exceeded)
stbs-health.js	`node ~/system/tools/stbs-health.js --metric <name>`	Check specific metric only
md-health.js	`node ~/system/tools/md-health.js`	Markdown health scanner: broken links, TODOs, empty files, stale dates. Integrated in AgentForge.
md-health.js	`node ~/system/tools/md-health.js --json`	JSON output (for programmatic use)
md-health.js	`node ~/system/tools/md-health.js --fix-todos`	List all TODOs across codebase
md-health.js	`node ~/system/tools/md-health.js ~/path`	Scan specific path
doc-index.sh	`bash ~/system/tools/doc-index.sh [--output file.json] [--verbose]`	Document indexer — scans ~/projects, ~/ALAI, ~/companies for all markdown files. Creates JSON index with metadata (path, category, size, modified). Output: ~/system/databases/doc-index.json
doc-index.sh	`bash ~/system/tools/doc-index.sh --verbose`	Verbose mode — shows progress and breakdown by category
bookstack-sync.js	`node ~/system/tools/bookstack-sync.js sync`	Sync system docs to BookStack wiki (full sync)
bookstack-sync.js	`node ~/system/tools/bookstack-sync.js status`	Show what needs syncing (new/changed/ok)
bookstack-sync.js	`node ~/system/tools/bookstack-sync.js push`	Force overwrite all pages
bookstack-sync.js	`node ~/system/tools/bookstack-sync.js auto-sync`	Auto-sync changed files (daemon mode)

BookStack Sync v2 Features (2026-02-18):

Glob expansion: Sources can use "glob": "~/.claude/skills/*/SKILL.md" patterns
Chapter support: Books can have "chapters" array with grouped sources
Metadata headers: Auto-prepends source path to synced pages
Stale page cleanup: Detects deleted source files, removes BookStack pages
New books: Skills Catalog (113 skills), Hooks Reference (24 hooks), Agent Catalog (35 agents)

Backup & Data Protection

Tool	Command	Description
db-backup.sh	`bash ~/system/daemons/db-backup.sh`	Safe daily backup of all SQLite databases using sqlite3 .backup. 30-day retention. Daily at 03:00 via LaunchAgent.
db-backup-verify.sh	`bash ~/system/tools/db-backup-verify.sh`	Verify backup integrity for today's backups. Checks file size and runs PRAGMA integrity_check on all backups.

Backup Strategy:

Location: ~/system/backups/databases/
Format: Individual .db files (not compressed) for granular restore
Naming: {db-name}-{YYYY-MM-DD}.db
Integrity: Each backup verified with PRAGMA integrity_check after creation
Retention: Automatic cleanup of backups older than 30 days
Logging: ~/system/logs/db-backup.log
Daemon: com.john.db-backup (LaunchAgent) runs at 03:00 daily
Databases: 33 SQLite DBs (flywheel, mission-control, knowledge, hivemind, leads, etc.)

BookStack Auto-Sync:

Daemon: com.john.bookstack-sync (LaunchAgent, runs every 5 min)
Rate limiting: Max 10 API calls per run
Lock file: /tmp/bookstack-sync.lock (prevents concurrent runs)
Last sync tracking: ~/system/services/bookstack/.last-sync
Logging: ~/system/logs/bookstack-sync.log
Map: ~/system/config/bookstack-sync-map.json
State: ~/system/config/bookstack-sync-state.json
API: https://docs.alai.no (local fallback: http://localhost:6875, via vault.js)
Created: 2026-02-17 — Auto-syncs ~/system/ docs to BookStack on file changes

BookStack Staleness Monitor:

Daemon: com.john.bookstack-staleness (LaunchAgent, Sunday 22:00)
Thresholds: Current (<30d), Needs Review (30-90d), Outdated (>90d)
Tagging: Applies "staleness" tag to stale pages via API
Reporting: Weekly Slack report to #general
Logging: ~/system/logs/bookstack-staleness-launchd.log
Created: 2026-02-17 — Task #1272 BookStack Activation

BookStack Webhook Relay:

Daemon: com.john.bookstack-webhook-relay (LaunchAgent, auto-start)
Port: localhost:3077/webhook (internal relay, not user-facing)
Function: Receives BookStack webhook POST → formats message → posts to Slack #all-alai
Events: page_create, page_update, page_delete, chapter/book/shelf events
Logging: ~/system/logs/bookstack-webhook.log
Setup: Configure webhook in BookStack UI → Settings → Webhooks → Add webhook with endpoint localhost:3077/webhook
Created: 2026-02-17 — Task #1272 BookStack Activation

API Utilities

Tool	Command	Description
api-fallback.js	`require('./api-fallback')`	Tiered API fallback + caching. `fetchWithFallback(key, tiers, opts)` tries each tier, caches result.
api-fallback.js	`node ~/system/tools/api-fallback.js cache-stats`	Show cache stats
api-fallback.js	`node ~/system/tools/api-fallback.js cache-clear`	Clear API cache

Cache: ~/system/cache/api-fallback/ (file-based, per-key, TTL-aware)

Usage Tracking

Tool	Command	Description
usage-tracker.js	`node ~/system/tools/usage-tracker.js log <agent> <model> <in> <out>`	Log AI call usage (auto-hooked in agent-runner.js + council-briefing.js)
usage-tracker.js	`node ~/system/tools/usage-tracker.js stats`	Usage summary (today, month, all-time)
usage-tracker.js	`node ~/system/tools/usage-tracker.js stats --agent <name>`	Per-agent breakdown
usage-tracker.js	`node ~/system/tools/usage-tracker.js stats --month`	Daily breakdown this month
usage-tracker.js	`node ~/system/tools/usage-tracker.js top`	Top agents by cost
usage-tracker.js	`node ~/system/tools/usage-tracker.js recent [limit]`	Recent calls

DB: ~/system/db/usage.db (SQLite). Auto-logged from agent-runner.js (Ollama) and council-briefing.js.

Session Tracking

Tool	Command	Description
session-ledger.sh	Auto (Stop/PreCompact hook)	Deterministic session extraction (files, commands, topics, errors, git)
session-search.sh	`bash ~/system/tools/session-search.sh topic\|file\|task\|keyword\|errors\|recent`	Search sessions
daily-consolidate.sh	`bash ~/system/tools/daily-consolidate.sh [YYYY-MM-DD]`	Consolidate day's sessions into daily log
weekly-digest.sh	`bash ~/system/tools/weekly-digest.sh [YYYY-MM-DD]`	Generate weekly summary

Session files: ~/system/memory/sessions/YYYY-MM-DD-HHMM-sessionid.md

Memory

Tool	Command	Description
hivemind.js	`node ~/system/agents/hivemind/hivemind.js read [agent] [limit]`	Read shared intelligence (replaces memory-lookup.js)
hivemind.js	`node ~/system/agents/hivemind/hivemind.js post <agent> <type> <msg>`	Post intel
hivemind.js	`node ~/system/agents/hivemind/hivemind.js query <search>`	Search intel
hivemind.js	`node ~/system/agents/hivemind/hivemind.js memo save\|get\|search\|list`	Key-value memory store
facts.js	`node ~/system/tools/facts.js save\|get\|list\|correct\|history\|display\|search\|seed`	Long-running critical facts — SQLite event-sourced memory that survives context compression. Boot-injected.
facts.js display	`node ~/system/tools/facts.js display`	Compact boot output of all critical facts
facts.js seed	`node ~/system/tools/facts.js seed [--force]`	Populate/reset initial seed data
memory-indexer.py	`python ~/system/tools/memory-indexer.py`	Index memory for search

Communication

Tool	Command	Description
slack.js	`node ~/system/tools/slack.js send <channel> "msg"`	Send plain text message to Slack channel
slack.js	`node ~/system/tools/slack.js sendBlocks <channel> <blocksFile> [fallback]`	Send Block Kit formatted message (blocks from JSON file)
slack.js	`node ~/system/tools/slack.js read <channel> [limit]`	Read recent messages from channel
slack.js	`node ~/system/tools/slack.js channels`	List all Slack channels
slack.js	`node ~/system/tools/slack.js create-channel <name>`	Create new channel
slack.js	`node ~/system/tools/slack.js unread`	Check unread messages
slack.js	`node ~/system/tools/slack.js users`	List workspace users
slack.js	`node ~/system/tools/slack.js status`	Check Slack connection
slack-blocks.js	`node ~/system/tools/slack-blocks.js test [channel]`	Slack Block Kit formatting library — test command sends sample to channel
slack-blocks.js	`require('./slack-blocks')`	Module API: builder(), tenderAlert(), tenderDigest(), emailBriefing(), emailEscalation(), weeklyPipeline(), pipelineEvent(), opsAlert(), send()
slack-bot.js	`node ~/system/tools/slack-bot.js`	Slack bot daemon — Claude Haiku via CLI (Socket Mode). AI backend: API → CLI → Ollama
slack-bot.js	`node ~/system/tools/slack-bot.js --test`	Test AI backend connection
email-to-task.js	`node ~/system/tools/email-to-task.js --from "x" --subject "y" --message-id "z" --class ACTION [--priority high]`	Auto-create MC tasks from ACTION emails with deduplication
email-to-task.js	`node ~/system/tools/email-to-task.js --status`	Show email classification stats
email-inbox.js	`node ~/system/tools/email-inbox.js status`	SQLite-backed email inbox — per-account stats (john, info, alai)
email-inbox.js	`node ~/system/tools/email-inbox.js pending`	List unanswered ACTION emails
email-inbox.js	`node ~/system/tools/email-inbox.js search "keyword"`	Full-text search in subject/from/sender name
email-inbox.js	`node ~/system/tools/email-inbox.js mark <id> responded\|archived\|read\|ignored`	Update email status
email-inbox.js	`node ~/system/tools/email-inbox.js stale [hours]`	Show emails unanswered > N hours (default 48)
email-inbox.js	`node ~/system/tools/email-inbox.js insert --message-id "x" --account john --from-addr "x" --subject "x" --classification ACTION --priority high`	Insert email into inbox DB

EMAIL PRAVILO: SVE email operacije koriste MCP email tools (custom: email-mcp-bridge.js).

Dva accounta: john@alai.no (account="john"), info@alai.no (account="info")
Server: ~/system/tools/email-mcp-bridge.js (ImapFlow + Nodemailer, wraps our proven stack)
Konfigurisano u ~/.claude/mcp.json mcpServers.email
Credentials: Vaultwarden (vault.js) — vault items "Email - john@alai.no", "Email - info@alai.no"
CLI fallback: ~/system/tools/mail-native.js (za daemons i background agente koji nemaju MCP)
Audit trail: Svaki poslan email se logira u ~/system/databases/email-audit.db via email-audit.js

Slack: alai-talk.slack.com (channels: ops, development, client-support, exec)

Credential Management (Vaultwarden)

Tool	Command	Description
vault.js	`node ~/system/tools/vault.js get <name>`	Get password from Vaultwarden by item name
vault.js	`node ~/system/tools/vault.js get <name> --field <field>`	Get specific field (custom field, username, notes)
vault.js	`node ~/system/tools/vault.js get <name> --json`	Get full item as JSON
vault.js	`node ~/system/tools/vault.js add <name> <user> <pass> [opts]`	Create new vault item (--uri, --notes, --field k=v, --hidden-field k=v)
vault.js	`node ~/system/tools/vault.js list`	List all vault items
vault.js	`node ~/system/tools/vault.js login`	Interactive unlock + cache session (no TTL, /tmp/bw-session)
vault.js	`node ~/system/tools/vault.js migrate`	Migrate 10 config files to vault (one-time)
vault.js	`node ~/system/tools/vault.js sync`	Force sync with Vaultwarden server (clears cache)
vault.js	`node ~/system/tools/vault.js refresh`	Force reload in-memory credential cache
password-share.js	`node ~/system/tools/password-share.js create\|retrieve\|list\|cleanup\|audit`	Secure one-time password sharing with clients
client-vault.js	`node ~/system/tools/client-vault.js init\|add\|list\|get\|rotate\|check-rotation`	Per-client encrypted credential storage

Vault Module API (for other tools):

const vault = require('~/system/tools/vault.js');
const pass = await vault.get('Email - john@alai.no');
const token = await vault.get('Slack Bot', 'token');
const val = await vault.getWithFallback('Slack Bot', 'token', () => jsonFallback());
vault.hasSession(); // boolean, non-throwing

Session: BW_SESSION env → /tmp/bw-session (0600, no TTL). Session key via env var (NOT in ps aux). Cache: First call loads all items (~600ms), subsequent <1ms. Refreshes on sync/add/refresh(). Non-TTY: Daemons get VAULT_LOCKED error (no hang). Graceful retry pattern. Vault items: AWS Console, Microsoft Azure, Vaultwarden Admin, Sentry + 10 migrated services. Note: vault-helper.js DELETED — all consumers now use vault.js directly.

Agent Infrastructure

Tool	Command	Description
agent-reporter.js	`node ~/system/tools/agent-reporter.js --task <id> --agent <name> --status <status> --summary <text>`	Structured agent output — validates against schema, stores in mission-control.db, emits events, posts to HiveMind
agent-reporter.js	`node ~/system/tools/agent-reporter.js --help`	Show usage and examples
agent-reporter.js	`node ~/system/tools/agent-reporter.js --task 937 --agent B1 --status completed --summary "..." --deliverables '[...]'`	Full structured report with deliverables, metrics, evidence
schema-validator.py	PostToolUse hook on TaskUpdate	Validates agent output JSON against agent-output-schema.json, logs violations to /tmp/schema-violations.log (warning-only, never blocks)
goal-verifier.js	`node ~/system/tools/goal-verifier.js --task <id>`	Automated goal verification — reads goal-schema.json, runs verification commands, updates statuses, stores in goals.db, emits events
goal-verifier.js	`node ~/system/tools/goal-verifier.js --help`	Show usage, goal types, and operators
goal-verifier.js	`node ~/system/tools/goal-verifier.js --task 937 --verbose`	Run verification with detailed output per goal
goal-verifier.js	`node ~/system/tools/goal-verifier.js --task 937 --dry-run`	Preview what would be verified without running commands
agent-worker.js	`node ~/system/tools/agent-worker.js`	Local-model-first agent worker — polls MC, executes via Ollama tool agent, queues complex tasks for human
agent-worker.js	`node ~/system/tools/agent-worker.js --once`	Run single cycle then exit
agent-worker.js	`node ~/system/tools/agent-worker.js --dry-run`	Show next task without executing
agent-worker.js	`node ~/system/tools/agent-worker.js --status`	Show worker status, queue stats
agent-worker.js	`node ~/system/tools/agent-worker.js --stop`	Stop daemon gracefully
human-queue.js	`node ~/system/tools/human-queue.js list`	Show all tasks queued for human review
human-queue.js	`node ~/system/tools/human-queue.js claim <id>`	Claim task (remove from queue, resume in MC)
human-queue.js	`node ~/system/tools/human-queue.js stats`	Queue statistics (by priority, reason, age)
human-queue.js	`node ~/system/tools/human-queue.js clear`	Clear entire human queue
human-queue.js	`node ~/system/tools/human-queue.js notify`	Send Slack summary if queue > 0

Agent Output Schema: ~/system/specs/agent-output-schema.json (JSON Schema draft-07) DB Table: mission-control.db.agent_reports (task_id, agent, status, summary, report_json) Event: agent.report emitted to event bus on report submission Created: 2026-02-15 (MC #937 Phase 1)

Goal Schema: ~/system/specs/goal-schema.json (JSON Schema draft-07) DB: ~/system/databases/goals.db (goals, goal_history tables) Verification: verification-gate.py enforces goal verification for H/M priority tasks (if goal-schema.json present) Events: goal.verified, goal.failed emitted to event bus Created: 2026-02-15 (MC #937 Phase 4)

Subagents (~/.claude/agents/)

Agent	Role	Description
builder.md	Build	Implements ONE task using GOTCHA, self-validates, reports via agent-reporter.js or TaskUpdate
validator.md	Verify	Read-only GOTCHA compliance check + acceptance criteria, reports via agent-reporter.js

Local AI (Ollama on Mac Studio M3 Ultra)

2 Tools — Executor + Orchestrator

Tool	Command	Description
agent-runner.js	`node ~/system/tools/agent-runner.js <agent> --task "X"`	Executor — sends ONE task to Ollama with agent identity + state
agent-runner.js	`node ~/system/tools/agent-runner.js list`	List all agents with status
agent-scheduler.js	`node ~/system/kernel/agent-scheduler.js spawn <agent> <task>`	Orchestrator — forks agent-runner.js as child processes for parallel execution
team-coordinator.js	`node ~/system/kernel/team-coordinator.js assign\|execute\|status\|message\|sync`	Team Orchestrator — multi-team coordination (Backend/Frontend/DevOps/QA) with cross-team messaging

Relationship: agent-scheduler.js spawns agent-runner.js. Runner = single agent. Scheduler = multi-agent. team-coordinator.js uses scheduler for team execution. What agents do: Generate text responses via Ollama. They don't execute anything. State: ~/system/agents/state/*.json (persists between runs) Identities: ~/system/agents/identities/*.md (15 agents)

Offline Mode: When Claude API hits usage limits, switch to local Ollama models. Auto-routes tasks to best model (qwen-coder for code, 70b for reasoning, 8b for trivial). All outputs saved to ~/system/offline-queue/ with NEEDS_REVIEW status. Claude reviews when back online. Capability matrix built in — knows what local models can/can't do. Created 2026-02-12.

Ollama Background Workers (~/system/tools/ollama-workers/)

Tool	Command	Description
run-all.sh	`bash ~/system/tools/ollama-workers/run-all.sh`	Run all background workers (embedding-backfill, session-summarizer, knowledge-scorer)
run-all.sh	`bash ~/system/tools/ollama-workers/run-all.sh --dry-run`	Preview all workers, no writes
run-all.sh	`bash ~/system/tools/ollama-workers/run-all.sh --status`	Check Ollama + Qdrant health
knowledge-scorer.js	`node ~/system/tools/ollama-workers/knowledge-scorer.js run [--limit N] [--offset ID] [--dry-run]`	Score and tag Qdrant 'knowledge' entries: quality_score (1-5) + category via llama3.1:8b. Skips already-scored. Default limit 500/run.
embedding-backfill.js	`node ~/system/tools/ollama-workers/embedding-backfill.js run [--db knowledge\|hivemind\|flywheel\|all] [--limit N] [--dry-run]`	Find rows with NULL embeddings across knowledge.db/hivemind.db/flywheel.db, batch-embed via Ollama bge-m3 (batches of 32), write BLOB back to SQLite, upsert to Qdrant.

Workers: Idempotent (skip already-processed). Safe to run repeatedly. Use --dry-run to preview. Logs to ~/system/logs/ollama-workers/.

Tier Routing (CC Rate Limit Optimization)

Tool	Command	Description
ollama-engine.js	`require('./ollama-engine')`	Centralized Ollama API — generate(), classify(), healthCheck(). Consolidates duplicated Ollama HTTP code from 5+ files.
ollama-engine.js	`node ~/system/tools/ollama-engine.js test`	Run health check + generate test
tier-router.js	`require('./tier-router')`	Central AI Router — classify(caller, task) → {tier, engine, model}. Routes tasks to Ollama (local) or human-queue. NO CC/API.
tier-router.js	`node ~/system/tools/tier-router.js test`	Run routing tests
tier-router.js	`node ~/system/tools/tier-router.js classify <caller> <task>`	Test classification for caller+task
tier-router.js	`node ~/system/tools/tier-router.js stats`	Show routing stats (ollama vs human-queue)
ollama-tool-agent.js	`node ~/system/tools/ollama-tool-agent.js --task "X" --model Y`	Ollama + Tools — multi-turn agent with read-only tools (read_file, glob, grep, list_dir, run_cmd). Replaces CC for explore/validate tasks.
ollama-tool-agent.js	`node ~/system/tools/ollama-tool-agent.js --task "X" --verbose`	Verbose mode (show tool calls)

Tier Routing Architecture:

Tier 1 (Ollama 8b): classify, filter, extract, triage
Tier 2 (Ollama 72b): summarize, draft, analyze, research, review
Tier 2c (Ollama coder:32b): code review, debug, simple fix
Tier 3 (CC Sonnet): multi-file coding, architecture
Tier 4 (CC Opus): interactive sessions only
Config: ~/system/config/tier-routing.json (caller→tier mapping, keywords, fallback)
Integration: agent-worker.js routes tasks through tier-router before execution
Fallback: Ollama failure → auto-escalate to CC
Created: 2026-02-16

Models

Model	Size	Use For
qwen2.5-coder:32b	19GB	Coding, debugging, refactoring
llama3.1:70b	40GB	Research, writing, analysis
llama3.1:8b	5GB	Fast validation, simple queries

Routing & Decision

Tool	Command	Description
route.js	`node ~/system/tools/route.js project <name>`	Lookup project (internal/external)
route.js	`node ~/system/tools/route.js query "<request>"`	Match request to company by routes
route.js	`node ~/system/tools/route.js list`	List all projects and companies
route.js	`node ~/system/tools/route.js add <name> <type>`	Add project to registry
decision.js	`node ~/system/tools/decision.js log <key> <decision> [--by alem] [--tags X] [--task ID] [--rationale "..."] [--evidence "path"] [--supersedes ID]`	Decision audit log — queryable decision trail with rationale, evidence, and supersede chains. Stores in mission-control.db decisions table.
decision.js	`node ~/system/tools/decision.js list [--tags X] [--since DATE] [--by alem] [--limit N]`	List all decisions (optionally filtered by tags, date, or author)
decision.js	`node ~/system/tools/decision.js query "<term>"`	Full-text search across key+decision+rationale
decision.js	`node ~/system/tools/decision.js show <id>`	Show single decision with history chain and supersede references
decision.js	`node ~/system/tools/decision.js history <key>`	All decisions for a specific key (newest first), shows decision evolution
decision.js	`node ~/system/tools/decision.js latest [--limit 10]`	Most recent decisions (default 10) — used in boot display for Alem
decision.js	`node ~/system/tools/decision.js stats`	Decision statistics: count by tag, by decided_by, by month

Database: ~/system/databases/mission-control.db (decisions table)

Registry: ~/system/databases/projects.json

Event Bus

Tool	Command	Description
event-bus.js	`node ~/system/tools/event-bus.js emit <type> <json> [--publisher X]`	SQLite event bus — async emit/subscribe/dispatch. Decouples tools from point-to-point execSync.
event-bus.js	`node ~/system/tools/event-bus.js list [--type X] [--status X] [--limit N]`	List events (supports * wildcard for type)
event-bus.js	`node ~/system/tools/event-bus.js show <id>`	Show event details with payload
event-bus.js	`node ~/system/tools/event-bus.js replay <id>`	Re-process a failed/completed event
event-bus.js	`node ~/system/tools/event-bus.js dead-letter list\|resolve\|replay`	Dead letter queue management
event-bus.js	`node ~/system/tools/event-bus.js stats`	Event bus statistics (counts, last 24h by type)
event-bus.js	`node ~/system/tools/event-bus.js subscriptions list\|register\|seed`	Manage handler subscriptions
event-bus.js	`node ~/system/tools/event-bus.js dispatch [--once] [--interval N]`	Start dispatch loop (default 2s)
event-handlers.js	`require('./event-handlers.js')`	All subscriber handlers — task, lead, invoice, draft, email, job events
durable-runner.js	`node ~/system/tools/durable-runner.js start <name> --steps '["s1","s2"]' [--mc-task <id>]`	Durable workflow execution engine with SQLite persistence. Checkpoint/resume capability. Emits events via outbox table.
durable-runner.js	`node ~/system/tools/durable-runner.js status\|resume\|rollback <workflow-id>`	Workflow status, resume from checkpoint, or rollback to step N
durable-runner.js	`node ~/system/tools/durable-runner.js step-complete <id> <step> [--output '{}']`	Mark step complete with output/files/commits
durable-runner.js (module)	`const { DurableRunner } = require('./durable-runner')`	Module API: createWorkflow(), completeStep(), failStep(), resume(), rollback()
chain-runner.js	`node ~/system/tools/chain-runner.js run <chain> "<input>" [--mc-task <id>] [--durable]`	YAML-defined agent chain orchestrator. DAG-ordered steps, Saga rollback, $INPUT/$ORIGINAL substitution, injection sanitization.
chain-runner.js	`node ~/system/tools/chain-runner.js list`	List all available chains from ~/system/agents/chains/*.yaml
chain-runner.js	`node ~/system/tools/chain-runner.js show <chain>`	Show chain definition with steps, deps, timeouts
chain-runner.js	`node ~/system/tools/chain-runner.js resume <workflow-id>`	Resume a durable chain workflow from checkpoint
chain-runner.js (module)	`const { ChainRunner } = require('./chain-runner')`	Module API: loadChain(), run(), listChains(), showChain(), resolveAgent()

Event Bus Architecture (Transactional Outbox Pattern):

Domain tools (mc.js, sales-pipeline.js, invoice-generator.js, drafts.js, durable-runner.js) write events to outbox table in their own domain DB — same transaction as domain data. Atomic: if domain write succeeds, event is guaranteed.
Daemon tools (email-agent.js, job-hunter-agent.js) use direct bus.emit() — no domain DB, fire-and-forget.
Two daemon pipeline:
1. outbox-processor.js (2s poll): reads outbox tables from durable-runner.db + mission-control.db → emits to event-bus → marks processed. Purges old events (7d+).
2. event-dispatcher.js (2s poll): relays outbox from legacy domain DBs (leads, invoices, drafts, tenders) → dispatches all events.db events to handlers.
Handlers in event-handlers.js process events (Slack, HiveMind, Planka, leads, MC tasks, etc.)
Retry: 3 attempts with backoff (0s → 30s → 2min) → dead letter queue → Slack alert
DB: ~/system/databases/events.db (central store, separate from domain DBs)
Outbox tables: durable-runner.db, mission-control.db, leads.db, invoices.db, drafts.db, tenders.db
Daemons: com.john.outbox-processor (durable-runner + MC), com.john.event-dispatcher (legacy DBs + dispatch)
Event types: task., lead., invoice., draft., workflow., step., email., job., tender., intake., proposal., follow_up., contract.*
Integrated tools: durable-runner.js, mc.js, sales-pipeline.js, invoice-generator.js, drafts.js (outbox), email-agent.js, job-hunter-agent.js (direct emit)

GOTCHA Core

Tool	Command	Description
utils.js	`require('~/system/lib/utils')`	Shared utility library (log, file, path, time, validate)
sales-pipeline.js	`node ~/system/tools/sales-pipeline.js add\|list\|show\|advance\|stats\|forecast\|auto-actions`	Lead CRM — tracks leads from prospect to won/lost. Auto-actions: archive old leads (lost >30d), escalate stale proposals (>14d no activity)
outbound.js	`node ~/system/tools/outbound.js start\|list\|stats`	Cold outreach prospecting — 3-email sequence (Day 1 intro, Day 3 follow-up, Day 7 final). Creates lead (cold_email), drafts intro email (LOW risk), schedules Day 3+7 reminders. Tags leads with outbound-seq.
email-to-contact.js	`node ~/system/tools/email-to-contact.js backfill`	Auto-populate contacts.db from email classifications. Creates contacts, logs interactions, skips spam/own.
email-to-contact.js	`node ~/system/tools/email-to-contact.js stats`	CRM import statistics (auto-imported vs manual, interactions)
contacts.js	`node ~/system/tools/contacts.js add\|list\|show\|search\|update\|log\|tag\|stats`	Central contact database — all partners, clients, brokers, vendors
contacts.js	`node ~/system/tools/contacts.js export-n8n`	Export n8n-monitored emails for Known Contact workflow
contacts.js	`node ~/system/tools/contacts.js import-leads`	Import contacts from leads.db
unified-crm.js	`node ~/system/tools/unified-crm.js pipeline\|client\|search\|dashboard`	READ-ONLY integration layer across 5 databases (contacts, leads, invoices, tickets, MC tasks)
contract-manager.js	`node ~/system/tools/contract-manager.js add\|list\|show\|renew\|terminate\|renewal-check\|status`	Contract lifecycle management — tracks contract status (draft→sent→signed→active→expired→terminated), auto-renewal alerts, MC task creation, Slack notifications. DB: contracts.db. Types: NDA, DPA, contract, SLA, MSA.
contract-manager.js	`node ~/system/tools/contract-manager.js renewal-check [--dry-run]`	Check for contracts expiring within 30 days, create MC renewal tasks (auto-renew only), send Slack alerts to #ops
document-store.js	`node ~/system/tools/document-store.js store <client> <type> <file>`	Document storage & retention system — organizes business documents with retention policies. Standard path: ~/ALAI/clients/{client}/documents/{type}/. Types: contract (10y), nda (5y), invoice (5y), proposal (2y), dpa (10y), agreement (10y), signed (10y). DB: documents.db
document-store.js	`node ~/system/tools/document-store.js list [client] [--type TYPE]`	List documents with optional filters
document-store.js	`node ~/system/tools/document-store.js find <search>`	Search documents by client/filename/notes
document-store.js	`node ~/system/tools/document-store.js retention-check`	Flag documents past retention period (non-destructive)
document-store.js	`node ~/system/tools/document-store.js stats`	Storage statistics by type and client
send-signing-email.js	`node ~/system/tools/send-signing-email.js send\|send-single\|test\|check`	ALAI branded document signing — creates DocuSeal submission + sends ALAI branded email with embedded logo via SMTP. Standard for all contracts/NDAs/DPAs. Always test first with `test` command.
nda-generator.js	`node ~/system/tools/nda-generator.js create <email> --name "Name" --company "Company"`	NDA PDF generator + DocuSeal signing flow — generates ALAI-branded NDA PDF via Puppeteer, uploads to DocuSeal, creates submission, sends ALAI branded signing emails. Flags: --preview (local PDF only), --test (send to post@alai.no), --orgnr, --address, --phone, --project.
fiken.js	`node ~/system/tools/fiken.js status\|companies\|invoices\|contacts\|balances\|dashboard`	Fiken API v2 integration — invoices list/show/sync, contacts list/show/sync, bank balances, CEO dashboard data. Syncs to invoices.db + contacts.db.
invoice-generator.js	`node ~/system/tools/invoice-generator.js create\|list\|show\|pay\|pdf\|send\|remind\|check-overdue\|auto-remind\|dashboard\|stats`	Invoice CRUD with VAT, PDF/HTML generation, MCP email draft creation, auto-reminders (3 levels: friendly/firm/urgent), automatic escalation system (Day 7/14/30+)
invoice-generator.js	`node ~/system/tools/invoice-generator.js auto-remind [--dry-run]`	Automatic invoice reminder escalation — Day 7: friendly (LOW risk draft), Day 14: firm (LOW risk draft + Slack), Day 30+: HIGH MC task + URGENT Slack. Norwegian templates.
support-ticket.js	`node ~/system/tools/support-ticket.js create\|list\|show\|update\|assign\|comment\|stats`	Support ticket system with SLA tracking (P1-P4)
email-to-ticket.js	`node ~/system/tools/email-to-ticket.js --sender "email" --subject "subject" --body "body" --uid uid`	Email → ticket bridge — detects support emails, creates tickets, generates ACK drafts, Slack + HiveMind notifications
ticket-sla-checker.js	`node ~/system/tools/ticket-sla-checker.js`	SLA breach detector — monitors open tickets, escalates to Slack #ops, generates escalation drafts, HiveMind logs
ticket-resolve-notify.js	`node ~/system/tools/ticket-resolve-notify.js --ticket-id TKT-12345`	Resolution notifier — generates client resolution email draft, HiveMind log
team-coordinator.js	`node ~/system/tools/team-coordinator.js teams\|assign\|handoff\|block\|unblock\|sync\|status`	Cross-team orchestration
onboard-client.js	`node ~/system/tools/onboard-client.js new\|status\|list\|timeline\|undo`	One-command client onboarding — orchestrates project scaffold, sales pipeline, support, teams, routing, welcome email, pipeline events, HiveMind
expansion-dashboard.js	`node ~/system/tools/expansion-dashboard.js [--compact]`	Aggregate view: companies, pipeline, invoices, support, teams
proposal-gen.js	`node ~/system/tools/proposal-gen.js create\|edit\|pdf\|send\|list\|show\|approve\|reject`	Professional proposal generator — auto-populates from leads, generates PDF, sends via SMTP (3 templates: standard, landing-page, webapp)
pipeline-events.js	`node ~/system/tools/pipeline-events.js check-reminders`	Stage transition event handlers — auto-triggered by sales-pipeline.js on advance/lose, generates drafts (→ drafts.db), creates reminders (~/system/reminders/), logs to HiveMind, sends Slack notifications. Handlers: onQualified, onProposal, onNegotiating, onWon, onActive, onLost
follow-up.js	`node ~/system/tools/follow-up.js check [--auto]`	Follow-up reminder processor — scans ~/system/reminders/ for due reminders, generates language-aware follow-up drafts (NO/EN/BS), 3 escalation levels (day 3/7/14), Slack alert on day 14
follow-up.js	`node ~/system/tools/follow-up.js list`	List all pending follow-up reminders with due dates and escalation levels
follow-up.js	`node ~/system/tools/follow-up.js add <lead_id> <type> <days>`	Manually create follow-up reminder (types: proposal, inquiry)
drafts.js	`node ~/system/tools/drafts.js list\|show\|approve\|reject\|send\|stats`	Draft approval workflow — 3-level risk classification (low/medium/high), content-based pattern matching, smart auto-approval
drafts.js	`node ~/system/tools/drafts.js process-auto [--dry-run]`	Auto-classify and process all pending drafts (LOW→approve+send, MEDIUM→approve+Slack+send, HIGH→manual)
drafts.js	`node ~/system/tools/drafts.js auto-approve [--type type1,type2]`	Auto-approve low-risk drafts (optional type filter)
drafts.js	`node ~/system/tools/drafts.js mark-sent <id> [--message-id mid]`	Mark draft as sent (updates linked invoice status)
drafts.js	`node ~/system/tools/drafts.js import`	Import JSON drafts from ~/system/drafts/
intake-analyzer.js	`node ~/system/tools/intake-analyzer.js detect-lang "text"`	Language detection (NO/EN/BS) via character markers + word frequency
intake-analyzer.js	`node ~/system/tools/intake-analyzer.js analyze "text"`	Request analysis via Ollama — extracts category/scope/urgency, generates 3 pricing options from Vizu pricing.md
intake-analyzer.js (module)	`const { detectLanguage, analyzeInquiry, generateOptions } = require('./intake-analyzer')`	Module API for client intake pipeline

intake-analyzer.js: Language detector (æøå→NO, ćčšžđ→BS, word frequency lists) + request analyzer (Ollama llama3.1:8b JSON extraction) + option generator (reads ~/ALAI/pipeline/Vizu/finance/pricing.md, maps category→packages, generates A/B/C options). Heuristic fallback when Ollama unavailable. Pure Node.js, no dependencies. Created: 2026-02-13 (MC #840).

follow-up.js: Automated follow-up reminder system. Proposal reminders: day 3 (gentle), day 7 (nudge), day 14 (final + Slack). General inquiry: day 5. Language-aware templates (NO/EN/BS) extracted from lead intake analysis. Idempotent processing (marks reminders as processed). Legacy reminder migration: infers missing escalation_level and lang fields from due date and lead notes. Wired into gotcha-health.sh (runs every 15 min). Reminder format: JSON files in ~/system/reminders/ with fields: id, lead_id, type, due_date, escalation_level, created_at, processed, lang. Created: 2026-02-13 (MC #840).

Image Generation

Tool	Command	Description
image-gen.js	`node ~/system/tools/image-gen.js --prompt "desc" --output path.png`	Generate image via Gemini (free) or Together.ai
image-gen.js	`node ~/system/tools/image-gen.js --setup gemini YOUR_KEY`	Save API key to config
image-gen.js	`node ~/system/tools/image-gen.js --prompt "desc" --count 4`	Generate multiple images

Providers: Gemini (default, free, no CC), Together.ai (FLUX, free tier) Keys: ~/system/config/image-gen.json or env vars GEMINI_API_KEY, TOGETHER_API_KEY Get key: https://aistudio.google.com/apikey (2 min, no credit card)

| brand-compositor.js | node ~/system/tools/brand-compositor.js all | Deterministic brand asset generator — resize/composite REAL logo (profile-pic.png) onto social banners, profiles, favicons. No AI generation. | | brand-compositor.js | node ~/system/tools/brand-compositor.js profile\|avatar\|banner-linkedin\|banner-twitter\|og-image\|favicon | Generate specific asset type | | design-engine.js | node ~/system/tools/design-engine.js render <template> --data '{}' --output path.png | Puppeteer-based HTML/CSS template rendering engine — pixel-perfect typography with Inter font, retina quality | | design-engine.js | node ~/system/tools/design-engine.js list | List available templates |

Brand Compositor: Uses sharp (npm) for deterministic resize + composite. Same pixels every time. Source: ~/system/context/branding/alai/social/profile-pic.png. Output: ~/system/context/branding/alai/social/. Options: --source <file>, --output <dir>. Design Engine: Uses Puppeteer (headless Chrome) to render HTML templates with professional typography (kerning, ligatures, OpenType). Templates: linkedin-banner (1584x396), twitter-banner (1500x500), og-image (1200x630), profile-card (400x400), favicon (180x180). Uses {{mustache}} placeholders. Reuses browser for batch rendering. Module export: require('./design-engine'). Options: --data '{"key":"value"}', --output path.png, --scale 2. Created: 2026-02-10

Intel & News Aggregation

Tool	Command	Description
intel-briefing.js	`node ~/system/tools/intel-briefing.js`	Full daily briefing — fetch RSS + HN, summarize via Ollama, deliver to Slack #exec + HiveMind
intel-briefing.js	`node ~/system/tools/intel-briefing.js --preview`	Preview briefing in terminal
intel-briefing.js	`node ~/system/tools/intel-briefing.js --fetch`	Fetch only — list items without summarization
intel-briefing.js	`node ~/system/tools/intel-briefing.js --hours 48`	Custom lookback period (default: 24h)

Sources (7): Anthropic News, Anthropic Engineering, Claude Code Changelog, OpenAI News, TechCrunch AI, Simon Willison, Hacker News API Summarization: Ollama llama3.1:8b (local, $0 cost) Delivery: Slack #exec channel + HiveMind + ~/system/logs/intel-briefing-{date}.md Daemon: com.edita.intel-briefing (daily 7:00 AM) MCP RSS: @missionsquad/mcp-rss added to Edita MCP config for live RSS queries Created: 2026-02-11

Tender Hunting & Public Procurement

Tool	Command	Description
tender-hunter-agent.js	`node ~/system/daemons/tender-hunter-agent.js`	Doffin (Norway) — TED API scanner for Norwegian IT tenders. Analyzes via Ollama, scores company fit (ALAI), stores in tenders.db. NO Puppeteer, NO Finn.no, NO TheHub.
tender-hunter-agent.js	`node ~/system/daemons/tender-hunter-agent.js --briefing`	Generate briefing from tenders.db (HOT/WARM summary)
tender-hunter-agent.js	`node ~/system/daemons/tender-hunter-agent.js --dry-run --verbose`	Test mode with detailed logging
bih-tender-hunter.js	`node ~/system/daemons/bih-tender-hunter.js`	BiH Tender Hunter — TED API (primary) + ejn.gov.ba (secondary) scanner for BiH IT tenders. Analyzes via Ollama, scores company fit (SnowIT), stores in bih-tenders.db.
bih-tender-hunter.js	`node ~/system/daemons/bih-tender-hunter.js --briefing`	Generate briefing from bih-tenders.db
bih-tender-hunter.js	`node ~/system/daemons/bih-tender-hunter.js --pages 5`	Custom page count (default: 3)
bih-tender-hunter.js	`node ~/system/daemons/bih-tender-hunter.js --source ted\|ejn`	Filter by data source (default: all)
bih-tender-hunter.js	`node ~/system/daemons/bih-tender-hunter.js --help`	Show usage and options

Doffin Agent:

Data Source: TED API (buyer-country = "NOR")
Keywords: Norwegian + English IT terms
Scoring: 0-100 (75+ HOT, 55-74 WARM, <55 COLD) — remote, English, tech stack match, framework, team size bonuses; security clearance, on-site, Norwegian-only penalties
DB: ~/system/databases/tenders.db (tenders + outbox tables)
Events: tender.hot, tender.warm → event bus
Delivery: Slack #exec
Daemon: com.john.tender-hunter (30 min interval)
Created: 2026-02-15

BiH Agent:

Data Sources: Tier 1 (TED API buyer-country = "BIH"), Tier 2 (ejn.gov.ba — needs Puppeteer scraper)
Keywords: Bosnian + English IT terms (digitalizacija, e-usluge, softver, etc.)
Scoring: 0-100 (75+ HOT, 55-74 WARM, <55 COLD) — BiH-specific bonuses: digitalizacija (+15), transport/railway sector (+10), BAM currency (+10)
DB: ~/system/databases/bih-tenders.db (tenders + outbox tables with source field: 'ted' or 'ejn')
Events: tender.hot, tender.warm → event bus
Delivery: Email reports (primary) + Slack #exec (fallback)
Daemons: com.snowit.bih-tender-hunter (30 min), com.snowit.bih-tender-briefing (daily 07:30)
Created: 2026-02-16 (MC #1057)

Reporting & Analytics

Tool	Command	Description
auto-report.js	`node ~/system/tools/auto-report.js daily`	Daily brief — revenue, pipeline, tasks, decisions, alerts. Generates email draft in ~/system/drafts/
auto-report.js	`node ~/system/tools/auto-report.js weekly`	Weekly report — revenue summary, pipeline progress, team performance, achievements. Email draft with ALAI branding
auto-report.js	`node ~/system/tools/auto-report.js preview`	Preview report in terminal without generating draft
client-status-update.js	`node ~/system/tools/client-status-update.js generate [--dry-run]`	Weekly client status updates — queries MC for completed tasks per project, matches to client contacts, generates ALAI-branded HTML email drafts (MEDIUM risk). LaunchAgent: Mondays 08:00.
client-status-update.js	`node ~/system/tools/client-status-update.js list`	Show recently generated status update drafts

Auto-Report Features:

Aggregates data from: invoice-generator, sales-pipeline, mc.js, support-ticket, decisions doc
ALAI brand styling (dark #09090b, accent #00E5A0)
Mobile-friendly HTML emails
Text + HTML versions in JSON draft
Daemon config: ~/system/daemons/auto-report-config.json
Recipient: alembasic@gmail.com
Schedule: Daily 7:00 AM, Weekly Monday 8:00 AM

Dashboards

Dashboard	URL	Description
Mission Control	https://mc.alai.no	Task management, sessions, active work
CEO Dashboard	https://mc.alai.no/ceo	Executive metrics — revenue, pipeline, projects, decisions, alerts
Client Portal	https://mc.alai.no/client?token=XXX	Client-facing project status — tasks, tickets, SLA. Token-authenticated.

CEO Dashboard Features:

Revenue Overview: MRR, outstanding invoices, 3-month trend, next due date
Pipeline Funnel: Visual funnel from prospect to won (data from sales-pipeline.js)
Active Projects: Kanban board (active/pending/stalled) from MC tasks
Decisions Pending: GO/NO-GO decisions from ~/system/specs/alem-decisions-2026-02.md
Alerts Panel: Overdue invoices, SLA breaches, stale tasks (>7 days)
Upcoming Timeline: Next 14 days deadlines from MC tasks
Dark theme (ALAI brand: #09090b background, #00E5A0 accent)
Auto-refresh: 60 seconds
Mobile responsive

Client Portal Features:

Token auth: POST /api/client/tokens (local network only) to generate tokens
Summary: active tasks, completed count, open tickets, blocked items
Task list: filtered by client project, shows priority/status
Ticket list: from tickets.db, shows SLA compliance
ALAI dark theme, auto-refresh 60s, mobile responsive
Token management: create/list/revoke via local API

Testing & Verification

Tool	Command	Description
smoke-test.js	`node ~/system/tools/smoke-test.js`	Run all smoke tests (Docker, Slack, daemons, MC, HiveMind)
smoke-test.js	`node ~/system/tools/smoke-test.js report`	Run all + post report to Slack #ops
smoke-test.js	`node ~/system/tools/smoke-test.js slack\|docker\|daemons\|mc\|hivemind`	Test specific suite
smoke-test.js	`node ~/system/tools/smoke-test.js api <url>`	Test specific API endpoint
health-check.js	`node ~/system/tools/health-check.js`	Monitor all services (Docker, HTTP, system, daemons) with human/JSON output
health-check.js	`node ~/system/tools/health-check.js --quick`	HTTP endpoints only (fast check)
health-check.js	`node ~/system/tools/health-check.js --json`	JSON output for programmatic use
daemon-health.js	`node ~/system/tools/daemon-health.js`	Daemon heartbeat monitor — checks all com.john.* LaunchAgents, reports PID/exit/status, detects unloaded plists
daemon-health.js	`node ~/system/tools/daemon-health.js --quick`	Quick status only
daemon-health.js	`node ~/system/tools/daemon-health.js --json`	JSON output for dashboards
auto-fix.js	`node ~/system/tools/auto-fix.js <service> <issue>`	Automated service recovery (restart loop prevention: max 3/hour)
ops-watchdog.js	`node ~/system/daemons/ops-watchdog.js`	Master watchdog daemon — health checks every 120s, auto-recovery via auto-fix.js, Slack alerts, event bus integration. Config: ~/system/config/ops-watchdog.json
cold-start.sh	`bash ~/system/ops/cold-start.sh`	Bring entire system up from fresh boot — 5-layer startup (infra→docker→core→business→workers→enrichment), pre-flight checks, verification
planka-sync.js	`node ~/system/tools/planka-sync.js test\|status\|sync <mc-id>`	MC↔Planka bidirectional sync — auto-moves cards on mc.js start/done/pause/resume
preflight-check.js	`node ~/system/tools/preflight-check.js --task <id>`	Pre-closure quality gate aggregator — checks GOTCHA, HOP Build, evidence, CoVe, validator, HiveMind, syntax before mc.js done
MCP playwright	`mcp__playwright__*` (nativni Claude toolovi)	Browser automation — navigate, click, fill, screenshot

Reports: ~/system/reports/smoke-test-*.json Protocol: Smoke test BEFORE + AFTER infra changes. Playwright for UI. npm test for code.

Deploy Quality Gate

Tool	Command	Description
qa-19.js	`node ~/system/tools/qa-19.js check <task-id>`	PRIMARY quality gate (ZAKON #14). 19-point check in 5 phases. Adapts per task type.
qa-19.js	`node ~/system/tools/qa-19.js list`	Show all 19 checks
quality-gate.js	DELETED 2026-02-26	Superseded by qa-19.js. Do not use.

Checks (19): RAG queried, GOTCHA written, tools checked, context read, build passes, tests pass, no secrets, no debug artifacts, error handling, performance, output matches spec, evidence captured, destination verified, visual check, backup taken, self-review, validator review, quality gate, CEO acceptance. Rule: ZAKON #14 — Run qa-19.js check <task-id> before mc.js done. Minimum 15/19 (M priority) or 17/19 (H priority).

Anti-Hallucination & Drift Detection

Tool	Command	Description
cove.js	`node ~/system/tools/cove.js verify --task-id <id> --claims-file <path>`	Chain-of-Verification — deterministically re-verify session claims using claim-types.json spec. Reads JSONL, executes file/syntax/server/build checks, writes cove-report.json
cove.js	`node ~/system/tools/cove.js report --task-id <id>`	Display CoVe verification report for a task
vcr.js	`node ~/system/tools/vcr.js record --session-id <id> --tool <name> --input <json> --output <text> --duration <ms>`	Record a tool interaction to vcr.db (used by vcr-recorder.py hook)
vcr.js	`node ~/system/tools/vcr.js replay <session-id>`	Replay recorded session — re-executes deterministic tools (Read/Glob/Grep), compares output hashes, flags regressions
vcr.js	`node ~/system/tools/vcr.js list [--days 7]`	List recorded VCR sessions
vcr.js	`node ~/system/tools/vcr.js compare <session1> <session2>`	Diff two sessions — detect behavioral changes between recordings
drift-detector.js	`node ~/system/tools/drift-detector.js snapshot`	Collect today's behavioral metrics from all data sources (claims, email-audit, MC, HiveMind, verification audits)
drift-detector.js	`node ~/system/tools/drift-detector.js analyze`	Analyze recent trends — anomaly detection via rolling 7-day mean ± 2σ
drift-detector.js	`node ~/system/tools/drift-detector.js report [--days 30]`	Human-readable drift report with ASCII table

VCR activation: touch /tmp/vcr-recording to start, rm /tmp/vcr-recording to stop. Hook: vcr-recorder.py (PostToolUse, advisory). Drift daemon: com.john.drift-detector runs daily at 23:55 (snapshot + analyze). Alerts: HiveMind (always) + Slack #john-alerts (MEDIUM+). Rule: ~/system/rules/determinism-spectrum.md — maps all 44 system components to 5-level determinism scale.

Test Quality

Tool	Command	Description
test-auditor.js	`node ~/system/tools/test-auditor.js <project-dir>`	Scan test suite for weak validation — detects "no crash" without rejection, missing stupid-user inputs, unused chaos strings
test-auditor.js	`node ~/system/tools/test-auditor.js <dir> --json`	JSON output for pipeline integration

Detects: (1) Chaos tests with "no crash" but no rejection assertion, (2) Form fields missing stupid-user inputs (numbers in names, letters in phones), (3) CHAOS_STRINGS defined but unused. Exit: 0=clean, 1=findings. Rule: ~/system/rules/testing.md (Mandatory Input Rejection Tests section)

Plan Enforcement

Tool	Command	Description
plan-advance-step.js	`node ~/system/tools/plan-advance-step.js`	Manually advance to next plan step with gate checks (for builder agents)
plan-adherence-report.js	`node ~/system/tools/plan-adherence-report.js <task-id>`	Post-execution adherence report — did agent follow the plan? Shows step execution, violations, summary

Plan Enforcement Architecture:

Hook: ~/.claude/hooks/plan-enforcer.py (PreToolUse) gates Write/Edit/Bash based on current plan step
Plan files: /tmp/plan-{task-id}.json (machine-readable plan), /tmp/plan-state-{task-id}.json (execution state)
Audit log: /tmp/plan-audit-{task-id}.jsonl (every hook decision logged)
Graceful degradation: If no plan file exists, hook warns but allows (not all tasks have plans)
Manual step advance: Builder calls plan-advance-step.js when ready to move forward
Validator check: Validator runs plan-adherence-report.js to verify compliance
Created: 2026-02-13 (MC #845)

Build Mode

Tool	Command	Description
build-mode.js	`node ~/system/tools/build-mode.js start <dir> [--task N] [--concurrency N] [--yolo]`	Activate build mode — bypass process hooks for project dir
build-mode.js	`node ~/system/tools/build-mode.js stop [--status completed\|failed]`	Deactivate build mode
build-mode.js	`node ~/system/tools/build-mode.js status`	Show current build mode state
build-mode.js	`node ~/system/tools/build-mode.js pause\|resume`	Pause/resume build mode
build-mode.js	`node ~/system/tools/build-mode.js sessions [--limit N]`	List build sessions
build-mode.js	`node ~/system/tools/build-mode.js autocoder [--project-dir <dir>] [--yolo]`	Launch AutoCoder agent
build-mode.js	`node ~/system/tools/build-mode.js update-features <total> <passing>`	Update feature progress

Build Mode: Switches from Operations→Build mode. Bypasses GOTCHA checklist, delegation enforcer, agent protocol, verification gate for files WITHIN project dir. Security hooks (forbidden paths, hallucination, bash security) remain active. 8h TTL auto-expire. DB: build_sessions table in mission-control.db. Flag: /tmp/build-mode-active.json. Hook: ~/.claude/hooks/build_mode.py (shared module). AutoCoder: ~/system/services/autocoder/ — autonomous coding agent (Python, Claude Agent SDK). Initializer creates features in SQLite, Coding Agent implements them. Supports parallel mode (--concurrency) and YOLO mode (skip browser tests). Skill: /build <dir> — activates build mode via skill.

Build Pipeline

Tool	Command	Description
build-project.js	`node ~/system/tools/build-project.js prep "Name" "type" "Description"`	Scaffold + CLAUDE.md + onboard + spec + task
build-project.js	`node ~/system/tools/build-project.js deploy "Name"`	Vercel deploy
build-project.js	`node ~/system/tools/build-project.js status "Name"`	Check project state
assert-log.sh	`source ~/system/tools/assert-log.sh`	Structured assertion library for deterministic verification (Phase 1)
gate-pre-claim.sh	`bash ~/system/tools/gate-pre-claim.sh --spec spec.json --workdir /path`	Pre-claim verification gate — file exists, hash changed, forbidden patterns (Phase 2)
gate-pre-claim.sh	`bash ~/system/tools/gate-pre-claim.sh --snapshot --workdir /path`	Snapshot file hashes before build
gate-pre-deploy.sh	`bash ~/system/tools/gate-pre-deploy.sh --project-dir /path`	Pre-deploy verification gate — tests, build, artifacts, TODO check (Phase 4)

Types: landing-page | nextjs-app | api-backend Templates: ~/system/template/types/<type>/CLAUDE.md + spec.md CI/CD: ~/system/template/github-actions/ci.yml (copied by scaffold.sh), ~/system/template/docker-compose.staging.yml Deploy: --platform vercel|railway|fly (auto-detects from type if omitted) Pipeline Gates: Part of Zero-Hallucination Deterministic Build Pipeline

Client Interaction & Design Review

Tool	Command	Description
preview-share.js	`node ~/system/tools/preview-share.js start\|stop\|status\|list`	Client preview sharing — starts local dev server + Cloudflare tunnel for public URL. Auto-detects build output dirs.
design-approval.js	`node ~/system/tools/design-approval.js create\|list\|approve\|reject\|show\|stats`	Design review workflow — tracks design approval from draft→sent→reviewing→approved/rejected→implemented. DB: design-reviews.db
design-board.js	`node ~/system/tools/design-board.js create\|list\|stop\|restart`	Client-facing design review board — ALAI-branded web page with design options, feedback form, approve/reject. Cloudflare tunnel (http2 protocol) for public URL. Health check endpoint. Integrates with design-reviews.db.
client-signoff.js	`node ~/system/tools/client-signoff.js create <project> <email> --type uat\|delivery [--project-type webapp] [--message "X"]`	UAT + delivery approval workflow. Sends email with approval link, client approves/rejects via web UI (https://mc.alai.no/signoff/{token}), pipeline auto-advances. Commands: create, status, approve, reject, checklist, check, list. DB: design-reviews.db

UAT Template: ~/system/template/uat-checklist.md (per project type: webapp, landing-page, api-backend) DB: ~/system/databases/design-reviews.db (reviews + signoffs tables)

File Editing

Tool	Command	Description
smart-edit.js	`node ~/system/tools/smart-edit.js view <file> [start-end]`	Show file lines with line numbers
smart-edit.js	`node ~/system/tools/smart-edit.js replace <file> <start-end> <content>`	Replace line range with new content
smart-edit.js	`node ~/system/tools/smart-edit.js insert <file> <after> <content>`	Insert content after line number
smart-edit.js	`node ~/system/tools/smart-edit.js delete <file> <start-end>`	Delete line range
smart-edit.js	`node ~/system/tools/smart-edit.js append <file> <content>`	Append content to end of file

Why: Line-number based editing is more reliable than str_replace (exact match failures). Inspired by The Harness Problem. Reduces edit fail rate from ~15-20% to ~5%. Backup: Auto-creates .bak before each edit. Use --no-backup to skip. Stdin: Use - as content arg to pipe content via stdin (for multi-line edits). Lines: 1-indexed, inclusive ranges (10-15 = lines 10 through 15). Workflow: view to see lines → replace/insert/delete by line number.

Daemons (LaunchAgents)

Daemon	Interval	Description
com.john.slack-bot	always	Slack bot — Claude Haiku via Socket Mode. AI: API → CLI → Ollama. Needs SLACK_BOT_TOKEN + SLACK_APP_TOKEN
com.john.mc-dashboard	always	Mission Control web dashboard (port 3030) — includes CEO Dashboard at /ceo, DocuSeal webhook at /webhooks/docuseal (auto-advances pipeline on NDA/contract signing)
com.john.mc-session-worker	on session events	Session state extraction
com.john.pipeline-watcher	60 sec	Pipeline event dispatcher + invoice auto-reminder daemon — checks unsigned proposals, triggers invoice escalation (Day 7/14/30+ reminders)
com.john.event-dispatcher	always	Event bus dispatcher daemon — polls events.db every 2s, routes to handlers, retry with backoff, dead letter queue
com.john.outbox-processor	always	Outbox processor daemon — polls durable-runner.db + mission-control.db outbox tables every 2s, emits to event-bus, purges old events (7d+). MC #1760
com.john.ops-watchdog	always	Master watchdog — health checks every 120s, auto-recovery, Slack alerts, event bus. Config: ~/system/config/ops-watchdog.json
com.john.client-status-update	Monday 08:00	Weekly client status update generator — queries MC for completed tasks, generates ALAI-branded email drafts per project
com.john.network-watchdog	60 sec	Network monitoring daemon — ping gateway, DNS resolution check, internet connectivity check. Alert chain: Slack ops → macOS notification → log. 3 consecutive failures trigger alert with 10min cooldown. Tracks uptime stats.
com.john.vault-keeper	always	Vault auto-unlock daemon — auto-unlocks Vaultwarden using macOS Keychain password, session refresh every 15min, circuit breaker, macOS notifications

Ops Documentation: ~/system/ops/ — service catalog, dependency map, 15 runbooks, cold-start script, ops README. Ops Dashboard: https://mc.alai.no/ops (status page), /api/ops/health (JSON), /api/ops/history (events)

Env Vars (both profiles):

enableToolSearch=true — lazy-load MCP tools
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=true — agent teams
DISABLE_AUTOUPDATER=1 — prevent auto-update breaking custom setup
CLAUDE_CODE_DISABLE_AUTO_COMPACT=true — manual compaction control

Boards (Planka — Kanban)

Tool	URL	Description
Planka	https://boards.alai.no	Kanban boards per project (Trello-like)
Planka local	http://localhost:3100	Direct local access (use https://boards.alai.no for sharing)

Admin: john / BasicAS2026! User: alem / Alem2026! Password reset: node ~/system/tools/planka-admin.js reset-password <username> <new-pass> Add user: node ~/system/tools/planka-admin.js add-user <email> <username> <name> <pass> SMTP: Configured (send.one.com:465, john@alai.no) — za notifikacije Docker: ~/system/services/planka/docker-compose.yml Projects: Wizard NUF, Ren Drom, Riad Basic, Drop Fintech, ALAI Internal, BasicAS Operations Hosting: Azure Container Apps (boards.alai.no via Cloudflare DNS)

Setup & Backup

Tool	Command	Description
syslog.sh	`bash ~/system/tools/syslog.sh add "opis"`	System Changelog — logira promjene za oba agenta
syslog.sh	`bash ~/system/tools/syslog.sh today`	Današnje changelog entries
syslog.sh	`bash ~/system/tools/syslog.sh recent [N]`	Zadnjih N entries
setup-backup.sh	`bash ~/system/tools/setup-backup.sh "opis"`	Backup setup files + changelog
sync-to-mini.sh	`bash ~/system/tools/sync-to-mini.sh [--execute]`	Sync GOTCHA to Mac Mini
daemon-manager.js	`node ~/system/daemons/daemon-manager.js list\|start\|stop\|status`	Manage persistent background services
team-cleanup.sh	`bash ~/system/tools/team-cleanup.sh [--force] [--days N]`	Clean stale Agent Teams task/team dirs (default 7d)

Company Management

Tool	Command	Description
company.sh	`~/system/tools/company.sh list\|info\|add`	Company registry management
company-worker.js	`node ~/system/tools/company-worker.js run\|run-all\|status\|list\|dry-run`	Autonomous work loop generator for pipeline companies. Generates MC tasks per company (Securion/Proveo/Proxima), posts to Slack/HiveMind, emits events. Config: `~/system/tools/config/company-worker-config.json`
skill-resolver.js	`node ~/system/tools/skill-resolver.js resolve <skill-name> [--company X]`	Resolve skill path with company override. Priority: `~/companies/COMPANY/skills/SKILL/SKILL.md` (if company set) → `~/.claude/skills/SKILL/SKILL.md` (global fallback). Returns absolute path or exit 1. Performance: ~47ms.
tool-resolver.js	`node ~/system/tools/tool-resolver.js check <tool-name> [--company X]`	Check if tool allowed for company via tools.json config. Modes: whitelist (financial), blacklist (dev), inherit-all (orchestrators). Pattern matching: exact + glob (invoice-*.js). Returns ALLOWED\|DENIED with reason on stderr. Performance: ~49ms.

Skills (Claude Code Slash Commands)

Command	Description
`/plan-with-team`	Creates plan with builder/validator teams
`/build-plan`	Executes approved plan using TaskList
`/code-review`	Systematic GOTCHA code review (security, quality, performance)
`/debugging`	Systematic bug investigation and resolution
`/security-audit`	OWASP Top 10 + config + infra security review
`/design-system`	AI-powered design generator — multi-tool (v0.dev, Google Stitch, Figma Make, Codia AI). Prompt templates per tool. Brief → kickass design + code.
`/figma-design`	Figma WebSocket bridge operations — populate design systems, create screens programmatically
`/build`	Switch to Build Mode — bypass process hooks, launch AutoCoder, track sessions

Workflow: /plan-with-team "task" → plan → approval → /build-plan → execution Build: /build <project_dir> → activate build mode → code freely → stop Design: /design-system "brief" → AI tool selection → optimized prompts → Figma + code Review: /code-review <file> or /security-audit <target> Debug: /debugging "<bug description>"

Vector & Semantic Search

Tool	Command	Description
vector-db.js	`node ~/system/tools/vector-db.js help`	Hybrid Vector DB: SQLite + vector columns for semantic search. Reusable module.
vector-db.js (module)	`const { VectorDB } = require('./vector-db')`	Module API: createCollection(), insert(), search(), hybridSearch(), bulkInsert()
vector-db.js search	`node ~/system/tools/vector-db.js search <db> <collection> <query>`	Semantic search via Ollama nomic-embed-text (768-dim)
vector-db.js hybrid	`node ~/system/tools/vector-db.js hybrid <db> <col> <query> --where "cond"`	SQL filter + vector ranking combined
knowledge-base.js	`node ~/system/tools/knowledge-base.js add <url-or-file> [--tag t]`	KB: drop URL/file → chunk → vector store. Semantic search over all docs.
knowledge-base.js	`node ~/system/tools/knowledge-base.js search <query> [--tag t]`	Semantic search across knowledge base documents
humanizer.js	`echo "text" \| node ~/system/tools/humanizer.js [--deep]`	Remove AI patterns from text. Quick (regex) or deep (Ollama rewrite). Module: require('./humanizer')
hourly-backup.sh	`bash ~/system/tools/hourly-backup.sh [--dry-run\|--list]`	Hourly auto-commit to 'auto-backup' branch across all repos. LaunchAgent: com.john.hourly-backup.
db-backup.sh	`bash ~/system/tools/db-backup.sh [--list\|--restore]`	Daily SQLite backup (14 DBs). sqlite3 .backup, tar.gz, 30-day rotation. LaunchAgent: com.john.db-backup (03:00).
cron-notify.sh	`bash ~/system/tools/cron-notify.sh "job" "OK\|ERROR" "details"`	Post cron results to Slack #ops channel. Used by db-backup, hourly-backup.
memory-indexer.py	`python3 ~/system/tools/memory-indexer.py index\|search\|stats\|test-embed`	Index ~/system/ MD files into knowledge.db (SQLite + Ollama nomic-embed-text, 768-dim, tag='memory-file')

Vector Pattern: Embeddings stored as BLOB (Float32Array) in SQLite. Cosine similarity computed in JS. Model: nomic-embed-text (768-dim, local Ollama). Batch embedding supported (32/batch). Usage tracked via usage-tracker.js. Unified model: ALL embedding tools use nomic-embed-text via Ollama — no model mismatch.

RAG & Knowledge Flywheel

Tool	Command	Description
retrieval-orchestrator.js	`node ~/system/tools/retrieval-orchestrator.js query "text" [--limit N] [--verbose]`	Multi-store retrieval: HiveMind + Knowledge DB + RAG Cache + Sessions → RRF merge
retrieval-orchestrator.js	`node ~/system/tools/retrieval-orchestrator.js stats`	Store statistics (coverage, entry counts)
retrieval-orchestrator.js	`node ~/system/tools/retrieval-orchestrator.js stores`	List available stores and status
session-archiver.js	`node ~/system/tools/session-archiver.js stats`	Session file statistics (count, size, savings)
session-archiver.js	`node ~/system/tools/session-archiver.js archive [--dry-run] [--days 14]`	Strip raw transcripts from old sessions
session-archiver.js	`node ~/system/tools/session-archiver.js index [--limit N]`	Embed session summaries into knowledge DB
session-archiver.js	`node ~/system/tools/session-archiver.js cleanup [--dry-run]`	Archive + index (LaunchAgent runs daily 03:00)
docuseal-monitor.js	`node ~/system/tools/docuseal-monitor.js check`	Poll DocuSeal for new signings → Slack + email + HiveMind + contracts.db
docuseal-monitor.js	`node ~/system/tools/docuseal-monitor.js status`	Show recent DocuSeal submissions with signer status
docuseal-monitor.js	`node ~/system/tools/docuseal-monitor.js history`	All tracked signings from contracts.db
rag-health.js	`node ~/system/tools/rag-health.js`	Full RAG health check: Ollama, Knowledge DB, HiveMind, RAG Cache, Session Archiver, Orchestrator smoke
rag-health.js	`node ~/system/tools/rag-health.js --json`	JSON output (for ops-watchdog integration)
rag-health.js	`node ~/system/tools/rag-health.js --alert`	Exit 1 if any critical check fails (for cron/alerting)
rag-health.js	`node ~/system/tools/rag-health.js --smoke`	Run orchestrator smoke query only
lightrag.js	`node ~/system/tools/lightrag.js query "question" [--mode hybrid\|local\|global\|naive]`	LightRAG REST client — semantic query, document upload, graph exploration, RAG cache sync via configured Azure/Cloud endpoint
lightrag.js	`node ~/system/tools/lightrag.js upload <file-or-dir> [--recursive]`	Upload documents to LightRAG knowledge graph
lightrag.js	`node ~/system/tools/lightrag.js explore [--entity "name"] [--limit N]`	Explore knowledge graph entities and relationships
lightrag.js	`node ~/system/tools/lightrag.js status`	Get LightRAG system status and statistics
lightrag.js	`node ~/system/tools/lightrag.js sync-from-rag`	Import rag-router cache → LightRAG
lightrag.js	`node ~/system/tools/lightrag.js sync-to-rag`	Export LightRAG results → rag-router cache
lightrag-migrate.js	`node ~/system/tools/lightrag-migrate.js start [--source hivemind\|knowledge\|both] [--rate 2] [--limit 1000] [--tier 1] [--type type1,type2] [--tag tag] [--dry-run]`	Daemon: migrate HiveMind + Knowledge DB to LightRAG (HTTP API). Idempotent, rate-limited (default 2 docs/min), resumable with state tracking.
lightrag-migrate.js	`node ~/system/tools/lightrag-migrate.js status`	Show migration progress (source, last_id, total_migrated, failed, rate)
lightrag-migrate.js	`node ~/system/tools/lightrag-migrate.js stop`	Stop running migration daemon (graceful SIGTERM + kill)
lightrag-migrate.js	`node ~/system/tools/lightrag-migrate.js reset`	Clear migration state file (/tmp/lightrag-migration-state.json)
rag-router.js	`node ~/system/tools/rag-router.js query "text"`	RAG intelligence router — embed, cache search, local model dispatch, interaction logging
rag-router.js	`node ~/system/tools/rag-router.js learn "question" "answer"`	Add Q&A pair to RAG cache
rag-router.js	`node ~/system/tools/rag-router.js stats`	Flywheel metrics (cache hit rate, cost savings)
rag-router.js	`node ~/system/tools/rag-router.js test`	Run self-test suite
rag-router.js	`node ~/system/tools/rag-router.js capture <id> "response"`	Capture external response for interaction, auto-index to cache
rag-router.js (module)	`const { RAGRouter } = require('./rag-router')`	Module API: query(), learn(), capture(), stats()
rag-mcp.js	MCP server (stdio)	RAG MCP server — exposes rag_query, rag_learn, rag_stats tools. Config: ~/.claude/mcp.json
MCP rag	`mcp__rag__rag_query`	Route query through RAG cache + local models. Returns response or needs_external flag
MCP rag	`mcp__rag__rag_learn`	Add Q&A pair to RAG cache with source tracking
MCP rag	`mcp__rag__rag_stats`	Flywheel metrics (cache hit rate, cost savings, training queue)
flywheel-extractor.js	`node ~/system/tools/flywheel-extractor.js extract [--output path] [--batch-name "X"]`	Extract external interactions from flywheel.db → JSONL for alaiML training
flywheel-extractor.js	`node ~/system/tools/flywheel-extractor.js stats`	Show training queue size, extraction batches
flywheel-indexer.js	`node ~/system/tools/flywheel-indexer.js index [--batch YYYYMMDD] [--dry-run]`	Sync high-quality external responses back to rag_cache (closes the loop)
flywheel-indexer.js	`node ~/system/tools/flywheel-indexer.js stats`	Show pending/cached/total counts
flywheel-session-extractor.js	`node ~/system/tools/flywheel-session-extractor.js extract [--dry-run] [--limit N]`	Extract Q&A pairs from Claude Code session transcripts → RAG cache
flywheel-session-extractor.js	`node ~/system/tools/flywheel-session-extractor.js stats`	Show extraction metrics (processed/pending sessions, pairs extracted)
flywheel-session-extractor.js	`node ~/system/tools/flywheel-session-extractor.js reprocess <session-id>`	Force re-extract a specific session

RAG Flywheel Architecture:

Cache: Embedding-based semantic cache (0.85 similarity threshold). Hit → instant response
Local: Tier-router dispatch to Ollama models (tier 2: qwen2.5:72b). Hit → fast local response
External: Falls back to Claude Code when cache miss + local unavailable
Session Capture: Q&A pairs from session transcripts auto-extracted every 5min (daemon)
Response Capture: External responses can be captured back via capture() → auto-index to cache
Learning: Every interaction logged to flywheel.db. High-quality Q&A pairs added to cache
DB: ~/system/databases/flywheel.db (interactions + rag_cache tables)
Integration: Uses vector-db.js (embeddings) + tier-router.js (local dispatch)
Cost Savings: Tracks queries answered locally vs externally, cumulative savings
Created: 2026-02-21 (MC #1610)

OSINT Investigation

Tool	Command	Description
investigate.js	`node ~/system/tools/investigate.js investigate --phone X --name Y --email Z --location W`	OSINT person lookup — spawns 4 parallel Claude subagents (phone, social, business, news) + synthesizer. SQLite backend with confidence scoring.
investigate.js	`node ~/system/tools/investigate.js show <id>`	Show investigation findings grouped by category
investigate.js	`node ~/system/tools/investigate.js list`	List all investigations
investigate.js	`node ~/system/tools/investigate.js report <id>`	Full formatted investigation report
investigate.js	`node ~/system/tools/investigate.js save-findings <id> <source> <json>`	Save agent findings (internal — used by orchestrator)
investigate.js	`node ~/system/tools/investigate.js complete <id>`	Mark investigation as complete

Architecture: 4 parallel investigator agents + 1 synthesizer:

Phone Lookup — phone directories, carrier, business listings
Social Media — LinkedIn, Facebook, Instagram, GitHub, Twitter/X
Business Registry — BiH registar, OpenCorporates, Brønnøysund, court records
News & Public — klix.ba, avaz.ba, nrk.no, Google News, academic records
Synthesizer — deduplication, cross-reference, confidence upgrade, profile building

Confidence levels: verified (2+ sources), likely (1 reliable), possible (indirect), unverified (uncertain) Phone parser: Auto-detects BiH (06x→+387) and Norwegian (4x/9x→+47) numbers DB: ~/system/databases/investigations.db Created: 2026-02-21

Databases (~/system/databases/)

Database	Description
investigations.db	OSINT person investigations — use `investigate.js`
leads.db	Sales pipeline / Lead CRM — use `sales-pipeline.js`
invoices.db	Invoice tracking — use `invoice-generator.js`
contracts.db	Contract lifecycle management — use `contract-manager.js`
documents.db	Document storage & retention — use `document-store.js`
tickets.db	Support tickets with SLA — use `support-ticket.js`
teams.db	Cross-team coordination — use `team-coordinator.js`
strategy-tracker.db	Strategic goals
alem-directives.db	Alem's direct orders
projects.db	Project lifecycle (phases, milestones, metrics)
hivemind.db	Agent shared intelligence
facts.db	Critical facts with event-sourced history — use `facts.js`
drafts.db	Email draft approval workflow — use `drafts.js`
events.db	Event bus store — use `event-bus.js`
flywheel.db	RAG flywheel — interactions log + cache. Use `rag-router.js`
projects.json	Routing registry — use `route.js`
company-registry.json	Company information registry

Enforcement Hooks (~/.claude/hooks/)

Hook	Matcher	Description
security-guard.py	`.*` (all tools)	Blocks forbidden paths, dangerous commands, delete protection, business-critical doc enforcement
agent-protocol-enforcer.py	`Task`	CORE PROTOCOL enforcement for subagent spawning
gotcha-enforcer.py	`Write\|Edit\|NotebookEdit\|Bash`	Boot flag + MC active task enforcement
gate-pre-commit.py	`Bash`	Pre-commit validation
hallucination-detector.py	`Write\|Edit`	Phantom tools, phantom paths, wrong ports, phantom require/import detection
teammate-quality-gate.py	`TeammateIdle`	Quality gate for agent teammates — checks TODO/FIXME markers, syntax errors in recent files. Exit 2 = keep working

Global: All hooks apply to ALL agents (parent + subagents) via ~/.claude/settings.json. ZAKON #1: AI bez enforcement-a ne radi. Hooks su deterministički enforcement.

Design & Figma

Tool	Command	Description
figma-extract.js	`node ~/system/tools/figma-extract.js extract-tokens <file-key>`	Extract design tokens (colors, typography, effects) from Figma file
figma-extract.js	`node ~/system/tools/figma-extract.js extract-components <file-key>`	List components with metadata and variants
figma-extract.js	`node ~/system/tools/figma-extract.js frame-to-prompt <file-key> <node>`	Generate implementation prompt from Figma frame
figma-extract.js	`node ~/system/tools/figma-extract.js file-info <file-key>`	File metadata and pages
figma-to-react.js	`node ~/system/tools/figma-to-react.js <file-key> <node-id> --output Login.tsx`	Figma → React + Tailwind — generates production React TSX from Figma frame via REST API. Post-processing: Pass 1 token replacement (figma-token-map.json), Pass 2 component mapping (figma-component-map.json), Pass 3 icon resolution (Lucide). Flag: --no-post-process to skip.
figma-to-react.js	`node ~/system/tools/figma-to-react.js <file-key> <node-id> --component Name`	Custom component name (default: derived from frame name)
figma-to-react.js	`node ~/system/tools/figma-to-react.js <file-key> <node-id>`	Output to stdout (pipe to file or preview)
figma-validate.js	`node ~/system/tools/figma-validate.js compare <file-key> <node-id> <url> --output /tmp/validate/`	Visual validation tool — compare built page vs Figma design via pixel diff. Exit: 0=PASS 1=FAIL 2=ERROR. Enforces ZAKON 0.1
figma-validate.js	`node ~/system/tools/figma-validate.js compare ... --threshold 0.05 --viewport 1920x1080`	Custom threshold (default 0.1=10%) and viewport (default 375x812)
figma-token-sync.js	`node ~/system/tools/figma-token-sync.js <file-key> --output ./tokens/ --format all`	Figma Variables → Design Tokens — extracts Variables API → W3C DTCG JSON + Tailwind theme + CSS custom properties. Supports modes (light/dark).
figma-token-sync.js	`node ~/system/tools/figma-token-sync.js <file-key> --format tailwind --output ./tailwind-tokens.js`	Single format: tailwind, css, w3c, json, or all
figma-token-map.json	`~/system/config/figma-token-map.json`	Hex color → Tailwind token lookup table for figma-to-react.js Pass 1 (token replacement). Source: Bilko tailwind.config.ts
figma-component-map.json	`~/system/config/figma-component-map.json`	Figma component → shadcn/ui mapping + Lucide icon map for figma-to-react.js Pass 2-3 (component mapping, icon resolution)
figma-populate.js	`bun ~/system/tools/figma-populate.js <channel-id>`	Populate Figma with design tokens (colors, typography, spacing, radius, buttons) via WebSocket bridge
v0-generate.js	`node ~/system/tools/v0-generate.js generate "prompt"`	v0.dev Platform API wrapper — prompt → React+Tailwind code. Also generates optimized prompts for manual use.
v0-generate.js	`node ~/system/tools/v0-generate.js generate --brief Name --screen login --industry fintech --primary "#hex"`	Structured brief → optimized prompt
v0-generate.js	`node ~/system/tools/v0-generate.js prompt --brief Name --industry fintech`	Output prompt only (no API call) — for copy-paste into v0.dev or Google Stitch
v0-generate.js	`node ~/system/tools/v0-generate.js setup <api-key>`	Save v0.dev API key
design-to-code.js	`node ~/system/tools/design-to-code.js assemble --stitch-code <html> --assets-dir <dir> --target-page <tsx>`	Assemble Stitch HTML + Figma assets → Next.js TSX. Converts HTML→JSX, inline styles→Tailwind, integrates assets, optional logic preservation.
design-to-code.js	`node ~/system/tools/design-to-code.js assemble ... --preserve-logic`	Extract and keep business logic (useState, handlers) from existing page
MCP figma	`mcp__figma__*` (native Claude tools)	Figma MCP integration — direct Figma access from Claude

Config: ~/system/config/figma.json or FIGMA_TOKEN env var v0 Config: ~/system/config/v0.json or V0_API_KEY env var File key: From Figma URL — figma.com/design/<FILE-KEY>/... Node ID: From Figma URL (select frame, copy link) or use figma-extract.js list-nodes <file-key> Figma bridge: WebSocket on port 3055 (bun). Channel ID from Figma Desktop → Plugins → Claude MCP Plugin. External AI tools: v0.dev ($20/mo), Google Stitch (free: stitch.withgoogle.com), Figma Make (native), Codia AI (Figma plugin) Design output: ~/system/design-output/ Created: 2026-02-12 (figma-extract), 2026-02-13 (figma-populate, v0-generate, /design-system skill), 2026-02-14 (figma-to-react, figma-validate, figma-token-sync)

Browser Form Filling

Tool	Command	Description
form-filler.py	`python ~/system/tools/form-filler.py <url> <fields.json>`	Fill web forms from JSON config — visible browser (Alem sees), CAPTCHA pause, screenshot
form-filler.py	`python ~/system/tools/form-filler.py <url> <fields.json> --headless --submit`	Headless auto-fill + submit
form-filler.py	`python ~/system/tools/form-filler.py <url> <fields.json> --wait-for-captcha --submit`	Fill, pause for CAPTCHA, submit
form-filler.py	`python ~/system/tools/form-filler.py <url> <fields.json> --screenshot /tmp/out.png`	Fill + screenshot
form-filler.py	`python ~/system/tools/form-filler.py <url> <fields.json> --dry-run`	Print fields without browser

Pre-built configs: ~/system/tools/form-configs/

anthropic-startup.json — Anthropic Claude Startup Program ($25K-$100K)
aws-activate.json — AWS Activate Founders ($1K-$100K)
google-cloud-startups.json — Google Cloud for Startups ($2K-$200K)
microsoft-founders-hub.json — Microsoft Founders Hub ($1K-$150K)

JSON format: {"fields": [{"selector": "label=X", "value": "Y", "type": "text|select|checkbox|radio|date|click|file"}], "submit_selector": "button[type='submit']"} Selectors: CSS (input[name='x']), text=, placeholder=, label=, role=, nth=N suffix Requires: Python Playwright (pip install playwright) Created: 2026-02-18

Archived (NE POSTOJE — samo za referencu)

Tool	Status	Note
~~session-save.sh~~	REMOVED (2026-02-07)	Orphaned code, never hooked, conflicts with session-ledger.sh
~~memory-lookup.js~~	REMOVED	Zamijenjeno HiveMind-om
~~memory-search.js~~	REMOVED	Zamijenjeno HiveMind-om
~~mail.js~~	NEVER EXISTED	Haluciniran
~~mail-filter.js~~	NEVER EXISTED	Haluciniran
~~security.js~~	NEVER EXISTED	Haluciniran — pravi enforcement = ~/.claude/hooks/
~~secure-config.js~~	NEVER EXISTED	Haluciniran
~~keychain-helper.js~~	NEVER EXISTED	Haluciniran
~~design-enforcer.js~~	NEVER EXISTED	Haluciniran
~~optimize-images.js~~	NEVER EXISTED	Haluciniran
~~strategy-tracker.js~~	NEVER EXISTED	Haluciniran
~~deploy-strategy-tracker.js~~	NEVER EXISTED	Haluciniran
~~prompt-tester.js~~	NEVER EXISTED	Haluciniran
~~self-improve.js~~	NEVER EXISTED	Haluciniran
~~send-to-edita.js~~	NEVER EXISTED	Haluciniran
~~generate-boot.js~~	NEVER EXISTED	Haluciniran
~~generate-today.js~~	NEVER EXISTED	Haluciniran
~~solution-finder.js~~	NEVER EXISTED	Haluciniran
~~docusign.js~~	NEVER EXISTED	Haluciniran
~~validator.js~~	ARCHIVED (2026-02-06)	Was orphaned — see ~/system/archive/
~~laws-enforcer.js~~	ARCHIVED (2026-02-06)	Was checker-only — see ~/system/archive/
~~email-smtp-imap-mcp~~	DEPRECATED (2026-02-11)	Community MCP server — unreliable, replaced by custom email-mcp-bridge.js
~~mcp-email-server (ai-zerolab)~~	TESTED (2026-02-11)	Python MCP — ClosedResourceError bug, not used

brand-package.js

Purpose: Generate brand package (guidelines, colors, typography) for company factory pipeline
Location: ~/system/tools/brand-package.js
Usage: node ~/system/tools/brand-package.js "ProjectName" --logo /path/to/logo.png [--colors "primary:#hex,secondary:#hex"] [--output /path/]
Dependencies: None (pure Node.js)
Output: Creates brand-guidelines.md, colors.json, typography.json
Features: Extracts colors from PNG logo, supports color overrides, generates complete brand identity
Created: 2026-02-09

Go-Live Runbook

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	{{DATE}}	{{AUTHOR}}	Initial draft

1. Go-Live Overview

What: {{PROJECT_NAME}} v{{VERSION}} production launch When: {{LAUNCH_DATE}} at {{LAUNCH_TIME}} {{TIMEZONE}} Deployment window: {{WINDOW_START}} – {{WINDOW_END}} ({{WINDOW_DURATION}}h window) Go-Live Type: {{TYPE}}

Incident Commander: {{IC}} (primary), {{IC_BACKUP}} (backup) Technical Lead: {{TECH_LEAD}} Communications Lead: {{COMMS_LEAD}} War Room: {{WAR_ROOM_LINK}} Status Page: {{STATUS_PAGE_URL}}

2. Pre-Launch Checklist

T-7 Days: Infrastructure Verification

All production infrastructure provisioned and tested
Load balancer health checks passing for all instances
Auto-scaling groups configured and tested (scale-up + scale-down)
Database replicas in sync and replication lag < {{REPLICATION_LAG}}s
Backup jobs running successfully (last backup verified: {{VERIFY_DATE}})
CDN configured and serving assets correctly
All IAM roles and permissions verified
Infrastructure monitoring dashboards showing green
Estimated cost reviewed and within budget

Owner: {{INFRA_OWNER}} | Due: T-7 days

T-5 Days: DNS Configuration

DNS records created/updated in {{DNS_PROVIDER}}
- {{DOMAIN}} → Load balancer (TTL set to {{LOW_TTL}} for easy rollback)
- api.{{DOMAIN}} → API load balancer
- www.{{DOMAIN}} → Redirect to {{DOMAIN}}
DNS propagation verified (check from multiple regions)
DNS failover routing configured (if applicable)
Old DNS records documented (for rollback reference)

Owner: {{DNS_OWNER}} | Due: T-5 days

T-5 Days: SSL Certificates

TLS certificates provisioned for all domains
- {{DOMAIN}} ✅
- *.{{DOMAIN}} ✅
Certificate expiry > 90 days from go-live date
HTTPS redirect configured (HTTP → HTTPS)
HSTS header configured
SSL Labs test: Grade A or better ({{SSL_TEST_LINK}})

Owner: {{SSL_OWNER}} | Due: T-5 days

T-3 Days: CDN Configuration

CDN distribution pointing to production origin
Cache behaviors configured per specification
Static asset cache headers correct (1yr for fingerprinted assets)
CDN WAF rules enabled and tested
CDN purge command tested and documented
CDN performance verified from target geographies

Owner: {{CDN_OWNER}} | Due: T-3 days

T-3 Days: Database Migration

Final migration scripts reviewed and approved
Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
Rollback/down migration tested
Migration script idempotent (safe to run twice)
Database backup taken immediately before migration window
Data integrity checks script prepared (scripts/verify-migration.sh)

Owner: {{DB_OWNER}} | Due: T-3 days

T-2 Days: Feature Flags

All new features behind feature flags
Feature flags defaulting to OFF in production
Flag rollout plan documented (which flags, in what order, with what criteria)
Kill switch flags configured (disable any feature immediately if needed)

Owner: {{FF_OWNER}} | Due: T-2 days

T-2 Days: Third-Party Integrations

{{INTEGRATION_1}} — live API keys configured in secrets manager
{{INTEGRATION_2}} — live API keys configured in secrets manager
Payment gateway: live mode activated and tested with real card (refunded)
Email service: sending domain authenticated (SPF, DKIM, DMARC)
All integrations tested in production with smoke tests
Webhook URLs updated to production endpoints

Owner: {{INTEGRATION_OWNER}} | Due: T-2 days

T-1 Day: Monitoring & Alerting

All alert rules deployed to production monitoring
Alert routing configured — PagerDuty / on-call active
Dashboards showing production data
Log aggregation capturing production logs
Distributed tracing enabled
Synthetic monitoring configured (uptime checks every 1 min)
Alert test fired and received by on-call

Owner: {{MONITORING_OWNER}} | Due: T-1 day

T-1 Day: Backup Verification

Production backup job running on schedule
Last backup restored to test environment and verified
Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
Point-in-time recovery tested

Owner: {{BACKUP_OWNER}} | Due: T-1 day

T-1 Day: Legal / Compliance Sign-off

Privacy policy published and linked
Terms of service published and linked
Cookie consent banner implemented (if required by jurisdiction)
GDPR data processing inventory updated
Security assessment completed and any findings resolved or accepted
Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}

Owner: {{LEGAL_OWNER}} | Due: T-1 day

T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)

Staging smoke tests passing (last run: {{TIMESTAMP}})
All engineers briefed and available
War room open and all participants joined
Rollback procedure rehearsed mentally
Monitoring dashboards open
Status page updated: "Scheduled maintenance: {{TIME}} - {{END_TIME}}"
Customer support briefed on launch features and potential issues
Deployment script / CI pipeline ready to trigger

3. Launch Day Procedure (Hour by Hour)

H-0: Deployment Start

Time	Action	Owner	Notes
H+0:00	Announce in war room: "Deployment started"	{{IC}}
H+0:00	Take final pre-deploy database backup	{{DB_OWNER}}
H+0:05	Enable maintenance mode (if applicable)	{{DEPLOY_OWNER}}
H+0:10	Trigger production deployment pipeline	{{DEPLOY_OWNER}}	Pipeline: {{PIPELINE_LINK}}
H+0:15	Monitor deployment progress	{{TECH_LEAD}}

H+0:15 → H+0:45: Database Migration Execution

Time	Action	Owner
H+0:15	Confirm deployment artifact ready	{{DEPLOY_OWNER}}
H+0:20	Run database migrations: `bash scripts/migrate-prod.sh`	{{DB_OWNER}}
H+0:25	Verify migration completed: `bash scripts/verify-migration.sh`	{{DB_OWNER}}
H+0:30	Confirm new application instances healthy	{{TECH_LEAD}}
H+0:40	Deploy new application version to all instances	{{DEPLOY_OWNER}}

H+0:45 → H+1:00: DNS Cutover

Time	Action	Owner
H+0:45	Point DNS to production load balancer	{{DNS_OWNER}}
H+0:50	Monitor DNS propagation	{{DNS_OWNER}}
H+0:55	Confirm HTTPS working from external network	{{TECH_LEAD}}
H+1:00	Disable maintenance mode	{{DEPLOY_OWNER}}

H+1:00 → H+1:30: Smoke Tests

Time	Action	Owner
H+1:00	Run automated smoke tests: `bash scripts/smoke-tests.sh production`	{{QA_OWNER}}
H+1:10	Manual smoke test — critical user journey 1	{{QA_OWNER}}
H+1:15	Manual smoke test — critical user journey 2	{{QA_OWNER}}
H+1:20	Verify payment processing (test transaction)	{{QA_OWNER}}
H+1:25	Verify email delivery (test email)	{{QA_OWNER}}
H+1:30	All smoke tests PASS → proceed to monitoring	{{IC}}

H+1:30 → H+2:00: Monitoring Verification

Time	Action	Owner
H+1:30	Verify error rate < {{ERROR_THRESHOLD}}%	{{TECH_LEAD}}
H+1:35	Verify P99 latency < {{P99_THRESHOLD}}ms	{{TECH_LEAD}}
H+1:40	Verify no unexpected spikes in DB CPU/connections	{{DB_OWNER}}
H+1:50	Begin enabling feature flags (per rollout plan)	{{FF_OWNER}}
H+2:00	Declare go-live successful	{{IC}}

4. Post-Launch Monitoring (T+1 to T+7)

Enhanced Monitoring Period

Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal

Period	Check Frequency	Responsible
H+0 to H+4	Every 30 min	On-call engineer
H+4 to H+24	Every 60 min	On-call engineer
Day 2-7	Standard monitoring	On-call rotation

Metrics to watch during enhanced monitoring:

Error rate (target: < {{ERROR_THRESHOLD}}%)
P99 latency (target: < {{P99_THRESHOLD}}ms)
DB connection pool utilization (target: < {{DB_POOL}}%)
Cache hit rate (target: > {{CACHE_HIT}}%)
Memory trend (should be stable, not growing)

Support Escalation Procedures

Issue Type	First Contact	Escalation
User-facing errors	Customer support → Engineering	On-call engineer
Performance degradation	On-call engineer	Tech lead + Eng manager
Data issues	On-call engineer	DB owner + Engineering lead
Security concern	Security contact → CISO	Immediate escalation

Performance Baseline Comparison

Compare post-launch metrics to pre-launch staging baseline:

Metric	Staging Baseline	Production Actual	Delta	Status
P95 latency	{{STG_P95}}ms	TBD	TBD	TBD
Error rate	{{STG_ERR}}%	TBD	TBD	TBD
Throughput	{{STG_RPS}} rps	TBD	TBD	TBD

5. Rollback Triggers & Procedure

Rollback Decision Criteria

Automatic rollback triggers:

Smoke tests fail after deployment
Error rate > {{ROLLBACK_ERROR_RATE}}% for {{ROLLBACK_DURATION}} consecutive minutes
Database migration causes data integrity issues

Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):

P99 latency > {{ROLLBACK_P99}}ms sustained for {{ROLLBACK_LATENCY_DURATION}} min
Critical feature broken with no quick fix available
Security vulnerability discovered in new release

Rollback Procedure (Quick Reference)

Announce in war room: "Initiating rollback"
Update status page: "We are investigating an issue and may revert recent changes"
Run: bash scripts/rollback.sh production (or trigger CI pipeline rollback)
Monitor health checks — confirm previous version healthy
If DB migration included: run down migration bash scripts/migrate-down.sh production
Verify all smoke tests pass on previous version
Update status page: "Issue resolved, system restored"
Notify stakeholders

Full rollback procedure: See rollback-plan.md

6. Communication Plan

Pre-Launch Communications

Audience	Channel	When	Message
Internal team	Slack #launches	T-3 days	Launch schedule and plan
Customer support	Briefing doc + Slack	T-2 days	Features, FAQ, escalation path
Existing users	Email / in-app banner	T-1 day	"Exciting updates coming"
Status page subscribers	Status page	T-4 hours	Scheduled maintenance notification

Launch Day Communications

Audience	Channel	When	Message
Status page	status page	T-0	"Scheduled deployment in progress"
Internal	Slack #launches	At success	"🚀 {{PROJECT}} is live!"
Users	Email / in-app	H+1 after success	Launch announcement
Status page	status page	H+1	"Deployment complete — all systems normal"

7. Stakeholder Notification Timeline

Milestone	Notify	Channel	Owner
Deployment started	Engineering team	Slack war room	{{IC}}
Smoke tests pass	Engineering + Product	Slack	{{IC}}
Go-live declared	All stakeholders	Email + Slack	{{COMMS_LEAD}}
Rollback initiated	All stakeholders + Management	Immediate call + Slack	{{IC}}

Approval

Role	Name	Date	Signature
Author
Reviewer
Approver

Operational Runbook

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	{{DATE}}	{{AUTHOR}}	Initial draft

1. Service Overview

Service: {{PROJECT_NAME}} Purpose: {{SERVICE_PURPOSE}} Technology stack: {{STACK}} Architecture reference: Deployment Architecture

Service URLs:

Environment	URL	Health Check
Production	`{{PROD_URL}}`	`{{PROD_URL}}/health`
Staging	`{{STG_URL}}`	`{{STG_URL}}/health`

Key dashboards:

System overview: {{DASHBOARD_LINK}}
Service metrics: {{SERVICE_DASHBOARD_LINK}}
Logs: {{LOG_DASHBOARD_LINK}}

2. Common Operational Tasks

2.1 Service Restart Procedure

When to use: Application unresponsive, hanging workers, suspected deadlock

Steps:

Option A — Rolling restart (no downtime):

# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --force-new-deployment

# Kubernetes
kubectl rollout restart deployment/{{DEPLOYMENT}} -n {{NAMESPACE}}

Option B — Emergency restart (brief downtime, use only if rolling restart fails):

# Stop all instances
{{STOP_COMMAND}}
# Wait for drain
sleep 30
# Start fresh
{{START_COMMAND}}

Verify:

# Check all instances healthy
{{HEALTH_CHECK_COMMAND}}
# Check for errors post-restart
{{LOG_CHECK_COMMAND}}

Expected restart time: {{RESTART_TIME}} minutes Alert expected: Service restart will trigger deployment alert — acknowledge in PagerDuty

2.2 Log Retrieval & Analysis

Centralized logs: {{LOG_URL}}

Quick log retrieval:

# Last 100 error lines
{{LOG_TOOL}} --filter "level=error" --since "1h" --service {{SERVICE}}

# Logs for a specific user
{{LOG_TOOL}} --filter "user_id={{USER_ID}}" --since "24h"

# Logs for a specific request
{{LOG_TOOL}} --filter "request_id={{REQUEST_ID}}"

# Database slow query logs
{{DB_LOG_COMMAND}}

Log format reference: See Monitoring & Observability

2.3 Database Maintenance

Connection count check:

SELECT count(*) as connections, state FROM pg_stat_activity GROUP BY state;

Kill idle connections:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < now() - interval '5 minutes'
  AND pid <> pg_backend_pid();

Running queries (detect long-running):

SELECT pid, duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '1 minute'
  AND state != 'idle';

Vacuum / analyze (if table bloat suspected):

VACUUM ANALYZE {{TABLE_NAME}};

Check replication lag:

SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

2.4 Cache Clearing / Warming

Clear all cache (use with caution — may spike DB load):

{{CACHE_FLUSH_COMMAND}}

Clear specific key pattern:

{{CACHE_DELETE_PATTERN_COMMAND}}

Check cache hit rate:

{{CACHE_STATS_COMMAND}}

Warm cache after clearing:

# Run cache warming script
bash scripts/warm-cache.sh {{ENVIRONMENT}}
# Or trigger warming job
{{WARM_CACHE_JOB_COMMAND}}

Expected DB load spike after cache clear: {{CACHE_CLEAR_IMPACT}} minutes of elevated load

2.5 Certificate Renewal

Automated renewal: Configured via {{CERT_TOOL}} (Let's Encrypt / ACM) Auto-renewal trigger: 30 days before expiry

Manual renewal (if auto-renewal fails):

# Check expiry
echo | openssl s_client -connect {{DOMAIN}}:443 2>/dev/null | openssl x509 -noout -dates

# Manual renewal
{{CERT_RENEW_COMMAND}}

# Verify
{{CERT_VERIFY_COMMAND}}

Verify renewal alert is working:

Alert configured: "Certificate expiring in < 30 days" → {{ALERT_CHANNEL}}
Test certificate: curl -I https://{{DOMAIN}} and check Strict-Transport-Security header

2.6 Scaling Up / Down

Scale up (increase capacity):

# AWS ECS
aws ecs update-service --cluster {{CLUSTER}} --service {{SERVICE}} --desired-count {{COUNT}}

# Kubernetes
kubectl scale deployment/{{DEPLOYMENT}} --replicas={{COUNT}} -n {{NAMESPACE}}

Verify scale-out:

# Check instance count
{{INSTANCE_COUNT_COMMAND}}
# Confirm health
{{HEALTH_CHECK_COMMAND}}

Scale down (reduce capacity — use cautiously):

Do NOT scale below {{MIN_INSTANCES}} instances
Scale down during off-peak hours only ({{OFF_PEAK_HOURS}})
Monitor for 10 minutes after scaling down to confirm stability

3. Troubleshooting Playbooks

3.1 High CPU Usage

Symptoms: CPU alert fires, slow responses, possible OOM

Identify the source:

# Top processes by CPU
{{CPU_TOP_COMMAND}}

Check for: runaway loops, large queries being processed, missing cache causing recalculation
Check for recently deployed code — did CPU spike after a deploy? → Consider rollback
Check queue depth — backed-up job queue causes worker CPU spike
If single instance: restart that instance ({{RESTART_SINGLE_COMMAND}})
If all instances: scale up immediately, then investigate root cause
Escalate if: CPU > {{CPU_ESCALATE}}% for > {{ESCALATE_DURATION}} min after scaling

3.2 Memory Leaks

Symptoms: Slowly increasing memory, eventual OOM kill / restart loop

Check memory trend in monitoring dashboard — linear increase over hours = leak
Identify the leak:
- Enable heap dump: {{HEAP_DUMP_COMMAND}}
- Profile with: {{PROFILER}}
Short-term mitigation: Schedule rolling restarts every {{RESTART_INTERVAL}}h
```
{{SCHEDULED_RESTART_COMMAND}}
```
Create ticket with heap dump attached — requires developer investigation
Escalate if: Restart cycle < {{MIN_RESTART_INTERVAL}}h (memory fills too fast)

3.3 Slow Database Queries

Symptoms: High P99 latency, DB CPU spike, timeouts in logs

Find slow queries:

SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Check for missing indexes: Look for sequential scans on large tables

Check for blocking queries:

SELECT blocking.pid, blocking.query, blocked.pid, blocked.query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));

Kill blocking query if safe:

SELECT pg_cancel_backend({{PID}});
-- If cancel doesn't work:
SELECT pg_terminate_backend({{PID}});

Create ticket — developer must optimize the query

3.4 Service Connectivity Issues

Symptoms: Connectivity errors between services, 502/503 errors

Check health endpoints:
```
curl -I {{SERVICE_URL}}/health
```
Check network security groups / firewall rules — was anything changed recently?
Check service discovery — DNS resolving correctly?
```
nslookup {{SERVICE_INTERNAL_DNS}}
```
Check if service is running:
```
{{SERVICE_STATUS_COMMAND}}
```
Check logs for connection errors:
```
{{CONNECTIVITY_LOG_COMMAND}}
```

3.5 High Error Rates

Symptoms: Error rate alert, user complaints, 5xx in logs

Identify error type: {{LOG_ERROR_COMMAND}} — what errors, what services, what endpoints?
Check if correlated with: recent deployment, external service outage, traffic spike
Check external service status pages:
- {{SERVICE_1}} status: {{STATUS_PAGE_1}}
- {{SERVICE_2}} status: {{STATUS_PAGE_2}}
If recent deployment: Consider rollback if errors affecting > {{ROLLBACK_ERROR_THRESHOLD}}% of requests
If external service down: Check circuit breaker status, enable fallback
Escalate if: Error rate > {{ESCALATE_ERROR_RATE}}% for > {{ESCALATE_DURATION}} min

3.6 Disk Space Issues

Symptoms: Disk space alert, application errors writing files

Check disk usage:

df -h
du -sh /var/log/* | sort -rh | head -10

Quick wins:

# Rotate and compress logs
logrotate -f /etc/logrotate.conf
# Clear old Docker images
docker image prune -a --filter "until=24h"
# Clear /tmp
find /tmp -mtime +7 -delete

If database disk: Check for table bloat, dead tuples, WAL accumulation
```
SELECT pg_size_pretty(pg_database_size('{{DB_NAME}}'));
```
Escalate if: Disk > {{DISK_ESCALATE}}% and cannot free space quickly

4. Health Check Endpoints

Endpoint	Method	Expected Response	What It Checks
`{{BASE_URL}}/health`	GET	HTTP 200 `{"status":"ok"}`	Application running
`{{BASE_URL}}/health/ready`	GET	HTTP 200 `{"status":"ready"}`	App + DB + Cache connected
`{{BASE_URL}}/health/live`	GET	HTTP 200 `{"status":"alive"}`	App process alive
`{{BASE_URL}}/health/db`	GET	HTTP 200 `{"status":"ok","latency_ms":X}`	Database reachable
`{{BASE_URL}}/health/cache`	GET	HTTP 200 `{"status":"ok"}`	Redis reachable

Health check from load balancer: {{HEALTH_CHECK_PATH}} every {{LB_INTERVAL}}s Unhealthy threshold: {{UNHEALTHY_COUNT}} consecutive failures

5. Alert Response Procedures

Alert	Immediate Action	Runbook Section
`HighErrorRate`	Check logs, identify error type, assess scope	3.5 High Error Rates
`SlowP99`	Check DB slow queries, recent deploys	3.3 Slow DB Queries
`ServiceDown`	Restart service, check logs	2.1 Service Restart
`HighCPU`	Scale up, identify source	3.1 High CPU
`DiskAlmostFull`	Clear logs/tmp, escalate if > 90%	3.6 Disk Space
`DBReplicationLag`	Check replication, network, disk on replica	DB section
`CertificateExpiring`	Trigger manual renewal	2.5 Certificate Renewal

6. Escalation Matrix

Situation	First Contact	Escalation	Ultimate Escalation
Service down	On-call engineer	Tech lead	Engineering manager
Data loss / corruption	On-call + Tech lead	CTO	CTO
Security incident	Security contact	CISO	CEO
Payment system down	On-call + Payment owner	Stripe/payment provider support	Engineering manager

Emergency contacts:

Role	Name	Phone	Slack
On-call (primary)	{{PRIMARY}}	{{PHONE}}	{{SLACK}}
On-call (backup)	{{BACKUP}}	{{PHONE}}	{{SLACK}}
Tech Lead	{{TECH_LEAD}}	{{PHONE}}	{{SLACK}}
Engineering Manager	{{ENG_MGR}}	{{PHONE}}	{{SLACK}}

7. On-Call Handoff Procedure

Handoff cadence: {{HANDOFF_CADENCE}} Handoff time: {{HANDOFF_TIME}}

Outgoing on-call must document:

Any open incidents or ongoing issues
Any monitoring anomalies (elevated error rates, slow queries not yet resolved)
Any upcoming events that may affect the system (marketing campaigns, scheduled maintenance)
Any temporary mitigations in place that need permanent fixes
Context on any unusual alerts that fired and were noise

Handoff document template: {{HANDOFF_TEMPLATE_LINK}}

8. Maintenance Window Procedure

Maintenance window schedule: {{MAINTENANCE_WINDOW}} (lowest traffic period)

Pre-maintenance:

Announce in Slack #ops: "Maintenance window {{DATE}} {{TIME}}-{{END_TIME}}"
Update status page: "Scheduled maintenance" with details
Notify impacted customers if downtime expected > {{DOWNTIME_NOTIFY_THRESHOLD}} minutes
Confirm rollback plan is ready

During maintenance:

Enable maintenance mode (if applicable): {{MAINTENANCE_MODE_CMD}}
Execute maintenance tasks per the specific runbook for the task
Run smoke tests after each major step
Document every action taken with timestamps

Post-maintenance:

Disable maintenance mode: {{DISABLE_MAINTENANCE_CMD}}
Run full smoke test suite
Monitor for 30 minutes
Update status page: "Maintenance complete, all systems normal"
Post-maintenance report in #ops Slack channel

Approval

Role	Name	Date	Signature
Author
Reviewer
Approver

Incident Report

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	{{DATE}}	{{AUTHOR}}	Initial draft

1. Incident Metadata

Field	Value
Incident ID	INC-{{YYYY}}-{{SEQ}}
Severity	P{{SEVERITY}}
Status	{{STATUS}}
Incident Commander	{{IC}}
Technical Lead	{{TECH_LEAD}}
Communications Lead	{{COMMS_LEAD}}
Declared at	{{START_TIME}} {{TIMEZONE}}
Resolved at	{{END_TIME}} {{TIMEZONE}}
Total duration	{{DURATION}}
Affected service(s)	{{SERVICES}}
Environment	Production / Staging

2. Executive Summary

Example: "On {{DATE}}, a database connection pool exhaustion caused the {{SERVICE}} API to return 503 errors for approximately 47 minutes, affecting {{AFFECTED_COUNT}} users and resulting in an estimated {{REVENUE_IMPACT}} in lost transactions. The root cause was a code change in the v{{VERSION}} deployment that introduced N+1 queries under high load."

3. Detection

Detected by: {{DETECTION_METHOD}} Detected at: {{DETECTION_TIME}} Lag from start to detection: {{DETECTION_LAG}} minutes Detecting system: {{DETECTING_SYSTEM}}

Alerting effectiveness:

Alert fired within the expected window (< {{ALERT_SLA}} minutes)
Alert delivered to on-call without delay
Alert contained sufficient context to begin investigation

Improvements to detection identified:

{{DETECTION_IMPROVEMENT_1}}

4. Detailed Timeline

Timezone: All times in {{TIMEZONE}}

Time	Event	Actor	Notes
{{TIME}}	{{EVENT_1}}	{{ACTOR}}
{{TIME}}	{{EVENT_2}}	System	Alert ID: {{ALERT_ID}}
{{TIME}}	{{EVENT_3}}	{{ENGINEER}}
{{TIME}}	{{EVENT_4}}	{{IC}}
{{TIME}}	{{EVENT_5}}	{{ENGINEER}}
{{TIME}}	{{EVENT_6}}	{{ENGINEER}}
{{TIME}}	{{EVENT_7}}	System
{{TIME}}	{{EVENT_8}}	{{IC}}

5. Impact Assessment

Users Affected

Metric	Value
Total users affected	{{USER_COUNT}}
% of total user base	{{USER_PERCENT}}%
Geography affected	{{GEOGRAPHY}}
User tier affected	{{USER_TIER}}

Services Affected

Service	Impact Type	Severity	Duration
{{SERVICE_1}}	{{IMPACT_TYPE}}	{{SEV}}	{{DURATION}}
{{SERVICE_2}}	{{IMPACT_TYPE}}	{{SEV}}	{{DURATION}}

Data Impact

Type	Assessment
Data loss	{{DATA_LOSS}}
Data corruption	{{DATA_CORRUPTION}}
Data exposure	{{DATA_EXPOSURE}}
Verification method	{{VERIFICATION}}

Financial Impact

Category	Amount	Notes
Lost transactions	${{AMOUNT}}	{{TRANSACTION_COUNT}} failed transactions
SLA credits	${{AMOUNT}}	Per SLA contract
Operational cost	${{AMOUNT}}	Engineering hours to resolve
Total estimated	${{TOTAL}}

SLA Breach Assessment

SLA Metric	Target	Actual	Breach
Uptime	{{UPTIME_SLA}}%	{{ACTUAL_UPTIME}}%	{{BREACH}}
Response time (P99)	< {{P99_SLA}}ms	{{P99_ACTUAL}}ms	{{BREACH}}
MTTR	< {{MTTR_SLA}}	{{MTTR_ACTUAL}}	{{BREACH}}

6. Root Cause Analysis

5 Whys

Why #	Question	Answer
Why 1	Why did users see errors?	{{ANSWER_1}}
Why 2	Why was the API returning 503?	{{ANSWER_2}}
Why 3	Why was the connection pool exhausted?	{{ANSWER_3}}
Why 4	Why was the N+1 query introduced?	{{ANSWER_4}}
Why 5	Why did code review miss it?	{{ANSWER_5}}

Root cause: {{ROOT_CAUSE}}

Contributing Factors

{{FACTOR_1}}
{{FACTOR_2}}
{{FACTOR_3}}

Trigger Event

What triggered this specific incident now: {{TRIGGER}}

7. Resolution Steps

Step	Time	Action	Result
1	{{TIME}}	{{ACTION_1}}	{{RESULT_1}}
2	{{TIME}}	{{ACTION_2}}	{{RESULT_2}}
3	{{TIME}}	{{ACTION_3}}	{{RESULT_3}}

Resolution commands (for runbook):

# {{RESOLUTION_DESCRIPTION}}
{{RESOLUTION_COMMAND}}

8. What Went Well

{{WENT_WELL_1}}
{{WENT_WELL_2}}
{{WENT_WELL_3}}

9. What Went Wrong

{{WENT_WRONG_1}}
{{WENT_WRONG_2}}
{{WENT_WRONG_3}}

10. Action Items

#	Action	Owner	Due Date	Priority	Status
1	{{ACTION_1}}	{{OWNER}}	{{DUE}}	High	Open
2	{{ACTION_2}}	{{OWNER}}	{{DUE}}	High	Open
3	{{ACTION_3}}	{{OWNER}}	{{DUE}}	Medium	Open
4	{{ACTION_4}}	{{OWNER}}	{{DUE}}	High	Open
5	{{ACTION_5}}	{{OWNER}}	{{DUE}}	Low	Open

11. Lessons Learned

{{LESSON_1}}
{{LESSON_2}}
{{LESSON_3}}

Incident ID	Date	Similarity	Resolved
INC-{{ID}}	{{DATE}}	{{DESCRIPTION}}	Yes / No

13. Communication Log

Time	Channel	Message Summary	Audience	Sent By
{{TIME}}	Status page	"Investigating reports of elevated errors"	All users	{{SENDER}}
{{TIME}}	Status page	"Identified root cause, applying fix"	All users	{{SENDER}}
{{TIME}}	Status page	"Incident resolved, all systems normal"	All users	{{SENDER}}
{{TIME}}	Email	Customer notification for SLA breach	Affected customers	{{SENDER}}

Approval

Role	Name	Date	Signature
Author
Reviewer
Approver

Post-Mortem

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	{{DATE}}	{{AUTHOR}}	Initial draft

Blameless Culture Statement

This post-mortem is conducted in a blameless spirit. Our goal is to understand how and why the incident occurred — not to assign fault to individuals. People make the best decisions they can with the information and tools available at the time. When things go wrong, we look for systemic improvements that make the right action easier and the wrong action harder for everyone.

1. Incident Reference & Metadata

Field	Value
Incident ID	INC-{{YYYY}}-{{SEQ}}
Severity	P{{SEVERITY}}
Incident Report	INC-{{YYYY}}-{{SEQ}}
Post-Mortem Facilitator	{{FACILITATOR}}
Post-Mortem Date	{{PM_DATE}}
Attendees	{{ATTENDEES}}
Status	Draft / In Review / Final

2. Executive Summary

Example: "A database index was dropped during a migration on {{DATE}}, causing query performance to degrade by 50× under load. This resulted in a 1h 23min degraded service period affecting {{USERS}} users. We have restored the index, added migration validation tooling, and created safeguards to prevent similar incidents."

3. Impact Summary

Metric	Value
Total duration	{{DURATION}} (detected at {{DETECTED}}, resolved at {{RESOLVED}})
Users affected	{{USER_COUNT}} ({{USER_PERCENT}}% of user base)
Requests affected	{{REQUEST_COUNT}} ({{REQUEST_PERCENT}}% error rate during incident)
Estimated revenue impact	${{REVENUE}}
SLA breach	{{SLA_BREACH}}
SLA credits owed	${{CREDITS}}

4. Detailed Timeline

timeline
    title Incident Timeline
    {{TIME_1}} : {{EVENT_1}}
    {{TIME_2}} : {{EVENT_2}}
    {{TIME_3}} : {{EVENT_3}}
    {{TIME_4}} : {{EVENT_4}}
    {{TIME_5}} : {{EVENT_5}}

Time	Event	MTTD/MTTR Marker
{{T1}}	{{EVENT}}	← Incident start
{{T2}}	{{EVENT}}
{{T3}}	{{EVENT}}	← Detection (MTTD = T3 - T1)
{{T4}}	{{EVENT}}
{{T5}}	{{EVENT}}
{{T6}}	{{EVENT}}
{{T7}}	{{EVENT}}
{{T8}}	{{EVENT}}	← Resolved (MTTR = T8 - T1)

MTTD (Mean Time to Detect): {{MTTD}} minutes MTTR (Mean Time to Resolve): {{MTTR}} minutes

5. Root Cause Analysis

5.1 5 Whys Analysis

Why #	Question	Answer
Why 1	Why did users experience {{SYMPTOM}}?	{{WHY_1}}
Why 2	Why did {{WHY_1_ANSWER}} happen?	{{WHY_2}}
Why 3	Why did {{WHY_2_ANSWER}} happen?	{{WHY_3}}
Why 4	Why did {{WHY_3_ANSWER}} happen?	{{WHY_4}}
Why 5	Why did {{WHY_4_ANSWER}} happen?	{{WHY_5}}

Root cause: {{ROOT_CAUSE}}

5.2 Contributing Factors

Factor	Type	Action Required
{{FACTOR_1}}	Technical / Process / Human	Yes / No
{{FACTOR_2}}	Technical / Process / Human	Yes / No
{{FACTOR_3}}	Technical / Process / Human	Yes / No

5.3 Trigger Event

The specific trigger for this incident: {{TRIGGER}}

6. What Went Well

{{CATEGORY_1}}: {{DESCRIPTION}}
{{CATEGORY_2}}: {{DESCRIPTION}}
{{CATEGORY_3}}: {{DESCRIPTION}}

7. What Went Wrong

{{CATEGORY_1}}: {{DESCRIPTION}}
{{CATEGORY_2}}: {{DESCRIPTION}}
{{CATEGORY_3}}: {{DESCRIPTION}}

8. Where We Got Lucky

{{LUCKY_1}}
{{LUCKY_2}}
{{LUCKY_3}}

9. Action Items

Short-Term Fixes (This Sprint)

#	Action	Owner	Due	Priority	Ticket
1	{{SHORT_TERM_1}}	{{OWNER}}	{{DATE}}	Critical	{{TICKET}}
2	{{SHORT_TERM_2}}	{{OWNER}}	{{DATE}}	High	{{TICKET}}
3	{{SHORT_TERM_3}}	{{OWNER}}	{{DATE}}	Medium	{{TICKET}}

Long-Term Improvements (Next Quarter)

#	Action	Owner	Due	Priority	Ticket
1	{{LONG_TERM_1}}	{{OWNER}}	{{DATE}}	High	{{TICKET}}
2	{{LONG_TERM_2}}	{{OWNER}}	{{DATE}}	Medium	{{TICKET}}

Process Changes

#	Change	Owner	Implementation Date
1	{{PROCESS_1}}	{{OWNER}}	{{DATE}}
2	{{PROCESS_2}}	{{OWNER}}	{{DATE}}

10. Follow-Up Tracking

Follow-up review date: {{FOLLOWUP_DATE}} (4 weeks after incident) Follow-up owner: {{FOLLOWUP_OWNER}}

Action Item	Expected Completion	Verified Complete	Effective
{{ACTION_1}}	{{DATE}}	Yes / No	Yes / No / TBD
{{ACTION_2}}	{{DATE}}

11. Recurrence Prevention

Before this incident: {{BEFORE_STATE}}

After implementing action items: {{AFTER_STATE}}

Confidence in prevention: {{CONFIDENCE}} / 10 Residual risk: {{RESIDUAL_RISK}}

12. Review & Sign-Off

Post-mortem presented at: {{MEETING}} on {{MEETING_DATE}} Meeting recording: {{RECORDING_LINK}} Meeting notes: {{NOTES_LINK}}

Approval

Role	Name	Date	Signature
Author
Reviewer
Approver

SLA Report

Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	{{DATE}}	{{AUTHOR}}	Initial draft

1. Reporting Period

Field	Value
Period	{{MONTH}} {{YEAR}}
From	{{START_DATE}} 00:00:00 UTC
To	{{END_DATE}} 23:59:59 UTC
Report Generated	{{REPORT_DATE}}
Generated By	{{AUTHOR}}

2. SLA Summary Table

Metric	SLA Target	Actual	Status
Availability (uptime)	≥ {{AVAIL_SLA}}%	{{AVAIL_ACTUAL}}%	✅ Pass / ❌ Breach
P95 Response Time	≤ {{P95_SLA}}ms	{{P95_ACTUAL}}ms	✅ Pass / ❌ Breach
P99 Response Time	≤ {{P99_SLA}}ms	{{P99_ACTUAL}}ms	✅ Pass / ❌ Breach
Error Rate	≤ {{ERR_SLA}}%	{{ERR_ACTUAL}}%	✅ Pass / ❌ Breach
MTTR (P1 incidents)	≤ {{MTTR_SLA}}	{{MTTR_ACTUAL}}	✅ Pass / ❌ Breach
MTTD (alert detection)	≤ {{MTTD_SLA}}	{{MTTD_ACTUAL}}	✅ Pass / ❌ Breach
Scheduled maintenance	≤ {{MAINT_SLA}}h/mo	{{MAINT_ACTUAL}}h	✅ Pass / ❌ Breach

Overall SLA compliance this period: {{OVERALL_STATUS}}

3. Availability Report

3.1 Uptime Percentage

Service	Total Minutes	Downtime Minutes	Uptime Minutes	Uptime %
{{SERVICE_1}}	{{TOTAL_MIN}}	{{DOWN_MIN}}	{{UP_MIN}}	{{UP_PCT}}%
{{SERVICE_2}}	{{TOTAL_MIN}}	{{DOWN_MIN}}	{{UP_MIN}}	{{UP_PCT}}%
Aggregate				{{AGG_UPTIME}}%

Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.

3.2 Downtime Incidents

Incident ID	Start	End	Duration	Service	Cause	SLA Counted
INC-{{ID}}	{{START}}	{{END}}	{{DURATION}}min	{{SERVICE}}	{{CAUSE}}	Yes / Excluded

Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes

3.3 Maintenance Windows

Date	Duration	Service Affected	Pre-announced	Purpose
{{DATE}}	{{DURATION}}min	{{SERVICE}}	Yes ({{DAYS}} days advance notice)	{{PURPOSE}}

4. Performance Report

4.1 Response Time

Service / Endpoint	P50	P90	P95	P99	Max	SLA (P95)	Status
Overall	{{P50}}ms	{{P90}}ms	{{P95}}ms	{{P99}}ms	{{MAX}}ms	{{SLA}}ms	✅ / ❌
`GET /`	{{P50}}ms	{{P90}}ms	{{P95}}ms	{{P99}}ms	{{MAX}}ms	{{SLA}}ms	✅ / ❌
`POST /api/{{RESOURCE}}`	{{P50}}ms	{{P90}}ms	{{P95}}ms	{{P99}}ms	{{MAX}}ms	{{SLA}}ms	✅ / ❌

4.2 Throughput

Service	Avg Requests/sec	Peak Requests/sec	Peak Time
{{SERVICE_1}}	{{AVG_RPS}}	{{PEAK_RPS}}	{{PEAK_TIME}}

Total requests served this period: {{TOTAL_REQUESTS}}

4.3 Error Rate

Service	Total Requests	4xx Errors	5xx Errors	Error Rate	SLA	Status
{{SERVICE_1}}	{{TOTAL}}	{{4XX}}	{{5XX}}	{{ERR_RATE}}%	≤ {{ERR_SLA}}%	✅ / ❌

5. Incident Summary

5.1 Incidents by Severity

Severity	Count	Total Duration	Avg MTTR
P1 (Critical)	{{P1_COUNT}}	{{P1_DURATION}}	{{P1_MTTR}}
P2 (High)	{{P2_COUNT}}	{{P2_DURATION}}	{{P2_MTTR}}
P3 (Medium)	{{P3_COUNT}}	{{P3_DURATION}}	{{P3_MTTR}}
P4 (Low)	{{P4_COUNT}}	{{P4_DURATION}}	{{P4_MTTR}}
Total	{{TOTAL_COUNT}}	{{TOTAL_DURATION}}	{{AVG_MTTR}}

5.2 MTTR (Mean Time to Resolve)

Severity	SLA Target	This Period	Last Period	Trend
P1	≤ {{P1_MTTR_SLA}}	{{P1_MTTR_ACT}}	{{P1_MTTR_PREV}}	↑ / ↓ / →
P2	≤ {{P2_MTTR_SLA}}	{{P2_MTTR_ACT}}	{{P2_MTTR_PREV}}	↑ / ↓ / →

5.3 MTTD (Mean Time to Detect)

Period	MTTD	vs SLA	Trend
This period	{{MTTD_ACT}}	{{MTTD_STATUS}}	↑ / ↓ / →
Last period	{{MTTD_PREV}}

6. SLA Breach Analysis

Breach Details

Breach #	Metric	SLA	Actual	Duration	Customers Affected
1	{{METRIC}}	{{SLA_TARGET}}	{{ACTUAL}}	{{BREACH_DURATION}}	{{CUSTOMERS}}

Root Cause

Remediation

Contractual Obligations

Customer	Contract Reference	Credit Due	Notification Required	Notification Sent
{{CUSTOMER}}	{{CONTRACT_REF}}	${{CREDIT}}	Yes	{{DATE}}

No SLA breaches this period. All commitments met.

7. Trend Analysis

Availability Trend (Last 6 Months)

Month	Uptime %	vs Target	Incidents
{{MONTH_6}}	{{PCT}}%	{{STATUS}}	{{COUNT}}
{{MONTH_5}}	{{PCT}}%	{{STATUS}}	{{COUNT}}
{{MONTH_4}}	{{PCT}}%	{{STATUS}}	{{COUNT}}
{{MONTH_3}}	{{PCT}}%	{{STATUS}}	{{COUNT}}
{{MONTH_2}}	{{PCT}}%	{{STATUS}}	{{COUNT}}
{{MONTH_1}} (This period)	{{PCT}}%	{{STATUS}}	{{COUNT}}

P95 Latency Trend (Last 6 Months)

Month	P95 (ms)	vs SLA
{{MONTH_6}}	{{P95}}ms	✅ / ❌
{{MONTH_5}}	{{P95}}ms	✅ / ❌
{{MONTH_4}}	{{P95}}ms	✅ / ❌
{{MONTH_3}}	{{P95}}ms	✅ / ❌
{{MONTH_2}}	{{P95}}ms	✅ / ❌
{{MONTH_1}} (This period)	{{P95}}ms	✅ / ❌

8. Improvement Initiatives

Initiative	Source	Owner	Target Date	Status	Expected Impact
{{INITIATIVE_1}}	Post-mortem INC-{{ID}}	{{OWNER}}	{{DATE}}	{{STATUS}}	+{{IMPACT}}% availability
{{INITIATIVE_2}}	Proactive	{{OWNER}}	{{DATE}}	{{STATUS}}	P99 < {{P99}} ms
{{INITIATIVE_3}}	Customer feedback	{{OWNER}}	{{DATE}}	{{STATUS}}	Reduce MTTR by 30%

9. Customer Communication Summary

Date	Type	Recipients	Subject	Sent By
{{DATE}}	Incident notification	All customers	{{SUBJECT}}	{{SENDER}}
{{DATE}}	SLA credit notice	Affected customers	{{SUBJECT}}	{{SENDER}}
{{DATE}}	Monthly SLA report	Enterprise customers	{{SUBJECT}}	{{SENDER}}

10. Next Period Targets

Metric	This Period	Next Period Target	Rationale
Availability	{{AVAIL_ACT}}%	{{AVAIL_NEXT}}%	{{RATIONALE}}
P95 latency	{{P95_ACT}}ms	{{P95_NEXT}}ms	{{RATIONALE}}
Error rate	{{ERR_ACT}}%	{{ERR_NEXT}}%	{{RATIONALE}}
MTTR (P1)	{{MTTR_ACT}}	{{MTTR_NEXT}}	{{RATIONALE}}

Approval

Role	Name	Date	Signature
Author
Reviewer
Approver

Terminal & Tmux Shortcuts

Brzi pregled prečica za svakodnevni rad u terminalu i tmux-u.

Tmux — Panel Navigacija

Prefix: Ctrl+A (naš custom config)

Prečica	Opis
`Ctrl+A` → `o`	Prebaci na sljedeći panel (kruži redom)
`Ctrl+A` → `←` `→` `↑` `↓`	Prebaci na panel u tom smjeru
`Ctrl+A` → `q` + broj	Pokaže brojeve panela, pritisni broj za skok
`Ctrl+A` → `z`	Zoom (fullscreen) trenutni panel (ponovi za undo)
`Ctrl+A` → `x`	Zatvori trenutni panel
`Ctrl+A` → `%`	Podijeli panel vertikalno (lijevo/desno)
`Ctrl+A` → `"`	Podijeli panel horizontalno (gore/dole)

Tmux — Window Navigacija

Prečica	Opis
`Ctrl+A` → `n`	Sljedeći window
`Ctrl+A` → `p`	Prethodni window
`Ctrl+A` → `0-9`	Direktno na window po broju
`Ctrl+A` → `c`	Kreiraj novi window
`Ctrl+A` → `,`	Preimenuj trenutni window
`Ctrl+A` → `w`	Lista svih windowa (interaktivni izbor)

Tmux — Session Management

Prečica	Opis
`Ctrl+A` → `d`	Detach iz sesije (sesija ostaje živa)
`Ctrl+A` → `s`	Lista sesija (prebaci se)
`Ctrl+A` → `$`	Preimenuj sesiju
`tmux ls`	Lista svih sesija iz terminala
`tmux a -t <ime>`	Attach na sesiju
`tmux new -s <ime>`	Nova sesija

Tmux — Copy Mode (Scroll)

Prečica	Opis
`Ctrl+A` → `[`	Uđi u copy/scroll mode
`q`	Izađi iz copy mode-a
`↑` `↓` ili `PgUp` `PgDn`	Skrolaj
`Space` → selektuj → `Enter`	Kopiraj tekst

Terminal — Readline Prečice

Prečica	Opis
`Ctrl+A`	Skok na početak linije
`Ctrl+E`	Skok na kraj linije
`Ctrl+K`	Obriši od kursora do kraja
`Ctrl+U`	Obriši od kursora do početka
`Ctrl+W`	Obriši riječ unazad
`Ctrl+R`	Pretraži historiju komandi
`Ctrl+L`	Očisti ekran
`Ctrl+C`	Prekini trenutnu komandu
`Ctrl+D`	Izlaz (EOF)

Claude Code — Prečice

Prečica	Opis
`Enter`	Pošalji poruku
`Shift+Tab`	Accept edits
`Esc`	Cancel / Interrupt
`Ctrl+O`	Expand/collapse tool output
`/help`	Pomoć
`/clear`	Očisti kontekst

Tip: Na Studio serveru tmux prefix je Ctrl+A (ne default Ctrl+B). Konfig: ~/.tmux.conf

Baikal CalDAV Runbook

Service: Baikal CalDAV

Label: Docker container baikal + LaunchAgent com.john.calendar-bridge Tier: P2 (Business) Port: 5232 (local), calendar.basicconsulting.no (public via Cloudflare)

What It Does

Self-hosted CalDAV server for ALAI Business calendar. Alem syncs from iPhone/MacBook via native Calendar app. calendar-bridge.js daemon scans emails every 5min, detects meeting invites, forwards to alem@alai.no, and creates CalDAV events.

Architecture

Email (john@) → email-agent.js → calendar-bridge.js → Baikal CalDAV → Alem iPhone/Mac
                                       ↓
                               mail-native.js forward → alem@alai.no

Components

Component	Location	Type
Baikal server	~/system/services/baikal/docker-compose.yml	Docker
calendar-bridge.js	~/system/tools/calendar-bridge.js	Tool + Daemon
LaunchAgent	~/Library/LaunchAgents/com.john.calendar-bridge.plist	Daemon (5min)
Cloudflare tunnel	calendar.basicconsulting.no → localhost:5232	Tunnel
Credentials	Vaultwarden → "Baikal CalDAV"	Vault
Calendar	"ALAI Business" (CalDAV user: alem)	CalDAV
Data	~/system/services/baikal/data/	Persistent volume

Dependencies

Docker (container: baikal)
Cloudflare tunnel (com.john.cloudflared)
Vaultwarden (credentials)
mail-native.js (email forwarding)
email-agent.js (inline meeting detection)

Health Check

# Quick check
node ~/system/tools/calendar-bridge.js test

# Docker container
docker ps --filter name=baikal

# CalDAV endpoint
curl -s -o /dev/null -w "%{http_code}" http://localhost:5232/dav.php/

# Public URL (expect 401 = auth required = healthy)
curl -s -o /dev/null -w "%{http_code}" https://calendar.basicconsulting.no/dav.php/

# List events
node ~/system/tools/calendar-bridge.js list

Common Failures & Fixes

Failure 1: Baikal container down

Symptoms: calendar-bridge.js test fails, CalDAV 502/connection refused Fix:

cd ~/system/services/baikal && docker compose up -d

Failure 2: Cloudflare tunnel not routing

Symptoms: Public URL returns 404 or timeout, local URL works fine Fix:

# Check config includes calendar entry
grep calendar ~/.cloudflared/config.yml
# Restart tunnel
launchctl kickstart -k gui/$(id -u)/com.john.cloudflared

Failure 3: Calendar-bridge scan finds nothing

Symptoms: Meeting invites arrive but no events created, no forwards Check:

# Check daemon is running
launchctl list | grep calendar-bridge
# Check logs
tail -50 ~/system/logs/calendar-bridge.log
# Check state file
cat ~/system/logs/calendar-bridge-state.json
# Manual scan with verbose
node ~/system/tools/calendar-bridge.js scan --verbose

Failure 4: Alem can't sync from iPhone

Symptoms: iPhone Calendar shows error, events not showing Check:

Verify credentials in Vault: node ~/system/tools/vault.js get "Baikal CalDAV"
Test public CalDAV endpoint (should return 401, not 502/404)
iPhone settings: Server = calendar.basicconsulting.no/dav.php/principals/alem

Failure 5: Authentication failure

Symptoms: 401 with correct password Fix: Password might be out of sync. Re-hash in Baikal DB:

NEW_PASS=$(bw get password "Baikal CalDAV" --session $(cat /tmp/bw-session))
DIGEST=$(printf "alem:BaikalDAV:$NEW_PASS" | md5)
docker exec baikal sqlite3 /var/www/baikal/Specific/db/db.sqlite \
  "UPDATE users SET digesta1='$DIGEST' WHERE username='alem';"

Restart Procedure

# Restart Baikal
cd ~/system/services/baikal && docker compose restart

# Restart calendar-bridge daemon
launchctl kickstart -k gui/$(id -u)/com.john.calendar-bridge

Backup

SQLite DB: ~/system/services/baikal/data/Specific/db/db.sqlite
Config: ~/system/services/baikal/data/config/baikal.yaml
Included in daily db-backup.sh via Docker volume mount

MC Task

Created: #3029 (Deploy), #3035 (Documentation + Watchdog)

ALAI Infrastructure Map & Ops Runbooks

Last updated: 2026-03-12 | Author: John (AI Director)

1. Infrastructure Overview

Azure VM — vm-alai-support

Property	Value
IP	4.223.110.181
Region	Sweden Central
Size	Standard_B2als_v2 (2 vCPU, 4GB RAM)
OS	Ubuntu 22.04 LTS
SSH	`ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181`
Resource Group	rg-alai-support
Cost	~$35/mo (Founders Hub credits, expires 2026-11-15)
Compose	/opt/alai/docker-compose.yml

ANVIL — Mac Studio M3 Max (Local)

Property	Value
Role	AI inference, product dev, agent orchestration
Services	Ollama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed
Tunnel	Cloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc

2. Services on Azure VM (16 containers)

Service	URL	Container
BookStack (Wiki)	docs.basicconsulting.no	alai-bookstack-1
Documenso (e-Sign)	sign.basicconsulting.no	alai-documenso-1
Planka (Boards)	boards.basicconsulting.no	alai-planka-1
Vaultwarden	vault.basicconsulting.no	alai-vaultwarden-1
Baikal (CalDAV)	calendar.basicconsulting.no	alai-baikal-1
Grafana	grafana.basicconsulting.no	alai-grafana-1
Prometheus	prometheus.basicconsulting.no	alai-prometheus-1
Paperless-ngx	archive.basicconsulting.no	alai-paperless-1
Caddy (TLS proxy)	—	alai-caddy-1

3. ANVIL Daemons

Daemon	LaunchAgent	Script
Pi-Orchestrator	com.john.pi-orchestrator	~/system/kernel/pi-orchestrator.js
Telegram Agent	com.john.telegram-agent	~/system/tools/telegram-agent.js
Email Agent	com.john.email-agent	~/system/daemons/email-agent.js
Vault Keeper	com.john.vault-keeper	~/system/daemons/vault-keeper.js
Event Dispatcher	com.john.event-dispatcher	~/system/daemons/event-dispatcher.js
Tool-Shed	com.john.tool-shed	~/system/tools/tool-shed.js (:3050)

4. DNS — Cloudflare

Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0

Subdomain	Target	Proxy
docs, sign, boards, vault, calendar, grafana, prometheus, archive	4.223.110.181 (Azure VM)	Orange cloud
lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vnc	Cloudflare Tunnel (ANVIL)	Orange cloud

5. Runbooks

5.1 Azure VM Full Restart

az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps  # verify 16 containers

5.2 Single Service Recovery

ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50

5.3 TLS Certificate Issues

Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.

5.4 ANVIL Daemon Recovery

launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log

5.5 Database Backup

docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql

5.6 Pi-Orchestrator Not Processing

curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10

5.7 Email Agent Not Fetching

export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log

5.8 SSH IP Update

az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
  -n AllowSSH --source-address-prefixes "NEW_IP"

6. Security

All services behind Cloudflare Access (Zero Trust)
SSH restricted to office IP
Docker .env (chmod 600) with secrets
Let's Encrypt TLS on all domains
Gitleaks pre-commit + CI on all 6 products

7. Monthly Cost

Item	Cost
Azure VM (B2als_v2)	~$35/mo
Cloudflare	Free
Total	~$36/mo (Azure Founders Hub credits until Nov 2026)

ALAI Infrastructure Map & Ops Runbooks

Last updated: 2026-03-12 | Author: John (AI Director)

1. Infrastructure Overview

Azure VM — vm-alai-support

Property	Value
IP	4.223.110.181
Region	Sweden Central
Size	Standard_B2als_v2 (2 vCPU, 4GB RAM)
OS	Ubuntu 22.04 LTS
SSH	`ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181`
Resource Group	rg-alai-support
Cost	~$35/mo (Founders Hub credits, expires 2026-11-15)
Compose	/opt/alai/docker-compose.yml

ANVIL — Mac Studio M3 Max (Local)

Property	Value
Role	AI inference, product dev, agent orchestration
Services	Ollama, Qdrant, Pi-Orchestrator, Telegram, Email, Tool-Shed
Tunnel	Cloudflare Tunnel for lobby, api, mc, auth, track, ssh, vnc

2. Services on Azure VM (16 containers)

Service	URL	Container
BookStack (Wiki)	docs.basicconsulting.no	alai-bookstack-1
Documenso (e-Sign)	sign.basicconsulting.no	alai-documenso-1
Planka (Boards)	boards.basicconsulting.no	alai-planka-1
Vaultwarden	vault.basicconsulting.no	alai-vaultwarden-1
Baikal (CalDAV)	calendar.basicconsulting.no	alai-baikal-1
Grafana	grafana.basicconsulting.no	alai-grafana-1
Prometheus	prometheus.basicconsulting.no	alai-prometheus-1
Paperless-ngx	archive.basicconsulting.no	alai-paperless-1
Caddy (TLS proxy)	—	alai-caddy-1

3. ANVIL Daemons

Daemon	LaunchAgent	Script
Pi-Orchestrator	com.john.pi-orchestrator	~/system/kernel/pi-orchestrator.js
Telegram Agent	com.john.telegram-agent	~/system/tools/telegram-agent.js
Email Agent	com.john.email-agent	~/system/daemons/email-agent.js
Vault Keeper	com.john.vault-keeper	~/system/daemons/vault-keeper.js
Event Dispatcher	com.john.event-dispatcher	~/system/daemons/event-dispatcher.js
Tool-Shed	com.john.tool-shed	~/system/tools/tool-shed.js (:3050)

4. DNS — Cloudflare

Zone: basicconsulting.no | Zone ID: 4670dbd0acfeab4174ac0d4746d11ea0

Subdomain	Target	Proxy
docs, sign, boards, vault, calendar, grafana, prometheus, archive	4.223.110.181 (Azure VM)	Orange cloud
lobby, lobby-api, api, drop-api, mc, auth, track, ssh, vnc	Cloudflare Tunnel (ANVIL)	Orange cloud

5. Runbooks

5.1 Azure VM Full Restart

az vm restart -g rg-alai-support -n vm-alai-support
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose up -d
docker ps  # verify 16 containers

5.2 Single Service Recovery

ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /opt/alai && docker compose restart bookstack
docker logs alai-bookstack-1 --tail 50

5.3 TLS Certificate Issues

Caddy auto-renews. If problems: disable CF proxy temporarily, restart caddy, re-enable proxy.

5.4 ANVIL Daemon Recovery

launchctl list | grep com.john
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
tail -50 ~/system/logs/pi-orchestrator.log

5.5 Database Backup

docker exec alai-bookstack-db-1 mysqldump -u bookstack bookstack > bookstack.sql
docker exec alai-planka-db-1 pg_dump -U postgres planka > planka.sql
docker exec alai-documenso-db-1 pg_dump -U documenso documenso > documenso.sql

5.6 Pi-Orchestrator Not Processing

curl http://localhost:8401/status
claude auth status
launchctl kickstart -k gui/$(id -u)/com.john.pi-orchestrator
node ~/system/tools/mc.js list --status open --limit 10

5.7 Email Agent Not Fetching

export NODE_TLS_REJECT_UNAUTHORIZED=0
node ~/system/daemons/email-agent.js --test
tail -20 ~/system/logs/email-agent.log

5.8 SSH IP Update

az network nsg rule update -g rg-alai-support --nsg-name nsg-alai-support \
  -n AllowSSH --source-address-prefixes "NEW_IP"

6. Security

All services behind Cloudflare Access (Zero Trust)
SSH restricted to office IP
Docker .env (chmod 600) with secrets
Let's Encrypt TLS on all domains
Gitleaks pre-commit + CI on all 6 products

7. Monthly Cost

Item	Cost
Azure VM (B2als_v2)	~$35/mo
Cloudflare	Free
Total	~$36/mo (Azure Founders Hub credits until Nov 2026)

System Map — Infrastructure & Services

ALAI System Map

Ažurirano: 2026-03-16
Autor: John (AI Director, AI-first OS)

☁️ Azure VM — Supporting Services (Production)

VM: vm-alai-support | Azure Founders Hub | Sweden Central
Specs: Standard_B2als_v2 — 2 vCPU / 4GB RAM / 30GB SSD | IP: 4.223.110.181
Compose: /opt/alai/docker-compose.yml

SSH port 22 je zatvoren/firewall'd — pristup samo kroz Caddy/Cloudflare

Servis	URL	Status
BookStack (wiki/docs)	https://docs.alai.no	✅
Vaultwarden (passwords)	https://vault.basicconsulting.no	✅
Documenso (e-sign)	https://sign.basicconsulting.no	✅
Grafana (monitoring)	https://grafana.basicconsulting.no	✅
Planka (kanban)	https://boards.basicconsulting.no	✅
Baikal (CalDAV)	https://cal.basicconsulting.no	❌ down
Prometheus	(interno, bez javnog URL-a)	?
Caddy	(reverse proxy za sve gore)	✅

🖥️ ANVIL (MacBook Pro M3 Max) — Lokalni Dev

Docker containers (dev baze za produkte)

Container	Port	Projekt
lumiscare-postgres	5432	Lumiscare
lumiscare-redis	6379	Lumiscare
plock-db	5434	Plock
plock-redis	6380	Plock
backend-postgres	5435	(shared backend)
backend-redis	6381	(shared backend)
bilko-postgres	5436	Bilko
bilko-redis	6382	Bilko
drop-postgres	5433	Drop
lobby-postgres	5437	Lobby
qdrant	6333-6334	RAG vector search
sonarqube	9000	Code quality
bookstack (lokalno)	6875	⚠️ Dev/sync kopija, prod=Azure
bookstack_db	3306	(bookstack lokalni DB)

⚠️ Ovo su DEV baze — production servisi su na Azure ili u cloud providerima

Lokalni servisi (ne Docker)

Servis	Port	Detalji
Ollama ANVIL	11434	10 modela (qwen2.5-coder:32b, llama3.1:8b, llama-guard...)
N8N	5678	Workflow automation (lokalni, via LaunchAgent)
MC Dashboard	(interno)	Mission Control web UI
Caddy Vault	(interno)	Secret proxy
Tender Dashboard	(interno)	Anbud-tracking UI
Tool Shed	(interno)	Tool registry API

Ollama Modeli

Host	Modeli	Najveći
ANVIL (localhost:11434)	10	qwen2.5-coder:32b (23GB), llama-guard3:8b
FORGE (10.0.0.2:11434)	5	deepseek-r1:70b (42GB), qwen3:32b (20GB)

⚙️ Aktivni LaunchAgent Daemoni (~33)

ALAI Kernel

agent-timeout-monitor · idle-learning-daemon · ram-monitor · task-router

John's Agents

browser-worker · caddy-vault · cloudflared · comms-agent · documenso-webhook · draft-sender · email-tracker · event-dispatcher · hook-daemon · intake-watcher · mc-dashboard · n8n · network-watchdog · ops-watchdog · outbox-processor · pi-orchestrator · pipeline-watcher · slack-bot · telegram-agent · tender-dashboard · tool-shed · vault-keeper · vault-proxy

Produkt Monitoring

drop.health-check

🗄️ Aktivne SQLite Baze (~54) — `~/system/databases/`

Baza	Namjena
mission-control.db (10MB)	Svi MC taskovi (3847 done, 36 open)
hivemind.db (52MB)	Intel, knowledge, sessions, events
knowledge.db (187MB)	RAG knowledge base
flywheel.db (36MB)	RAG cache
events.db (11MB)	Event bus log
guardrails-audit.db (9.6MB)	AI safety audit
bee-index.db (3.4MB)	Code/file index
tenders.db (184KB)	Anbud/tender tracker
leads.db (224KB)	CRM leads
contacts.db (96KB)	CRM kontakti
hivemind-archive.db (5.9MB)	HiveMind arhiva
email-inbox.db (164KB)	Email inbox
drafts.db (292KB)	Email draftovi
routing-outcomes.db (64KB)	AI routing metrike
tool-audit.db (900KB)	Tool usage audit
bih-tenders.db (284KB)	BiH tender scraper
strategy-tracker.db (128KB)	Strategija/OKR
teams.db (40KB)	Timovi
projects.db (40KB)	Projekti
pipeline.db (56KB)	Sales pipeline
sprint-pipeline.db (32KB)	Sprint tracker
goals.db (44KB)	Ciljevi
invoices.db (36KB)	Fakture
baikal-caldav.db (108KB)	Kalendar (CalDAV backup)
+ još ~30 manjih baza	contacts, emails, tickets, vcr, distill...

🌐 Eksterni Servisi

Servis	Namjena
Anthropic API	Claude (claude-3-5-sonnet, claude-opus)
Fiken	Regnskap, fakture, lønn (NO)
Cloudflare	DNS, Tunnel, DDoS zaštita
Slack (basicconsulting)	Interna komunikacija
Telegram	Notifikacije, bot
Dropbox	File sync
one.com	Email hosting (SMTP/IMAP)
GitHub	Code repos
Azure Founders Hub	VM hosting

🔧 Tools & Scripts — `~/system/tools/`

Ukupno: 1,310 skripti
JS: 1,248 | SH: 58 | PY: 4

📁 Ključni Direktorijumi

~/system/
  tools/          ← 1,310 JS/SH skripti
  databases/      ← ~54 aktivnih SQLite baza
  config/         ← json konfiguracije, daemon registry
  agents/         ← hivemind, agent definicije
  notes/          ← ovaj fajl i drugi notesi
  backups/        ← dnevni backup svake baze
  services/       ← docker-compose po servisu

~/ALAI/
  products/       ← Drop, Bilko, Plock, Gotiva, Lobby, Lumiscare...
  internal/       ← configs, tools, docs
  legal/          ← ugovori, compliance, templates

🚦 Mission Control Status (2026-03-16)

Status	Broj
✅ done	3,847
⏸️ paused	664
🔴 blocked	120
🔵 open	36

ALAI Domain Migration — basicconsulting.no → alai.no

Context

ALAI rebrand did not include support stack migration. 11 subdomains remain on legacy basicconsulting.no domain.

Current Live State (by Zone)

basicconsulting.no (Cloudflare zone 4670dbd0acfeab4174ac0d4746d11ea0)

30+ DNS records
Main support stack hosting
Active services: docs, sign, bilko-demo, www
Inactive subdomains: status, support, monitor, alerts, help, wiki

alai.no (Cloudflare zone 3dc40d9c37fee79c4281f7e86870c0b5)

Status: PENDING — Nameservers on one.com not yet changed
Required NS change: ns01/02.one.com → aspen.ns.cloudflare.com + wells.ns.cloudflare.com
18 DNS records pre-created (A/CNAME for 15 services + root)
Blocker: Alem must update NS on one.com dashboard (5 min task, blocks 15 subdomain migrations)

snowit.ba (AWS Route53 zone Z04121493CAJZ75TQUPIW)

2026-04-19 added: A record root → 76.76.21.21 (Vercel)
CNAME www → cname.vercel-dns.com
Change ID: C065644119MEENZWSSKW3

Cloudflare Tunnel Config

Tunnel ID: 3315a609-7934-45c5-ad0c-56d86d16374d (named "mattermost")
Host VM: Azure 4.223.110.181 (swedencentral)
Ingress rules: Multiple service routes (see tunnel config for details)

Incident: sign.basicconsulting.no 404 (2026-04-18)

Symptom: DNS resolved to Cloudflare proxy but returned 404.

Root cause: Tunnel ingress had route sign.basicconsulting.no → localhost:3003 but cloudflared could not reach backend.

Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied).

Result: Documenso Sign In page now live.

Alem TODO

Log into one.com domain panel
Select alai.no domain
Change nameservers from ns01/02.one.com to:
- aspen.ns.cloudflare.com
- wells.ns.cloudflare.com
Wait 5-30 minutes for propagation
Verify: dig alai.no NS should show Cloudflare nameservers

AWS CLI Setup — john-deploy IAM

Credentials Location

~/.aws/credentials
[default]
aws_access_key_id = AKIAUXDEHCNUHFX472XL
aws_secret_access_key = (stored in Vault: "AWS CLI - john-deploy IAM")

IAM User Details

User: john-deploy
AWS Account: 324480209768
ARN: arn:aws:iam::324480209768:user/john-deploy
Access Key ID: AKIAUXDEHCNUHFX472XL
Secret Key: DO NOT print in docs — reference Bitwarden/Vault item "AWS CLI - john-deploy IAM"
Primary Region: eu-central-1 (Frankfurt)

Permissions

Known permissions (unverified full list):

Route53 (zone management, record creation)
S3 (bucket operations)
SES (email sending)
ECR (container registry)
App Runner (serverless containers)

Validated Usage

2026-04-14: Credentials confirmed working
2026-04-19: Route53 change for snowit.ba (Change ID: C065644119MEENZWSSKW3)

Usage Pattern

# Export credentials as env vars
export AWS_ACCESS_KEY_ID=AKIAUXDEHCNUHFX472XL
export AWS_SECRET_ACCESS_KEY="(from Vault)"
export AWS_DEFAULT_REGION=eu-central-1

# Example: Route53 change
aws route53 change-resource-record-sets \
  --hosted-zone-id Z04121493CAJZ75TQUPIW \
  --change-batch file://change-batch.json

MCP Docker AWS Tool

Tool: mcp__MCP_DOCKER__call_aws

Note: This tool has its own config and uses environment variables. May not share the same credentials as CLI.

Security Notes

Secret key NEVER committed to git
Stored in Vault: "AWS CLI - john-deploy IAM" item
Keychain fallback on macOS
If rotating keys: update Vault + ~/.aws/credentials + env vars

Slack alaiops Bot — Backend Architecture

Basic Info

Workspace: alai-talk.slack.com
Bot user: @alaiops (U0AEMU81LBG)
Channels: 11 public + 6 private (manual invite required for private)
Mode: Socket Mode (no public webhook needed)

Tokens Location

Primary: macOS Keychain
- slack-bot/slack-bot-token
- slack-bot/slack-app-token
Fallback 1: Bitwarden/Vault
Fallback 2: Environment variables

Daemon

LaunchAgent: com.john.slack-bot
PID lookup: pgrep -f slack-bot.js
Code: ~/system/tools/slack-bot.js
Logs: ~/system/logs/slack-bot.log

Backend Chain (via comms-responder.js)

Priority-based fallback system (lower number = higher priority, faster response):

Groq (priority 5, ~100-500ms) — PRIMARY
- Model: llama-3.1-8b-instant
- Added: 2026-04-18
- Requires: GROQ_API_KEY env var
- Adapter: ~/system/tools/adapters/groq.js
Claude API (priority 10, ~2s)
Claude CLI (priority 20, ~20s)
Ollama (priority 30, ~40s) — FALLBACK ONLY

Groq Adapter

// Registered in ~/system/tools/adapters/index.js
const groq = require("./groq.js");

// Usage
const response = await groq.send("prompt", {
  model: "llama-3.1-8b-instant",
  temperature: 0.7,
  max_tokens: 512
});

Event Subscriptions

Status: Re-enabled 2026-04-18 after scope fix

Critical fix: Bot NO LONGER requires admin scopes (caused "Enterprise only" error). Removed admin scopes from User token, kept 15 bot scopes.

Active bot scopes (15):

app_mentions:read
channels:history
channels:read
chat:write
groups:history
groups:read
im:history
im:read
im:write
mpim:history
mpim:read
reactions:read
reactions:write
users:read
users:read.email

Dead Pattern Warning

If bot stops responding, check logs first:

tail -100 ~/system/logs/slack-bot.log

Benign pattern (ignore): "Dedup: skipping" — message already processed

Error patterns (investigate):

"Socket mode error"
"Token invalid"
"Groq API error"
"All backends failed"

Test Commands

# Send test message
node ~/system/tools/slack.js send general "Test from John"

# Read channel history
node ~/system/tools/slack.js read general 10

# Check bot status
pgrep -f slack-bot.js && echo "Running" || echo "Stopped"

Documenso Self-Hosted — sign.basicconsulting.no

Service Details

Service: Documenso v2.x (open-source document signing)
URL: https://sign.basicconsulting.no
DNS: A record → 4.223.110.181 (Azure VM, proxied via Cloudflare)
Hosting: Azure VM (swedencentral)

Admin Credentials

Email: alem@alai.no
Password: (stored in Vault: "Documenso - sign.basicconsulting.no")
Vault item password: Cemerika_!950

API Integration

API Token: api_xn907c9xczrteoba (created 2026-04-19 for Bilko Sign integration)
API Base URL: https://sign.basicconsulting.no/api/v1

Test cURL

curl -H "Authorization: api_xn907c9xczrteoba" \
  https://sign.basicconsulting.no/api/v1/documents

# Expected response:
{"documents":[],"totalPages":0}

Bilko Sign Integration

Documenso is used as the signing backend for Bilko (accounting SaaS).

Spec: ~/ALAI/products/Bilko/docs/product/BILKO-SIGN-SPEC.md
Integration team: Skybound (mobile + frontend specialists)

GCP Secret Manager

Secret name: bilko-documenso-api-key
Value: api_xn907c9xczrteoba
Bound to: bilko-api Cloud Run service (revision 00045-flz)
Environment variable: DOCUMENSO_API_KEY

bilko-api Environment Variables

DOCUMENSO_API_URL=https://sign.basicconsulting.no
DOCUMENSO_API_KEY=(from GCP Secret Manager)

Incident History

2026-04-18: 404 Error

Symptom: sign.basicconsulting.no returned 404 Not Found

Root cause: Cloudflare Tunnel ingress had route to localhost:3003 but cloudflared could not reach backend

Fix: Changed DNS from tunnel CNAME to direct A record → 4.223.110.181 (proxied)

Result: Documenso Sign In page now live

Maintenance

Backup API Tokens

Store all API tokens in Vault immediately after creation
Documenso does NOT allow viewing tokens after creation (one-time display)

Version Updates

# Check current version
curl -s https://sign.basicconsulting.no/api/health | jq .version

# Update (on Azure VM)
ssh -i ~/.ssh/azure_alai alai-admin@4.223.110.181
cd /path/to/documenso
docker-compose pull
docker-compose up -d

Future Migration

Target: sign.alai.no (part of ALAI domain migration)

See ALAI Domain Migration runbook
Requires: alai.no NS change on one.com (pending as of 2026-04-19)

Azure Blob Offsite Backup Setup

Overview

Purpose: Offsite backup for ALAI system databases and git bundles
Region: North Europe (Dublin) — geographic separation from primary Sweden Central VM
Retention: 365 days with lifecycle policies (Hot → Cool → Archive → Delete)
Recovery Time Objective: 4 hours (manual restore)

Azure Resources

Resource Type	Name	Purpose
Resource Group	`alai-backups-rg`	Isolation boundary for backup storage
Storage Account	`alaibackups0ebb`	Blob storage (LRS, Standard tier)
Container	`system-db-backups`	SQLite databases (hivemind.db, mission-control.db, etc.)
Container	`system-git-bundles`	Git repository bundles
Service Principal	`alai-backup-writer`	Scoped write-only access (Storage Blob Data Contributor)

Service Principal Setup

# Create service principal
az ad sp create-for-rbac --name alai-backup-writer --skip-assignment

# Assign Storage Blob Data Contributor to SA only (not subscription)
STORAGE_ID=$(az storage account show --name alaibackups0ebb --query id -o tsv)
az role assignment create \
  --assignee <service-principal-app-id> \
  --role "Storage Blob Data Contributor" \
  --scope "$STORAGE_ID"

# Store credentials in ~/system/config/azure-backup.env
cat > ~/system/config/azure-backup.env <

Lifecycle Policy

Hot → Cool: 30 days
Cool → Archive: 90 days
Archive → Delete: 365 days
Delete blobs: Last modified > 365 days

az storage account management-policy create \
  --account-name alaibackups0ebb \
  --policy @lifecycle-policy.json

lifecycle-policy.json:

{
  "rules": [
    {
      "enabled": true,
      "name": "archive-old-backups",
      "type": "Lifecycle",
      "definition": {
        "actions": {
          "baseBlob": {
            "tierToCool": {"daysAfterModificationGreaterThan": 30},
            "tierToArchive": {"daysAfterModificationGreaterThan": 90},
            "delete": {"daysAfterModificationGreaterThan": 365}
          }
        },
        "filters": {"blobTypes": ["blockBlob"]}
      }
    }
  ]
}

Backup Scripts

LightRAG to Azure Blob

#!/bin/bash
# ~/system/tools/migrate-lightrag-to-azure.sh

source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_FILE="/tmp/lightrag-backup-$TIMESTAMP.tar.gz"

tar -czf "$BACKUP_FILE" ~/system/lightrag/
az storage blob upload \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "lightrag-$TIMESTAMP.tar.gz" \
  --file "$BACKUP_FILE" \
  --auth-mode login

rm "$BACKUP_FILE"

Ollama Models Export

#!/bin/bash
# ~/system/tools/ollama-models-export.sh --azure

source ~/system/config/azure-backup.env
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
EXPORT_DIR="/tmp/ollama-export-$TIMESTAMP"

mkdir -p "$EXPORT_DIR"
ollama list | tail -n +2 | awk '{print $1}' > "$EXPORT_DIR/model-list.txt"

while read -r model; do
  ollama show "$model" --modelfile > "$EXPORT_DIR/$model.modelfile"
done < "$EXPORT_DIR/model-list.txt"

tar -czf "$EXPORT_DIR.tar.gz" "$EXPORT_DIR"
az storage blob upload \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "ollama-models-$TIMESTAMP.tar.gz" \
  --file "$EXPORT_DIR.tar.gz"

rm -rf "$EXPORT_DIR" "$EXPORT_DIR.tar.gz"

Disaster Recovery Path

List available backups:

az storage blob list \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --output table

Download latest backup:

az storage blob download \
  --account-name alaibackups0ebb \
  --container-name system-db-backups \
  --name "lightrag-20260420-143000.tar.gz" \
  --file /tmp/restore-lightrag.tar.gz

Verify SHA-256 checksum:

shasum -a 256 /tmp/restore-lightrag.tar.gz

Restore to target system:

tar -xzf /tmp/restore-lightrag.tar.gz -C ~/system/

Monitoring

Cron: Hourly backup at :15 (15 * * * *)
Log: ~/system/logs/azure-backup.log
Alert: HiveMind alert if backup fails 2 consecutive runs

node ~/system/agents/hivemind/hivemind.js post john alert \
  "Azure backup failed 2 consecutive runs — check ~/system/logs/azure-backup.log"

ANVIL Memory Troubleshooting — Mac Studio

ANVIL Memory Troubleshooting — Mac Studio (M2 Ultra 192GB)

Incident Summary

Date: 2026-04-20
Symptom: System freezes, Chrome/Claude unresponsive, OOM kernel panics
Root Cause: Zombie Ollama runner processes + duplicate launchd agents + runaway grep processes
Resolution: Ollama config tuning, duplicate agent removal, zombie cleanup daemon, Ollama 0.21.0 upgrade

Root Causes

Ollama zombie runners: ollama ps reports 0 models loaded, but pgrep -fl ollama_llama_server shows 4-6 GB processes still resident
Duplicate launchd agents: Both com.alai.ollama-serve.plist and com.alai.ollama-serve-v2.plist running simultaneously → 2x Ollama daemons
grep memory leak: grep -rn commands on large codebases hang and consume 8+ GB RAM each
Preload warmup bloat: com.john.ollama-warmup.plist loading 3 models on boot → 48 GB baseline before any work

Permanent Fix — Ollama Config

File: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.alai.ollama-serve-v2</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/ollama</string>
    <string>serve</string>
  </array>
  <key>EnvironmentVariables</key>
  <dict>
    <key>OLLAMA_HOST</key>
    <string>0.0.0.0:11434</string>
    <key>OLLAMA_KEEP_ALIVE</key>
    <string>60s</string>
    <key>OLLAMA_MAX_LOADED_MODELS</key>
    <string>1</string>
    <key>OLLAMA_NUM_PARALLEL</key>
    <string>1</string>
  </dict>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/ollama-serve.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/ollama-serve-error.log</string>
</dict>
</plist>

Key parameters:

OLLAMA_KEEP_ALIVE=60s — unload model after 60s idle (default 5m causes bloat)
OLLAMA_MAX_LOADED_MODELS=1 — only one model resident at a time
OLLAMA_NUM_PARALLEL=1 — no parallel inference (reduces contention)

Zombie Cleanup Daemon

File: ~/Library/LaunchAgents/com.alai.zombie-cleanup.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.alai.zombie-cleanup</string>
  <key>ProgramArguments</key>
  <array>
    <string>/bin/bash</string>
    <string>/Users/makinja/system/tools/zombie-proc-cleanup.sh</string>
  </array>
  <key>StartInterval</key>
  <integer>3600</integer>
  <key>StandardOutPath</key>
  <string>/tmp/zombie-cleanup.log</string>
</dict>
</plist>

Script: ~/system/tools/zombie-proc-cleanup.sh

#!/bin/bash
# Kill zombie Ollama runners (no parent process or disconnected from ollama serve)
pgrep -fl ollama_llama_server | while read -r pid rest; do
  parent=$(ps -o ppid= -p "$pid" | xargs)
  if [[ -z "$parent" ]] || ! ps -p "$parent" | grep -q ollama; then
    echo "$(date): Killing zombie Ollama runner $pid"
    kill -9 "$pid"
  fi
done

# Kill grep processes older than 5 minutes (likely hung)
ps -eo pid,etime,command | grep 'grep -rn' | while read -r pid etime rest; do
  minutes=$(echo "$etime" | awk -F: '{print ($1*60)+$2}')
  if [[ "$minutes" -gt 5 ]]; then
    echo "$(date): Killing hung grep process $pid (runtime: $etime)"
    kill -9 "$pid"
  fi
done

Disabled Agents

launchctl unload ~/Library/LaunchAgents/com.alai.ollama-serve.plist
launchctl unload ~/Library/LaunchAgents/com.john.ollama-warmup.plist
rm ~/Library/LaunchAgents/com.alai.ollama-serve.plist
rm ~/Library/LaunchAgents/com.john.ollama-warmup.plist

Ollama Upgrade

brew upgrade ollama  # 0.19.0 → 0.21.0
# Changelog: Fixed memory leak in runner cleanup (issue #4821)

OOM Symptom Recognition

Command:

vm_stat | awk '/Pages free/ {printf "%.1f GB\n", $3*16384/1024/1024/1024}'

Thresholds:

< 5 GB free: Alert — investigate top memory consumers
< 2 GB free: Critical — kill non-essential processes immediately
< 500 MB free: Imminent OOM — force quit Claude/Chrome, restart Ollama

Quick triage:

ps aux | sort -nrk 4 | head -10  # Top 10 memory hogs
pgrep -fl ollama_llama_server    # Zombie Ollama runners
pgrep -fl grep                    # Hung grep processes

Prevention Checklist

Monitor free RAM hourly: vm_stat check in cron
Zombie cleanup daemon running: launchctl list | grep zombie-cleanup
Only one Ollama launchd agent: launchctl list | grep ollama → expect 1 line
No warmup preload agents: launchctl list | grep warmup → empty
Grep with timeout: timeout 60 grep -rn ... instead of bare grep -rn

Email Pipeline + Edita PA — Runbook

Overview

The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).

Architecture

Daemon: ~/system/daemons/email-agent.js
LaunchAgent: com.john.email-agent (via wrapper email-agent-wrapper.sh)
Vault: Bitwarden session (/tmp/bw-session) required for IMAP credentials
Triage LLM: llama3.1:8b (Ollama ANVIL, preloaded via ollama-triage-preload.sh)

OWN Classifier Logic

The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.

Constants (email-agent.js lines 118-123)

const OWN_SYSTEM_PREFIXES = [
  'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
  'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
  'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];

isOwnSystemEmail() Function (lines 446-456)

Two-tier check:

Exact match: OWN_ADDRESSES array (hardcoded machine addresses)
Prefix + domain: Any prefix in OWN_SYSTEM_PREFIXES on domains in OWN_SYSTEM_DOMAINS

Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.

TLDR_SKIP Routing

Newsletters from dan@tldrnewsletter.com do NOT create MC tasks. They are handled exclusively by tldr-briefing.js daemon.

// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';

// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
  return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}

VIP Ordering

Classification priority (lines 464-481):

VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
OWN: System emails → archive, no task
Other: Spam allowlist check → Ollama classification

Edita PA Phases

Phase 0: --dry-run (Log-Only)

Classification + logging only. No archive, no escalate, no respond.

node ~/system/daemons/email-agent.js --dry-run

Phase 1: --allow-archive (CURRENT)

Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).

node ~/system/daemons/email-agent.js --allow-archive

Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).

Phase 2: Full Live (NOT YET APPROVED)

Archive + escalate + respond. Requires CEO explicit approval.

node ~/system/daemons/email-agent.js --allow-all

Unit Testing

Test classifier without IMAP/Vault dependencies:

node ~/system/daemons/test-email-classifier.js

Scenarios (16 total):

VIP bypass (alem@alai.no, CEO family)
TLDR_SKIP routing
OWN system emails (noreply@alai.no, sentinel@basicconsulting.no)
Spam patterns with allowlist exceptions (GitHub, Cloudflare, Anthropic)

Rollback

Revert to dry-run mode:

launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist

# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh

launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist

Monitoring

Logs: ~/system/logs/email-agent-launchd.log
Errors: ~/system/logs/email-agent-launchd-error.log
MC tasks: node ~/system/tools/mc.js list --owner edita
DLQ: Failed vault sessions stored in email-agent.js in-memory DLQ (logged only, no persistence)

Generated by Skillforge | ALAI, 2026

Contact Form Handlers

This section documents all contact forms across ALAI properties and their email delivery mechanisms.

alai.no Contact Form

Frontend: https://alai.no/contact (Cloudflare Pages)
Handler: CF Pages Function /functions/contact.js
Endpoint: POST https://alai.no/api/contact
Email provider: Resend API
Recipient: info@alai.no
Credentials: Bitwarden item "Resend API Key" → CF Pages env var RESEND_API_KEY
Status: LIVE (deployed 2026-04-21, MC #8587)

Test procedure:

curl -X POST https://alai.no/api/contact \
  -H "Content-Type: application/json" \
  -d '{"name": "Test User", "email": "test@example.com", "message": "E2E test 2026-04-21 14:00"}'

# Verify inbox:
himalaya search --account info-alai --folder INBOX "subject:Contact Form"

snowit.ba Contact Form

Frontend: https://snowit.ba/contact
Handler: BROKEN — Vercel API route not migrated to CF Pages (MC #8591)
Endpoint: POST https://api.basicconsulting.no/contact (hijacked by documenso-webhook, returns false success)
Recipient: info@snowit.ba (LumisCare side, not ALAI-managed)
Status: BROKEN — awaiting CodeCraft fix

getdrop.no Waitlist

Frontend: https://getdrop.no (Cloudflare Pages)
Handler: CF Pages Function /functions/waitlist.js
Endpoint: POST https://getdrop.no/api/waitlist
Storage: Cloudflare D1 database drop-waitlist
Email provider: None (DB-only storage, no email sent)
Status: LIVE

Test procedure:

wrangler d1 execute drop-waitlist --command "SELECT * FROM submissions ORDER BY created_at DESC LIMIT 5"

merdzanovic.ba Contact Form

Status: UNKNOWN — needs audit (likely same risk as snowit.ba)
MC Task: #8593 (audit all ALAI-managed contact forms)

Form Handler Migration Checklist

When migrating sites from Vercel/Netlify to Cloudflare Pages:

Inventory: Identify all POST endpoints (forms, webhooks, API routes)
Port handlers: Rewrite Vercel API routes as CF Pages Functions (/functions/*.js)
Environment variables: Copy SMTP/API credentials to CF Pages env vars
Update form actions: Change form targets to new CF Pages routes (e.g., /api/contact)
E2E test: Follow Forms E2E Testing Protocol (HTTP + inbox check MANDATORY)
Monitor: Check inbox/DB for 24 hours post-migration to catch silent failures

Reference incident: 2026-04-21 alai.no Contact Form Failure

Himalaya IMAP Setup (for Form Testing)

Himalaya CLI provides rapid inbox verification without browser login.

Install

brew install himalaya

Configure Account

Add to ~/.config/himalaya/config.toml:

[accounts.info-alai]
default = false
email = "info@alai.no"
display-name = "ALAI Info"

[accounts.info-alai.imap]
host = "imap.one.com"
port = 993
encryption = "tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"

[accounts.info-alai.smtp]
host = "send.one.com"
port = 587
encryption = "start-tls"
login = "info@alai.no"
passwd.cmd = "bw get password 'Email - info@alai.no' --session $(cat /tmp/bw-session)"

Usage

# Unlock Bitwarden first
bw unlock --raw > /tmp/bw-session

# List recent messages
himalaya list --account info-alai --folder INBOX --page-size 20

# Search for form submissions
himalaya search --account info-alai --folder INBOX "from:noreply@alai.no"

# Search by date range
himalaya search --account info-alai --folder INBOX "since:2026-04-21"

Credentials: Bitwarden item "Email - info@alai.no"

Updated: 2026-04-21 | Skillforge

Email Pipeline + Edita PA — Runbook

Overview

The email pipeline classifies incoming emails and routes them to Mission Control, HiveMind, or archive. Edita PA is the autonomous email assistant operating in phased rollout (currently Phase 1).

Architecture

Daemon: ~/system/daemons/email-agent.js
LaunchAgent: com.john.email-agent (via wrapper email-agent-wrapper.sh)
Vault: Bitwarden session (/tmp/bw-session) required for IMAP credentials
Triage LLM: llama3.1:8b (Ollama ANVIL, preloaded via ollama-triage-preload.sh)

OWN Classifier Logic

The OWN classifier identifies machine-generated emails from ALAI's own systems to prevent task spam.

Constants (email-agent.js lines 118-123)

const OWN_SYSTEM_PREFIXES = [
  'noreply@', 'no-reply@', 'sentinel@', 'alerts@', 'auto@', 'daemon@',
  'mailer@', 'notification@', 'notifications@', 'bounces@', 'bounce@',
  'donotreply@', 'do-not-reply@', 'system@'
];
const OWN_SYSTEM_DOMAINS = ['@alai.no', '@basicconsulting.no'];

isOwnSystemEmail() Function (lines 446-456)

Two-tier check:

Exact match: OWN_ADDRESSES array (hardcoded machine addresses)
Prefix + domain: Any prefix in OWN_SYSTEM_PREFIXES on domains in OWN_SYSTEM_DOMAINS

Critical: alem@alai.no is NEVER in this list. VIP check runs FIRST (line 464), bypassing OWN classifier entirely.

TLDR_SKIP Routing

Newsletters from dan@tldrnewsletter.com do NOT create MC tasks. They are handled exclusively by tldr-briefing.js daemon.

// line 126
const TLDR_SENDER = 'dan@tldrnewsletter.com';

// line 474
if (lowerFrom.includes(TLDR_SENDER)) {
  return { category: 'TLDR_SKIP', priority: 'low', summary: 'TLDR newsletter — handled by tldr-briefing.js', action: '' };
}

VIP Ordering

Classification priority (lines 464-481):

VIP: CEO/family → bypass ALL filters, force ACTION/high, skip Ollama
TLDR_SKIP: Newsletter → skip MC INTAKE, route to tldr-briefing.js
OWN: System emails → archive, no task
Other: Spam allowlist check → Ollama classification

Edita PA Phases

Phase 0: --dry-run (Log-Only)

Classification + logging only. No archive, no escalate, no respond.

node ~/system/daemons/email-agent.js --dry-run

Phase 1: --allow-archive (CURRENT)

Archive low-priority emails only. Escalate and respond actions are held (logged but not executed).

node ~/system/daemons/email-agent.js --allow-archive

Plist config: com.john.email-agent calls email-agent-wrapper.sh, which passes no flags → defaults to Phase 1 (archive-only mode is internal default in daemon code).

Phase 2: Full Live (NOT YET APPROVED)

Archive + escalate + respond. Requires CEO explicit approval.

node ~/system/daemons/email-agent.js --allow-all

Unit Testing

Test classifier without IMAP/Vault dependencies:

node ~/system/daemons/test-email-classifier.js

Scenarios (16 total):

VIP bypass (alem@alai.no, CEO family)
TLDR_SKIP routing
OWN system emails (noreply@alai.no, sentinel@basicconsulting.no)
Spam patterns with allowlist exceptions (GitHub, Cloudflare, Anthropic)

Rollback

Revert to dry-run mode:

launchctl unload ~/Library/LaunchAgents/com.john.email-agent.plist

# Edit wrapper to add --dry-run flag
echo 'exec /opt/homebrew/bin/node "$HOME/system/daemons/email-agent.js" --dry-run' >> ~/system/tools/email-agent-wrapper.sh

launchctl load ~/Library/LaunchAgents/com.john.email-agent.plist

Monitoring

Logs: ~/system/logs/email-agent-launchd.log
Errors: ~/system/logs/email-agent-launchd-error.log
MC tasks: node ~/system/tools/mc.js list --owner edita
DLQ: Failed vault sessions stored in email-agent.js in-memory DLQ (logged only, no persistence)

Generated by Skillforge | ALAI, 2026

Ollama Fleet Architecture

Overview

ALAI operates a two-node Ollama fleet: ANVIL (local dev Mac) and FORGE (Ubuntu 22.04 GPU workstation). ANVIL handles triage workloads (email, TLDR, quick classification), FORGE handles heavy inference (32B+ models, RAG pipelines).

ANVIL Ollama Configuration

Capacity Limits

MAX_LOADED_MODELS: 2 (prevents RAM exhaustion)
KEEP_ALIVE: 30s (default for on-demand models)
Hardware: M1 Pro, 32GB RAM, 5GB reserved for triage model

LaunchAgent: com.alai.ollama-serve-v2

Label: com.alai.ollama-serve-v2
Plist: ~/Library/LaunchAgents/com.alai.ollama-serve-v2.plist
Port: 11434
Environment:
  OLLAMA_FLASH_ATTENTION=1
  OLLAMA_KV_CACHE_TYPE=q8_0
  OLLAMA_MAX_LOADED_MODELS=2
  OLLAMA_KEEP_ALIVE=30s

Triage Preload Pattern

MC #8477 — Prevent qwen2.5-coder:32b (23GB) from blocking email/TLDR triage.

Strategy

Preload llama3.1:8b with keep_alive=-1 (indefinite) so it's always resident for fast triage operations. 5GB footprint.

LaunchAgent: com.john.ollama-triage-preload

Label: com.john.ollama-triage-preload
Script: ~/system/tools/ollama-triage-preload.sh
Trigger: RunAtLoad + StartInterval 300s (every 5 min)
Log: ~/system/logs/ollama-triage-preload-stdout.log

Script Logic (ollama-triage-preload.sh)

Check if llama3.1:8b is already loaded via /api/ps
If not loaded, send minimal prompt with keep_alive=-1
Log success/skip

curl -sf -X POST "$OLLAMA_URL/api/generate" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama3.1:8b\",
    \"prompt\": \"ready\",
    \"stream\": false,
    \"keep_alive\": -1,
    \"options\": {
      \"num_predict\": 1
    }
  }"

Model Tier System

Tier	Model	Size	Use Case	Keep Alive	Node
Triage	llama3.1:8b	5GB	Email classification, TLDR summarization, quick routing	-1 (indefinite)	ANVIL
Heavy	qwen2.5-coder:32b	23GB	Code generation, architecture review, complex reasoning	30s (on-demand)	ANVIL
Primary	devstral:24b	~15GB	Agent orchestration, planning, context routing	300s	FORGE

FORGE Failover

Consumers (email-agent.js, tldr-briefing.js, YouTube daemon) can set FORGE_FIRST=0 environment variable to skip FORGE and use ANVIL directly.

# Force ANVIL-only
export FORGE_FIRST=0
node ~/system/daemons/youtube-daemon.js

Default behavior: Try FORGE (10.0.0.2:11434), fallback to ANVIL (localhost:11434) on timeout.

Vault-Keeper Watchdog (MC #8471 — PENDING)

Monitors ~/system/.cache/vault-keeper-heartbeat file. If stale > 1 hour, SENTINEL alerts.

Implementation

LaunchAgent: com.john.vault-keeper-watchdog
Interval: 600s (10 min)
Script: ~/system/daemons/vault-keeper-watchdog.sh
Alert: Slack #sentinel-alerts

Logic

Read heartbeat file timestamp
Compare with current time
If > 3600s, send SENTINEL alert with vault-keeper logs

YouTube Daemon Lesson (MC #8472)

Log redirection corruption: tee + subshell arithmetic capture caused output mangling.

Anti-Pattern

# WRONG — tee inside $() breaks arithmetic
NEW_COUNT=$(node ~/system/daemons/youtube-processor.js | tee -a "$LOG")

Correct Pattern

# RIGHT — separate logging stream
node ~/system/daemons/youtube-processor.js >> "$LOG" 2>&1

LaunchAgent Duplication

Never use both KeepAlive and StartInterval in same plist. StartInterval triggers even if process is still running, causing overlap.

# WRONG
<key>KeepAlive</key>
<true/>
<key>StartInterval</key>
<integer>3600</integer>

# RIGHT (pick one)
<key>StartInterval</key>
<integer>3600</integer>

Fleet Monitoring

ANVIL

curl http://localhost:11434/api/ps
curl http://localhost:11434/api/tags
tail -f ~/system/logs/ollama-triage-preload-stdout.log

FORGE

curl http://10.0.0.2:11434/api/ps
ssh forge "tail -f /var/log/ollama.log"

Mission Control

node ~/system/tools/mc.js list --tag ollama
node ~/system/tools/cost-tracker.js summary --service ollama

Generated by Skillforge | ALAI, 2026

Static Hosting Migration — Progress Log

MC: #8523 (tracking), #8482 (basicconsulting.no), #8489 (bilko.io) | Date: 2026-04-20

Overview

ALAI is migrating 8 static sites from Vercel/Azure VM to Cloudflare Pages for cost savings (€0 vs €12-14/mo), operational simplification, and DDoS/WAF coverage. See full blueprint at ~/system/specs/ALAI-STATIC-HOSTING-BLUEPRINT.md.

Migration Log

Date	Domain	From	To	Downtime	TTFB Before	TTFB After	Notes
2026-04-20	basicconsulting.no	Vercel (76.76.21.21)	CF Pages	~60s	114ms	51ms (warm avg)	MC #8482. DNS: A→CNAME. Validation required domain re-add. TTFB improved 55%. Proveo pilot validated #8490.
2026-04-20	bilko.io	one.com (down)	CF Pages	N/A (site was down)	N/A	68ms (warm avg)	MC #8489. Apex CNAME not possible on one.com free tier (paid feature). Switched to Cloudflare NS (ana.ns.cloudflare.com, bob.ns.cloudflare.com). CF Pages zone ID: `62d89b79f0648d3fa1d045335a989ea7`. DNS: CNAME flattening bilko.io → bilko-io.pages.dev (proxied), www → bilko-io.pages.dev.

Paused Migrations

MC #8483 — basicfakta.no

Reason: Inventory error. Site has serverless functions (Vercel Edge), not pure static. Requires CodeCraft assessment before migration path can be determined.

MC #8484 — snowit.no

Reason: Inventory error. Site has API routes (Next.js), not pure static. Requires CodeCraft assessment for static export viability or alternate hosting.

Audit Verdict: bilko-demo.alai.no (MC #8486)

Decision: Stays on GCP Cloud Run. Not eligible for CF Pages migration.

Reason: Full-stack Next.js app with dynamic API routes and server-side rendering. Static export would break functionality. Current platform (Cloud Run) is correct fit.

Lessons Learned

one.com Apex CNAME Limitation

one.com free tier does NOT support apex CNAME (requires paid plan). For domains registered at one.com, the migration path is:

Switch nameservers to Cloudflare (ana.ns.cloudflare.com, bob.ns.cloudflare.com)
Import DNS records via Cloudflare zone scan
Set up CNAME flattening in Cloudflare (apex → CF Pages project, proxied)

Propagation time: 15 minutes to 4 hours for .no domains.

Inventory Validation Pre-Migration

Before scheduling a migration, verify the site is truly static:

Check for pages/api/ or app/api/ directories (Next.js API routes)
Check for Vercel Edge Functions (middleware.ts, edge-config)
Check for ISR/SSR (getServerSideProps, revalidate in Next.js)
Run npm run build and verify output is out/ or dist/ (static export)

If any of the above exist, the site is NOT static and requires CodeCraft review.

TTFB Improvements

Cloudflare Pages with CDN caching (orange-cloud proxy) delivers 50-60% TTFB improvement over Vercel for static sites. Cold start overhead is negligible (CF edge network vs Vercel edge).

Remaining Migrations

Domain	Current Host	Status	MC Task
alai.no	CF Pages	✅ Complete (already on target platform)	N/A
basicconsulting.no	CF Pages	✅ Complete (2026-04-20)	#8482
bilko.io	CF Pages	✅ Complete (2026-04-20)	#8489
basicfakta.no	Vercel	⏸ Paused (serverless functions found)	#8483
snowit.no	Vercel	⏸ Paused (API routes found)	#8484
getdrop.no	Azure VM	🔄 Pending (DNS on Vercel, move to CF)	#8485
kenyhot.pro	Vercel	🔄 Pending (coordinate with client)	#8487
merdzanovic.ba	Vercel	🔄 Pending (coordinate with client)	#8488

DNS Consolidation Status

Domain	Registrar	Current NS	Target NS	Status
alai.no	one.com	Cloudflare	Cloudflare	✅ Done
basicconsulting.no	one.com	Cloudflare	Cloudflare	✅ Done
bilko.io	one.com	Cloudflare	Cloudflare	✅ Done (2026-04-20)
getdrop.no	one.com	Vercel	Cloudflare	🔄 Pending
basicfakta.no	one.com	Vercel	Cloudflare	🔄 Pending
snowit.no	one.com	Unknown	Cloudflare	🔄 Pending

Generated by Skillforge | ALAI, 2026

ANVIL DR Bootstrap Runbook (Mac Air)

When to use

This runbook is for recovering the ALAI AI factory infrastructure when:

ANVIL (Mac Studio, 100.103.49.98) is dead, stolen, or inaccessible
Hardware failure requiring complete rebuild on new Mac
Setting up FORGE (disaster recovery clone) on fresh hardware
Provisioning a new MacBook Air for Alem with minimal AI factory capabilities

SPOF Context: As of 2026-04-20, ANVIL is the single Mac Studio hosting 112 LaunchAgent daemons, 68 SQLite databases (litestream-replicated), Ollama (8 models), and the entire ~/system + ~/.claude infrastructure. This runbook enables recovery to any fresh Mac with admin access.

Prerequisites

Before starting bootstrap, ensure you have:

Fresh Mac with admin account (macOS Sonoma or later, Apple Silicon preferred)
Tailscale app installed + logged into alembasic@ tailnet (download from tailscale.com/download)
GitHub account with read access to:
- github.com/johnatbasicas/clawd (~/system repo, auto-backup branch)
- github.com/johnatbasicas/claude-config (~/.claude repo)
Bitwarden account unlocked with master password ready (Alem's personal vault: alembasic@gmail.com)
Internet connection (stable, for 2-3 GB of Homebrew packages + Ollama models)

Step-by-step Bootstrap

Phase 1: Foundation

1. Install Xcode Command Line Tools

xcode-select --install

Expected: GUI dialog appears. Click "Install" and wait 5-10 minutes. Verify with:

xcode-select -p
# Should output: /Library/Developer/CommandLineTools

2. Install Homebrew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Expected: Homebrew installs to /opt/homebrew. Add to shell profile:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

# Verify:
brew --version
# Should show: Homebrew 4.x.x

3. Install Bitwarden CLI + unlock vault

brew install bitwarden-cli

# Unlock vault (enter master password when prompted):
bw login alembasic@gmail.com
export BW_SESSION=$(bw unlock --raw)

# Verify:
bw status | jq .status
# Should show: "unlocked"

Note: Keep this terminal window open. BW_SESSION is needed for bootstrap script.

Phase 2: Clone Infrastructure Repos

4. Clone ~/system (clawd repo)

# If using SSH (recommended if SSH keys already set up):
git clone git@github.com:johnatbasicas/clawd.git ~/system

# OR if using HTTPS with GitHub PAT:
git clone https://github.com/johnatbasicas/clawd.git ~/system

# Switch to auto-backup branch (contains latest portability artifacts):
cd ~/system
git checkout auto-backup
git pull

Expected:

ls ~/system/
# Should show: Brewfile, bootstrap.sh, config/, databases/, tools/, etc.

5. Clone ~/.claude (claude-config repo)

git clone git@github.com:johnatbasicas/claude-config.git ~/.claude

# Verify:
ls ~/.claude/
# Should show: CLAUDE.md, hooks/, agents/, skills/, projects/

Phase 3: Run Bootstrap Script

6. Execute bootstrap (with BW_SESSION active)

cd ~/system
bash bootstrap.sh workstation

Role options:

anvil: Full primary node (all daemons, Ollama, heavy workloads)
forge: DR clone (continuous restore from Azure, lighter load)
workstation: Minimal setup (SSH relay to ANVIL for heavy ops)

What the script does:

Re-checks Xcode CLT + Homebrew (idempotent)
Installs ~70 brew packages from Brewfile (15-30 min depending on connection)
Copies 112 LaunchAgent plists from ~/system/config/launchagents/ to ~/Library/LaunchAgents/
Rehydrates BW:<item> placeholders in plists by calling bw get password <item>
Loads all LaunchAgents via launchctl bootstrap
Verifies core services (Ollama, litestream)

Expected output (tail of bootstrap.log):

[bootstrap] Bootstrap COMPLETE. Next steps:
[bootstrap]   - Verify SSH: ssh makinja@100.103.49.98
[bootstrap]   - Check MC: node ~/system/tools/mc.js list
[bootstrap]   - Log: /Users/makinja/bootstrap.log

LaunchAgents loaded: 112
Ollama models available: 8
Litestream: RUNNING

If BW rehydration fails: You'll see warnings like:

WARN: Bitwarden item 'groq-api-key' not found — com.alai.groq-model-benchmark.plist will need manual fix

Fix manually after bootstrap completes (see Troubleshooting section).

Phase 4: Database Restore (if DBs lost/corrupt)

When to run: Only if ~/system/databases/ is empty or you need to restore from Azure backups (e.g., ANVIL disk died).

7. Set Azure auth environment variables

export AZURE_CLIENT_ID="1a0b3018-0c31-474b-918f-531b0a29a669"
export AZURE_CLIENT_SECRET=$(bw get password alai-backup-writer-secret)
export AZURE_TENANT_ID="cd0a7929-1d14-4f81-820d-b36e45f72cf7"

8. Restore P0 critical databases

mkdir -p ~/system/databases

# Mission Control:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/mission-control.db

# HiveMind:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/hivemind.db

# Tasks:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/tasks.db

# Costs:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/costs.db

# Events:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/events.db

9. Restore P0 financial databases

# Fiken (accounting cache):
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/fiken.db

# Invoices:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/invoices.db

# Contracts:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/contracts.db

# Leads:
litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/leads.db

Note: The -if-replica-exists flag prevents overwriting local DB if it's newer than Azure backup. Omit this flag to force restore.

Bulk restore all 68 DBs (if needed):

for db in mission-control hivemind tasks costs events fiken invoices contracts leads \
          orchestrator-queue orchestrator-workers durable-runner session-index knowledge \
          emails email-inbox alem-directives agent-routing bee-index companies contacts \
          deploy-registry design-reviews distill documents drafts drift email-audit \
          email-briefing email-index email-tracking escalations facts flywheel goals \
          guardrails-audit health-events hivemind-archive master-control mc minions \
          observability orchestrator-events pipeline projects routing-outcomes skill-improvements \
          skill-registry sprint-pipeline strategy-tracker teams tenders tickets tool-audit \
          tool-registry trace-events applications-tracker baikal-caldav prompt-cache \
          prompt-metrics semantic-reuse-index stbs telemetry token-cost usage vcr bih-tenders browser-tasks; do
  echo "Restoring $db..."
  litestream restore -config ~/system/config/litestream.yml -if-replica-exists ~/system/databases/$db.db || echo "WARN: $db restore failed or skipped"
done

Verify restores:

ls -lh ~/system/databases/*.db | wc -l
# Should show: 68 (or close, depending on which DBs had replicas)

# Check specific DB integrity:
sqlite3 ~/system/databases/mission-control.db "PRAGMA integrity_check;"
# Should output: ok

Bitwarden Items Required

The following Bitwarden vault items MUST exist in Alem's vault before running bootstrap. These are referenced as BW:<item> placeholders in LaunchAgent plists:

Item Name	Used By	Purpose
`alai-backup-writer-secret`	litestream, Azure backups	Azure SP client secret for Storage Blob write access
`cf-access-client-secret`	BookStack sync, CF-protected APIs	Cloudflare Access client secret for docs.basicconsulting.no
`groq-api-key`	Groq model benchmark daemon	Groq API key for LLM model testing
`slack-app-token`	Slack integration	Slack app-level token for socket mode
`slack-bot-token`	Slack integration	Slack bot user OAuth token (xoxb-...)

How to verify items exist:

bw get item alai-backup-writer-secret --session $BW_SESSION
bw get item cf-access-client-secret --session $BW_SESSION
bw get item groq-api-key --session $BW_SESSION
bw get item slack-app-token --session $BW_SESSION
bw get item slack-bot-token --session $BW_SESSION

If missing: Contact Alem or check Vaultwarden (https://vault.basicconsulting.no) for backup credentials. These secrets are also in ANVIL's Keychain if ANVIL is still accessible.

Post-Bootstrap Verification

10. Check LaunchAgents loaded

launchctl list | grep -E "com\.(alai|john)\." | wc -l
# Expected: ~110-112 (depending on role)

11. Verify Ollama running

curl -s http://localhost:11434/api/tags | jq -r '.models[].name'
# Expected (ANVIL): qwen2.5-coder:32b, llama3.3, deepseek-r1, etc.

12. Verify litestream replication

pgrep -fl litestream
# Should show: litestream replicate -config /Users/makinja/system/config/litestream.yml

# Check logs:
tail -f ~/system/logs/litestream.log
# Should show periodic sync messages (every 1-30s depending on DB tier)

13. Test Mission Control

node ~/system/tools/mc.js stats
# Should show task counts, agents, recent activity

node ~/system/tools/mc.js list --limit 5
# Should show recent tasks

14. Test SSH to original ANVIL (if still alive)

ssh makinja@100.103.49.98 "hostname && uptime"
# Expected: ANVIL + uptime if machine is reachable

Troubleshooting

Error: "brew: command not found" after install

Cause: Homebrew not in PATH.

Fix:

eval "$(/opt/homebrew/bin/brew shellenv)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile

Error: "bw: command not found"

Cause: Bitwarden CLI not installed or not in PATH.

Fix:

brew install bitwarden-cli
hash -r  # Refresh shell PATH cache

LaunchAgent fails to load

Symptoms: launchctl bootstrap returns error code 119, 122, or 125.

Debug:

# Check specific agent status:
launchctl print gui/$(id -u)/com.alai.litestream
# Look for "state = waiting" or "last exit code"

# Check agent logs:
tail -f ~/system/logs/litestream.log
tail -f ~/Library/Logs/com.alai.*.log

Common exit codes:

119: Invalid plist XML (malformed after sed replacement)
122: Path not found (e.g., /Users/makinja hardcoded but new user is /Users/alem)
125: Permission denied (env var secret not readable)

Secret rehydration failed

Symptoms: Bootstrap log shows "WARN: Bitwarden item 'X' not found".

Fix manually:

# Get secret from Bitwarden:
SECRET=$(bw get password groq-api-key --session $BW_SESSION)

# Edit plist:
vi ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist

# Replace BW:groq-api-key with actual value in <string> tag

# Reload:
launchctl bootout gui/$(id -u)/com.alai.groq-model-benchmark
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.alai.groq-model-benchmark.plist

Hardcoded /Users/makinja path mismatch

Cause: LaunchAgent plists contain hardcoded paths to /Users/makinja, but new Mac has different username (e.g., /Users/alem).

Fix (bulk replace):

NEW_USER=$(whoami)
cd ~/Library/LaunchAgents

for plist in com.alai.*.plist com.john.*.plist; do
  sed -i.bak "s|/Users/makinja|/Users/$NEW_USER|g" "$plist"
done

# Reload all:
launchctl bootout gui/$(id -u)
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/*.plist

Ollama models missing

Cause: Fresh install has no models cached. Models are NOT in git repos (too large).

Fix (pull from Ollama registry):

ollama pull qwen2.5-coder:32b
ollama pull llama3.3:70b
ollama pull deepseek-r1:32b
ollama pull deepseek-r1:70b
ollama pull devstral:24b
ollama pull mistral-small
ollama pull llama3.2-vision:90b
ollama pull qwq:32b

# Verify:
ollama list

Expected download size: ~150 GB total for all models. This takes 2-6 hours on good connection.

Database restore fails with "replica not found"

Cause: Azure credentials invalid, or DB was never replicated (new DB created after litestream setup).

Debug:

# Test Azure auth:
az login --service-principal \
  --username $AZURE_CLIENT_ID \
  --password $AZURE_CLIENT_SECRET \
  --tenant $AZURE_TENANT_ID

# List backups:
litestream snapshots -config ~/system/config/litestream.yml ~/system/databases/mission-control.db

# Should show timestamps of snapshots in Azure Blob Storage

If no snapshots: DB is new or replication was broken. Accept data loss or restore from other source (e.g., Time Machine if on ANVIL).

Known Limitations

Untested end-to-end: bootstrap.sh has NOT been tested on a completely fresh Mac. Code paths for Xcode install prompt, Homebrew first-run, and BW unlock flow are based on best practices but unverified in production DR scenario.
User rename not handled: If new Mac username != "makinja", LaunchAgent plists will fail due to hardcoded /Users/makinja paths. Manual sed replacement required (see Troubleshooting).
npm install layer incomplete: ~/system/tools/ contains 1,310 scripts, some requiring npm install in subdirs. bootstrap.sh does NOT auto-install these deps. Expect some tools to fail until deps are installed manually.
Ollama models not in backup: Models are fetched from Ollama registry on first use. Expect 2-6 hour delay to repopulate model cache (~150 GB).
GitHub auth assumed: Script assumes SSH keys or PAT for GitHub already configured. If not, git clone will prompt interactively.
No Keychain sync: macOS Keychain items (SSH keys, app passwords, etc.) are NOT part of this backup. Alem must re-enter credentials for Mail.app, Calendar, etc.
No ~/felles or ~/Documents: User data directories are NOT backed up by this system. Rely on Time Machine or iCloud for personal files.

Testing Recommendations

Before trusting this runbook in a real disaster:

Spin up a fresh Mac VM (UTM or Parallels) with macOS Sonoma
Run through Steps 1-6 end-to-end without looking at ANVIL
Verify LaunchAgent load count matches expected (~112)
Verify DB restore works for at least mission-control.db and hivemind.db
Document any new errors or missing secrets in this runbook

Assigned to: Petter Graff (CodeCraft) — MC task #8534

Last updated: 2026-04-20 | MC Task: #8534 | Tags: status=draft-untested, type=runbook, severity=critical

Incident — 2026-04-21 alai.no Contact Form Failure

2026-04-21 — alai.no Contact Form Silent Failure

Incident Classification

Severity: HIGH — Silent data loss (potential lead loss)
Duration: 2026-04-19 19:00 → 2026-04-21 11:30 (40.5 hours)
Detection: Manual inspection via Himalaya IMAP client
Status: RESOLVED (form handler redeployed to CF Pages Functions)

Timeline

2026-04-19 19:00 — alai.no migrated from Vercel to Cloudflare Pages (MC #8576)
2026-04-19 19:00 → 2026-04-21 11:30 — Contact form submissions received HTTP 200 OK but no emails delivered
2026-04-21 11:30 — CEO (Alem) noticed no inquiry emails received in days, requested investigation
2026-04-21 11:35 — John inspected info@alai.no IMAP (via himalaya search --folder INBOX from:noreply) — zero messages from contact form
2026-04-21 11:45 — Root cause identified: CF Tunnel routing hijack + documenso-webhook false-positive response
2026-04-21 12:15 — CodeCraft dispatched to deploy dedicated contact handler as CF Pages Function (MC #8587)
2026-04-21 14:00 — Fix deployed and verified (E2E browser test + inbox check)

Impact Assessment

Lost inquiries: Unknown (no form submission logging). Estimated 0-5 potential leads during 40-hour window.
User experience: Users received "success" feedback but no confirmation email. No error notification.
Business risk: Medium — alai.no is not yet primary sales channel; minimal active marketing campaigns during incident window.

Root Cause Analysis

Technical Chain of Failure

alai.no contact form POSTs to https://api.basicconsulting.no/contact (hardcoded Vercel pattern from pre-migration code)
Cloudflare Tunnel ingress rule matches api.basicconsulting.no/* → routes ALL POST requests to localhost:3001
documenso-webhook.js listens on port 3001, designed for Documenso signature events
Webhook handler has catch-all route: app.post('/*', (req, res) => res.json({ok: true}))
Contact form receives HTTP 200 + {ok: true} → assumes success, displays "Message sent"
No email handler ever invoked → no SMTP call → no delivery

Root Cause Categories

Architectural: Assumed serverless runtime (Vercel Functions) but deployed to static hosting (CF Pages) without serverless equivalent
Migration process: No pre-deployment checklist for "dynamic endpoints" (forms, APIs, webhooks)
Testing gap: No E2E validation of email delivery — only HTTP response validated (curl 200 != email delivered)
Monitoring gap: No alerting on zero-message rate for info@alai.no INBOX (expected rate: ~1-3/week)

Detection Method

Manual IMAP inspection using Himalaya CLI:

himalaya search --account info@alai.no --folder INBOX "from:noreply" "since:2026-04-19"
# Result: No messages found

Lesson: HTTP 200 is NOT proof of delivery. Always verify end-to-end (inbox check, log inspection, user confirmation email).

Fix Summary

CodeCraft deployed /functions/contact.js as CF Pages Function
Handler uses Resend API (RESEND_API_KEY in Bitwarden → CF Pages env vars)
Form target updated to https://alai.no/api/contact (CF Pages Functions route: /functions/ → /api/)
Proveo validated: submit test form → received at info@alai.no within 5 seconds

MC Task: #8587

Lessons Learned

What Went Well

CEO noticed absence of expected emails (operational intuition)
Himalaya CLI provided rapid IMAP audit without browser login
Root cause identified within 15 minutes of investigation start

What Went Wrong

Migration checklist did NOT include "verify all POST endpoints have backend handlers"
No E2E test protocol for forms (HTTP 200 assumed sufficient)
No monitoring/alerting on email delivery rates (silent failure undetected for 40 hours)
Cloudflare Tunnel routing too broad (/* catch-all dangerous for multi-service proxy)

Prevention Actions

Action	Owner	MC Task	Status
Update site migration checklist: "Verify form handlers migrated"	Skillforge	#8587	DONE (this doc)
Create Forms E2E Testing Protocol (HTTP + inbox check required)	Skillforge	#8587	DONE (BookStack QA section)
Add Grafana alert: `info@alai.no` message rate < 1/week → notify #ops	FlowForge	#8588	OPEN
Audit all CF Tunnel ingress rules for overly-broad `/*` patterns	Securion	#8589	OPEN
Migrate snowit.ba contact form (same silent failure risk)	CodeCraft	#8591	OPEN
Add form submission logging to all contact handlers (track volume even if email fails)	CodeCraft	#8592	OPEN

snowit.ba contact form: Same root cause (Vercel pattern, no CF Pages handler). Bouncing to info@snowit.ba (LumisCare side, not ALAI). MC #8591 tracks.
getdrop.no waitlist: Already migrated correctly (CF Pages Function + D1 storage). No issue.

References

Email Pipeline Runbook
Forms E2E Testing Protocol (new)
Static Hosting Migration — Progress Log
Himalaya setup: ~/.config/himalaya/config.toml (info@alai.no IMAP credentials in Bitwarden)

Authored: 2026-04-21 | Owner: Skillforge | Reviewed: John

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Date: 2026-04-22
Severity: High (CEO time wasted + security leak)
Status: Resolved
Type: Blameless Postmortem

Summary

A 2-hour bug fix sprint (MC tasks #8626, #8627, #8628) aimed at fixing 3 bugs in Bilko demo resulted in ZERO live changes reaching the production demo URL (bilko-demo.alai.no). All code changes were pushed to the wrong branch (feat/intesa-bih-demo instead of main), CI pipeline was silently broken for 7 days, and client-specific content (Intesa BiH pitch) leaked to the public demo URL.

Timeline (UTC+1)

Time	Event	Actor
2026-04-21 13:32	MC #8626 created (invoice template save button broken)	John
2026-04-21 13:33	MC #8627 created (invoice PDF download fails on unsaved invoice)	John
2026-04-21 13:33	MC #8628 created (settings logo upload missing)	John
2026-04-21 13:46	All 3 tasks marked ready_for_review (commit d408cc6 + 53fe1d6)	Brad Frost (Vizu)
2026-04-22 09:00	CEO: "Bilko demo nije updatan, bugs jos uvijek tu"	Alem
2026-04-22 09:10	Discovery: All fixes pushed to feat/intesa-bih-demo (no CI on that branch)	John
2026-04-22 09:15	Verification via curl + git log: main unchanged, bilko-demo.alai.no serving old code	John
2026-04-22 09:36	MC #8678 created: /intesa-bridge leak discovered (HTTP 200 on public demo)	John
2026-04-22 10:00	CI investigation: Last 5 runs all failed (since 2026-04-15)	Kelsey (FlowForge)
2026-04-22 10:36	MC #8696 created: ZAKON PI2 Deploy Verification Protocol	John
2026-04-22 12:00	Manual deploy attempt: GitHub PAT missing workflow scope (can't trigger CI fix)	FlowForge
2026-04-22 12:50	Manual docker build + push (CEO hands off to FlowForge)	Alem + FlowForge
2026-04-22 21:41	MC #8730 done: fix-bugs-22apr deployed, all 4 evidence checks pass	FlowForge
2026-04-22 21:50	MC #8678 code fix pushed (66d2220): intesa routes deleted from main	Brad Frost

Impact

User-Facing

Bilko demo bugs: Persisted for 1 extra day (low severity — internal demo, no external users)
Intesa content leak: Unknown duration (potentially days) — BiH bank integration pitch content publicly accessible at /intesa-bridge on bilko-demo.alai.no

Internal

CEO time lost: ~2 hours (debugging + manual deploy)
Trust erosion: "Validacija ne radi" feedback — John claimed done without verifying live state
CI health invisible: 7 days of broken deploys undetected

Root Causes (5 Failures)

1. Branch Assumption (No Pre-Flight Verification)

What happened: John inferred target branch from memory (assumed feat/intesa-bih-demo based on last session), dispatched builder without running curl -sI + git log to verify which branch serves bilko-demo.alai.no.

Why it matters: Wrong branch = wrong deploy target. All fixes landed on isolated feature branch with no CI and no domain mapping.

Prevention: ZAKON PI2 Check 2 — 4 pre-flight commands mandatory BEFORE code changes.

2. CI Broken for 7 Days Undetected

What happened: GitHub Actions workflow failing since 2026-04-15. No one noticed because:

No daily CI health check in boot.sh
Manual deploys used as workaround without logging CI status
gh run list not part of standard deploy checklist

Root cause:

GitHub Actions quota exhausted (monthly minutes limit)
--no-traffic flag on line 206 of gcp-deploy.yml prevents traffic promotion on existing services

Prevention: ZAKON PI2 Check 4 — gh run list --limit 5 before any push. If 5/5 = failure, STOP and fix CI first.

3. Intesa Content Leaked to Public URL

What happened: Commit 13c2efb merged /intesa-bridge and /intesa-cockpit routes to main branch. These were pitch-specific features for Dženana Hardaga (Intesa BiH IT director) and should have remained isolated on bilko-intesa-demo Cloud Run service.

Why it matters: Client-specific content (including BiH bank integration mockups) publicly visible on generic demo. Potential NDA violation + confusing UX for non-Intesa visitors.

Prevention:

ZAKON PI2 Check 3 — Branch Purity CI check (.github/workflows/branch-purity.yml)
Client prefix registry in ~/system/rules/client-prefix-registry.md
Automated check blocks PR merge if intesa-*, corpint-*, etc. routes detected on main

4. PAT Missing `workflow` Scope

What happened: GitHub Personal Access Token used for CI fixes lacked workflow scope. FlowForge couldn't push branch-purity.yml or fix gcp-deploy.yml via automation.

Why it matters: Blocked automated CI repair. Forced manual workarounds + CEO paste-copy anti-pattern.

Prevention: ZAKON PI2 Check 6 — gh auth status --show-token at session start. Verify repo, workflow, packages:write scopes present.

5. Manual Paste-Copy Anti-Pattern

What happened: CEO built docker image locally, pasted output to John, who pasted to FlowForge agent. FlowForge took over from "image already built" state instead of owning full build→push→deploy flow.

Why it matters: Process fragmentation = more failure points. Agent can't verify build context, dockerfile, or .dockerignore changes if it didn't run the build.

Prevention: Always dispatch FlowForge BEFORE build step. Agent owns entire flow or none of it.

What Went Well

Kelsey persona diagnosis: FlowForge correctly identified --no-traffic flag as root cause within 10 minutes of investigation
ZAKON PI2 authored mid-incident: Turned incident into system improvement without waiting for postmortem
.dockerignore fix: Reduced build context from 4.1GB → 50MB (8200% improvement) during incident resolution
Evidence gate upheld: MC #8730 not marked done until curl + Playwright + revision checks passed
Blameless culture: No punishment for agents; root cause analysis focused on system gaps

Action Items

Action	Owner	MC Task	Deadline	Status
Sync ZAKON PI2 to BookStack	pi-orchestrator	#8718	2026-04-23	PAUSED
Create DEPLOY-MAP.md in Bilko repo	Skillforge	#8715	2026-04-23	DONE
Bake PI2 checks into pi-orchestrator v2	pi-orchestrator	#8696 (item 3)	2026-04-29	IN PROGRESS
Add pre-deploy hook (~/.claude/hooks/pre-deploy-check.sh)	pi-orchestrator	#8696 (item 4)	2026-04-29	DONE
Patch mc.js done with evidence gate for H-priority deploy tasks	pi-orchestrator	#8696 (item 5)	2026-04-29	DONE
Create client-prefix-registry.md	pi-orchestrator	#8696 (item 7)	2026-04-29	DONE
Fix GitHub Actions quota (upgrade plan or optimize workflows)	John	TBD	2026-05-01	OPEN
Remove --no-traffic flag from gcp-deploy.yml for existing services	FlowForge	TBD	2026-04-30	OPEN
Upgrade GitHub PAT with workflow scope	John	TBD	2026-04-25	OPEN
Weekly CEO audit of mc.js --ceo-override usage	John	#8696 (item 8)	Ongoing	OPEN

Lessons Learned

For John (Orchestrator)

Never infer deploy target from memory. Always run curl + git log + gh run list before dispatching builder.
CI health = system health. Broken CI for 7 days = broken deployment capability. Monitor actively.
Claim verification: "Task done" without live URL verification = hallucination. CEO was right: "validacija ne radi."

For Builder Agents (Brad Frost, Vizu)

Ready for review ≠ deployed. Code pushed to branch ≠ code live on target URL. Always verify deploy target match.
Client-specific routes: If building intesa-*, corpint-*, etc. — verify target branch is NOT main before merging.

For FlowForge (DevOps)

Own the full flow. If dispatched for deploy, own build→push→deploy→verify. Don't take over mid-stream from CEO paste-copy.
--no-traffic flag: Only use on first-ever deploy. Never on existing services (blocks traffic promotion).

System-Level

ZAKON PI2 works. All 5 root causes preventable with 6 hard checks. Enforce at agent level + hook level + MC gate level.
Evidence gates prevent false claims. mc.js enforcement (item 5 of #8696) blocks "done" without verification.json.
Blameless postmortems → system rules. This incident produced ZAKON PI2, DEPLOY-MAP.md standard, and client-prefix-registry. Net positive.

ZAKON PI2: ~/system/rules/zakon-pi2-deploy-verification.md (BookStack synced)
Client Prefix Registry: ~/system/rules/client-prefix-registry.md
Pre-Deploy Hook: ~/.claude/hooks/pre-deploy-check.sh
Feedback Log: ~/.claude/projects/-Users-makinja/memory/feedback_verify_deploy_target_before_code.md

Metrics

Incident duration: 32 hours (2026-04-21 13:46 → 2026-04-22 21:41)
CEO time lost: ~2 hours
Root causes identified: 5
New rules created: 4
MC tasks spawned: 10 (parent #8696 + 7 subtasks + 3 original bugs)
Lines of ZAKON PI2: 136
Evidence files generated: 11 (verification.json + 4 PNG + 6 TXT)

Follow-Up

Next review: 2026-04-29 (PI2 implementation deadline)
Owner: John
Success criteria: All 8 items in MC #8696 marked done + CI health green for 7 consecutive days

Postmortem by ALAI Skillforge, 2026-04-22
Credit: ALAI, 2026

pi-orchestrator: john H-task auto-pause root cause + #10063 reconcile

Created: 2026-05-02
MC References: #10063 (phantom fix), #10517 (true fix)
Daemon: com.john.pi-orchestrator (currently STOPPED, reactivation pending CEO Step 3)

Symptom

John's H-priority tasks were being auto-paused without user action. The pi-orchestrator daemon would intercept high-priority john tasks and route them through queueForHuman instead of executing them, creating a silent work-stoppage pattern.

Investigation Finding — Phantom Fix in MC #10063

MC #10063 (2026-04-XX) claimed to fix the auto-pause behavior by adding configuration flags:

skip_interactive_owners: ["john", "alem"]
interactive_grace_seconds: 300

Problem: These config keys were specified in the task's acceptance criteria and marked COMPLETE, but were never actually written to ~/system/config/pi-orchestrator-config.json.

Anti-pattern identified: "Proveo PASS but code doesn't match documentation" — the validation passed based on spec intent rather than verifying actual configuration state.

True Root Cause

The mechanism actually auto-pausing john H-tasks was a dead fallback block in ~/system/kernel/pi-orchestrator.js:

// Original lines 3409-3421 (13 lines, now removed)
if (!selectedTask) {
  // Fallback: check for john tasks
  const johnTask = execSync(
    'node ~/system/tools/mc.js next-task --owner john',
    { encoding: 'utf8' }
  ).trim();
  
  if (johnTask) {
    queueForHuman(johnTask);
    return null;
  }
}

When task selection failed (empty queue or filter mismatch), this fallback would:

Synchronously fetch the next john task via mc.js next-task --owner john
Queue it for human review via queueForHuman()
Return null, preventing execution

This created the observed auto-pause behavior regardless of the missing config flags.

Fix Applied — MC #10517

Date: 2026-05-02
Builder: Codecraft
Validator: Proveo

Changes:

Configuration reconciliation — Added missing flags to ~/system/config/pi-orchestrator-config.json at lines 93-94:
```
"skip_interactive_owners": ["john", "alem"],
"interactive_grace_seconds": 300
```

Dead fallback removal — Replaced the 13-line execSync fallback block in ~/system/kernel/pi-orchestrator.js (original lines 3409-3421) with a 4-line comment + null return:

// No fallback to john tasks — auto-pause removed per MC #10517.
// Configuration now controls interactive routing via skip_interactive_owners.
log('No task selected; returning null.');
return null;

Verification

Proveo validation: APPROVED 2026-05-02
Acceptance Criteria: 4/4 PASS

AC1: pi-orchestrator-config.json contains skip_interactive_owners: ["john", "alem"] ✅
AC2: pi-orchestrator-config.json contains interactive_grace_seconds: 300 ✅
AC3: Dead fallback block removed from pi-orchestrator.js (lines 3409-3421 replaced) ✅
AC4: No execSync call to mc.js next-task --owner john in the selection path ✅

Evidence:

Config diff: git diff ~/system/config/pi-orchestrator-config.json
Code diff: git diff ~/system/kernel/pi-orchestrator.js
No remaining queueForHuman calls in fallback path: grep -n "queueForHuman" ~/system/kernel/pi-orchestrator.js shows only intentional usage in interactive routing logic

Daemon State

Current state: com.john.pi-orchestrator is STOPPED (unloaded via launchctl unload).

Reactivation: Pending CEO Step 3 directive. DO NOT restart daemon until explicitly approved — this is part of a phased rollout to validate the fix does not introduce regression.

To check status:

launchctl list | grep pi-orchestrator
# Empty output = daemon not loaded

To restart (when authorized):

launchctl load ~/Library/LaunchAgents/com.john.pi-orchestrator.plist
tail -f ~/system/logs/pi-orchestrator.log

Cross-References

MC #10063: Original task claiming fix (phantom — config never written, Proveo validated spec not state)
MC #10517: True fix reconciling config + removing dead fallback (Proveo APPROVED 2026-05-02)
Related pattern: Feedback memo feedback_task_description_state_verify.md — agents must tool-verify state before writing it into MC descriptions or acceptance criteria

Lessons

Proveo must verify actual state, not spec intent. A config flag in the task description ≠ the flag exists in the file.
Dead code can be the true mechanism. The "fix" in #10063 was irrelevant because the real culprit was a fallback block that ran regardless of config.
Daemon restart ≠ verification. Stopping the daemon masked the symptom but didn't prove the fix. Reactivation under observation is the true test.

Generated by Skillforge for MC #10517 documentation deliverable. HiveMind sync pending.

Pi-Orchestrator: GOTCHA Fabrication Removal (MC #10549)

Context

Problem: Pi-orchestrator was auto-generating GOTCHA docs at two sites, bypassing ZAKON #25 quality gate (H/BLOCKER → /prompt-forge → /mehanik). Pi-orch is NOT the authority for /prompt-forge work.

The Two Sites Removed

Site 1: Pre-Spawn Auto-Gen (Step 4.55)

Fabricated GOTCHA before spawn so spawn-gate would permit task dispatch
Violated /prompt-forge exclusivity for H/BLOCKER tasks
REMOVED

Site 2: Post-Spawn Synthesis

Fabricated GOTCHA after agent ran, based on proof-of-work artifacts
Papered over agent omissions (agent's failure → pi-orch's rescue)
Rationale for removal: Agent omission IS agent failure; pi-orch should not mask it
REMOVED

Replacement Behavior

GOTCHA Missing Pre-Spawn

mc.js blocks task with reason: "awaiting_forge: GOTCHA doc missing — run /prompt-forge {id} first, then unblock"
Task stays blocked until human review unblocks

GOTCHA Missing Post-Spawn

mc.js blocks task with reason: "agent omitted GOTCHA file — needs /prompt-forge and human review"
Task stays blocked until human review unblocks

Status Note

mc.js does NOT have awaiting_forge as first-class status — used blocked with reason-prefixed text. Future enhancement: add awaiting_forge status (track in separate MC if scope warrants).

Current State

Daemon STOPPED
Code lands cold
No production behavior change yet

Test Plan

7 tests at ~/system/tests/pi-orch-await-forge.test.js
23 regression tests at ~/system/tests/spawn-gate.test.js
Run: node --test ~/system/tests/pi-orch-await-forge.test.js

Change Genesis

Pi-orch hardening Talas 3 (parent thread #10043 reform)
Depends on α #10548 (Spawn Gate Node-Side Parity)

Cross-Reference

Last updated: 2026-05-04 | Part of pi-orch hardening Talas 3

Email-Agent Ingest Gap Postmortem (2026-05-23) — MC #101887

TL;DR

Email-agent.js silently dropped SEEN-flagged messages for 9+ days (2026-05-14 → 2026-05-23) due to HIMALAYA_DISABLED=1 forcing a fallback code path that filtered { seen: false }. This caused 17 missed messages across 5 accounts, including 2 paying-client-class emails (Asmir Merdžanović SEO work, cynthia.li medical contact). Fixed by replacing SEEN filter with date-range + DB dedup. Backfilled all missed messages, added audit tool, deployed hourly monitoring LaunchAgent.

Incident Timeline (UTC)

2026-05-14 → Newest alai/INBOX DB row before gap
2026-05-23 13:26 → Asmir Merdžanović email arrives at alai/INBOX uid=6, server already flags SEEN
2026-05-23 18:49 (CEST 20:49) → John boot detects DB:0 IMAP:1 gap during inbox-pending sweep
2026-05-23 ~21:00 → MC #101887 created, gate cleared, ST1-ST4 dispatched
2026-05-23 ~21:22 → ST3 backfill complete, 17 messages ingested
2026-05-23 ~21:26 → ST6 (this documentation) initiated

Root Cause

File: /Users/makinja/system/daemons/email-agent.js

Original code (lines 638-644, pre-fix): The fetchUnseenLegacy function used { seen: false } as its IMAP fetch filter, which translates to an IMAP SEARCH UNSEEN query. Any message already flagged \Seen on the server (e.g., by mobile client, webmail, or Outlook auto-marking) was invisible to this query.

const messages = client.fetch(
  { seen: false },  // ← PROBLEM: excludes SEEN messages
  { uid: true, envelope: true }
);

Trigger chain:

LaunchAgent plist /Users/makinja/Library/LaunchAgents/com.john.email-agent.plist sets HIMALAYA_DISABLED=1 as hard environment variable
This forces all accounts to fall back to fetchUnseenLegacy instead of the safer fetchAllRecent path (which was introduced in MC #6832 to solve exactly this class of problem)
When alem@alai.no is also accessed via mobile/web client, incoming messages are auto-flagged \Seen before daemon's next 5-minute cycle
Daemon runs every 5 minutes, sees 0 unseen, logs "alai: 0 unseen envelopes fetched", and continues — no alarm, no visibility

Why it went undetected: The daemon logs showed normal execution (no errors, no timeouts), just consistently 0 results for the alai account. The pattern looked like "no new email" rather than "email silently dropped."

Fixed code (lines 638-684, post-fix): Replaced { seen: false } with date-range filter { since: } + DB deduplication by UID set lookup:

// MC #101887 fix: SEEN filter caused 9-day gap. Switched to date-range + DB dedup.
const lookbackDays = parseInt(process.env.EMAIL_AGENT_LOOKBACK_DAYS || '7', 10);
const sinceDate = new Date(Date.now() - lookbackDays * 24 * 60 * 60 * 1000);

// Load existing UIDs for this account from DB to enable dedup
const db = emailInbox.getDb();
const existingUids = new Set(
  db.prepare("SELECT message_id FROM emails WHERE account = ?").all(boxLabel).map(r => {
    const m = r.message_id.match(/-uid-(\d+)$/);
    return m ? parseInt(m[1], 10) : null;
  }).filter(Boolean)
);

// Fetch envelopes only — date-range avoids SEEN-flag blind spot
const messages = client.fetch(
  { since: sinceDate },  // ← FIX: fetch all messages in date range
  { uid: true, envelope: true }
);

for await (const msg of messages) {
  // Dedup: skip if UID already in DB
  if (existingUids.has(msg.uid)) continue;
  // ... insert logic
}

Impact Assessment

Total missed: 17 messages across 5 accounts in 30-day lookback window
Paying-client-class misses:
- Asmir Merdžanović (asmirmc@gmail.com) — "Potrebne informacije." re: 2 new SEO clients (alai/INBOX uid=6, john/INBOX uid=134)
- cynthia.li@jamrmed.com (Shenzhen Jamr Medical) — "New contact-Shenzhen Jamr" (john/INBOX uid=114)
Informational/system misses: 13+ messages including Google Cloud alerts, TLDR newsletters, GitHub notifications, Cloudflare alerts
Duration:
- alai account: 9 days (2026-05-14 → 2026-05-23)
- alem account: 11+ days (2026-05-13 → ongoing, separate IMAP connection failure)
Accounts affected: alai (1 missed), dev (3 missed), john (13 missed); info/alem had no IMAP-side new messages in window (alem broken for separate reason)

Fix Applied

Code fix: ~/system/daemons/email-agent.js lines 638-725 — replaced { seen: false } with { since: } + DB dedup via UID set lookup (idempotent, safe for overlapping runs)
Backfill: 17 missed messages ingested via ~/system/tools/email-backfill-from-audit.js — used audit JSON as source of truth, patched subject/from metadata in 14 cases where IMAP envelope fetch failed (tool is idempotent, safe to re-run)
New audit tool: ~/system/tools/email-imap-db-audit.js — enumerates IMAP UIDs vs DB UIDs per account+folder for configurable N-day window, outputs JSON diff with missed UID samples
Monitoring LaunchAgent: ~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist + wrapper ~/system/tools/email-ingest-monitor.sh — runs hourly, executes audit tool, fires Slack #exec alarm when total_missed > 0

Remaining Open Items (NOT yet fixed)

alem@alai.no IMAP connection broken since 2026-05-13 — credentials load OK from Vault, but server rejects connection with "Command failed" (no detailed error exposed by ImapFlow). Needs separate MC task for IMAP diagnostics + credential rotation test.
Monitor LaunchAgent NOT auto-loaded — file exists at correct path, but launchctl does not auto-load new plists without manual intervention. CEO must run: launchctl load -w ~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist (permission constraint, cannot be automated without sudo/TCC access).
HIMALAYA_DISABLED env flag still active in com.john.email-agent.plist — the fix made fetchUnseenLegacy safe, but ideally the himalaya path should be vetted and re-enabled to reduce IMAP connection load.
3 john/INBOX uids (61, 69, 71) backfilled with placeholder metadata — IMAP fetchOne returned "Command failed" for envelope fetch, so subject/from are "(no subject)" / empty. These need separate IMAP range-fetch backfill to recover actual metadata from server.

Reproduction / Detection Commands

# Detect the gap
node ~/system/tools/email-imap-db-audit.js
cat /tmp/alai/email-ingest-gap/imap-db-diff-30d.json | jq .summary

# Trigger monitor manually
launchctl kickstart -k gui/$(id -u)/com.alai.email-ingest-monitor

# Re-run backfill (idempotent)
node ~/system/tools/email-backfill-from-audit.js

# Check daemon status
launchctl list | grep email
tail -100 ~/system/logs/email-agent.log

# Test audit in verbose mode
node ~/system/tools/email-imap-db-audit.js --verbose

Lessons / Preventive Actions

Silent skips are P0: Any code path that filters IMAP results without an alarm when count drops to 0 unexpectedly = future incident. The daemon should have emitted a warning when alai account returned 0 unseen for >7 consecutive cycles (35+ minutes) given its historical delivery rate.
SEEN flag is not under our control: Any mobile/web client can pre-read messages and set \Seen before the daemon polls. The ingest pipeline must not assume UNSEEN = unread-by-us. Date-range + DB dedup is the only reliable pattern.
Audit > trust: ST2 audit revealed a 2nd unrelated paying-client miss (cynthia.li) we wouldn't have known about without full IMAP-vs-DB enumeration. Periodic audits should be part of email-agent health checks.
Fallback paths are production code: The fetchUnseenLegacy path was treated as a temporary fallback but ran in production for weeks/months with HIMALAYA_DISABLED=1. All fallback paths must have equal quality gates (logging, alarms, safety checks) as primary paths.
Monitoring must be fail-closed: The new monitor LaunchAgent is valuable, but it's not yet loaded (manual step required). For future daemons, the deploy checklist must verify LaunchAgent is loaded AND firing test alarms.

MC: #101887 (this fix), supersedes #101886
Triggering email evidence: /tmp/alai/john-boot-20260523T1441/asmir-search.log
RCA: /tmp/alai/email-ingest-gap/root-cause.md
Audit JSON: /tmp/alai/email-ingest-gap/imap-db-diff-30d.json
Backfill log: /tmp/alai/email-ingest-gap/backfill-run.log
Monitor runs: /tmp/alai/email-ingest-gap/monitor-runs.log
Code fix: ~/system/daemons/email-agent.js lines 638-725
Tools created:
- ~/system/tools/email-imap-db-audit.js (audit)
- ~/system/tools/email-backfill-from-audit.js (backfill)
- ~/system/tools/email-ingest-monitor.sh (monitor wrapper)
LaunchAgent: ~/Library/LaunchAgents/com.alai.email-ingest-monitor.plist

Technical Details

Missed Messages Breakdown (30-day window, all accounts)

Account	Folder	Missed Count	Sample UIDs	Notes
alai	INBOX	1	6	Asmir email re: SEO clients
dev	INBOX	3	4, 7, 11	Google Cloud Logging alerts
john	INBOX	13	61, 69, 71, 72, 79, 80, 82, 83, 88, 99, 102, 114, 134	Mix: GitHub, TLDR, Cloudflare, cynthia.li, Asmir
info	INBOX	0	—	No new IMAP messages in window
alem	INBOX	N/A	—	IMAP connection broken, cannot audit

Backfill Execution Summary

Total inserted: 17 (first run)
Total patched: 14 (second run — corrected subject/from metadata)
Total skipped: 3 (UIDs 61, 69, 71 had no audit sample metadata, kept placeholder)
Tool runs: 3 (idempotent, each run refined metadata)

Monitor Configuration

LaunchAgent: com.alai.email-ingest-monitor

Schedule: Hourly (StartCalendarInterval)
Command: ~/system/tools/email-ingest-monitor.sh
Output: ~/system/logs/email-ingest-monitor.log
Alarm channel: Slack #exec
Trigger condition: total_missed > 0 in audit JSON
Status: Plist exists, NOT loaded (manual load required)

Sign-off

Documented by: Skillforge (ALAI agent)

Date: 2026-05-23

MC Task: #101887 ST6

Status: Fix deployed, backfill complete, monitoring deployed (pending manual load)

ALAI Mail Topology — Migadu Domains, Mailbox Inventory, John's 19-Account Ingest Loop (2026-06-08)

ALAI Mail Topology & John's Email Ingest Loop

Last updated: 2026-06-08 (v2 — 19 accounts, daemon-path docs, himalaya touch-points) | MC: #103182 | Built by: FlowForge | Validated by: Proveo (Angie Jones) — PASS

1. Mail Infrastructure — Migadu (Single Account)

All ALAI product domains are hosted on one Migadu account. MX records for every domain point to the same two servers:

aspmx1.migadu.com (priority 10)
aspmx2.migadu.com (priority 20)

Domains on this account: alai.no, bilko.io, bilko.cloud, bilko.company, snowit.ba, basicconsulting.no, basicfakta.no, lumiscare.com

Migadu Admin Access

Item	Value / Location
Admin login	`alem@alai.no`
API key	Vaultwarden item "migadu keyy" (86-char token — do NOT print)
IMAP host	`imap.migadu.com`
SMTP host	`smtp.migadu.com`
Web UI	https://admin.migadu.com

Migadu API Quirks (DO NOT FORGET)

GET aliases — response key is address_aliases, not aliases.
Create alias — must send JSON body {"local_part": "...", "destinations": ["..."]} with header Accept: application/json (omitting Accept = HTML response, silent fail).
Alias destinations MUST be same-domain. Cross-domain targets (e.g. info@alai.no → john@basicconsulting.no) return HTTP 400. Route to a real mailbox on the same domain instead.
No catch-all rewrites — verified via /rewrites endpoint (empty on all domains). Any email to a non-existent local-part that has no alias bounces.
App-passwords for new mailboxes are created via PUT /v1/domains/{domain}/mailboxes/{local_part} and stored as Vaultwarden items (never in logs).
Migadu catch-all copy (alem@alai.no): alem@alai.no is configured as a global catch-all copy recipient for all outgoing ALAI-managed-domain mail. This means emails sent FROM any ALAI account will also appear in alem's INBOX. Because alem iterates before product accounts in the daemon list, it ingests those Message-IDs first; the UNIQUE(message_id) constraint causes product-account inserts to be no-ops. This affects ingest attribution for ALAI-origin probes only — external (non-ALAI) mail is not affected. See Section 6 for forwarding removal note.

2. Real Mailbox Inventory

These are the real mailboxes that exist in Migadu (verified 2026-06-08 via admin API). Only real mailboxes can be used as alias destinations.

Domain	Real mailboxes (local parts)
`alai.no`	john, alem, dev, post, admin
`bilko.io`	admin, sales, privacy
`bilko.cloud`	admin, sales
`bilko.company`	admin, sales
`snowit.ba`	admin, info, asmir, enis
`basicconsulting.no`	john, info
`lumiscare.com`	hello, admin

Note: basicfakta.no is on this Migadu account but has no actively polled mailboxes in John's loop.

Note: lumiscare.com is ALAI's Migadu domain (our infrastructure). It is distinct from caresafetyinnovations.com, which remains a hard-stop boundary (see Section 6).

3. John's Email Ingest — All 19 Monitored Accounts

John's email ingest is managed by ~/system/tools/email-inbox.js and polled by ~/system/daemons/email-agent.js. As of MC #103182 final state (2026-06-08), 19 accounts are registered in email-inbox.db → email_accounts.

Original 6 Accounts (pre-MC #103182)

Account name (DB key)	Email address	Vault item
`john`	john@basicconsulting.no	existing
`info`	info@basicconsulting.no	existing
`alai`	john@alai.no	existing
`dev`	dev@alai.no	existing
`alem`	alem@alai.no	existing
`gmail`	alembasic@gmail.com	existing

11 Product/Role Accounts (added MC #103182 round 1)

Account name (DB key)	Email address	Vault item name
`post-alai`	post@alai.no	Migadu — post@alai.no
`admin-alai`	admin@alai.no	Migadu — admin@alai.no
`sales-bilko-io`	sales@bilko.io	Migadu — sales@bilko.io
`privacy-bilko-io`	privacy@bilko.io	Migadu — privacy@bilko.io
`admin-bilko-io`	admin@bilko.io	Migadu — admin@bilko.io
`sales-bilko-cloud`	sales@bilko.cloud	Migadu — sales@bilko.cloud
`admin-bilko-cloud`	admin@bilko.cloud	Migadu — admin@bilko.cloud
`sales-bilko-company`	sales@bilko.company	Migadu — sales@bilko.company
`admin-bilko-company`	admin@bilko.company	Migadu — admin@bilko.company
`info-snowit`	info@snowit.ba	info@snowit.ba IMAP
`admin-snowit`	admin@snowit.ba	Migadu — admin@snowit.ba

2 LumisCare Accounts (added MC #103182 round 2 — CEO directive 2026-06-08)

CEO directive: LumisCare must be in John's reading loop. lumiscare.com is ALAI's own Migadu domain — these are operational mailboxes, not CareSafety-boundary addresses.

Account name (DB key)	Email address	Vault item name
`hello-lumiscare`	hello@lumiscare.com	Migadu — hello@lumiscare.com
`admin-lumiscare`	admin@lumiscare.com	Migadu — admin@lumiscare.com

Note on hello@lumiscare.com forwarding: A Migadu direct forward from hello@lumiscare.com → alem@alai.no was active since 2026-05-24. This was removed 2026-06-08 so the mailbox is polled directly under hello-lumiscare with clean labeling. Before removal, LumisCare contact mail appeared in the DB under alem (Migadu ingested the forwarded copy first). After removal, external mail to hello@lumiscare.com is stored under hello-lumiscare only. Confirmed behaviourally: gmail-origin probe stored as DB id=9195 under hello-lumiscare, not duplicated under alem.

App-passwords for the 5 newly created admin@* mailboxes (round 1) were generated via the Migadu API and stored as Vaultwarden items. Vault IDs: 558181ec, 8dfe8d2d, 2f38a16a, 7d0f9216, 2fb07c20.

4. Alias Map — Dead-Address Fixes (2026-06-08)

The following addresses were previously advertised (on websites, legal pages, landing pages) but did not correspond to any real mailbox — all mail to them was silently bouncing. Migadu aliases were created to route them to the nearest real same-domain mailbox.

Dead address (was bouncing)	Now routes to	Why
`info@alai.no`	`john@alai.no`	alai.no contact form was sending to this dead address — all website contact submissions were lost
`support@bilko.io`	`sales@bilko.io`	bilko.io landing mailto link
`podrska@bilko.io`	`sales@bilko.io`	bilko.io Bosnian support address on legal/terms pages
`legal@bilko.io`	`admin@bilko.io`	bilko.io legal/terms page
`security@bilko.io`	`admin@bilko.io`	bilko.io security disclosure address
`support@bilko.cloud`	`sales@bilko.cloud`	bilko.cloud landing mailto
`support@bilko.company`	`sales@bilko.company`	bilko.company landing mailto

Pre-fix state: Only postmaster@{domain} → admin@{domain} aliases existed. No rewrites, no catch-all. All other non-existent local-parts bounced.
Post-fix: All advertised addresses now deliver to a real monitored mailbox. Nothing bounces.

5. Contact-Form Routing

Product	Contact form path	Where mail ends up
alai.no website	Vercel serverless: `~/business/ALAI-Holding-AS/web/api/contact.js` (nodemailer)	Sends to `info@alai.no` (which now aliases to `john@alai.no` — monitored). Was dead before 2026-06-08 fix.
Bilko landing pages	Cloudflare Pages function: `apps/landing-*/functions/api/lead.js`	Posts to Slack #ceo channel (C0AFJDP9V6U) + writes to Cloudflare KV (`BILKO_LEADS`). No email path — separate from IMAP polling.

6. Boundary Accounts — NOT Polled (intentional)

Address	Reason not polled
`asmir@snowit.ba`	Personal mailbox belonging to Asmir (SnowIT partner). He reads his own mail.
`enis@snowit.ba`	Personal mailbox belonging to Enis. Same reason.
Any `*@caresafetyinnovations.com`	CareSafety hard-stop boundary — health/patient-adjacent service under external ownership. NOT on ALAI's Migadu account. Never poll. See CareSafety boundary memo in MEMORY.

Important distinction: lumiscare.com (ALAI's Migadu domain — hello@, admin@) IS polled. caresafetyinnovations.com (external operator) is the hard boundary, not lumiscare.com.

7. Daemon Architecture — Production Path

Understanding the daemon path is critical when debugging ingest issues or adding accounts.

Production Execution Path

LaunchAgent: com.john.email-agent — starts email-agent-wrapper.sh, sets HIMALAYA_DISABLED=1 via plist EnvironmentVariables key.
Wrapper: ~/system/daemons/email-agent-wrapper.sh — thin shell wrapper, does not set HIMALAYA_DISABLED itself.
Daemon: ~/system/daemons/email-agent.js — when HIMALAYA_DISABLED=1, all 19 accounts use the legacy unseen-fetch IMAP path (direct node-imap, proven stable).

Himalaya Layer — Present but Bypassed in Production

Even with HIMALAYA_DISABLED=1, the daemon still routes account resolution through himalaya-adapter.js ACCOUNT_MAP. If an account name is missing from ACCOUNT_MAP, the daemon throws Unknown account: <name> and the account is skipped entirely.

himalaya-adapter.js ACCOUNT_MAP — must list all 19 accounts (currently L34–56).
~/.config/himalaya/config.toml — must have 19 [accounts.*] stanzas (verified: grep count = 19).
When run without HIMALAYA_DISABLED=1 (bare wrapper invocation), the himalaya binary is called and times out after 120s per account (~82 min total for 19 accounts). This is expected and non-destructive but slow. Production LaunchAgent always sets the env flag.

Validated (2026-06-08T13:15Z): Zero "Unknown account" errors in both daemon runs (wrapper + legacy). All 19 accounts have last_checked_at = 2026-06-08T13:09:39Z.

8. Components — All 8 Touch-Points

Adding any new account requires updating all 8 of the following. Missing any one will cause silent failures or "Unknown account" errors.

#	File	What to change
1	`~/system/tools/email-inbox.js`	(a) Add `INSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>')` seed row. (b) Add a guarded migration block to extend the `emails` table CHECK constraint to include the new account name. The CHECK constraint is hardcoded and cannot be altered without rebuilding the table (SQLite limitation). The guard must use a unique string from the new account name (e.g. `!ddlRow.sql.includes("'<name>'")`). All existing rows and all 25 columns must be preserved in the rebuilt table. This is the most error-prone step — see Section 9 for the gotcha detail.
2	`~/system/tools/mail-native.js`	Add account-name → Vaultwarden item-name entry in `VAULT_NAMES` map.
3	`~/system/tools/himalaya-adapter.js`	Add account-name → email entry in `ACCOUNT_MAP` (L34–56 area). Without this, the daemon throws "Unknown account" and skips the account entirely even in legacy mode.
4	`~/.config/himalaya/config.toml`	Add a new `[accounts.<name>]` stanza. Required even when HIMALAYA_DISABLED=1.
5	`~/system/daemons/email-agent.js`	Add account to counts map (L2459 area). Also confirm it is present in the fetch loop and `last_checked_at` update loop (both must be mirrored).
6	`~/system/tools/email-imap-db-audit.js`	Add account to `ACCOUNTS` constant.
7	`~/system/tools/email-action-hard-check.js`	Add account to `ALL_MONITORED_ACCOUNTS` constant.
8	Vaultwarden (via `bw` CLI)	Create app-password item named `Migadu — <email>` with the IMAP/SMTP password. New admin@ mailboxes require a new app-password generated via Migadu API (`PUT /v1/domains/{d}/mailboxes/{lp}`). Existing sales@/privacy@/info@ mailboxes may already have creds in Vaultwarden — check before creating.

Files Changed in MC #103182 (round 1 — 11 accounts)

All files modified additively. Round 1 changed 5 files (himalaya touch-points were added in round 2 as BLOCKER-2 fix).

File	Lines changed
`email-inbox.js`	L159–172 (seeds) + L141–208 (CHECK migration, 17-account guard)
`mail-native.js`	L76–88 (11 VAULT_NAMES entries)
`email-imap-db-audit.js`	L51 (ACCOUNTS 5→16)
`email-action-hard-check.js`	L14–22 (ALL_MONITORED_ACCOUNTS 17 accounts)
`email-agent.js`	L1853–1861 (fetch loop), L1889–1895 (last_checked_at loop)

Files Changed in MC #103182 (round 2 — LumisCare + BLOCKER-2 fix)

File	Lines changed
`email-inbox.js`	L212–311 (second guarded CHECK migration, 19-account guard: `!ddlRow2.sql.includes("'hello-lumiscare'")`); 2 new email_accounts seed rows
`mail-native.js`	L90–91 (hello-lumiscare + admin-lumiscare VAULT_NAMES)
`himalaya-adapter.js`	L34–56 (ACCOUNT_MAP expanded to 19 entries)
`~/.config/himalaya/config.toml`	2 new [accounts.*] stanzas (19 total)
`email-agent.js`	L1862 (fetch loop), L1899 (last_checked_at loop), L2459–2468 (counts map)
`email-action-hard-check.js`	L24 (hello-lumiscare + admin-lumiscare in ALL_MONITORED_ACCOUNTS)
`email-imap-db-audit.js`	L60 (both accounts in ACCOUNTS array)

Known Minor Issue (pre-existing, non-blocking)

After SMTP send via mail-native.js, the IMAP post-send copy to Sent folder times out with ETIMEOUT. Delivery succeeds (Message-ID is logged). This is a cosmetic issue in the IMAP cleanup code — pre-existing, unrelated to MC #103182. Separate MC recommended.

9. GOTCHA — emails Table CHECK Constraint

This is the most dangerous footgun when adding new accounts. Read before touching email-inbox.js.

The emails table in ~/system/databases/email-inbox.db has a hardcoded SQLite CHECK constraint:

account TEXT NOT NULL CHECK(account IN ('john','info','alai','dev','alem','gmail',
  'post-alai','admin-alai',
  'sales-bilko-io','privacy-bilko-io','admin-bilko-io',
  'sales-bilko-cloud','admin-bilko-cloud',
  'sales-bilko-company','admin-bilko-company',
  'info-snowit','admin-snowit',
  'hello-lumiscare','admin-lumiscare'
))

The trap: INSERT OR IGNORE silently discards rows that violate CHECK constraints — no exception is thrown, no warning is logged. If a new account name is not in this list, every email received by that account is permanently lost at ingest time. In MC #103182 this caused 27 real emails to be silently dropped before the issue was caught by Proveo.

The fix: SQLite does not support ALTER TABLE ... MODIFY COLUMN with a new CHECK constraint. The only way to extend it is to rebuild the table:

Read current DDL: SELECT sql FROM sqlite_master WHERE type='table' AND name='emails'
Guard the migration: check that the new account name is NOT already in the DDL (idempotency)
In a transaction: CREATE TABLE emails_new (...same schema + extended CHECK...) → INSERT INTO emails_new SELECT * FROM emails → assert row count matches → DROP TABLE emails → ALTER TABLE emails_new RENAME TO emails → recreate indexes → COMMIT
Rollback on any error or row count mismatch

The pattern already exists in email-inbox.js — follow it exactly. All 25 columns must be listed explicitly, including the post-migration additions: delegated_to, delegated_at, deadline, body, triaged_at, auto_forwarded.

10. Runbook — How to Add a New Mailbox to John's Loop

Verify the mailbox exists in Migadu.
Check via GET /v1/domains/{domain}/mailboxes using the admin API key ("migadu keyy" in Vaultwarden).
If it does not exist, create it via the admin UI or API first.
Create an app-password for the mailbox.
Use Migadu admin UI (Mailbox settings > App Passwords) or PUT /v1/domains/{domain}/mailboxes/{local_part}.
Store the password as a new Vaultwarden item named Migadu — {email}.
[Touch-point 2] Add to mail-native.js VAULT_NAMES map.
Key = your chosen account name (e.g. sales-newdomain), value = the Vaultwarden item name.
[Touch-point 3] Add to himalaya-adapter.js ACCOUNT_MAP.
Add '<name>': '<email>' in the ACCOUNT_MAP object. Without this step the daemon throws "Unknown account" and the account is silently skipped.
[Touch-point 4] Add stanza to ~/.config/himalaya/config.toml.
Follow the existing pattern for a Migadu account stanza.
[Touch-point 1a] Add the email_accounts seed to email-inbox.js.
Append INSERT OR IGNORE INTO email_accounts (name, email) VALUES ('<name>', '<email>') in the seed block.
[Touch-point 1b — CRITICAL] Add a guarded CHECK migration to email-inbox.js getDb().
Read Section 9 first. Guard: !ddlRow.sql.includes("'<name>'"). Extend CHECK to include new account. Rebuild table in a transaction preserving all 25 columns. Test idempotency.
[Touch-point 6] Add to email-imap-db-audit.js ACCOUNTS array.
[Touch-point 7] Add to email-action-hard-check.js ALL_MONITORED_ACCOUNTS array.
[Touch-point 5] Add to email-agent.js counts map, fetch loop, and last_checked_at loop.
All three locations must be mirrored.
Run syntax checks on all modified files.
node --check ~/system/tools/email-inbox.js && node --check ~/system/tools/mail-native.js && node --check ~/system/tools/himalaya-adapter.js && node --check ~/system/daemons/email-agent.js
Test connectivity.
node ~/system/tools/mail-native.js test --account <name> — expect IMAP OK + SMTP OK.
Restart the email-agent daemon (LaunchAgent: com.john.email-agent) so the updated accounts array and config take effect.
Proveo ingest probe.
Send a test email from a non-ALAI sender (e.g. gmail account) with subject INGEST-PROBE-<name>-<timestamp>. This avoids the Migadu catch-all pre-emption issue (see Section 1 API quirks). Trigger one daemon cycle. Confirm the row appears under the correct account name via node ~/system/tools/email-inbox.js search "INGEST-PROBE".
If adding a new alias (not a real mailbox): create the Migadu alias first (same-domain destination only, with Accept: application/json header). Then proceed from step 3.

11. Validation Evidence (MC #103182 — Final)

Round 1 (17 accounts — 2026-06-08T11:24Z)

Check	Result
Code changes (5 files) verified by Proveo	PASS
DB registry — 17 rows in email_accounts	PASS
IMAP/SMTP connectivity — 11/11 new accounts	PASS
emails table CHECK migration (emails_new rebuild)	PASS — DDL confirmed, 4697 rows preserved
Ingest probes — 4/4 probe accounts persist to DB	PASS (round 2 probes after schema fix; DB ids 9052/9056/9057/9059/9062/9063/9064)
Regression — original 6 accounts	PASS — counts growing, timestamps advancing
No-loop / alias dedup (UNIQUE on message_id)	PASS — 0 duplicate message_ids
email-action-hard-check.js exit code	PASS — exit 0, 17 accounts in scope

Blocker found and fixed during round 1 validation: The emails table had a hardcoded CHECK covering only the original 6 accounts. INSERT OR IGNORE silently dropped 27 real emails before the migration was applied. See Section 9 for the full gotcha description.

Round 2 (19 accounts — LumisCare + daemon path — 2026-06-08T13:15Z)

Check	Result
ACCOUNT_MAP (himalaya-adapter.js) has 19 entries	PASS — L34–56 confirmed
config.toml has 19 [accounts.*] stanzas	PASS — grep count = 19
email-agent.js counts map has 19 accounts	PASS — L2460–2468
Zero "Unknown account" errors (wrapper run)	PASS — grep -c = 0 / 40 lines
Zero "Unknown account" errors (legacy/production run)	PASS — grep -c = 0
Zero silent drops / CHECK failures (production run)	PASS
admin-lumiscare ingest proof	PASS — DB id=9070 under admin-lumiscare
hello-lumiscare ingest proof (external sender)	PASS — DB id=9195 under hello-lumiscare (gmail-origin probe)
sales-bilko-cloud ingest proof	PASS — DB id=9193
sales-bilko-company ingest proof	PASS — DB id=9194
hello@lumiscare.com forwarding removal (behavioural)	PASS — gmail-origin stored only under hello-lumiscare, not duplicated under alem
All 19 last_checked_at fresh	PASS — 2026-06-08T13:09:39Z all accounts
No duplicate message_ids	PASS — 0 rows
Regression (orig 6 + prior 11)	PASS — row counts growing, timestamps fresh

Evidence files: /tmp/evidence-103182/flowforge-build.md, /tmp/evidence-103182/proveo-validation.md, /tmp/evidence-103182/daemon-wrapper-run.log, /tmp/evidence-103182/daemon-legacy-run.log

Operations

Overview

Operations Overview

Contents

BookStack Runbook

Runbook: BookStack

Service Info

Status Check

Container Health

HTTP Check

API Check

Database Check

Restart Procedure

Quick Restart (Container Only)

Full Stack Restart (Container + Database)

Sync System Docs to BookStack

Sync All Mapped Content

Sync Single File

Check Sync Status

Force Overwrite All

Troubleshooting

Problem: Container won't start

Problem: Can't login (wrong password)

Problem: API returns 401 Unauthorized

Problem: Sync tool fails (500 error)

Problem: Database connection issues

API Usage

List Shelves

List Books

List Pages

Create Page

Dependencies

Backup

Database Dump

Data Volumes (includes uploads, images)

Restore from Backup

Configuration

Key Environment Variables

Application Settings (via UI)

Content Structure

Notes

BookStack MFA Setup

BookStack MFA and API Token Setup

Overview

Prerequisites

Part 1: Enable MFA (Multi-Factor Authentication)

Step 1: Login as Admin

Step 2: Access Account Settings

Step 3: Enable MFA

Step 4: Test MFA

Part 2: Create New API Token

Step 1: Navigate to API Settings

Step 2: Create Token

Step 3: Copy Token Credentials

Step 4: Update Config File

Step 5: Test API Token

Part 3: Additional Security Measures

Disable Guest Access (Optional)

Review User Permissions

Enable Audit Log

Regular Backups

Troubleshooting

MFA Not Working

Lost API Token

Cannot Access Web UI

Security Best Practices

Next Steps

CEO Dashboard Runbook

CEO Dashboard

Overview

Sections

1. Revenue Overview (Banner)

2. Pipeline Funnel

3. Active Projects (Kanban)

4. Decisions Pending

5. Alerts Panel

6. Upcoming Deadlines

Technical Details

Implementation

Data Aggregation