AI Factory v2 — Phase 0 Backbone

AI Factory v2 — Phase 0 Backbone 
 Author: ALAI 
 Version: 2026-04-27 
 Status: COMPLETE 
 
 Executive Summary 
 AI Factory v2 Phase 0 restored critical feedback loops and observability infrastructure across 5 build tasks (MC #9865-9869). This work unblocks the 9-point CEO vision by fixing broken learning mechanisms: the Mehanik dispatch gate now enforces scope discipline, LightRAG container is restored for token deduplication, quality_score wiring enables self-learning routing, cost telemetry closes a $163K/week blind spot, and trace capture creates the corpus for future distillation and fine-tuning. 
 Status: 5/5 builder tasks COMPLETE per Proveo validation (MC #9870). Documentation task complete (MC #9871). Phase 0 is GREEN. 
 Objective: Restore feedback loops and activate architectural gates to prepare ALAI for compounding self-improvement phases post-triage (2026-05-02+). 
 
 Vision Reminder 
 The CEO approved a 9-point AI Factory vision: 
 
 Self-building — AutoCoder that writes and executes plans 
 Self-learning — Quality scores feed back into routing decisions 
 Self-healing — Autowork daemon drains task queues autonomously 
 No SPOF — All critical databases replicated, multi-cloud backup 
 Portable — Multi-provider LLM routing (Anthropic, OpenAI, Groq, Ollama) 
 Free + paid models — Tier routing balances cost vs quality 
 LightRAG token saving — Dedupe uploaded docs, query before planning 
 Own fine-tuned model — Post-revenue: distill from traces.db corpus 
 AIOS — Autonomous OS that schedules and executes work 
 
 Pre-Phase 0 realization: 10-12% (per 5 expert lens convergent analysis). Bottleneck: broken feedback loops. Every database designed to convert effort into learning operated write-only. 
 Full plan: /Users/makinja/system/specs/ai-factory-v2-plan.md 
 
 Phase 0 Goals 
 Phase 0 is the triage-compatible foundation layer that closes broken feedback loops, activates dispatch gates, and eliminates observability blind spots. All tasks absorb into existing Lane 2 (infra restart) with zero CEO touch during execution. 
 Key outcomes: 
 
 Mehanik Phase 2 gate enforces 13-field marker schema (prevents scope creep disasters) 
 LightRAG container restored (Vision 7 token savings unblocked) 
 Quality score wiring enables self-learning routing (Vision 2 + 6) 
 Cost telemetry blind spot closed ($163K/week now visible) 
 Trace capture pipeline creates distillation corpus (gates Vision 8) 
 
 
 Architecture Diagram 
 flowchart LR
 subgraph Dispatch Gate
 A[John receives task] --> B{Mehanik clearance?}
 B -->|No marker| C[BLOCKED: exit 2]
 B -->|Valid 13-field marker| D[CLEAR: dispatch]
 end
 
 subgraph Tier Routing
 D --> E[tier-router.js classify]
 E --> F{quality_score feedback}
 F -->|avg < 0.6| G[Escalate tier+1]
 F -->|avg > 0.85| H[Demote tier-1]
 F -->|else| I[Keep tier]
 end
 
 subgraph Observability
 G --> J[routing_log write]
 H --> J
 I --> J
 J --> K[(tool-audit.db)]
 
 D --> L[PostToolUse hook]
 L --> M[(traces.db)]
 
 D --> N[cost-tracker parseAndTrack]
 N --> O[(costs.db)]
 end
 
 subgraph Token Optimization
 D --> P{LightRAG STEP 0}
 P --> Q[Query existing context]
 Q -->|Hit| R[Reduce re-discovery]
 Q -->|Miss| S[Normal dispatch]
 end
 
 K -.quality_score read path.-> F
 M -.corpus for Phase 3 distillation.-> T[Future: Fine-tune]
 O -.daily cost report.-> U[CEO visibility]
 
 
 Task 0.1 — Mehanik Phase 2 Activation 
 MC: #9865 
 Owner: FlowForge 
 What: Activate Mehanik Phase 2 BLOCKING mode with 13-field marker schema enforcement. 
 Why 
 Single highest-leverage architectural fix. The pre-dispatch-gate.sh hook now enforces scope discipline at dispatch time, preventing the 11-agent scope-creep disasters that previously derailed builds. Per MC #9223 root cause analysis, missing pre-dispatch validation allowed unbounded work expansion. 
 Changes 
 File: ~/.claude/hooks/pre-dispatch-gate.sh 
 Line 72-79: Extended field validation loop from 4 fields to 13 fields (canonical schema). 
 13-Field Schema: 
 
 timestamp: — ISO8601 marker creation time 
 task_id: — MC task ID 
 project_path: — Absolute path to project root 
 blueprint_read: — Path to BUILD-BLUEPRINT.md or N/A 
 deploy_map_read: — Path to DEPLOY-MAP.md or N/A 
 deploy_path_summary: — One-line deploy mechanism 
 ceo_item_count: — CEO-authored items in plan 
 approved_subtask_count: — Approved subtask count 
 ceiling: — Scope ceiling (ceo_item_count + 2) 
 approved_agents: — Comma-separated agent list 
 orchestration_surface: — one-shot-Task | claude-chains | dag | pi-factory | cron 
 tool_contract_required: — true | false (research tasks) 
 mehanik_session_id: — Unique session identifier 
 
 7 BLOCK paths (exit 2): 
 
 No MC task ID 
 No Mehanik clearance marker 
 Marker stale (>4h old) 
 Missing required field 
 Scope ceiling exceeded 
 Research dispatch missing TOOL_CONTRACT 
 Invalid marker format 
 
 Validation Results 
 Canary tests: 3/3 PASS 
 
 Valid 13-field marker → exit 0 (CLEAR) 
 No marker file → exit 2 (BLOCKED) 
 Partial marker (5/13 fields) → exit 2 (BLOCKED on missing field) 
 
 Evidence: /tmp/aif-v2-task-0.1-evidence.md 
 
 Task 0.2 — LightRAG Container Restore 
 MC: #9866 
 Owner: FlowForge 
 What: Restore LightRAG main container (was missing from docker ps ) and verify drain worker functionality. 
 Why 
 Vision 7 (LightRAG token saving) was at 0% realization because main container was down. Each day without deduplication costs Anthropic tokens that LightRAG should eliminate. 114K docs uploaded historically, but container absent since unknown date. 
 Before State 
 
 LightRAG main container: MISSING 
 Local health endpoint: UNREACHABLE (curl localhost:9621/health → timeout) 
 Drain worker: LaunchAgent NOT LOADED 
 Queue: 276 records in outbox 
 
 After State 
 
 Container: HEALTHY (docker ps shows lightrag , Up 26s) 
 Health endpoint: RESPONSIVE (http://localhost:9621/health) 
 Pipeline status: pipeline_busy: false 
 Neo4j: HEALTHY (Up 41h) 
 Configuration verified:
 
 LLM: ollama @ host.docker.internal:11434, model qwen3:8b-q8_0 
 Embedding: ollama @ host.docker.internal:11434, model bge-m3:latest 
 Graph: Neo4JStorage (bolt://neo4j:7687) 
 Vector: NanoVectorDBStorage (22,771 entities + 43,582 relationships loaded) 
 
 
 Drain worker: FUNCTIONAL (manual execution, 276/276 processed) 
 
 Caveats 
 
 LaunchAgent bootstrap failure — Manual execution works, but launchctl bootstrap → I/O error. Drain worker runs manually until resolved. 
 Platform mismatch — Container image linux/amd64 on Apple Silicon (arm64), runs via Rosetta emulation. 
 Health endpoint blocks during pipeline_busy — Single-process design limitation; /health unavailable during active ingestion (follow-up task recommended). 
 
 Evidence: /tmp/aif-v2-task-0.2-evidence.md 
 
 Task 0.3 — Quality Score Read Path Wiring 
 MC: #9867 
 Owner: AgentForge 
 What: Wire quality_score read path in tier-router.js to enable feedback-informed routing. 
 Why 
 36,671 rows existed in legacy agent-routing.db with NULL quality_score . Wiring the read path closes Vision 2 (self-learning) and Vision 6 (free + paid models) with zero new data collection — routing decisions now adjust based on historical agent performance. 
 Schema Migration 
 Database: ~/system/databases/tool-audit.db 
 Extended routing_log table with 4 new columns: 
 
 quality_score REAL — Success metric (0.0 = failure, 1.0 = success) 
 caller_agent TEXT — Calling agent name 
 target_tier TEXT — Target tier before adjustment 
 mc_task_id INTEGER — MC task reference 
 
 Implementation 
 Write Path: 
Function updateQualityScore(routingLogId, score) at line 60. 
Heuristic v1 (interim until Phase 1.4 eval harness): 
 
 Task marked ready → 1.0 
 Task orphaned → 0.5 
 Task failed/blocked → 0.0 
 
 Read Path: 
Function getRecentQualityScores(callerAgent, targetTier) at line 76. 
Returns last 20 scores for {agent, tier} pair. 
 Tier Adjustment Logic: 
 
 If ≥5 scores exist for {agent, tier}:
 
 avg < 0.6 → escalate to tier+1 (e.g., tier 2 → tier 3) 
 avg > 0.85 → demote to tier-1 (e.g., tier 3 → tier 2) 
 else → keep current tier 
 
 
 
 Validation Results 
 Smoke test: 5/5 PASS 
 
 Write path: 5 failures (quality_score=0.0) persisted 
 Read path escalation: avg=0.00 → tier 2 escalated to tier 3 
 Write path: 5 successes (quality_score=1.0) persisted 
 Read path demotion: avg=1.00 detected (logic verified) 
 Schema validation: all 4 columns exist 
 
 Legacy archive: 
 agent-routing.db renamed to agent-routing.db.legacy-archive-2026-04-27 (36,671 rows, 3.5MB). Not migrated — does not reflect current routing reality. 
 ADR: /Users/makinja/system/specs/adr/ADR-quality-score-read-path.md 
 Evidence: /tmp/aif-v2-task-0.3-evidence.md 
 
 Task 0.4 — Cost Telemetry Blind Spot Fix 
 MC: #9868 (existing, now resolved) 
 Owner: CodeCraft 
 What: Backfill claude-cli cost data for 2026-04-17 → 2026-04-24 and add real-time stderr parser. 
 Why 
 Week magnitude cost was invisible. node ~/system/tools/cost-tracker.js summary today showed $0 for 967 claude-cli requests. Cannot optimize without measurement. This blocked all routing optimization work. 
 Before State 
 
 claude-cli rows in range: 27 rows, ALL $0.00 
 Root cause: Stop hook only started logging sessions with token data from 2026-04-24 
 
 Backfill Results 
 Script: ~/system/tools/backfill-claude-cli-costs.js 
 
 Files processed: 21 session transcripts (2026-04-17 to 2026-04-24) 
 Sessions inserted: 19 
 Sessions already in DB: 2 (skipped — idempotent) 
 Total cost backfilled: $41.46 
 Model: claude-sonnet-4-6 (all sessions) 
 Pricing: cache_write=$3.75/MTok, cache_read=$0.30/MTok, input=$3/MTok, output=$15/MTok 
 
 Week Total (2026-04-27) 
 
 Total requests: 711 
 Total cost: $163,223.11 
 claude-cli: 671 req, $163,223.11
 
 claude-opus-4-7: 636 req, $163,182.96 
 claude-sonnet-4-6: 29 req, $40.15 
 
 
 
 Magnitude: $163K/week aligns with OpenAI lens estimate ($162,945/wk). 
 Real-Time Capture 
 Added to cost-tracker.js : 
 
 parseAndTrack(stdoutJson, opts) — Parse --output-format json output, track cost 
 parseStderrLine(line, opts) — Parse individual stderr line, idempotent 
 
 Daily Cron 
 Script: ~/system/tools/cost-daily-report.sh 
 LaunchAgent: ~/Library/LaunchAgents/com.alai.cost-daily-report.plist 
 Schedule: 23:55 daily 
 Output: ~/system/reports/cost-daily.md 
 Evidence: /tmp/aif-v2-task-0.4-evidence.md 
 
 Task 0.5 — Trace Capture Pipeline 
 MC: #9869 
 Owner: AgentForge 
 What: Add PostToolUse hook that captures per-dispatch metadata to traces.db for future distillation and fine-tuning. 
 Why 
 Every agent run currently exits and disappears. Trace capture creates a passive corpus that gates ALL future AI Factory learning: distillation (Phase 2), eval harness (Phase 1.4), and fine-tuning (Phase 3). Without this, Vision 8 (own fine-tuned model) remains at 0%. 
 Database Schema 
 Location: ~/system/databases/traces.db 
 14 fields: 
 
 id — Primary key 
 timestamp — DATETIME DEFAULT CURRENT_TIMESTAMP 
 task_id — MC task ID 
 agent — Subagent type or "john" 
 session_id — Join key to costs.db 
 tool_name — Agent, Bash, Read, Write, Edit 
 prompt_hash — SHA256(tool_input), 16-char prefix 
 response_hash — SHA256(tool_response), 16-char prefix 
 duration_ms — Tool execution time 
 exit_code — 0=success, 1=error, 2=blocked 
 model — Model used (if Agent) 
 tokens_in — Input tokens 
 tokens_out — Output tokens 
 cost_usd — Computed cost 
 
 7 indexes: timestamp, agent, model, tool_name, prompt_hash, session_id, task_id 
 PostToolUse Hook 
 Location: ~/.claude/hooks/trace-capture.py 
 Language: Python 3 (fast JSON parsing, sqlite3 stdlib) 
 Registered: ~/.claude/settings.json PostToolUse hooks array (async: true) 
 Key features: 
 
 Fire-and-forget (always exit 0 per ZAKON PI2) 
 Privacy-preserving (only hashes, no raw prompts/responses) 
 MC task ID extraction via regex 
 Session ID from env or date fallback 
 Error handling: logs to stderr, never blocks tool execution 
 
 Latency Measurement 
 Method: 10-iteration synthetic hook call 
 Results: 
 
 Average: 45ms 
 Budget: <50ms 
 Status: PASS (10% under budget) 
 
 Privacy Posture 
 CRITICAL: No raw prompts or responses stored in traces.db. 
 Method: 
 
 SHA256 hash of full tool_input 
 SHA256 hash of full tool_response 
 Store only 16-char hex prefix (collision-resistant for corpus size) 
 Original content never persists 
 
 Rationale: 
 
 Prevents PII leakage (credentials, API keys, personal data) 
 Enables duplicate detection 
 Supports eval harness (hash matching for golden tasks) 
 Future fine-tuning uses hashes as index, not content 
 
 Smoke Test Results 
 Test 1: Row insertion — +10 rows captured (PASS) 
 Test 2: Privacy validation — 0 raw prompts/responses stored (PASS) 
 Test 3: Schema integrity — All 14 fields populated correctly (PASS) 
 Live integration: 64 rows captured during Proveo validation. 
 Evidence: /tmp/aif-v2-task-0.5-evidence.md 
 
 Caveats & Follow-ups 
 From Proveo Validation (MC #9870) 
 
 
 LightRAG health endpoint blocks during pipeline_busy 
 
 Root cause: Single-process design (no separate health worker) 
 Impact: /health unavailable during active ingestion 
 Recommendation: Separate health check process or async health handler 
 Severity: LOW (operational monitoring gap, not functional block) 
 
 
 
 Hash prefix length (16-char) may need adjustment at scale 
 
 Current corpus: 64 rows (negligible collision risk) 
 At 100K rows: <0.01% collision probability 
 Recommendation: Monitor at 10K rows, extend to 24-char if needed 
 Severity: LOW (future consideration) 
 
 
 
 Table name typo in smoke test 
 
 Test script referenced routing_logs (wrong), actual table routing_log 
 Impact: None (test passed via fallback query) 
 Resolution: Fixed in final evidence file 
 Severity: TRIVIAL 
 
 
 
 Row count delta across validation runs 
 
 Different smoke test runs show varying baselines (304 vs 337 rows) 
 Root cause: Multiple validation passes appending to same DB 
 Impact: None (idempotent inserts verified) 
 Severity: TRIVIAL 
 
 
 
 
 How To Verify 
 Run these commands to validate Phase 0 backbone functionality: 
 Task 0.1 — Mehanik Gate 
 # Verify 7 exit-2 block paths exist
grep -c "exit 2" ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: 7

# Test BLOCK path (no marker)
MC_TASK_ID=9999 ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: exit 2, error message

# Test ALLOW path (valid marker)
# (Requires /mehanik clearance file in /tmp/)
MC_TASK_ID=9865 ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: exit 0
 
 Task 0.2 — LightRAG 
 # Verify container running
docker ps | grep lightrag
# Expected: 2 containers (lightrag, lightrag-neo4j)

# Verify health endpoint
curl -s http://localhost:9621/health | jq .
# Expected: {"pipeline_busy": false, ...}

# Check vector/graph load
docker logs lightrag 2>&1 | grep "Loaded"
# Expected: 22,771 entity vectors, 43,582 relationship vectors
 
 Task 0.3 — Quality Score 
 # Verify schema extended
sqlite3 ~/system/databases/tool-audit.db ".schema routing_log"
# Expected: quality_score, caller_agent, target_tier, mc_task_id columns

# Check non-NULL quality scores
sqlite3 ~/system/databases/tool-audit.db \
 "SELECT COUNT(*) FROM routing_log WHERE quality_score IS NOT NULL"
# Expected: >0 (any recent dispatches)

# Verify legacy DB archived
ls -lh ~/system/databases/agent-routing.db.legacy-archive-2026-04-27
# Expected: 3.5MB file
 
 Task 0.4 — Cost Telemetry 
 # Verify today's cost non-zero
node ~/system/tools/cost-tracker.js summary today | grep claude
# Expected: $>0 for claude-cli

# Verify week magnitude
node ~/system/tools/cost-tracker.js summary week
# Expected: ~$163K total

# Verify daily report cron loaded
launchctl list | grep cost-daily-report
# Expected: com.alai.cost-daily-report with PID or status 0
 
 Task 0.5 — Trace Capture 
 # Verify traces.db exists and has rows
sqlite3 ~/system/databases/traces.db "SELECT COUNT(*) FROM traces"
# Expected: >10 (grows with each dispatch)

# Verify hook registered
grep -A3 "trace-capture.py" ~/.claude/settings.json
# Expected: PostToolUse hook entry with async:true

# Verify privacy (no raw content)
sqlite3 ~/system/databases/traces.db \
 "SELECT prompt_hash, response_hash FROM traces LIMIT 5"
# Expected: Only 16-char hex strings, no full text
 
 
 References 
 Parent Plan 
 
 AI Factory v2 Full Plan: /Users/makinja/system/specs/ai-factory-v2-plan.md 
 CEO Approval: 2026-04-27 (option B, override DA-BLOCKED + triage-mode) 
 
 Lens Reports (5 expert convergent analysis) 
 
 /tmp/ai-factory-v2-petter.md — Architecture (Petter Graff) 
 /tmp/ai-factory-v2-anthropic.md — Token economics (Anthropic Chief AI Architect) 
 /tmp/ai-factory-v2-openai.md — Multi-provider/distillation (OpenAI Chief Architect) 
 /tmp/ai-factory-v2-alem-clone.md — CEO reality check (Alem-Clone) 
 /tmp/ai-factory-v2-da.md — Risk audit (Devil's Advocate) 
 
 Root Cause Analysis 
 
 MC #9223 Final Synthesis: Mehanik Phase 2 architectural decision 
 Scope creep incident 2026-04-24: 11-agent dispatch without gate (pre-Mehanik) 
 
 Architecture Decision Records 
 
 ADR — Quality Score Read Path: /Users/makinja/system/specs/adr/ADR-quality-score-read-path.md 
 
 Evidence Files 
 
 /tmp/aif-v2-task-0.1-evidence.md — Mehanik Phase 2 activation 
 /tmp/aif-v2-task-0.2-evidence.md — LightRAG container restore 
 /tmp/aif-v2-task-0.3-evidence.md — Quality score integration 
 /tmp/aif-v2-task-0.4-evidence.md — Cost telemetry backfill 
 /tmp/aif-v2-task-0.5-evidence.md — Trace capture pipeline 
 /tmp/aif-v2-task-0.8-evidence.md — This documentation task 
 
 Proveo Validation 
 
 MC #9870: Cross-validation of all 5 builder tasks (COMPLETE) 
 
 
 Next Steps 
 Immediate (Phase 0 closure) 
 
 Proveo validates this BookStack page exists and is discoverable 
 John marks MC #9870 and #9871 done 
 Phase 0 declared COMPLETE 
 
 Phase 1 — Token Economics Wiring (Post-2026-05-02) 
 Gate: CEO must explicitly close triage mode before Phase 1 begins. 
 6 tasks planned: 
 
 Anthropic prompt caching wire-up (50-70% input token reduction) 
 Sub-agent context isolation (prevents 7M token bleed) 
 LightRAG STEP 0 injection in 8 active agents 
 Eval harness with 25 golden tasks (gates all future routing changes) 
 Multi-provider fallback chain (Groq adapter wire-up) 
 Proveo E2E + Skillforge docs (ZAKON PLAN mandatory) 
 
 Expected savings: $144-240/week conservative (prompt caching alone). Upper bound: $14,778/week (sub-agent isolation). 
 Phase 2 — Capability Expansion (Weeks 2-4) 
 Gate: Phase 1 must show measurable token savings (≥$3K/week) AND eval harness green. 
 7 tasks planned: 
 
 AutoCoder.js Phase 1 (dry-run mode) 
 ANVIL SPOF: replicate 13 P0 databases to Azure 
 MCP tool schema portability 
 Distillation candidate scoring 
 Archive 44 orphan agents 
 TTL sweep on hivemind.db 
 Phase 2 Proveo E2E + Skillforge docs 
 
 Phase 3 — Strategic Horizon (Q3 2026+) 
 Gate: ALAI must have ≥1 paid AI Services engagement closed. 
 5 tasks planned: 
 
 Fine-tune candidate review 
 AIOS competitor evaluation (Cursor, Devin, OpenAI Operator, Gemini Extensions) 
 Operator-style browser agents 
 Anti-lying enforcement hooks 
 Multimodal expansion (Realtime API, OCR) 
 
 
 Last Updated: 2026-04-27 
 Maintained By: ALAI 
 Document Version: 1.0 
 BookStack Path: Engineering / AI Factory v2 — Phase 0 Backbone