AI Factory v2 — Phase 0 Backbone
AI Factory v2 — Phase 0 Backbone
Executive Summary
AI Factory v2 Phase 0 restored critical feedback loops and observability infrastructure across 5 build tasks (MC #9865-9869). This work unblocks the 9-point CEO vision by fixing broken learning mechanisms: the Mehanik dispatch gate now enforces scope discipline, LightRAG container is restored for token deduplication, quality_score wiring enables self-learning routing, cost telemetry closes a $163K/week blind spot, and trace capture creates the corpus for future distillation and fine-tuning.
Status: 5/5 builder tasks COMPLETE per Proveo validation (MC #9870). Documentation task complete (MC #9871). Phase 0 is GREEN.
Objective: Restore feedback loops and activate architectural gates to prepare ALAI for compounding self-improvement phases post-triage (2026-05-02+).
Vision Reminder
The CEO approved a 9-point AI Factory vision:
- Self-building — AutoCoder that writes and executes plans
- Self-learning — Quality scores feed back into routing decisions
- Self-healing — Autowork daemon drains task queues autonomously
- No SPOF — All critical databases replicated, multi-cloud backup
- Portable — Multi-provider LLM routing (Anthropic, OpenAI, Groq, Ollama)
- Free + paid models — Tier routing balances cost vs quality
- LightRAG token saving — Dedupe uploaded docs, query before planning
- Own fine-tuned model — Post-revenue: distill from traces.db corpus
- AIOS — Autonomous OS that schedules and executes work
Pre-Phase 0 realization: 10-12% (per 5 expert lens convergent analysis). Bottleneck: broken feedback loops. Every database designed to convert effort into learning operated write-only.
Full plan: /Users/makinja/system/specs/ai-factory-v2-plan.md
Phase 0 Goals
Phase 0 is the triage-compatible foundation layer that closes broken feedback loops, activates dispatch gates, and eliminates observability blind spots. All tasks absorb into existing Lane 2 (infra restart) with zero CEO touch during execution.
Key outcomes:
- Mehanik Phase 2 gate enforces 13-field marker schema (prevents scope creep disasters)
- LightRAG container restored (Vision 7 token savings unblocked)
- Quality score wiring enables self-learning routing (Vision 2 + 6)
- Cost telemetry blind spot closed ($163K/week now visible)
- Trace capture pipeline creates distillation corpus (gates Vision 8)
Architecture Diagram
flowchart LR
subgraph Dispatch Gate
A[John receives task] --> B{Mehanik clearance?}
B -->|No marker| C[BLOCKED: exit 2]
B -->|Valid 13-field marker| D[CLEAR: dispatch]
end
subgraph Tier Routing
D --> E[tier-router.js classify]
E --> F{quality_score feedback}
F -->|avg < 0.6| G[Escalate tier+1]
F -->|avg > 0.85| H[Demote tier-1]
F -->|else| I[Keep tier]
end
subgraph Observability
G --> J[routing_log write]
H --> J
I --> J
J --> K[(tool-audit.db)]
D --> L[PostToolUse hook]
L --> M[(traces.db)]
D --> N[cost-tracker parseAndTrack]
N --> O[(costs.db)]
end
subgraph Token Optimization
D --> P{LightRAG STEP 0}
P --> Q[Query existing context]
Q -->|Hit| R[Reduce re-discovery]
Q -->|Miss| S[Normal dispatch]
end
K -.quality_score read path.-> F
M -.corpus for Phase 3 distillation.-> T[Future: Fine-tune]
O -.daily cost report.-> U[CEO visibility]
Task 0.1 — Mehanik Phase 2 Activation
MC: #9865
Owner: FlowForge
What: Activate Mehanik Phase 2 BLOCKING mode with 13-field marker schema enforcement.
Why
Single highest-leverage architectural fix. The pre-dispatch-gate.sh hook now enforces scope discipline at dispatch time, preventing the 11-agent scope-creep disasters that previously derailed builds. Per MC #9223 root cause analysis, missing pre-dispatch validation allowed unbounded work expansion.
Changes
File: ~/.claude/hooks/pre-dispatch-gate.sh
Line 72-79: Extended field validation loop from 4 fields to 13 fields (canonical schema).
13-Field Schema:
timestamp:— ISO8601 marker creation timetask_id:— MC task IDproject_path:— Absolute path to project rootblueprint_read:— Path to BUILD-BLUEPRINT.md or N/Adeploy_map_read:— Path to DEPLOY-MAP.md or N/Adeploy_path_summary:— One-line deploy mechanismceo_item_count:— CEO-authored items in planapproved_subtask_count:— Approved subtask countceiling:— Scope ceiling (ceo_item_count + 2)approved_agents:— Comma-separated agent listorchestration_surface:— one-shot-Task | claude-chains | dag | pi-factory | crontool_contract_required:— true | false (research tasks)mehanik_session_id:— Unique session identifier
7 BLOCK paths (exit 2):
- No MC task ID
- No Mehanik clearance marker
- Marker stale (>4h old)
- Missing required field
- Scope ceiling exceeded
- Research dispatch missing TOOL_CONTRACT
- Invalid marker format
Validation Results
Canary tests: 3/3 PASS
- Valid 13-field marker → exit 0 (CLEAR)
- No marker file → exit 2 (BLOCKED)
- Partial marker (5/13 fields) → exit 2 (BLOCKED on missing field)
Evidence: /tmp/aif-v2-task-0.1-evidence.md
Task 0.2 — LightRAG Container Restore
MC: #9866
Owner: FlowForge
What: Restore LightRAG main container (was missing from docker ps) and verify drain worker functionality.
Why
Vision 7 (LightRAG token saving) was at 0% realization because main container was down. Each day without deduplication costs Anthropic tokens that LightRAG should eliminate. 114K docs uploaded historically, but container absent since unknown date.
Before State
- LightRAG main container: MISSING
- Local health endpoint: UNREACHABLE (curl localhost:9621/health → timeout)
- Drain worker: LaunchAgent NOT LOADED
- Queue: 276 records in outbox
After State
- Container: HEALTHY (docker ps shows
lightrag, Up 26s) - Health endpoint: RESPONSIVE (http://localhost:9621/health)
- Pipeline status:
pipeline_busy: false - Neo4j: HEALTHY (Up 41h)
- Configuration verified:
- LLM: ollama @ host.docker.internal:11434, model qwen3:8b-q8_0
- Embedding: ollama @ host.docker.internal:11434, model bge-m3:latest
- Graph: Neo4JStorage (bolt://neo4j:7687)
- Vector: NanoVectorDBStorage (22,771 entities + 43,582 relationships loaded)
- Drain worker: FUNCTIONAL (manual execution, 276/276 processed)
Caveats
- LaunchAgent bootstrap failure — Manual execution works, but
launchctl bootstrap→ I/O error. Drain worker runs manually until resolved. - Platform mismatch — Container image linux/amd64 on Apple Silicon (arm64), runs via Rosetta emulation.
- Health endpoint blocks during pipeline_busy — Single-process design limitation; /health unavailable during active ingestion (follow-up task recommended).
Evidence: /tmp/aif-v2-task-0.2-evidence.md
Task 0.3 — Quality Score Read Path Wiring
MC: #9867
Owner: AgentForge
What: Wire quality_score read path in tier-router.js to enable feedback-informed routing.
Why
36,671 rows existed in legacy agent-routing.db with NULL quality_score. Wiring the read path closes Vision 2 (self-learning) and Vision 6 (free + paid models) with zero new data collection — routing decisions now adjust based on historical agent performance.
Schema Migration
Database: ~/system/databases/tool-audit.db
Extended routing_log table with 4 new columns:
quality_scoreREAL — Success metric (0.0 = failure, 1.0 = success)caller_agentTEXT — Calling agent nametarget_tierTEXT — Target tier before adjustmentmc_task_idINTEGER — MC task reference
Implementation
Write Path:
Function updateQualityScore(routingLogId, score) at line 60.
Heuristic v1 (interim until Phase 1.4 eval harness):
- Task marked
ready→ 1.0 - Task orphaned → 0.5
- Task failed/blocked → 0.0
Read Path:
Function getRecentQualityScores(callerAgent, targetTier) at line 76.
Returns last 20 scores for {agent, tier} pair.
Tier Adjustment Logic:
- If ≥5 scores exist for {agent, tier}:
- avg < 0.6 → escalate to tier+1 (e.g., tier 2 → tier 3)
- avg > 0.85 → demote to tier-1 (e.g., tier 3 → tier 2)
- else → keep current tier
Validation Results
Smoke test: 5/5 PASS
- Write path: 5 failures (quality_score=0.0) persisted
- Read path escalation: avg=0.00 → tier 2 escalated to tier 3
- Write path: 5 successes (quality_score=1.0) persisted
- Read path demotion: avg=1.00 detected (logic verified)
- Schema validation: all 4 columns exist
Legacy archive:
agent-routing.db renamed to agent-routing.db.legacy-archive-2026-04-27 (36,671 rows, 3.5MB). Not migrated — does not reflect current routing reality.
ADR: /Users/makinja/system/specs/adr/ADR-quality-score-read-path.md
Evidence: /tmp/aif-v2-task-0.3-evidence.md
Task 0.4 — Cost Telemetry Blind Spot Fix
MC: #9868 (existing, now resolved)
Owner: CodeCraft
What: Backfill claude-cli cost data for 2026-04-17 → 2026-04-24 and add real-time stderr parser.
Why
Week magnitude cost was invisible. node ~/system/tools/cost-tracker.js summary today showed $0 for 967 claude-cli requests. Cannot optimize without measurement. This blocked all routing optimization work.
Before State
- claude-cli rows in range: 27 rows, ALL $0.00
- Root cause: Stop hook only started logging sessions with token data from 2026-04-24
Backfill Results
Script: ~/system/tools/backfill-claude-cli-costs.js
- Files processed: 21 session transcripts (2026-04-17 to 2026-04-24)
- Sessions inserted: 19
- Sessions already in DB: 2 (skipped — idempotent)
- Total cost backfilled: $41.46
- Model: claude-sonnet-4-6 (all sessions)
- Pricing: cache_write=$3.75/MTok, cache_read=$0.30/MTok, input=$3/MTok, output=$15/MTok
Week Total (2026-04-27)
- Total requests: 711
- Total cost: $163,223.11
- claude-cli: 671 req, $163,223.11
- claude-opus-4-7: 636 req, $163,182.96
- claude-sonnet-4-6: 29 req, $40.15
Magnitude: $163K/week aligns with OpenAI lens estimate ($162,945/wk).
Real-Time Capture
Added to cost-tracker.js:
parseAndTrack(stdoutJson, opts)— Parse --output-format json output, track costparseStderrLine(line, opts)— Parse individual stderr line, idempotent
Daily Cron
Script: ~/system/tools/cost-daily-report.sh
LaunchAgent: ~/Library/LaunchAgents/com.alai.cost-daily-report.plist
Schedule: 23:55 daily
Output: ~/system/reports/cost-daily.md
Evidence: /tmp/aif-v2-task-0.4-evidence.md
Task 0.5 — Trace Capture Pipeline
MC: #9869
Owner: AgentForge
What: Add PostToolUse hook that captures per-dispatch metadata to traces.db for future distillation and fine-tuning.
Why
Every agent run currently exits and disappears. Trace capture creates a passive corpus that gates ALL future AI Factory learning: distillation (Phase 2), eval harness (Phase 1.4), and fine-tuning (Phase 3). Without this, Vision 8 (own fine-tuned model) remains at 0%.
Database Schema
Location: ~/system/databases/traces.db
14 fields:
id— Primary keytimestamp— DATETIME DEFAULT CURRENT_TIMESTAMPtask_id— MC task IDagent— Subagent type or "john"session_id— Join key to costs.dbtool_name— Agent, Bash, Read, Write, Editprompt_hash— SHA256(tool_input), 16-char prefixresponse_hash— SHA256(tool_response), 16-char prefixduration_ms— Tool execution timeexit_code— 0=success, 1=error, 2=blockedmodel— Model used (if Agent)tokens_in— Input tokenstokens_out— Output tokenscost_usd— Computed cost
7 indexes: timestamp, agent, model, tool_name, prompt_hash, session_id, task_id
PostToolUse Hook
Location: ~/.claude/hooks/trace-capture.py
Language: Python 3 (fast JSON parsing, sqlite3 stdlib)
Registered: ~/.claude/settings.json PostToolUse hooks array (async: true)
Key features:
- Fire-and-forget (always exit 0 per ZAKON PI2)
- Privacy-preserving (only hashes, no raw prompts/responses)
- MC task ID extraction via regex
- Session ID from env or date fallback
- Error handling: logs to stderr, never blocks tool execution
Latency Measurement
Method: 10-iteration synthetic hook call
Results:
- Average: 45ms
- Budget: <50ms
- Status: PASS (10% under budget)
Privacy Posture
CRITICAL: No raw prompts or responses stored in traces.db.
Method:
- SHA256 hash of full tool_input
- SHA256 hash of full tool_response
- Store only 16-char hex prefix (collision-resistant for corpus size)
- Original content never persists
Rationale:
- Prevents PII leakage (credentials, API keys, personal data)
- Enables duplicate detection
- Supports eval harness (hash matching for golden tasks)
- Future fine-tuning uses hashes as index, not content
Smoke Test Results
Test 1: Row insertion — +10 rows captured (PASS)
Test 2: Privacy validation — 0 raw prompts/responses stored (PASS)
Test 3: Schema integrity — All 14 fields populated correctly (PASS)
Live integration: 64 rows captured during Proveo validation.
Evidence: /tmp/aif-v2-task-0.5-evidence.md
Caveats & Follow-ups
From Proveo Validation (MC #9870)
-
LightRAG health endpoint blocks during pipeline_busy
- Root cause: Single-process design (no separate health worker)
- Impact:
/healthunavailable during active ingestion - Recommendation: Separate health check process or async health handler
- Severity: LOW (operational monitoring gap, not functional block)
-
Hash prefix length (16-char) may need adjustment at scale
- Current corpus: 64 rows (negligible collision risk)
- At 100K rows: <0.01% collision probability
- Recommendation: Monitor at 10K rows, extend to 24-char if needed
- Severity: LOW (future consideration)
-
Table name typo in smoke test
- Test script referenced
routing_logs(wrong), actual tablerouting_log - Impact: None (test passed via fallback query)
- Resolution: Fixed in final evidence file
- Severity: TRIVIAL
- Test script referenced
-
Row count delta across validation runs
- Different smoke test runs show varying baselines (304 vs 337 rows)
- Root cause: Multiple validation passes appending to same DB
- Impact: None (idempotent inserts verified)
- Severity: TRIVIAL
How To Verify
Run these commands to validate Phase 0 backbone functionality:
Task 0.1 — Mehanik Gate
# Verify 7 exit-2 block paths exist
grep -c "exit 2" ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: 7
# Test BLOCK path (no marker)
MC_TASK_ID=9999 ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: exit 2, error message
# Test ALLOW path (valid marker)
# (Requires /mehanik clearance file in /tmp/)
MC_TASK_ID=9865 ~/.claude/hooks/pre-dispatch-gate.sh
# Expected: exit 0
Task 0.2 — LightRAG
# Verify container running
docker ps | grep lightrag
# Expected: 2 containers (lightrag, lightrag-neo4j)
# Verify health endpoint
curl -s http://localhost:9621/health | jq .
# Expected: {"pipeline_busy": false, ...}
# Check vector/graph load
docker logs lightrag 2>&1 | grep "Loaded"
# Expected: 22,771 entity vectors, 43,582 relationship vectors
Task 0.3 — Quality Score
# Verify schema extended
sqlite3 ~/system/databases/tool-audit.db ".schema routing_log"
# Expected: quality_score, caller_agent, target_tier, mc_task_id columns
# Check non-NULL quality scores
sqlite3 ~/system/databases/tool-audit.db \
"SELECT COUNT(*) FROM routing_log WHERE quality_score IS NOT NULL"
# Expected: >0 (any recent dispatches)
# Verify legacy DB archived
ls -lh ~/system/databases/agent-routing.db.legacy-archive-2026-04-27
# Expected: 3.5MB file
Task 0.4 — Cost Telemetry
# Verify today's cost non-zero
node ~/system/tools/cost-tracker.js summary today | grep claude
# Expected: $>0 for claude-cli
# Verify week magnitude
node ~/system/tools/cost-tracker.js summary week
# Expected: ~$163K total
# Verify daily report cron loaded
launchctl list | grep cost-daily-report
# Expected: com.alai.cost-daily-report with PID or status 0
Task 0.5 — Trace Capture
# Verify traces.db exists and has rows
sqlite3 ~/system/databases/traces.db "SELECT COUNT(*) FROM traces"
# Expected: >10 (grows with each dispatch)
# Verify hook registered
grep -A3 "trace-capture.py" ~/.claude/settings.json
# Expected: PostToolUse hook entry with async:true
# Verify privacy (no raw content)
sqlite3 ~/system/databases/traces.db \
"SELECT prompt_hash, response_hash FROM traces LIMIT 5"
# Expected: Only 16-char hex strings, no full text
References
Parent Plan
- AI Factory v2 Full Plan:
/Users/makinja/system/specs/ai-factory-v2-plan.md - CEO Approval: 2026-04-27 (option B, override DA-BLOCKED + triage-mode)
Lens Reports (5 expert convergent analysis)
/tmp/ai-factory-v2-petter.md— Architecture (Petter Graff)/tmp/ai-factory-v2-anthropic.md— Token economics (Anthropic Chief AI Architect)/tmp/ai-factory-v2-openai.md— Multi-provider/distillation (OpenAI Chief Architect)/tmp/ai-factory-v2-alem-clone.md— CEO reality check (Alem-Clone)/tmp/ai-factory-v2-da.md— Risk audit (Devil's Advocate)
Root Cause Analysis
- MC #9223 Final Synthesis: Mehanik Phase 2 architectural decision
- Scope creep incident 2026-04-24: 11-agent dispatch without gate (pre-Mehanik)
Architecture Decision Records
- ADR — Quality Score Read Path:
/Users/makinja/system/specs/adr/ADR-quality-score-read-path.md
Evidence Files
/tmp/aif-v2-task-0.1-evidence.md— Mehanik Phase 2 activation/tmp/aif-v2-task-0.2-evidence.md— LightRAG container restore/tmp/aif-v2-task-0.3-evidence.md— Quality score integration/tmp/aif-v2-task-0.4-evidence.md— Cost telemetry backfill/tmp/aif-v2-task-0.5-evidence.md— Trace capture pipeline/tmp/aif-v2-task-0.8-evidence.md— This documentation task
Proveo Validation
- MC #9870: Cross-validation of all 5 builder tasks (COMPLETE)
Next Steps
Immediate (Phase 0 closure)
- Proveo validates this BookStack page exists and is discoverable
- John marks MC #9870 and #9871 done
- Phase 0 declared COMPLETE
Phase 1 — Token Economics Wiring (Post-2026-05-02)
Gate: CEO must explicitly close triage mode before Phase 1 begins.
6 tasks planned:
- Anthropic prompt caching wire-up (50-70% input token reduction)
- Sub-agent context isolation (prevents 7M token bleed)
- LightRAG STEP 0 injection in 8 active agents
- Eval harness with 25 golden tasks (gates all future routing changes)
- Multi-provider fallback chain (Groq adapter wire-up)
- Proveo E2E + Skillforge docs (ZAKON PLAN mandatory)
Expected savings: $144-240/week conservative (prompt caching alone). Upper bound: $14,778/week (sub-agent isolation).
Phase 2 — Capability Expansion (Weeks 2-4)
Gate: Phase 1 must show measurable token savings (≥$3K/week) AND eval harness green.
7 tasks planned:
- AutoCoder.js Phase 1 (dry-run mode)
- ANVIL SPOF: replicate 13 P0 databases to Azure
- MCP tool schema portability
- Distillation candidate scoring
- Archive 44 orphan agents
- TTL sweep on hivemind.db
- Phase 2 Proveo E2E + Skillforge docs
Phase 3 — Strategic Horizon (Q3 2026+)
Gate: ALAI must have ≥1 paid AI Services engagement closed.
5 tasks planned:
- Fine-tune candidate review
- AIOS competitor evaluation (Cursor, Devin, OpenAI Operator, Gemini Extensions)
- Operator-style browser agents
- Anti-lying enforcement hooks
- Multimodal expansion (Realtime API, OCR)
Last Updated: 2026-04-27
Maintained By: ALAI
Document Version: 1.0
BookStack Path: Engineering / AI Factory v2 — Phase 0 Backbone