Task Metrics & Learning Agent

AAOS tracks every task's execution metrics and learns from patterns via a nightly learning agent.

Task Metrics Schema

Database: ~/system/databases/mission-control.db
Table: task_metrics

CREATE TABLE task_metrics (
    task_id INTEGER PRIMARY KEY,
    qa_score INTEGER,              -- /19 from qa-19.js
    token_cost_usd REAL,           -- Anthropic API cost
    duration_seconds INTEGER,       -- Wall time from start to done
    cache_hits INTEGER,             -- RAG cache hits
    agents_spawned INTEGER,         -- How many subagents
    rework_count INTEGER,           -- How many review cycles
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (task_id) REFERENCES tasks(id)
);

Field Descriptions

Field	Type	Purpose
`task_id`	INTEGER	Foreign key to `tasks.id`
`qa_score`	INTEGER	Score out of 19 from `qa-19.js` check
`token_cost_usd`	REAL	Total Anthropic API cost for this task
`duration_seconds`	INTEGER	Wall time from `mc.js start` to `mc.js done`
`cache_hits`	INTEGER	How many RAG queries hit cache vs miss
`agents_spawned`	INTEGER	How many specialist agents were spawned
`rework_count`	INTEGER	How many times REVIEW sent it back to BUILD

Purpose: Track task efficiency. Learning agent analyzes these to flag inefficient patterns.

QA-19 Score

Tool: ~/system/tools/qa-19.js

Usage:

node ~/system/tools/qa-19.js check <task-id>

19-Point Quality Gate — inspired by Quran 74:30 ("Nad njim je devetnaest" — "Over it is nineteen").

5 Phases, 19 Checks

Phase	Checks	Description
1. Preparation	1-4	GOTCHA gate, RAG query, MC task, blueprint read
2. Construction	5-10	Code quality, tests, schema adherence, dependencies
3. Verification	11-15	Functional tests, exit codes, HTTP responses, DOM
4. Validation	16-18	Validator review, security audit, evidence artifacts
5. Seal	19	HiveMind posting (learnings extracted)

Minimum thresholds:

M priority — 15/19 to pass
H priority — 17/19 to pass
CRIT priority — 19/19 to pass

Adaptive: Checks adapt by task type (web/api/script/document/email/trivial).

Rule: ~/system/rules/19-point-quality-gate.md

Learning Agent

Tool: ~/system/tools/learning-agent.js

Runs: Nightly at 02:00 via cron

Cron entry:

0 2 * * * cd ~/system && node tools/learning-agent.js >> logs/learning-agent.log 2>&1

Capabilities

Analyze task metrics
- High token cost (> $5 per task)
- Low QA score (< 15/19)
- Many rework cycles (> 2)
- Long duration (> 2 hours)
Flag patterns to HiveMind
- "kotlin-architect frequently gets database schema wrong → suggest RAG query for schema before coding"
- "nextjs-specialist high token cost on forms → suggest pre-built form component library"
Update flywheel cache
- Identifies common RAG queries that miss cache
- Pre-computes answers for top 100 queries
- Saves to ~/system/databases/flywheel.db
Suggest agent improvements
- Analyzes which agents have high rework_count
- Suggests prompt updates or new hard prompts
- Posts to ~/system/agents/improvement-suggestions.md
Generate weekly summary report
- Total tasks completed
- Average QA score
- Total token cost
- Top 5 inefficient patterns
- Top 5 most efficient agents
- Saves to ~/system/reports/learning-agent-YYYY-WW.md

Example Output

LEARNING AGENT REPORT — Week 14, 2026

Total tasks: 47
Average QA score: 16.8/19
Total token cost: $89.40
Average duration: 42 minutes

TOP INEFFICIENCIES:
1. kotlin-architect: 3 tasks with rework_count > 2 (schema mismatch)
2. nextjs-specialist: High token cost on form tasks (avg $4.20 vs $1.80)
3. qa-specialist: Missing DOM visibility checks (5 false passes)

SUGGESTED FIXES:
1. Add "read schema before coding" to kotlin-architect GOTCHA template
2. Create form component library RAG entry
3. Update qa-specialist to run Playwright visibility assertions

TOP PERFORMERS:
1. devops-specialist: 12 tasks, avg 8 min, avg $0.40, 18.5/19 QA
2. database-specialist: 9 tasks, avg 15 min, avg $1.20, 17.8/19 QA
3. api-architect: 7 tasks, avg 22 min, avg $1.80, 18.1/19 QA

HiveMind Integration

Learning agent posts findings to HiveMind:

node ~/system/agents/hivemind/hivemind.js post \
  --type learning \
  --category pattern \
  --tags "kotlin-architect,schema,rework" \
  --content "kotlin-architect frequently misses database schema — suggest RAG query for schema before coding"

Result: Future kotlin-architect spawns will RAG query schema files before writing migrations.

RAG Flywheel

Flow:

Question → SHA256 hash → flywheel.db lookup → (HIT: return cached answer) → (MISS: query HiveMind FTS → query Qdrant → query Ollama → query Anthropic → save answer to flywheel.db)

Databases

Database	Size	Purpose
`flywheel.db`	36MB	SHA256-keyed cache, fast hits
`knowledge.db`	187MB	Full RAG knowledge base
`hivemind.db`	—	Structured intel + memory (14K+ entries)

Models

ANVIL (localhost:11434) — Mac Studio M3 Ultra, 96GB

qwen3:32b
deepseek-r1:8b
Local inference, no API cost

FORGE (10.0.0.2:11434) — Remote LAN GPU server

deepseek-r1:70b
qwen3:32b
Heavier models for complex queries

Anthropic (claude.ai API) — Cloud fallback

claude-sonnet-4-6
Used when local models fail or for critical tasks

Cache Hit Rate

As of 2026-02-24: 61%

Goal: 80% by end of Q2 2026

Strategy:

Learning agent pre-computes top 100 queries nightly
John runs RAG query before EVERY action (ZAKON #12)
Specialist agents must query RAG before implementation

Metrics Dashboard (Future)

Planned: https://metrics.basicconsulting.no

Features:

Real-time QA score trend
Token cost per agent
Rework count heatmap
Cache hit rate graph
Agent efficiency leaderboard

Status: Spec'd, not yet built. Part of PLATFORM group roadmap.

Source Files

File	Purpose
`~/system/databases/mission-control.db`	task_metrics table
`~/system/tools/learning-agent.js`	Nightly analysis + HiveMind posting
`~/system/tools/qa-19.js`	19-point quality gate
`~/system/databases/flywheel.db`	RAG cache
`~/system/databases/knowledge.db`	RAG knowledge base
`~/system/agents/hivemind/hivemind.js`	HiveMind CLI
`~/system/rules/19-point-quality-gate.md`	QA-19 protocol

Cron Schedule

# Learning agent — nightly at 02:00
0 2 * * * cd ~/system && node tools/learning-agent.js >> logs/learning-agent.log 2>&1

# Flywheel cache precompute — nightly at 03:00
0 3 * * * cd ~/system && node tools/flywheel-precompute.js >> logs/flywheel.log 2>&1

# HiveMind embeddings backfill — weekly on Sunday at 04:00
0 4 * * 0 cd ~/system/agents/hivemind && node hivemind.js backfill-embeddings >> ../../logs/hivemind-backfill.log 2>&1

LaunchAgents: All cron jobs also have LaunchAgent equivalents for macOS persistence.

validator

backend-builder

frontend-builder

design-builder

backend-dev

frontend-dev

fullstack-dev

database-dev

devops-dev

integration-dev

sentinel-architect

sentinel-ba

sentinel-developer

sentinel-tester

sentinel-validator

researcher

accessibility-agent

api-architect

compliance-agent

database-specialist

design-system-agent

e2e-agent

mutation-tester

pentest-agent

performance-agent

security-auditor

supply-chain-agent

Task Metrics & Learning Agent

Task Metrics & Learning Agent

Task Metrics Schema

Field Descriptions

QA-19 Score

5 Phases, 19 Checks

Learning Agent

Capabilities

Example Output

HiveMind Integration

RAG Flywheel

Databases

Models

Cache Hit Rate

Metrics Dashboard (Future)

Source Files

Cron Schedule