Task Metrics & Learning Agent
Task Metrics & Learning Agent
AAOS tracks every task's execution metrics and learns from patterns via a nightly learning agent.
Task Metrics Schema
Database: ~/system/databases/mission-control.db
Table: task_metrics
CREATE TABLE task_metrics (
task_id INTEGER PRIMARY KEY,
qa_score INTEGER, -- /19 from qa-19.js
token_cost_usd REAL, -- Anthropic API cost
duration_seconds INTEGER, -- Wall time from start to done
cache_hits INTEGER, -- RAG cache hits
agents_spawned INTEGER, -- How many subagents
rework_count INTEGER, -- How many review cycles
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (task_id) REFERENCES tasks(id)
);
Field Descriptions
| Field | Type | Purpose |
|---|---|---|
task_id |
INTEGER | Foreign key to tasks.id |
qa_score |
INTEGER | Score out of 19 from qa-19.js check |
token_cost_usd |
REAL | Total Anthropic API cost for this task |
duration_seconds |
INTEGER | Wall time from mc.js start to mc.js done |
cache_hits |
INTEGER | How many RAG queries hit cache vs miss |
agents_spawned |
INTEGER | How many specialist agents were spawned |
rework_count |
INTEGER | How many times REVIEW sent it back to BUILD |
Purpose: Track task efficiency. Learning agent analyzes these to flag inefficient patterns.
QA-19 Score
Tool: ~/system/tools/qa-19.js
Usage:
node ~/system/tools/qa-19.js check <task-id>
19-Point Quality Gate — inspired by Quran 74:30 ("Nad njim je devetnaest" — "Over it is nineteen").
5 Phases, 19 Checks
| Phase | Checks | Description |
|---|---|---|
| 1. Preparation | 1-4 | GOTCHA gate, RAG query, MC task, blueprint read |
| 2. Construction | 5-10 | Code quality, tests, schema adherence, dependencies |
| 3. Verification | 11-15 | Functional tests, exit codes, HTTP responses, DOM |
| 4. Validation | 16-18 | Validator review, security audit, evidence artifacts |
| 5. Seal | 19 | HiveMind posting (learnings extracted) |
Minimum thresholds:
- M priority — 15/19 to pass
- H priority — 17/19 to pass
- CRIT priority — 19/19 to pass
Adaptive: Checks adapt by task type (web/api/script/document/email/trivial).
Rule: ~/system/rules/19-point-quality-gate.md
Learning Agent
Tool: ~/system/tools/learning-agent.js
Runs: Nightly at 02:00 via cron
Cron entry:
0 2 * * * cd ~/system && node tools/learning-agent.js >> logs/learning-agent.log 2>&1
Capabilities
-
Analyze task metrics
- High token cost (> $5 per task)
- Low QA score (< 15/19)
- Many rework cycles (> 2)
- Long duration (> 2 hours)
-
Flag patterns to HiveMind
- "kotlin-architect frequently gets database schema wrong → suggest RAG query for schema before coding"
- "nextjs-specialist high token cost on forms → suggest pre-built form component library"
-
Update flywheel cache
- Identifies common RAG queries that miss cache
- Pre-computes answers for top 100 queries
- Saves to
~/system/databases/flywheel.db
-
Suggest agent improvements
- Analyzes which agents have high rework_count
- Suggests prompt updates or new hard prompts
- Posts to
~/system/agents/improvement-suggestions.md
-
Generate weekly summary report
- Total tasks completed
- Average QA score
- Total token cost
- Top 5 inefficient patterns
- Top 5 most efficient agents
- Saves to
~/system/reports/learning-agent-YYYY-WW.md
Example Output
LEARNING AGENT REPORT — Week 14, 2026
Total tasks: 47
Average QA score: 16.8/19
Total token cost: $89.40
Average duration: 42 minutes
TOP INEFFICIENCIES:
1. kotlin-architect: 3 tasks with rework_count > 2 (schema mismatch)
2. nextjs-specialist: High token cost on form tasks (avg $4.20 vs $1.80)
3. qa-specialist: Missing DOM visibility checks (5 false passes)
SUGGESTED FIXES:
1. Add "read schema before coding" to kotlin-architect GOTCHA template
2. Create form component library RAG entry
3. Update qa-specialist to run Playwright visibility assertions
TOP PERFORMERS:
1. devops-specialist: 12 tasks, avg 8 min, avg $0.40, 18.5/19 QA
2. database-specialist: 9 tasks, avg 15 min, avg $1.20, 17.8/19 QA
3. api-architect: 7 tasks, avg 22 min, avg $1.80, 18.1/19 QA
HiveMind Integration
Learning agent posts findings to HiveMind:
node ~/system/agents/hivemind/hivemind.js post \
--type learning \
--category pattern \
--tags "kotlin-architect,schema,rework" \
--content "kotlin-architect frequently misses database schema — suggest RAG query for schema before coding"
Result: Future kotlin-architect spawns will RAG query schema files before writing migrations.
RAG Flywheel
Flow:
Question → SHA256 hash → flywheel.db lookup → (HIT: return cached answer) → (MISS: query HiveMind FTS → query Qdrant → query Ollama → query Anthropic → save answer to flywheel.db)
Databases
| Database | Size | Purpose |
|---|---|---|
flywheel.db |
36MB | SHA256-keyed cache, fast hits |
knowledge.db |
187MB | Full RAG knowledge base |
hivemind.db |
— | Structured intel + memory (14K+ entries) |
Models
ANVIL (localhost:11434) — Mac Studio M3 Ultra, 96GB
qwen3:32bdeepseek-r1:8b- Local inference, no API cost
FORGE (10.0.0.2:11434) — Remote LAN GPU server
deepseek-r1:70bqwen3:32b- Heavier models for complex queries
Anthropic (claude.ai API) — Cloud fallback
claude-sonnet-4-6- Used when local models fail or for critical tasks
Cache Hit Rate
As of 2026-02-24: 61%
Goal: 80% by end of Q2 2026
Strategy:
- Learning agent pre-computes top 100 queries nightly
- John runs RAG query before EVERY action (ZAKON #12)
- Specialist agents must query RAG before implementation
Metrics Dashboard (Future)
Planned: https://metrics.basicconsulting.no
Features:
- Real-time QA score trend
- Token cost per agent
- Rework count heatmap
- Cache hit rate graph
- Agent efficiency leaderboard
Status: Spec'd, not yet built. Part of PLATFORM group roadmap.
Source Files
| File | Purpose |
|---|---|
~/system/databases/mission-control.db |
task_metrics table |
~/system/tools/learning-agent.js |
Nightly analysis + HiveMind posting |
~/system/tools/qa-19.js |
19-point quality gate |
~/system/databases/flywheel.db |
RAG cache |
~/system/databases/knowledge.db |
RAG knowledge base |
~/system/agents/hivemind/hivemind.js |
HiveMind CLI |
~/system/rules/19-point-quality-gate.md |
QA-19 protocol |
Cron Schedule
# Learning agent — nightly at 02:00
0 2 * * * cd ~/system && node tools/learning-agent.js >> logs/learning-agent.log 2>&1
# Flywheel cache precompute — nightly at 03:00
0 3 * * * cd ~/system && node tools/flywheel-precompute.js >> logs/flywheel.log 2>&1
# HiveMind embeddings backfill — weekly on Sunday at 04:00
0 4 * * 0 cd ~/system/agents/hivemind && node hivemind.js backfill-embeddings >> ../../logs/hivemind-backfill.log 2>&1
LaunchAgents: All cron jobs also have LaunchAgent equivalents for macOS persistence.
No comments to display
No comments to display