Verifier Autonomy Audit

AI Factory Audit — Plan Task 2.2: Verifier Autonomy

Date: 2026-05-09 Auditor: Martin Kleppmann (CodeCraft) Classification: AUDIT-ONLY — read-only, no mutation, no live invocation


VERDICT SUMMARY (up front)

Autonomy verdict: ABSENT

The /verify-fix-loop skill is fully specified and internally consistent, but it has zero wiring into any automated trigger path. CEO is the de-facto verifier for every task that reaches mc.js ready. The skill exists only as a manually-invoked slash command.


1. End-to-End Trace of /verify-fix-loop

Source: ~/.claude/skills/verify-fix-loop/SKILL.md

Flow map

Caller (John / human) invokes: /verify-fix-loop mc_id=<N> spec_path=<path>
    │
    ▼
SKILL orchestrates in main conversation thread (not a sub-agent itself)
    │
    ├─ mkdir -p /tmp/verify-fix-loop-<mc_id>/    (EVIDENCE_DIR)
    │
    ▼
LOOP (max 3 iterations):
    │
    ├─ Step A: Task(subagent_type=verifier OR general-purpose+persona)
    │     prompt = verifier brief template (inline in SKILL.md)
    │     verifier writes: EVIDENCE_DIR/verifier-loop<N>.md  (mandatory)
    │                       /tmp/verifier-feedback-<mc_id>.md (if CONFIDENCE=FEEDBACK)
    │
    ├─ Step B: Parse STATUS + CONFIDENCE from verifier output
    │
    ├─ Step C: Branch
    │     PERFECT / VERIFIED → write SUMMARY.md (SUCCESS), exit
    │     PARTIAL            → if high_stakes: ESCALATE; else: SUCCESS_WITH_NOTES, exit
    │     FAILED             → ESCALATE (harness broken)
    │     FEEDBACK:
    │         if high_stakes or budget exhausted → ESCALATE
    │         else →
    │
    ├─ Step D: Task(subagent_type=fix-builder OR general-purpose+persona)
    │     reads /tmp/verifier-feedback-<mc_id>.md
    │     applies prescribed edits to spec_path via Edit tool
    │     returns APPLIED:<N> / PARTIAL:<N>/<M> / COULD_NOT_APPLY:<reason>
    │
    └─ LOOP_INDEX += 1 → back to Step A

Domain escalation policy

Loop budget

Termination conditions

  1. CONFIDENCE in {PERFECT, VERIFIED} → SUCCESS
  2. CONFIDENCE == PARTIAL + not high_stakes → SUCCESS_WITH_NOTES
  3. Budget exhausted (LOOP_INDEX == MAX_LOOPS with FEEDBACK) → ESCALATE
  4. High-stakes domain with FEEDBACK on first iteration → ESCALATE
  5. Any FAILED confidence → ESCALATE (harness broken)
  6. fix-builder returns COULD_NOT_APPLY → ESCALATE
  7. MC status changes to done/cancelled mid-loop → ABORT silently
  8. Cost estimate exceeds $5 → ESCALATE before next iter

Entry points (who can call this)

The SKILL.md lists trigger phrases: "verify-fix-loop", "auto-verify and fix", "verifier loop", "ne idi preko mene", "loop until pass". All trigger phrases are designed for human invocation in a conversation. No programmatic entry points exist.


2. Auto-Invocation Analysis — The Central CEO Question

pi-orchestrator.js

Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in ~/system/kernel/pi-orchestrator.js.

The orchestrator's post-completion flow (reportCompletion function, lines ~3781–3930) does:

None of these steps call the verifier, fix-builder, or verify-fix-loop skill. The "postflight" referenced in pi-orchestrator is a file marker write, NOT the /task-postflight skill.

task-postflight skill

Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in ~/.claude/skills/task-postflight/SKILL.md.

The /task-postflight skill dispatches Angie Jones (Proveo) for AC-checklist QA, not the atomic-claim verifier. These are parallel, non-overlapping verification patterns:

Hooks directory

Grep result: Only archive files matched. No active hook in ~/.claude/hooks/ references verify-fix-loop, verifier, or fix-builder.

Active hooks audited:

Daemon fleet

Grep result: ZERO matches for verify-fix-loop, verifier, fix-builder in ~/system/daemons/.

LaunchAgents

Grep result: ZERO matches in ~/Library/LaunchAgents/.

VERDICT: ABSENT

The verify-fix-loop and its constituent agents (verifier, fix-builder) have zero automated entry points. The only invocation path is a human typing a trigger phrase in a Claude Code conversation. CEO is always in the loop because there is no loop without CEO.


3. Tool-Surface Security Check

Verifier (read-only)

Definition file: ~/.claude/agents/verifier.md Declared tools: tools: Read, Grep, Glob, Bash

The tools: field includes Bash. This is the critical point.

The agent definition does NOT use a tool whitelist that removes Write/Edit/Task at the API level. It relies entirely on prompt-level enforcement ("Enforcement is prompt-only — this rule is yours to honor. You are the gatekeeper."). The verifier.md explicitly states this.

Permitted Bash commands (per prompt whitelist in verifier.md):

Escape paths documented:

Verdict on verifier tool isolation: Prompt-enforced, not API-enforced. Read-only is a behavioral constraint, not a structural constraint. The risk is manageable for a trusted model, but not cryptographically bounded.

Fix-builder (write-only, scoped)

Definition file: ~/.claude/agents/fix-builder.md Declared tools: tools: Read, Edit, Grep, Glob

The fix-builder tool list explicitly excludes:

This is stronger isolation than the verifier: the tools: field at the agent definition level excludes Bash and Write. If the agent framework enforces declared tools as a whitelist, fix-builder genuinely cannot run shell commands or create new files. It can only read existing files (Read, Grep, Glob) and apply edits to existing files (Edit).

Gap: Fix-builder cannot create new files even when feedback prescribes it. The skill handles this: "If the feedback prescribes creating a new file, mark that fix as COULD_NOT_APPLY" — the loop escalates. This is a by-design limitation, not a bug.

Verdict on fix-builder tool isolation: Structurally scoped (Bash and Write excluded from tools declaration). This is the correct pattern. The verifier should be refactored to match this approach.


4. Synthetic Dry-Trace

Selected task: MC #99389 — "Refactor /mehanik skill to progressive-disclosure pattern" (status: review, owner: pi-orchestrator)

This task was marked mc.js ready (now review) after pi-orchestrator completed it.

What WOULD have happened if /verify-fix-loop were auto-invoked:

Step 0: trigger fired when pi-orchestrator called mc.js ready #99389
         → /verify-fix-loop mc_id=99389 spec_path=~/.claude/skills/mehanik/SKILL.md
            domain=docs (inferred from skill file path)
            max_loops=3

Step A (iter 1): dispatch verifier
  - verifier reads ~/.claude/skills/mehanik/SKILL.md
  - verifier reads MC #99389 ACs via mc.js show 99389
  - verifier decomposes ACs into atomic claims:
      (a) SKILL.md exists and is < N lines (tier-1 constraint)
      (b) references/agent-brief.md exists
      (c) references/failure-modes.md exists
      (d) Skill tool callable post-refactor
  - verifier probes each atom with Read/Glob/Bash

Step B: parse CONFIDENCE
  If all files exist and SKILL.md is within limits → PERFECT → SUCCESS
  If any reference file missing → FEEDBACK
  
Step D (if FEEDBACK): dispatch fix-builder
  - fix-builder reads /tmp/verifier-feedback-99389.md
  - applies Edit to create missing sections or correct line counts
  
Step C (iter 2): re-verify → likely PERFECT → write SUMMARY.md → SUCCESS

Actual closure path used for MC #99389: The task is in review status. Looking at the review queue (25+ tasks in review), there is no evidence of verifier invocation. The closure path was: pi-orchestrator marked ready → task sits in review queue → CEO/John is the implicit reviewer. This is the CEO-as-verifier pattern the CEO wants to eliminate.


5. Comparison with Existing Patterns

liveness-claim-validator.sh

This is the fundamental architectural gap: hooks can intercept tool calls synchronously, but spinning up a verify-fix-loop requires an async agent conversation that the hook system cannot initiate.

evidence-verifier agent

File: ~/.claude/agents/evidence-verifier.md Declared tools: (not in scope of this read — but confirmed the agent exists) Auto-invoked: YES, but differently — it is called by mc-ready-gate.sh via the evidence-contract-validator.sh pathway. However, the evidence-contract-validator.sh is a pure shell script that validates JSON schema + file hashes — it does NOT dispatch the evidence-verifier agent. The agent definition exists for manual invocation. The shell script performs a deterministic (non-LLM) validation that is auto-invoked at mc.js ready time.

Pattern difference: The evidence-verifier pattern uses a shell script as the auto-invoke layer (deterministic, no LLM), with the agent definition as a fallback for edge cases. The verify-fix-loop requires LLM reasoning at every step, making shell-script auto-invocation insufficient.


6. Gap Analysis and Fix Proposal (Audit-Level Only)

Root cause of the gap

The verify-fix-loop was designed top-down as a skill (manual invocation). The liveness-claim-validator was designed bottom-up as a hook (automatic). There is no bridge layer that translates "mc.js ready event" → "spawn verify-fix-loop conversation".

The missing component is a postflight agent dispatcher: something that observes the ready event and spawns a verify-fix-loop session as a sub-agent task.

Minimum wiring needed

Option A: PostToolUse hook on mc.js ready (recommended)

Element Detail
File to modify ~/.claude/hooks/mc-ready-gate.sh (already fires on mc.js ready)
Addition location After line 196 (all gates passed — currently execs mc.js directly)
Trigger After mc.js ready succeeds, spawn verify-fix-loop as a background Task
Mechanism mc-ready-gate.sh would write a trigger file to /tmp/vfl-trigger-<mc_id>.json containing mc_id + spec_path + domain; a daemon polls this file

The problem: mc-ready-gate.sh is a synchronous shell script. It cannot spawn a conversational agent (Task dispatch requires a running Claude Code session). It can only write a file.

Option B: pi-orchestrator.js postflight hook (most natural wiring point)

Element Detail
File to modify ~/system/kernel/pi-orchestrator.js
Addition location Inside reportCompletion() function, after line ~3900 (after QA gate passes)
What to add A call to write /tmp/vfl-trigger-<task_id>.json with task metadata
Trigger The daemon below polls this and dispatches

Option C: /task-postflight skill modification (cleanest for H-tasks)

Element Detail
File to modify ~/.claude/skills/task-postflight/SKILL.md
Addition location After Section 2 (PROVEO VALIDATION DISPATCH), add Section 2b
What to add Conditional: if Proveo returns PASS AND task domain is docs/system/refactor, dispatch /verify-fix-loop before writing the postflight marker
Trigger Manual invocation of /task-postflight already exists for H/BLOCKER tasks
Advantage Stays within the skill conversation context — Task dispatch works naturally here
  1. Immediate (no new infrastructure): Add a Section 2b to /task-postflight SKILL.md that dispatches /verify-fix-loop when Proveo passes and domain is non-high-stakes. This works today for all tasks that go through /task-postflight.

  2. Systematic (covers tasks that bypass /task-postflight): Add a trigger file write to pi-orchestrator.js reportCompletion(). A lightweight daemon polls /tmp/vfl-trigger-*.json files and — when a pi-orchestrator session is active — dispatches the verify-fix-loop skill via the existing Claude Code session.

Loop budget recommendation

Escalation path when budget exhausted

  1. Write SUMMARY.md to EVIDENCE_DIR with full loop history
  2. Call node ~/system/tools/slack.js send alerts "[VFL-ESCALATED] MC #<id> — N/MAX loops used, last verdict: <CONFIDENCE>" (Slack, not CEO direct)
  3. Set task status to blocked via mc.js block with reason "verify-fix-loop budget exhausted — human review needed"
  4. John receives Slack alert and decides: (a) override + mark done, (b) dispatch additional builder, (c) extend budget via [CEO_APPROVED] token

Open Questions

  1. Tool-level enforcement for verifier: Should the verifier's tools: field be changed from Read, Grep, Glob, Bash to Read, Grep, Glob (removing Bash) to achieve structural isolation matching fix-builder? This would break the verifier's ability to run curl -sI, git log, sqlite3 -readonly probes — which are core to its value. The tradeoff is behavioral (current) vs structural enforcement.

  2. Conversation context for auto-dispatch: Spawning a verify-fix-loop Task requires an active Claude Code conversation. If pi-orchestrator fires after a conversation closes, there is no context to spawn into. Does the system need a persistent "factory session" that stays open to receive postflight dispatches?

  3. High-stakes domain detection: The SKILL.md defaults unknown domains to HIGH_STAKES (no autonomous correction). For auto-invocation, domain inference from spec path heuristics will frequently return unknown. Should the default be flipped to docs for auto-invoked postflight use cases?

  4. Proveo vs verifier: overlap management: /task-postflight already dispatches Proveo for AC-checklist QA. If verify-fix-loop is added as Section 2b, tasks will run both Proveo (AC checklist) AND verifier (atomic claims) sequentially. Is this the intended double-verification model, or should one replace the other for certain task types?

  5. mc.js ready event vs pi-orchestrator ready: Some tasks are marked ready by human John (node ~/system/tools/mc.js ready <id>), others by pi-orchestrator after build completion, and others by /task-postflight. The auto-invocation wiring point differs for each path. A comprehensive solution needs to intercept all three paths.


Evidence Metadata

Item Value
Files read 8
Grep/Bash tool calls 12
Live agent invocations 0
Mutations 0
Wall-clock (estimated) ~18 min
Key source files ~/.claude/skills/verify-fix-loop/SKILL.md, ~/.claude/agents/verifier.md, ~/.claude/agents/fix-builder.md, ~/.claude/skills/task-postflight/SKILL.md, ~/system/kernel/pi-orchestrator.js (lines 3730–3930), ~/.claude/hooks/mc-ready-gate.sh, ~/.claude/hooks/liveness-claim-validator.sh

Revision #2
Created 2026-05-09 19:44:21 UTC by John
Updated 2026-06-14 20:02:56 UTC by John