ADR-026 pi-orchestrator reactivation (supersedes ADR-025) — 2026-05-14

Why This Matters

On 2026-05-14 at 10:14:41, pi-orchestrator successfully picked up and claimed task #100591 — a real MC task — within 30 seconds of being restored. This proves the software works. ADR-025 had concluded pi-orch "never worked" and "ran in mock mode," but the real cause was a missing kernel file (deleted, only .bak files remained) and an unloaded plist. The decommission decision was based on a deployment failure, not a software failure. This ADR corrects that record and re-establishes pi-orchestrator as the canonical autonomous poll loop for ALAI's build dispatch surface.


ADR-026 — pi-orchestrator Reactivation as Canonical Autonomous Poll Loop

Date: 2026-05-14
Status: ACCEPTED
MC: #100597
Decided by: John (Petter Graff architecture review)
Supersedes: ADR-025 (pi-orchestrator Decommission, 2026-05-09)


Context

ADR-025 (2026-05-09) declared pi-orchestrator decommissioned with the following exact claims:

"pi-orchestrator ran in mock mode. It never dispatched a real task. Port 8401 was empty at every probe."

"pi-orch never worked. 50+ days dead, no real dispatch observed in logs. 'No eligible tasks' only."

"Note: pi-orch was in mock mode. Rollback restores the process, not real dispatch capability."

These claims were wrong. The root cause was structural, not behavioral: the kernel file ~/system/kernel/pi-orchestrator.js had been deleted (only .bak files remained on disk) and the plist com.john.pi-orchestrator was not loaded in launchd. A dead process with no kernel file and no plist will of course show no activity on port 8401 — that does not mean the software does not work.

Hivemind RCA (event 67100, 2026-05-14T10:15:58Z):

"pi-orchestrator.js was deleted (only .bak files in ~/system/kernel/). plist com.john.pi-orchestrator NOT loaded. Fix: restore bak-race-window-2026-05-08, copy .new plist to active, launchctl load. PID 57544 running. workers=0 in /stats = DAG artefact, not real worker count. MC #100597 closed."

Restoration (MC #100597, 2026-05-14):

  1. Kernel restored from ~/system/kernel/pi-orchestrator.js.bak-race-window-2026-05-08.
  2. Plist com.john.pi-orchestrator loaded via launchctl load.
  3. Process came up: PID 57544.
  4. Within the first 30-second poll cycle, pi-orchestrator picked up task #100591 at 2026-05-14T10:14:41.072Z.

Force-close evidence at /tmp/evidence-100597/:

File Key fact
verification.json verified:true, pid:57544, task_picked:"100591"
daemon-stdout-tail.txt Full cycle log — task classified, routing token written, claim acquired
launchctl-list.txt com.john.pi-orchestrator present and running
stats.json status:ok, uptime:2078s, pipelines total:5 active:1

Daemon stdout excerpt (authoritative):

[2026-05-14T10:14:41.072Z] [INFO] Claude OAuth: OK (authenticated)
[2026-05-14T10:14:41.525Z] [DEBUG] Delegation filter: picked task #100591 (route=post-build)
[2026-05-14T10:14:41.541Z] [INFO] Found task #100591: Skillforge: RCA + runbook for pi-orch route restoration
[2026-05-14T10:14:59.007Z] [INFO] [orch] Blueprint available: flowforge-infra.yaml (FlowForge)
[2026-05-14T10:14:59.223Z] [INFO] Task #100591 claimed by pi-orchestrator (session=pi-orch-57544-1778753679888)

This is not mock mode. This is a real classification, a real routing-token write, and a real MC claim against a live task.


Decision

pi-orchestrator is the canonical autonomous poll loop for ALAI's build dispatch surface.

ADR-025's decommission is revoked in full. The claims that pi-orch "never worked" and "ran in mock mode" are retracted — they described a broken deployment state, not the software itself.

Canonical topology

Property Value
Kernel file ~/system/kernel/pi-orchestrator.js
Plist com.john.pi-orchestrator
LaunchAgent path ~/Library/LaunchAgents/com.john.pi-orchestrator.plist
HTTP port 8401
Poll interval 30 s (pollIntervalMs: 30000 in config)
Config ~/system/config/pi-orchestrator-config.json
Mandatory routing Enabled — all build tasks touching ~/projects/* MUST route through pi-orchestrator
Anti-hallucination hook ~/.claude/hooks/hallucination-detector.py injected into every agent context

Relationship to durable-runner (port 3052)

ADR-025 attempted to collapse the system to a single surface (durable-runner only). That was correct as an architectural instinct — dual dispatch surfaces do add complexity. However, the two processes serve different roles:

These are complementary, not duplicates. Both stay active. This is a design, not an accident.


Consequences

Immediate

  1. com.john.pi-orchestrator stays loaded. Do not unload it.
  2. ~/system/kernel/pi-orchestrator.js is a critical asset. Do not delete it. .bak retention proved its worth — the entire restoration depended on bak-race-window-2026-05-08.
  3. Any audit or documentation referencing ADR-025 as authoritative MUST be re-evaluated against this ADR. ADR-025 is superseded.

Operational protections required

Protection Rationale
Fleet watchdog must assert pi-orchestrator.js present in ~/system/kernel/ File deletion was the root cause of the 50-day outage. Watchdog would have caught this immediately.
.bak retention policy: keep at minimum the last bak-race-window-* snapshot This specific backup was the only recovery path. Without it, 50+ days of config evolution would have been lost.
Plist presence check in daemon-fleet watchdog launchctl list | grep pi-orchestrator returning nothing must trigger an alert, not silence.
No agent may unload com.john.pi-orchestrator without an explicit CEO decision The plist was unloaded as a side effect of ADR-025, which was itself based on a misdiagnosis. Unloading a core daemon must be a named, deliberate act.

Lesson: distinguish deployment failure from software failure

ADR-025 diagnosed a deployment failure (kernel file missing + plist unloaded) as a software failure ("never worked"). This is a class of error: inferring capability from a broken runtime state. Before declaring a daemon non-functional, the diagnostic checklist is:

  1. Is the kernel/binary present on disk?
  2. Is the plist loaded in launchd?
  3. Is the process running (PID)?
  4. Only then: is the process behaving correctly?

ADR-025 checked step 4 (port 8401 empty, logs show "No eligible tasks") without first verifying steps 1 and 2. That is the failure mode that produced the wrong conclusion.


What Is NOT Changed


Rollback

If pi-orchestrator must be decommissioned again in the future, the following conditions must all be true before proceeding:

  1. A named CEO decision MC exists (not a John autonomous call).
  2. A functional alternative handles autonomous poll-loop dispatch.
  3. The kernel file is archived, not deleted.
  4. The plist is archived, not deleted.
  5. A named MC documents the restoration path.

A diagnosis of "port is empty" or "no tasks in logs" is NOT sufficient grounds for decommission without first verifying kernel file presence and plist load state.


See Also


Revision #2
Created 2026-05-14 10:59:52 UTC by John
Updated 2026-06-14 20:03:18 UTC by John