Incident Postmortem — Bilko Deploy Fix 2026-04-22

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Date: 2026-04-22
Severity: High (CEO time wasted + security leak)
Status: Resolved
Type: Blameless Postmortem

Summary

A 2-hour bug fix sprint (MC tasks #8626, #8627, #8628) aimed at fixing 3 bugs in Bilko demo resulted in ZERO live changes reaching the production demo URL (bilko-demo.alai.no). All code changes were pushed to the wrong branch (feat/intesa-bih-demo instead of main), CI pipeline was silently broken for 7 days, and client-specific content (Intesa BiH pitch) leaked to the public demo URL.

Timeline (UTC+1)

Time Event Actor
2026-04-21 13:32 MC #8626 created (invoice template save button broken) John
2026-04-21 13:33 MC #8627 created (invoice PDF download fails on unsaved invoice) John
2026-04-21 13:33 MC #8628 created (settings logo upload missing) John
2026-04-21 13:46 All 3 tasks marked ready_for_review (commit d408cc6 + 53fe1d6) Brad Frost (Vizu)
2026-04-22 09:00 CEO: "Bilko demo nije updatan, bugs jos uvijek tu" Alem
2026-04-22 09:10 Discovery: All fixes pushed to feat/intesa-bih-demo (no CI on that branch) John
2026-04-22 09:15 Verification via curl + git log: main unchanged, bilko-demo.alai.no serving old code John
2026-04-22 09:36 MC #8678 created: /intesa-bridge leak discovered (HTTP 200 on public demo) John
2026-04-22 10:00 CI investigation: Last 5 runs all failed (since 2026-04-15) Kelsey (FlowForge)
2026-04-22 10:36 MC #8696 created: ZAKON PI2 Deploy Verification Protocol John
2026-04-22 12:00 Manual deploy attempt: GitHub PAT missing workflow scope (can't trigger CI fix) FlowForge
2026-04-22 12:50 Manual docker build + push (CEO hands off to FlowForge) Alem + FlowForge
2026-04-22 21:41 MC #8730 done: fix-bugs-22apr deployed, all 4 evidence checks pass FlowForge
2026-04-22 21:50 MC #8678 code fix pushed (66d2220): intesa routes deleted from main Brad Frost

Impact

User-Facing

Internal

Root Causes (5 Failures)

1. Branch Assumption (No Pre-Flight Verification)

What happened: John inferred target branch from memory (assumed feat/intesa-bih-demo based on last session), dispatched builder without running curl -sI + git log to verify which branch serves bilko-demo.alai.no.

Why it matters: Wrong branch = wrong deploy target. All fixes landed on isolated feature branch with no CI and no domain mapping.

Prevention: ZAKON PI2 Check 2 — 4 pre-flight commands mandatory BEFORE code changes.

2. CI Broken for 7 Days Undetected

What happened: GitHub Actions workflow failing since 2026-04-15. No one noticed because:

Root cause:

  1. GitHub Actions quota exhausted (monthly minutes limit)
  2. --no-traffic flag on line 206 of gcp-deploy.yml prevents traffic promotion on existing services

Prevention: ZAKON PI2 Check 4 — gh run list --limit 5 before any push. If 5/5 = failure, STOP and fix CI first.

3. Intesa Content Leaked to Public URL

What happened: Commit 13c2efb merged /intesa-bridge and /intesa-cockpit routes to main branch. These were pitch-specific features for Dženana Hardaga (Intesa BiH IT director) and should have remained isolated on bilko-intesa-demo Cloud Run service.

Why it matters: Client-specific content (including BiH bank integration mockups) publicly visible on generic demo. Potential NDA violation + confusing UX for non-Intesa visitors.

Prevention:

4. PAT Missing workflow Scope

What happened: GitHub Personal Access Token used for CI fixes lacked workflow scope. FlowForge couldn't push branch-purity.yml or fix gcp-deploy.yml via automation.

Why it matters: Blocked automated CI repair. Forced manual workarounds + CEO paste-copy anti-pattern.

Prevention: ZAKON PI2 Check 6 — gh auth status --show-token at session start. Verify repo, workflow, packages:write scopes present.

5. Manual Paste-Copy Anti-Pattern

What happened: CEO built docker image locally, pasted output to John, who pasted to FlowForge agent. FlowForge took over from "image already built" state instead of owning full build→push→deploy flow.

Why it matters: Process fragmentation = more failure points. Agent can't verify build context, dockerfile, or .dockerignore changes if it didn't run the build.

Prevention: Always dispatch FlowForge BEFORE build step. Agent owns entire flow or none of it.

What Went Well

Action Items

Action Owner MC Task Deadline Status
Sync ZAKON PI2 to BookStack pi-orchestrator #8718 2026-04-23 PAUSED
Create DEPLOY-MAP.md in Bilko repo Skillforge #8715 2026-04-23 DONE
Bake PI2 checks into pi-orchestrator v2 pi-orchestrator #8696 (item 3) 2026-04-29 IN PROGRESS
Add pre-deploy hook (~/.claude/hooks/pre-deploy-check.sh) pi-orchestrator #8696 (item 4) 2026-04-29 DONE
Patch mc.js done with evidence gate for H-priority deploy tasks pi-orchestrator #8696 (item 5) 2026-04-29 DONE
Create client-prefix-registry.md pi-orchestrator #8696 (item 7) 2026-04-29 DONE
Fix GitHub Actions quota (upgrade plan or optimize workflows) John TBD 2026-05-01 OPEN
Remove --no-traffic flag from gcp-deploy.yml for existing services FlowForge TBD 2026-04-30 OPEN
Upgrade GitHub PAT with workflow scope John TBD 2026-04-25 OPEN
Weekly CEO audit of mc.js --ceo-override usage John #8696 (item 8) Ongoing OPEN

Lessons Learned

For John (Orchestrator)

For Builder Agents (Brad Frost, Vizu)

For FlowForge (DevOps)

System-Level

Metrics

Follow-Up

Next review: 2026-04-29 (PI2 implementation deadline)
Owner: John
Success criteria: All 8 items in MC #8696 marked done + CI health green for 7 consecutive days


Postmortem by ALAI Skillforge, 2026-04-22
Credit: ALAI, 2026


Revision #2
Created 2026-04-22 21:57:52 UTC by John
Updated 2026-05-24 20:03:11 UTC by John