Skip to main content

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Incident Postmortem — Bilko Deploy Fix 2026-04-22

Date: 2026-04-22
Severity: High (CEO time wasted + security leak)
Status: Resolved
Type: Blameless Postmortem

Summary

A 2-hour bug fix sprint (MC tasks #8626, #8627, #8628) aimed at fixing 3 bugs in Bilko demo resulted in ZERO live changes reaching the production demo URL (bilko-demo.alai.no). All code changes were pushed to the wrong branch (feat/intesa-bih-demo instead of main), CI pipeline was silently broken for 7 days, and client-specific content (Intesa BiH pitch) leaked to the public demo URL.

Timeline (UTC+1)

Time Event Actor
2026-04-21 13:32 MC #8626 created (invoice template save button broken) John
2026-04-21 13:33 MC #8627 created (invoice PDF download fails on unsaved invoice) John
2026-04-21 13:33 MC #8628 created (settings logo upload missing) John
2026-04-21 13:46 All 3 tasks marked ready_for_review (commit d408cc6 + 53fe1d6) Brad Frost (Vizu)
2026-04-22 09:00 CEO: "Bilko demo nije updatan, bugs jos uvijek tu" Alem
2026-04-22 09:10 Discovery: All fixes pushed to feat/intesa-bih-demo (no CI on that branch) John
2026-04-22 09:15 Verification via curl + git log: main unchanged, bilko-demo.alai.no serving old code John
2026-04-22 09:36 MC #8678 created: /intesa-bridge leak discovered (HTTP 200 on public demo) John
2026-04-22 10:00 CI investigation: Last 5 runs all failed (since 2026-04-15) Kelsey (FlowForge)
2026-04-22 10:36 MC #8696 created: ZAKON PI2 Deploy Verification Protocol John
2026-04-22 12:00 Manual deploy attempt: GitHub PAT missing workflow scope (can't trigger CI fix) FlowForge
2026-04-22 12:50 Manual docker build + push (CEO hands off to FlowForge) Alem + FlowForge
2026-04-22 21:41 MC #8730 done: fix-bugs-22apr deployed, all 4 evidence checks pass FlowForge
2026-04-22 21:50 MC #8678 code fix pushed (66d2220): intesa routes deleted from main Brad Frost

Impact

User-Facing

  • Bilko demo bugs: Persisted for 1 extra day (low severity — internal demo, no external users)
  • Intesa content leak: Unknown duration (potentially days) — BiH bank integration pitch content publicly accessible at /intesa-bridge on bilko-demo.alai.no

Internal

  • CEO time lost: ~2 hours (debugging + manual deploy)
  • Trust erosion: "Validacija ne radi" feedback — John claimed done without verifying live state
  • CI health invisible: 7 days of broken deploys undetected

Root Causes (5 Failures)

1. Branch Assumption (No Pre-Flight Verification)

What happened: John inferred target branch from memory (assumed feat/intesa-bih-demo based on last session), dispatched builder without running curl -sI + git log to verify which branch serves bilko-demo.alai.no.

Why it matters: Wrong branch = wrong deploy target. All fixes landed on isolated feature branch with no CI and no domain mapping.

Prevention: ZAKON PI2 Check 2 — 4 pre-flight commands mandatory BEFORE code changes.

2. CI Broken for 7 Days Undetected

What happened: GitHub Actions workflow failing since 2026-04-15. No one noticed because:

  • No daily CI health check in boot.sh
  • Manual deploys used as workaround without logging CI status
  • gh run list not part of standard deploy checklist

Root cause:

  1. GitHub Actions quota exhausted (monthly minutes limit)
  2. --no-traffic flag on line 206 of gcp-deploy.yml prevents traffic promotion on existing services

Prevention: ZAKON PI2 Check 4 — gh run list --limit 5 before any push. If 5/5 = failure, STOP and fix CI first.

3. Intesa Content Leaked to Public URL

What happened: Commit 13c2efb merged /intesa-bridge and /intesa-cockpit routes to main branch. These were pitch-specific features for Dženana Hardaga (Intesa BiH IT director) and should have remained isolated on bilko-intesa-demo Cloud Run service.

Why it matters: Client-specific content (including BiH bank integration mockups) publicly visible on generic demo. Potential NDA violation + confusing UX for non-Intesa visitors.

Prevention:

  • ZAKON PI2 Check 3 — Branch Purity CI check (.github/workflows/branch-purity.yml)
  • Client prefix registry in ~/system/rules/client-prefix-registry.md
  • Automated check blocks PR merge if intesa-*, corpint-*, etc. routes detected on main

4. PAT Missing workflow Scope

What happened: GitHub Personal Access Token used for CI fixes lacked workflow scope. FlowForge couldn't push branch-purity.yml or fix gcp-deploy.yml via automation.

Why it matters: Blocked automated CI repair. Forced manual workarounds + CEO paste-copy anti-pattern.

Prevention: ZAKON PI2 Check 6 — gh auth status --show-token at session start. Verify repo, workflow, packages:write scopes present.

5. Manual Paste-Copy Anti-Pattern

What happened: CEO built docker image locally, pasted output to John, who pasted to FlowForge agent. FlowForge took over from "image already built" state instead of owning full build→push→deploy flow.

Why it matters: Process fragmentation = more failure points. Agent can't verify build context, dockerfile, or .dockerignore changes if it didn't run the build.

Prevention: Always dispatch FlowForge BEFORE build step. Agent owns entire flow or none of it.

What Went Well

  • Kelsey persona diagnosis: FlowForge correctly identified --no-traffic flag as root cause within 10 minutes of investigation
  • ZAKON PI2 authored mid-incident: Turned incident into system improvement without waiting for postmortem
  • .dockerignore fix: Reduced build context from 4.1GB → 50MB (8200% improvement) during incident resolution
  • Evidence gate upheld: MC #8730 not marked done until curl + Playwright + revision checks passed
  • Blameless culture: No punishment for agents; root cause analysis focused on system gaps

Action Items

Action Owner MC Task Deadline Status
Sync ZAKON PI2 to BookStack pi-orchestrator #8718 2026-04-23 PAUSED
Create DEPLOY-MAP.md in Bilko repo Skillforge #8715 2026-04-23 DONE
Bake PI2 checks into pi-orchestrator v2 pi-orchestrator #8696 (item 3) 2026-04-29 IN PROGRESS
Add pre-deploy hook (~/.claude/hooks/pre-deploy-check.sh) pi-orchestrator #8696 (item 4) 2026-04-29 DONE
Patch mc.js done with evidence gate for H-priority deploy tasks pi-orchestrator #8696 (item 5) 2026-04-29 DONE
Create client-prefix-registry.md pi-orchestrator #8696 (item 7) 2026-04-29 DONE
Fix GitHub Actions quota (upgrade plan or optimize workflows) John TBD 2026-05-01 OPEN
Remove --no-traffic flag from gcp-deploy.yml for existing services FlowForge TBD 2026-04-30 OPEN
Upgrade GitHub PAT with workflow scope John TBD 2026-04-25 OPEN
Weekly CEO audit of mc.js --ceo-override usage John #8696 (item 8) Ongoing OPEN

Lessons Learned

For John (Orchestrator)

  • Never infer deploy target from memory. Always run curl + git log + gh run list before dispatching builder.
  • CI health = system health. Broken CI for 7 days = broken deployment capability. Monitor actively.
  • Claim verification: "Task done" without live URL verification = hallucination. CEO was right: "validacija ne radi."

For Builder Agents (Brad Frost, Vizu)

  • Ready for review ≠ deployed. Code pushed to branch ≠ code live on target URL. Always verify deploy target match.
  • Client-specific routes: If building intesa-*, corpint-*, etc. — verify target branch is NOT main before merging.

For FlowForge (DevOps)

  • Own the full flow. If dispatched for deploy, own build→push→deploy→verify. Don't take over mid-stream from CEO paste-copy.
  • --no-traffic flag: Only use on first-ever deploy. Never on existing services (blocks traffic promotion).

System-Level

  • ZAKON PI2 works. All 5 root causes preventable with 6 hard checks. Enforce at agent level + hook level + MC gate level.
  • Evidence gates prevent false claims. mc.js enforcement (item 5 of #8696) blocks "done" without verification.json.
  • Blameless postmortems → system rules. This incident produced ZAKON PI2, DEPLOY-MAP.md standard, and client-prefix-registry. Net positive.
  • ZAKON PI2: ~/system/rules/zakon-pi2-deploy-verification.md (BookStack synced)
  • Client Prefix Registry: ~/system/rules/client-prefix-registry.md
  • Pre-Deploy Hook: ~/.claude/hooks/pre-deploy-check.sh
  • Feedback Log: ~/.claude/projects/-Users-makinja/memory/feedback_verify_deploy_target_before_code.md

Metrics

  • Incident duration: 32 hours (2026-04-21 13:46 → 2026-04-22 21:41)
  • CEO time lost: ~2 hours
  • Root causes identified: 5
  • New rules created: 4
  • MC tasks spawned: 10 (parent #8696 + 7 subtasks + 3 original bugs)
  • Lines of ZAKON PI2: 136
  • Evidence files generated: 11 (verification.json + 4 PNG + 6 TXT)

Follow-Up

Next review: 2026-04-29 (PI2 implementation deadline)
Owner: John
Success criteria: All 8 items in MC #8696 marked done + CI health green for 7 consecutive days


Postmortem by ALAI Skillforge, 2026-04-22
Credit: ALAI, 2026