Skip to main content

MC 103057 — Bilko Demo API Hang Validation 2026-06-06

MC #103057 — Bilko demo API hang recurrence validation (3f56ab5)

Deployment

  • PR: https://github.com/johnatbasicas/bilko/pull/267
  • Merge commit: 3f56ab5198fd37b43cbbf8a917d7ce4236d42104
  • Stage Cloud Build: 7b413a82-256a-4cfc-8917-b0b06376d850 = SUCCESS
  • Stage API image promoted to demo: europe-north1-docker.pkg.dev/tribal-sign-487920-k0/bilko/api:stage-3f56ab5@sha256:5ca9a347c56375757fd563980034307f2d1d87327d1958b4fbd593c4b34741c6
  • Demo revision: bilko-api-demo-mc103057-3f56ab5
  • Demo traffic: 100% to bilko-api-demo-mc103057-3f56ab5

Changes

  • Ktor Netty groups configured: connectionGroupSize=2, workerGroupSize=4, callGroupSize=32.
  • Demo Cloud Run deploy config adjusted to concurrency=1, min-instances=1, max-instances=5, cpu-throttling=false.

Validation evidence

  • Pre-deploy recurrence evidence: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-resume-20260606.json (10/20 pass, 10/20 abort/timeouts).
  • P2P pre-verifier PASS: mesh-thr-969f5997-357c-49a1-8f51-00de356f781a; evidence /tmp/alai/company-mesh-auto-responder/2026-06-06T17-38-47-783Z-mesh-msg-65457e77-674c-4e86-854f-0a9165a7c829.json.
  • Local validation: cd /tmp/bilko-wt-hang/apps/api && gradle test --tests no.alai.bilko.auth.JwtServiceTest => BUILD SUCCESSFUL.
  • GitHub Actions CI: run 27069304656 => SUCCESS.
  • Demo no-traffic smoke: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-smoke-3f56ab5-20260606T195531Z.json => 20/20 pass.
  • Post-promote custom-domain probe: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-post-promote-custom-20260606T195559Z.json => 40/40 pass.
  • Sustained watcher: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/sustained-health-watch-3f56ab5-20260606T195637Z.json => 149/150 pass over ~76 minutes. One client-side AbortError at 2026-06-06T21:00:27Z; Cloud Run logs for the same revision/window showed no non-200 and no >1s latency entries.
  • Cloud Run log check: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/cloudrun-logs-sustained-window-3f56ab5-20260606T211402Z.json => HTTP status counts 98 x 200, slow_or_non200=0.
  • Final custom-domain probe after recurrence window: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-final-custom-3f56ab5-20260606T211421Z.json => 30/30 pass.
  • Final direct Cloud Run probe after recurrence window: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-final-direct-3f56ab5-20260606T211430Z.json => 30/30 pass.
  • Final Cloud Run log check: /tmp/alai/d7bced9a/evidence-bilko-demo-flaky/cloudrun-logs-final-window-3f56ab5-20260606T211451Z.json => HTTP status counts 98 x 200, slow_or_non200=0.

Verdict

READY FOR VALIDATOR REVIEW. The previous ~50% recurring /health hang/504 pattern was not observed after deploy. There was one client-side abort in the 76-minute watcher, but Cloud Run did not record a matching 504/non-200 or slow request for the new revision, and final custom + direct probes were clean (60/60).

Follow-up

MC #103060 remains the durable architectural fix: migrate blocking Exposed transaction {} usage to newSuspendedTransaction(Dispatchers.IO) or a bounded DB dispatcher.