MC 103057 — Bilko Demo API Hang Validation 2026-06-06
MC #103057 — Bilko demo API hang recurrence validation (3f56ab5)
Deployment
- PR: https://github.com/johnatbasicas/bilko/pull/267
- Merge commit:
3f56ab5198fd37b43cbbf8a917d7ce4236d42104 - Stage Cloud Build:
7b413a82-256a-4cfc-8917-b0b06376d850= SUCCESS - Stage API image promoted to demo:
europe-north1-docker.pkg.dev/tribal-sign-487920-k0/bilko/api:stage-3f56ab5@sha256:5ca9a347c56375757fd563980034307f2d1d87327d1958b4fbd593c4b34741c6 - Demo revision:
bilko-api-demo-mc103057-3f56ab5 - Demo traffic: 100% to
bilko-api-demo-mc103057-3f56ab5
Changes
- Ktor Netty groups configured:
connectionGroupSize=2,workerGroupSize=4,callGroupSize=32. - Demo Cloud Run deploy config adjusted to
concurrency=1,min-instances=1,max-instances=5,cpu-throttling=false.
Validation evidence
- Pre-deploy recurrence evidence:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-resume-20260606.json(10/20 pass, 10/20 abort/timeouts). - P2P pre-verifier PASS:
mesh-thr-969f5997-357c-49a1-8f51-00de356f781a; evidence/tmp/alai/company-mesh-auto-responder/2026-06-06T17-38-47-783Z-mesh-msg-65457e77-674c-4e86-854f-0a9165a7c829.json. - Local validation:
cd /tmp/bilko-wt-hang/apps/api && gradle test --tests no.alai.bilko.auth.JwtServiceTest=> BUILD SUCCESSFUL. - GitHub Actions CI: run
27069304656=> SUCCESS. - Demo no-traffic smoke:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-smoke-3f56ab5-20260606T195531Z.json=> 20/20 pass. - Post-promote custom-domain probe:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-post-promote-custom-20260606T195559Z.json=> 40/40 pass. - Sustained watcher:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/sustained-health-watch-3f56ab5-20260606T195637Z.json=> 149/150 pass over ~76 minutes. One client-sideAbortErrorat 2026-06-06T21:00:27Z; Cloud Run logs for the same revision/window showed no non-200 and no >1s latency entries. - Cloud Run log check:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/cloudrun-logs-sustained-window-3f56ab5-20260606T211402Z.json=> HTTP status counts 98 x 200, slow_or_non200=0. - Final custom-domain probe after recurrence window:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-final-custom-3f56ab5-20260606T211421Z.json=> 30/30 pass. - Final direct Cloud Run probe after recurrence window:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/health-probe-final-direct-3f56ab5-20260606T211430Z.json=> 30/30 pass. - Final Cloud Run log check:
/tmp/alai/d7bced9a/evidence-bilko-demo-flaky/cloudrun-logs-final-window-3f56ab5-20260606T211451Z.json=> HTTP status counts 98 x 200, slow_or_non200=0.
Verdict
READY FOR VALIDATOR REVIEW. The previous ~50% recurring /health hang/504 pattern was not observed after deploy. There was one client-side abort in the 76-minute watcher, but Cloud Run did not record a matching 504/non-200 or slow request for the new revision, and final custom + direct probes were clean (60/60).
Follow-up
MC #103060 remains the durable architectural fix: migrate blocking Exposed transaction {} usage to newSuspendedTransaction(Dispatchers.IO) or a bounded DB dispatcher.
No comments to display
No comments to display