Set-Cookie Cross-Origin Regression — RCA + Fix Pattern
Bilko Set-Cookie Cross-Origin Regression — RCA + Fix Pattern
MC: #9499 (final fix), #9495 (canary discovery), #9398 (original same-origin fix)
Resolved: 2026-04-27
Final fix: bilko-web rev 00029-zkp + bilko-api rev 00062-gwx
Problem
User authentication failed on Bilko demo despite successful API login response. Symptoms:
- POST
/auth/login→ HTTP 200 with valid user/org/tokens payload refreshTokencookie NOT stored in browser- Subsequent
/auth/refresh→ HTTP 401 "No refresh token" - User remained on
/loginpage, unable to access/dashboard
This occurred despite MC #9398 fixing the same issue 2 days earlier — indicating a regression.
Root Cause (Compound 2-Layer)
Layer 1: Cross-eTLD+1 Boundary
Frontend: bilko-demo.alai.no
Backend API (actual target): bilko-api-762788903040.europe-north1.run.app
These are different registrable domains (alai.no vs run.app). Cookies with SameSite=Strict or SameSite=Lax cannot be stored cross-origin when the origins differ at the eTLD+1 level.
The browser rejects the Set-Cookie header entirely — no cookie is stored, no cookie is sent to /auth/refresh.
Fix in MC #9398: Domain mapping created bilko-demo-api.alai.no → Cloud Run service, making frontend and API share the same registrable domain (alai.no). SameSite=Lax allows same-site cookies across subdomains.
Layer 2: Next.js NEXT_PUBLIC_* Baked at BUILD TIME
In Next.js, environment variables prefixed with NEXT_PUBLIC_ are inlined at compile time by Webpack.
// Code written by developer:
const apiUrl = process.env.NEXT_PUBLIC_API_URL
// Code in compiled bundle after build:
const apiUrl = 'https://bilko-api-762788903040.europe-north1.run.app/api/v1'
Consequence: Setting or updating NEXT_PUBLIC_API_URL at runtime (via Cloud Run service environment variables) has ZERO EFFECT. The old URL remains baked into the JavaScript bundle from the previous build.
Evidence: MC #9499 canary-postfix test showed:
- Cloud Run service env var set to
NEXT_PUBLIC_API_URL=https://bilko-demo-api.alai.no/api/v1 - Deployed frontend still made requests to
bilko-api-762788903040.europe-north1.run.app - No subdomain URL found in compiled JS bundle
Fix: Docker image must be rebuilt with --build-arg NEXT_PUBLIC_API_URL=https://bilko-demo-api.alai.no/api/v1 to bake the correct URL into the bundle.
Failed Attempts (Lessons Learned)
Attempt 1 — Domain Mapping Only (MC #9398)
What was done:
- Created
bilko-demo-api.alai.nosubdomain pointing to Cloud Run - SameSite=Lax cookie policy on backend
- Frontend deployed with runtime env var (not rebuild)
Result: Worked initially because the previous build happened to have the correct URL. Regressed on next deploy when image was rebuilt without --build-arg, reverting to hardcoded .run.app URL.
Lesson: Domain mapping is necessary but not sufficient. Frontend bundle content matters.
Attempt 2 — Cloud Run Runtime Env Only (MC #9495 → #9499, first iteration)
What was done (Hadi Hariri):
- Set
NEXT_PUBLIC_API_URLviagcloud run services update --set-env-vars - Restarted service (but did NOT rebuild image)
Result: FAIL. Canary test showed frontend still calling .run.app direct URL.
Lesson: Runtime env vars are visible to server-side code but do NOT affect client-side code already compiled into the bundle. Next.js requires rebuild.
Final Fix
Backend (bilko-api)
Update session cookie configuration:
SESSION_COOKIE_SECURE=true
SESSION_COOKIE_SAMESITE=lax
Frontend (bilko-web) — REBUILD
Docker image must be rebuilt with build-time argument:
docker build \
--build-arg NEXT_PUBLIC_API_URL=https://bilko-demo-api.alai.no/api/v1 \
-f apps/web/Dockerfile \
-t bilko-web:00029-zkp \
.
Dockerfile must declare the ARG and set ENV:
ARG NEXT_PUBLIC_API_URL
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
Then deploy new revision to Cloud Run. Runtime env var should also be set (for server-side rendering), but rebuild is mandatory.
Verification
⚠️ CRITICAL: curl is NOT a Valid Oracle for SameSite
Testing with curl or fetch does NOT prove cookie storage. The Set-Cookie header may appear in response headers but the browser's cookie jar enforcement is separate.
SameSite restrictions apply to browser cookie storage, not HTTP-level headers. Only a real browser test with cookie jar inspection proves success.
Tools used:
- Playwright with
context.cookies()API - Browser DevTools Application → Storage → Cookies
Canary Test Results
Three iterations:
- MC #9495 canary: FAIL — frontend calling
.run.appURL, no cookie stored - MC #9499 canary-postfix (runtime env only): FAIL — frontend still calling
.run.app, no rebuild - MC #9499 canary-rebuild (full fix): PASS — all 5 acceptance criteria met
Final Pass Criteria (canary-rebuild.md):
| # | Criterion | Result |
|---|---|---|
| 1 | All API URLs use bilko-demo-api.alai.no (NOT .run.app) |
PASS |
| 2 | refreshToken cookie stored (sameSite=Lax, secure=true, httpOnly=true) |
PASS |
| 3 | /auth/refresh returns 200 (app-initiated flow, ignoring test artefact 403) |
PASS |
| 4 | Dashboard URL stays /dashboard (not redirected to /login) |
PASS |
| 5 | Authenticated dashboard shows seed data (5.1M RSD cash, charts) | PASS |
Next.js Frontend Deploy Checklist
To prevent this regression in future deploys:
- ALL
NEXT_PUBLIC_*env vars must be--build-argwhen building Docker image - Dockerfile MUST declare ARG + ENV:
ARG NEXT_PUBLIC_API_URL ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL - After deploy: Bundle inspection to verify URL baked correctly:
# Extract and inspect JS chunks grep -r "bilko-demo-api.alai.no" .next/static/chunks/ - Set runtime env too (for server-side rendering and consistency)
- Cross-origin cookies: Frontend and API must share same registrable domain (e.g.,
*.alai.no). SameSite=Lax allows same-site, different subdomain.
Cloud Build Pattern
Current cloudbuild.yaml (lines 8-11, 143-145):
substitutions:
_API_URL: https://bilko-api-762788903040.europe-north1.run.app/api/v1 # ⚠️ WRONG
steps:
- id: build-web
args:
- --build-arg NEXT_PUBLIC_API_URL=$_API_URL # Uses substitution
Good: Uses --build-arg with substitution variable.
⚠️ OPEN ISSUE: Default _API_URL is .run.app direct URL, not the subdomain. This means builds triggered from GitHub without manual substitution override will bake the wrong URL.
Required fix: Update default substitution:
substitutions:
_API_URL: https://bilko-demo-api.alai.no/api/v1 # ✅ Correct subdomain
This requires followup MC task to update cloudbuild.yaml and redeploy to verify.
Cross-References
-
MC Tasks:
- #9398 — Original same-origin fix (domain mapping created)
- #9495 — Canary discovery (regression confirmed)
- #9499 — Final fix (rebuild + SameSite=Lax)
- #9529 — Cloud Build (contains current cloudbuild.yaml)
- (pending) — Fix cloudbuild.yaml default
_API_URLsubstitution
-
Memory:
feedback_curl_is_not_browser_test.md— curl HTTP 200 ≠ demo worksfeedback_deploy_verification_protocol.md— ZAKON PI2 deploy gates
-
Evidence:
docs/evidence/9495/canary.png— Screenshot showing unauthenticated /logindocs/evidence/9499/canary-postfix.md— FAIL after runtime env onlydocs/evidence/9499/canary-rebuild.md— PASS after full rebuild
Key Takeaways
-
Domain alignment is necessary but not sufficient — frontend and API must share registrable domain, AND frontend code must target that domain.
-
Next.js NEXTPUBLIC* variables are build-time constants — runtime env vars do NOT update client-side code. Always rebuild when changing public env vars.
-
curl/fetch tests cannot validate cookie storage — SameSite enforcement happens in browser cookie jar, not HTTP layer. Use Playwright or manual browser inspection.
-
SameSite=Lax is the right balance for same-registrable-domain subdomains. SameSite=Strict blocks legitimate cross-subdomain flows. SameSite=None is too permissive (requires CSRF tokens everywhere).
-
Regression prevention requires CI enforcement — Cloud Build substitutions must have correct defaults to avoid silent regressions on automated deploys.