Skip to main content

Rollback Plan

Rollback PlanPlan: Drop — Fintech Payment App

Project: {{PROJECT_NAME}}Drop — Remittance + QR Payments Version: {{VERSION}}0.5.0 Date: {{DATE}}2026-02-23 Author: {{AUTHOR}}John (AI Director) Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 {{DATE}}2026-02-23 {{AUTHOR}}John Initial draft — Fly.io deployment on Stockholm region

Rollback Summary

Field Value
Deployment being rolled back v{{VERSION}}v0.5.0
Rollback target version v{{ROLLBACK_VERSION}}v0.4.x (previous stable)
Rollback image / artifact {{ROLLBACK_IMAGE}}registry.fly.io/drop-app:v0.4.x
DB migration reversible {{DB_REVERSIBLE}}Yes (Phase 0.5 adds tables only; no destructive migrations)
Estimated rollback time {{ROLLBACK_TIME}}2–5 minutes (Fly.io blue/green instant rollback)
Rollback owner {{ROLLBACK_OWNER}}John (AI Director)
Backup to restore (if needed) {{BACKUP_ID}}Fly.io (volume snapshot taken atbefore {{BACKUP_TIME}})migration

1. Rollback Decision Criteria

Roll back immediately if ANY of these conditions occur:

Trigger Threshold Measurement Wait Before Deciding
Error rate spike > {{ERROR_THRESHOLD}}%1% 5xx errors Rolling 5-min average on Fly.io metrics {{WAIT_DURATION}}5 minutes
P99 latency spike > {{P99_THRESHOLD}}ms2,000ms sustained Rolling 5-min P99 on Fly.io metrics {{WAIT_DURATION}}5 minutes
Health check failures >Any {{HEALTH_FAIL_PCT}}%instance instancesunhealthy LoadFly.io load balancer health checks 0 minutes (immediate)
Smoke test failure Any critical Playwright test fails Automateduser-flows smokeE2E testssuite 0 minutes (immediate)
Data integrity issue Any confirmed data corruptioncorruption; balance column found in users table Post-deploy verification (db.test.ts assertions) 0 minutes (immediate)
Security vulnerability Critical severity confirmed (e.g., auth bypass, JWT exposure) Security alert0 minutes (immediate)
Pass-through model violationDrop found to be holding customer funds in any DB columnSchema check 0 minutes (immediate)

Do NOT roll back for:

  • Warning-level alerts that were present pre-deployment
  • Increased error rate in non-critical paths < {{MINOR_ERROR_THRESHOLD}}%0.5%
  • Expected behavior changes (verify against release notes first)
  • Cosmetic/visual issues that don't affect functionality
  • Mock BaaS timeout errors (expected in MVP; not production-blocking)

2. Rollback Authority

Situation Authority
StandardAutomated rollbacktrigger (automatedsmoke trigger)test fails) On-call engineerJohn (AI Director) — no CEO approval needed)needed
Manual rollback (judgment call)call, business hours) SeniorJohn engineer(AI onDirector) duty— inform Alem post-rollback
Business-hours manualManual rollback involving data loss risk EngineeringAlem ManagerBašić (CEO) approval recommendedrequired
Off-hours manual rollback On-call leadJohn (AI Director) — inform managerAlem post-rollback)immediately after

Authorization contact: {{ROLLBACK_AUTHORITY}}John |(AI {{PHONE}}Director) | Slack: {{SLACK}}#drop-deploy on alai-talk.slack.com Emergency escalation: Alem Bašić — +47 40 47 42 51


3. Pre-Rollback Assessment

Data Changes Since Deployment

  • Deployment time: {{DEPLOYMENT_TIME}}Recorded at time of deployment (see deployment log)
  • Data changes since deployment: {{DATA_CHANGES}}Estimated from Fly.io metrics (transaction count in audit_logs)
  • Critical data at risk: {{DATA_RISK}}User registrations, completed transactions (both are financial records)
  • Acceptable to lose thistransaction data? Yes / No / Needstransactions analysisare financial records; if loss is possible, prefer forward fix over rollback

Decision:Decision framework:

  • If deployment < 30 min ago and 0 transactions completed → Proceed with rollback
  • /
  • If Rollbackdeployment with> data30 preservationmin stepsago /or Dotransactions NOTcompleted rollback (dataEscalate lossto unacceptable)

    Alem for decision
  • Drop is a PSD2 pass-through model — no funds stored; transaction records are audit trail only

Database Migration Reversibility

Migration Type Reversible Down Migration Available
{{MIGRATION_1}}0005_security_hardening.sql — Add audit_logs table {{TYPE}}Add table {{REVERSIBLE}}Yes (DROP TABLE is safe) {{AVAILABLE}}Yes
{{MIGRATION_2}}0005_security_hardening.sql — Add rate_limit_requests table {{TYPE}}Add table {{REVERSIBLE}}Yes {{AVAILABLE}}Yes
0005_security_hardening.sql — Add transaction_locks tableAdd tableYesYes

IfPhase migration0.5 migrations are all additive (add-only). No column drops, no type changes. Rolling back schema is NOT reversible: Rollback requires database restore from backup (see Section 4.2)safe.

External System State

System Events Processed Since Deploy Reversible Action if Rollback
PaymentMock gatewayBaaS (PISP) {{PAYMENT_COUNT}}Transaction transactionsrecords in DB only NoN/A (mocked) No action — mock transactions stand
EmailMock serviceSumsub KYC {{EMAIL_COUNT}}KYC emailswebhook sentevents NoN/A (mocked) No action — emailsmock sentKYC standstatus stands
WebhooksRate limiter DB {{WEBHOOK_COUNT}}Request deliveredcount recordsYes No action needed
Audit logs NotifyImmutable downstreamlog systemsentriesNo — by designNo action — audit logs are compliance records

4. Rollback Procedures

4.1 Application Rollback (Step by Step)

Total estimated time: {{APP_ROLLBACK_TIME}}2–5 minutes

# Step 1: Announce rollback (required)
# Post in war#drop-deploy room:Slack: "ROLLBACK initiated — v{{VERSION}}v0.5.0v{{ROLLBACK_VERSION}}v0.4.x — Reason: [state reason]"

# Step 2: Trigger rollback deployment via Fly.io
# Option A — CIFly.io pipelinerollback rollback:to {{CI_ROLLBACK_CMD}}previous release:
flyctl releases list --app drop-app  # Find the previous release number
flyctl deploy --app drop-app --image registry.fly.io/drop-app:v0.4.x

# Option B — DirectFly.io deploymentbuilt-in withrollback previouscommand:
image:flyctl {{DIRECT_ROLLBACK_CMD}}machine update --app drop-app --image registry.fly.io/drop-app:v0.4.x

# Step 3: Monitor rollback progress
{{MONITOR_CMD}}flyctl logs --app drop-app

# Step 4: Confirm rollback complete
curl {{URL}}https://getdrop.no/api/versionhealth
# Should returnreturn: {{ROLLBACK_VERSION}"status":"ok","db":"connected","version":"0.4.x"}

Verification commands:

# Check all instances running rollback version
{{INSTANCE_CHECK_CMD}}flyctl status --app drop-app

# Check health
curl {{URL}}-i https://getdrop.no/api/health

# CheckRun smoke tests against rolled-back version
npx playwright test --project=user-flows

# Verify error rate (shouldhas dropdropped
immediately)flyctl {{ERROR_RATE_CMD}}metrics --app drop-app

4.2 Database Rollback (Migration Down)

Warning: Execute database rollback ONLY after confirming:

  1. Application rollback is complete
  2. Data loss from migration reversal is acceptable (see Section 3)
  3. Down migration is available and(it testedis, for Phase 0.5)
# Step 1: Confirm current migration state
{{MIGRATION_STATUS_CMD}}flyctl ssh console -a drop-app -C "npm run db:migrate:status"

# Step 2: Take emergency backup BEFORE running down migration
{{DB_BACKUP_CMD}}flyctl volumes list --app drop-app  # Get volume ID
# Manual: Create volume snapshot via Fly.io dashboard

# Step 3: Run down migration {{DOWN_MIGRATION_CMD}}(reverses Phase 0.5 security tables)
flyctl ssh console -a drop-app -C "npm run db:migrate:down"
# Drops: audit_logs, rate_limit_requests, transaction_locks tables

# Step 4: Verify migration state
{{MIGRATION_VERIFY_CMD}}flyctl ssh console -a drop-app -C "npm run db:migrate:status"
# Should show v0.4.x migrations only

# Step 5: Verify data integrity
bashflyctl scripts/ssh console -a drop-app -C "npm run db:verify-integrity.shintegrity"

If down migration fails or is not available:fails: Restore from pre-deployment backupFly.io volume snapshot

# Restore from backupvolume {{BACKUP_ID}}snapshot {{DB_RESTORE_CMD}}(requires --backup-idFly.io {{BACKUP_ID}}support or volume recreation)
# Contact: https://community.fly.io/ or fly.io/docs/volumes/

4.3 Configuration Rollback

# Revert environment variables (if changed in this deployment)
{{CONFIG_ROLLBACK_CMD}}# Phase 0.5 added: BCRYPT_ROUNDS, RATE_LIMIT_WINDOW_MS, RATE_LIMIT_MAX_AUTH, RATE_LIMIT_MAX_GENERAL
flyctl secrets set BCRYPT_ROUNDS=10 --app drop-app  # Only if bcrypt rounds is root cause
# Note: JWT_SECRET must remain set — never remove

# Verify configuration {{CONFIG_VERIFY_CMD}}via Fly.io secrets list
flyctl secrets list --app drop-app

Changed configuration to revert:revert (if needed):

Variable New Value (to revert FROM) Previous Value (to revert TO)
{{VAR_1}}BCRYPT_ROUNDS {{NEW_VALUE}}12 {{OLD_VALUE}}10 (only if bcrypt is root cause)
NEXT_PUBLIC_SERVICE_MODEmockmock (no change expected)

4.4 DNSCDN / CDNDNS Rollback

Drop

DNSMVP rollbackis deployed on Fly.io only (ifno CDN for API; static assets via Next.js on Vercel for landing page). No DNS changes wereare made):expected in Phase 0.5.

If getdrop.no DNS was changed:

# RevertVerify current DNS
recordnslookup {{DNS_REVERT_CMD}}getdrop.no

# WaitRevert forvia propagationdomain registrar (TTL:Domene.no {{DNS_TTL}}s)or sleepcurrent {{DNS_TTL}}registrar)
# VerifyTTL: nslookup {{DOMAIN}}

CDN cache purge300s (to5 clearmin) cached versionfast of new code):

{{CDN_PURGE_CMD}}propagation

5. Verification After Rollback

Health Check Verification

  • GET {{URL}}https://getdrop.no/api/health returns HTTP 200 with {"status":"ok","db":"connected"}
  • GETAll {{URL}}/health/ready returns HTTP 200 (DB + Cache connected)
  •  AllFly.io instances showing previous version: {{VERSION_VERIFY_CMD}}flyctl status --app drop-app
  • Load balancer health checks green for all instances (2/2 healthy in Fly.io dashboard)

Smoke Test Execution

bashnpx scripts/smoke-tests.shplaywright {{ENVIRONMENT}}test --project=user-flows
  • AllRegistration critical smokeOTP tests passingPIN flow completes successfully
  • CriticalLogin user+ journeydashboard manuallyaccess verified
  •  Remittance flow with mock BaaS verified

Data Integrity Verification

bashflyctl scripts/ssh console -a drop-app -C "npm run db:verify-integrity.sh {{ENVIRONMENT}}integrity"
  • Nousers datatable losshas confirmedNO balance column (orpass-through datamodel loss quantified and documented)invariant)
  • Databasecards intable consistenthas stateNO card_number or cvv columns (PCI-DSS invariant)
  • ReplicationNo lagorphaned normalsessions (FK constraint check passes)
  •  Transaction types limited to remittance and qr_payment

Monitoring Verification

  • Error rate returned to pre-deployment baseline (< {{ERROR_BASELINE}}%0.1%)
  • P99 latency returned to pre-deployment baseline (< 500ms standard, < 1,000ms bcrypt)
  • No unexpected log errors (flyctl logs --app drop-app)
  • AlertsFly.io silencedhealth (ifcheck anyshows were2/2 firinghealthy during incident)instances

6. Communication Plan

Internal Notification

stability
Audience Channel When Message
EngineeringAlem teamBašić (CEO) WarDirect room(phone: +47 40 47 42 51)At rollback decision"Rolling back Drop v0.5.0 — Reason: [X] — ETA: 5 min"
Engineering (John)#drop-deploy Slack At rollback initiation "RollbackROLLBACK ofinitiated v{{VERSION}}v0.5.0 initiated"→ v0.4.x"
EngineeringValidator managementagent DirectMission Control task AtPost-rollback"Verify rollback decision Summary ofrun decisionsmoke + expected timeline
Customer supportSlackIf user-facing impactSupport briefing notetests"

External Notification

Drop MVP is pre-production (no public users). No external status page required.

For Phase 1+ production:

Audience Channel When Trigger
Status page {{STATUS_PAGE}}getdrop.no/status (future) At rollback initiation Always (anyAny production rollback)rollback
Affected users EmailIn-app notification If impact > {{EMAIL_THRESHOLD}}h30 min At rollback + recovery
SLA customersDirect contactPer contractIf SLA breach triggered

Status page message template:template (Phase 1+):

Vi opplever for øyeblikket et problem med Drop og har startet en tilbakerulling
for å løse det. Vi forventer at tjenesten gjenopprettes innen 10 minutter.
Vi beklager ulempen og vil gi oppdateringer hvert 15. minutt.

(Translation: "We are currently experiencing an issue with {{PROJECT_NAME}}Drop and have initiated a rollback to resolve it. We expect service to be restored within {{EXPECTED_TIME}}10 minutes. We apologize for the inconvenience and will provide updates every 15 minutes. ")


7. Post-Rollback Analysis

Post-rollback review scheduled: {{REVIEW_DATE}}Within 4 hours of resolution Post-mortem scheduled: {{PM_DATE}}Within (within24 {{PM_SLA}}hhours of resolution)resolution (NFR-COMP06 / DORA incident reporting)

Analysis questions:

  1. What caused the rollback? (specific code/config/migration)migration change)
  2. Could this have been detected earlier? (pre-productionstaging test coverage gap?)
  3. Was the rollback executed correctly and quickly?within the 5-minute SLA?
  4. What process change would prevent this next time?

Output: Post-mortemLog documententry atin comms/decisions/ + lessons learned entry in post-mortem.lessons-learned.md


8. Forward Fix vs Rollback Decision Matrix

Factor Favors Forward Fix Favors Rollback
Time to fix < 30 min > 30 min
DB migration Not included in root cause Included (rollback simpler)
DataTransaction data written since deploy Significant (> 100 records) Minimal (< 10 records)
User impact severity P3/P4 — cosmetic or minor P1/P2 — auth or payment broken
Fix risk Low — isolated change High — cascading dependencies
Team availability SeniorBuilder devagent available DevBuilder unavailable or offline
Off-hours UsuallyBusiness nohours UsuallyOff-hours yes(02:00–06:00 CET)

Default guideline: When uncertain, rollback. A rollback to a known good state is safer than a rushed forward fix. Drop handles financial flows — correctness > speed.

Drop-specific rule: If any P1 issue involves the pass-through model invariant (Drop storing money), rollback immediately without waiting for forward fix analysis.



Approval

Role Name Date Signature
Author John (AI Director) 2026-02-23 Approved (AI)
ReviewerTech Lead John 2026-02-23 Approved
ApproverAI Director (John) John 2026-02-23Approved
CEO (Alem)Alem BašićTBD