Rollback Plan
Rollback PlanPlan: Drop — Fintech Payment App
Project:
{{PROJECT_NAME}}Drop — Remittance + QR Payments Version:{{VERSION}}0.5.0 Date:{{DATE}}2026-02-23 Author:{{AUTHOR}}John (AI Director) Status:Draft | In Review |Approved Reviewers:{{REVIEWERS}}Alem Bašić (CEO)
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial draft — Fly.io deployment on Stockholm region |
Rollback Summary
| Field | Value |
|---|---|
| Deployment being rolled back | |
| Rollback target version | |
| Rollback image / artifact | |
| DB migration reversible | |
| Estimated rollback time | |
| Rollback owner | |
| Backup to restore (if needed) |
1. Rollback Decision Criteria
Roll back immediately if ANY of these conditions occur:
| Trigger | Threshold | Measurement | Wait Before Deciding |
|---|---|---|---|
| Error rate spike | > |
Rolling 5-min average on Fly.io metrics | |
| P99 latency spike | > |
Rolling 5-min P99 on Fly.io metrics | |
| Health check failures | 0 minutes (immediate) | ||
| Smoke test failure | Any critical Playwright test fails | 0 minutes (immediate) | |
| Data integrity issue | Any confirmed data balance column found in users table |
Post-deploy verification (db.test.ts assertions) |
0 minutes (immediate) |
| Security vulnerability | Critical severity confirmed (e.g., auth bypass, JWT exposure) | Security alert | 0 minutes (immediate) |
| Pass-through model violation | Drop found to be holding customer funds in any DB column | Schema check | 0 minutes (immediate) |
Do NOT roll back for:
- Warning-level alerts that were present pre-deployment
- Increased error rate in non-critical paths <
{{MINOR_ERROR_THRESHOLD}}%0.5% - Expected behavior changes (verify against release notes first)
- Cosmetic/visual issues that don't affect functionality
- Mock BaaS timeout errors (expected in MVP; not production-blocking)
2. Rollback Authority
| Situation | Authority |
|---|---|
| Manual rollback (judgment |
|
| Off-hours manual rollback |
3. Pre-Rollback Assessment
Data Changes Since Deployment
- Deployment time:
{{DEPLOYMENT_TIME}}Recorded at time of deployment (see deployment log) - Data changes since deployment:
{{DATA_CHANGES}}Estimated from Fly.io metrics (transaction count in audit_logs) - Critical data at risk:
{{DATA_RISK}}User registrations, completed transactions (both are financial records) - Acceptable to lose
thistransaction data?Yes /No/—Needstransactionsanalysisare financial records; if loss is possible, prefer forward fix over rollback
Decision:Decision framework:
- If deployment < 30 min ago and 0 transactions completed → Proceed with rollback
- If
Rollbackdeploymentwith>data30preservationminstepsago/orDotransactionsNOTcompletedrollback→(dataEscalatelosstounacceptable)Alem for decision - Drop is a PSD2 pass-through model — no funds stored; transaction records are audit trail only
Database Migration Reversibility
| Migration | Type | Reversible | Down Migration Available |
|---|---|---|---|
— Add audit_logs table |
|||
— Add rate_limit_requests table |
|||
0005_security_hardening.sql — Add transaction_locks table |
Add table | Yes | Yes |
IfPhase migration0.5 migrations are all additive (add-only). No column drops, no type changes. Rolling back schema is NOT reversible: Rollback requires database restore from backup (see Section 4.2)safe.
External System State
| System | Events Processed Since Deploy | Reversible | Action if Rollback |
|---|---|---|---|
| No action — mock transactions stand | |||
| No action — |
|||
| Yes | No action needed | ||
| Audit logs | No — by design | No action — audit logs are compliance records |
4. Rollback Procedures
4.1 Application Rollback (Step by Step)
Total estimated time: {{APP_ROLLBACK_TIME}}2–5 minutes
# Step 1: Announce rollback (required)
# Post in war#drop-deploy room:Slack: "ROLLBACK initiated — v{{VERSION}}v0.5.0 → v{{ROLLBACK_VERSION}}v0.4.x — Reason: [state reason]"
# Step 2: Trigger rollback deployment via Fly.io
# Option A — CIFly.io pipelinerollback rollback:to {{CI_ROLLBACK_CMD}}previous release:
flyctl releases list --app drop-app # Find the previous release number
flyctl deploy --app drop-app --image registry.fly.io/drop-app:v0.4.x
# Option B — DirectFly.io deploymentbuilt-in withrollback previouscommand:
image:flyctl {{DIRECT_ROLLBACK_CMD}}machine update --app drop-app --image registry.fly.io/drop-app:v0.4.x
# Step 3: Monitor rollback progress
{{MONITOR_CMD}}flyctl logs --app drop-app
# Step 4: Confirm rollback complete
curl {{URL}}https://getdrop.no/api/versionhealth
# Should returnreturn: {{ROLLBACK_VERSION}"status":"ok","db":"connected","version":"0.4.x"}
Verification commands:
# Check all instances running rollback version
{{INSTANCE_CHECK_CMD}}flyctl status --app drop-app
# Check health
curl {{URL}}-i https://getdrop.no/api/health
# CheckRun smoke tests against rolled-back version
npx playwright test --project=user-flows
# Verify error rate (shouldhas dropdropped
immediately)flyctl {{ERROR_RATE_CMD}}metrics --app drop-app
4.2 Database Rollback (Migration Down)
Warning: Execute database rollback ONLY after confirming:
- Application rollback is complete
- Data loss from migration reversal is acceptable (see Section 3)
- Down migration is available
and(ittestedis, for Phase 0.5)
# Step 1: Confirm current migration state
{{MIGRATION_STATUS_CMD}}flyctl ssh console -a drop-app -C "npm run db:migrate:status"
# Step 2: Take emergency backup BEFORE running down migration
{{DB_BACKUP_CMD}}flyctl volumes list --app drop-app # Get volume ID
# Manual: Create volume snapshot via Fly.io dashboard
# Step 3: Run down migration {{DOWN_MIGRATION_CMD}}(reverses Phase 0.5 security tables)
flyctl ssh console -a drop-app -C "npm run db:migrate:down"
# Drops: audit_logs, rate_limit_requests, transaction_locks tables
# Step 4: Verify migration state
{{MIGRATION_VERIFY_CMD}}flyctl ssh console -a drop-app -C "npm run db:migrate:status"
# Should show v0.4.x migrations only
# Step 5: Verify data integrity
bashflyctl scripts/ssh console -a drop-app -C "npm run db:verify-integrity.shintegrity"
If down migration fails or is not available:fails: Restore from pre-deployment backupFly.io volume snapshot
# Restore from backupvolume {{BACKUP_ID}}snapshot {{DB_RESTORE_CMD}}(requires --backup-idFly.io {{BACKUP_ID}}support or volume recreation)
# Contact: https://community.fly.io/ or fly.io/docs/volumes/
4.3 Configuration Rollback
# Revert environment variables (if changed in this deployment)
{{CONFIG_ROLLBACK_CMD}}# Phase 0.5 added: BCRYPT_ROUNDS, RATE_LIMIT_WINDOW_MS, RATE_LIMIT_MAX_AUTH, RATE_LIMIT_MAX_GENERAL
flyctl secrets set BCRYPT_ROUNDS=10 --app drop-app # Only if bcrypt rounds is root cause
# Note: JWT_SECRET must remain set — never remove
# Verify configuration {{CONFIG_VERIFY_CMD}}via Fly.io secrets list
flyctl secrets list --app drop-app
Changed configuration to revert:revert (if needed):
| Variable | New Value (to revert FROM) | Previous Value (to revert TO) |
|---|---|---|
|
|
(only if bcrypt is root cause) |
NEXT_PUBLIC_SERVICE_MODE |
mock |
mock (no change expected) |
4.4 DNSCDN / CDNDNS Rollback
Drop
DNSMVP rollbackis deployed on Fly.io only (ifno CDN for API; static assets via Next.js on Vercel for landing page). No DNS changes wereare made):expected in Phase 0.5.
If getdrop.no DNS was changed:
# RevertVerify current DNS
recordnslookup {{DNS_REVERT_CMD}}getdrop.no
# WaitRevert forvia propagationdomain registrar (TTL:Domene.no {{DNS_TTL}}s)or sleepcurrent {{DNS_TTL}}registrar)
# VerifyTTL: nslookup {{DOMAIN}}
CDN cache purge300s (to5 clearmin) cached— versionfast of new code):
{{CDN_PURGE_CMD}}propagation
5. Verification After Rollback
Health Check Verification
-
GETreturns HTTP 200 with{{URL}}https://getdrop.no/api/health{"status":"ok","db":"connected"} -
GETAll{{URL}}/health/readyreturns HTTP 200 (DB + Cache connected) AllFly.io instances showing previous version:{{VERSION_VERIFY_CMD}}flyctl status --app drop-app- Load balancer health checks green for all instances (2/2 healthy in Fly.io dashboard)
Smoke Test Execution
bashnpx scripts/smoke-tests.shplaywright {{ENVIRONMENT}}test --project=user-flows
-
AllRegistrationcritical→smokeOTPtests→passingPIN flow completes successfully -
CriticalLoginuser+journeydashboardmanuallyaccess verified - Remittance flow with mock BaaS verified
Data Integrity Verification
bashflyctl scripts/ssh console -a drop-app -C "npm run db:verify-integrity.sh {{ENVIRONMENT}}integrity"
-
NousersdatatablelosshasconfirmedNObalancecolumn (orpass-throughdatamodelloss quantified and documented)invariant) -
DatabasecardsintableconsistenthasstateNOcard_numberorcvvcolumns (PCI-DSS invariant) -
ReplicationNolagorphanednormalsessions (FK constraint check passes) - Transaction types limited to
remittanceandqr_payment
Monitoring Verification
- Error rate returned to pre-deployment baseline (<
{{ERROR_BASELINE}}%0.1%) - P99 latency returned to pre-deployment baseline (< 500ms standard, < 1,000ms bcrypt)
- No unexpected log errors (
flyctl logs --app drop-app) -
AlertsFly.iosilencedhealth(ifcheckanyshowswere2/2firinghealthyduring incident)instances
6. Communication Plan
Internal Notification
| Audience | Channel | When | Message | |
|---|---|---|---|---|
| At rollback decision | "Rolling back Drop v0.5.0 — Reason: [X] — ETA: 5 min" | |||
| Engineering (John) | #drop-deploy Slack | At rollback initiation | " |
|
| "Verify rollback | stability ||||
External Notification
Drop MVP is pre-production (no public users). No external status page required.
For Phase 1+ production:
| Audience | Channel | When | Trigger |
|---|---|---|---|
| Status page | At rollback initiation | ||
| Affected users | If impact > |
At rollback + recovery | |
Status page message template:template (Phase 1+):
Vi opplever for øyeblikket et problem med Drop og har startet en tilbakerulling
for å løse det. Vi forventer at tjenesten gjenopprettes innen 10 minutter.
Vi beklager ulempen og vil gi oppdateringer hvert 15. minutt.
(Translation: "We are currently experiencing an issue with {{PROJECT_NAME}}Drop and have initiated a rollback to resolve it. We expect service to be restored within {{EXPECTED_TIME}}10 minutes. We apologize for the inconvenience and will provide updates every 15 minutes.
")
7. Post-Rollback Analysis
Post-rollback review scheduled: {{REVIEW_DATE}}Within 4 hours of resolution
Post-mortem scheduled: {{PM_DATE}}Within (within24 {{PM_SLA}}hhours of resolution)resolution (NFR-COMP06 / DORA incident reporting)
Analysis questions:
- What caused the rollback? (specific code/config/
migration)migration change) - Could this have been detected earlier? (
pre-productionstaging test coverage gap?) - Was the rollback executed correctly and
quickly?within the 5-minute SLA? - What process change would prevent this next time?
Output: Post-mortemLog documententry atin comms/decisions/ + lessons learned entry in post-mortem.lessons-learned.md
8. Forward Fix vs Rollback Decision Matrix
| Factor | Favors Forward Fix | Favors Rollback |
|---|---|---|
| Time to fix | < 30 min | > 30 min |
| DB migration | Not included in root cause | Included (rollback simpler) |
| Significant (> 100 records) | Minimal (< 10 records) | |
| User impact severity | P3/P4 — cosmetic or minor | P1/P2 — auth or payment broken |
| Fix risk | Low — isolated change | High — cascading dependencies |
| Team availability | ||
| Off-hours |
Default guideline: When uncertain, rollback. A rollback to a known good state is safer than a rushed forward fix. Drop handles financial flows — correctness > speed.
Drop-specific rule: If any P1 issue involves the pass-through model invariant (Drop storing money), rollback immediately without waiting for forward fix analysis.
Related Documents
- Deployment Checklist
- Release Notes
Go-LiveUATRunbookSign-OffPost-MortemSecurity IncidentAudit Report
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | John (AI Director) | 2026-02-23 | Approved (AI) |
| John | 2026-02-23 | Approved | |
| John | 2026-02-23 | Approved | |
| CEO (Alem) | Alem Bašić | TBD |