Go-Live Runbook
Go-Live Runbook
Project:
Drop{{PROJECT_NAME}} Version:0.1.0{{VERSION}} Date:2026-02-23{{DATE}} Author:Platform Architect (AI){{AUTHOR}} Status: Draft | In Review | Approved Reviewers:Alem Bašić (CEO){{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial draft |
1. Go-Live Overview
ThisWhat: runbook{{PROJECT_NAME}} covers the full procedure for taking Drop from its current staging state tov{{VERSION}} production launch.launch
DropWhen: is{{LAUNCH_DATE}} currentlyat deployed{{LAUNCH_TIME}} on{{TIMEZONE}}
AWSDeployment Appwindow: Runner{{WINDOW_START}} – {{WINDOW_END}} (eu-west-1){{WINDOW_DURATION}}h with RDS PostgreSQL, but has not yet launched publicly. This runbook covers the steps required to open Drop to real Norwegian users.
Go-Live Owner: Alem Bašić (CEO)window)
Go-Live Date:Type: TBD{{TYPE}} Rollback Authority: Alem Bašić
PrerequisitesIncident forCommander: go-live:
BankID OIDC client credentials obtained and configuredOpen Banking provider selected, contracted, and integratedSumsub production account configured and webhooks verifiedgetdrop.nodomain configured with DNS pointing to App RunnerSSL certificate provisionedBetterStack monitors configured{{IC}} (3primary),monitors active)Slack#drop-opswebhook configuredRDS security group locked down{{IC_BACKUP}} (nobackup)publicTechnicalaccess)Lead: - {{TECH_LEAD}}
AllCommunicationsfeatureLead:flags{{COMMS_LEAD}}reviewedWarandRoom:set{{WAR_ROOM_LINK}}correctlyStatus - Page:
2. Pre-Launch Checklist
T-7 days)Days: Infrastructure Verification
- All production infrastructure provisioned and tested
- Load balancer health checks passing for all instances
- Auto-scaling groups configured and tested (scale-up + scale-down)
- Database replicas in sync and replication lag < {{REPLICATION_LAG}}s
- Backup jobs running successfully (last backup verified: {{VERIFY_DATE}})
- CDN configured and serving assets correctly
- All IAM roles and permissions verified
- Infrastructure monitoring dashboards showing green
- Estimated cost reviewed and within budget
Owner: {{INFRA_OWNER}} | Due: T-7 days
2.1T-5 InfrastructureDays: DNS Configuration
-
AWSDNSApprecordsRunnercreated/updatedserviceinrunning withRUNNINGstatus RDS PostgreSQLdrop-dbinstance available (status:available)ECR repository contains latest production-tagged imageAWS Secrets Manager contains all required secrets:{{DNS_PROVIDER}}→ Load balancer (JWT_SECRET{{DOMAIN}}minTTL32setchars,togenerated{{LOW_TTL}}viaforopenssleasyrand -base64 48)rollback)DATABASE_URLapi.{{DOMAIN}}(PostgreSQL→connectionAPIstring)load balancerwww.{{DOMAIN}}→ Redirect toBANKID_CLIENT_IDandBANKID_CLIENT_SECRETSUMSUB_APP_TOKENandSUMSUB_SECRET_KEYSLACK_WEBHOOK_URL{{DOMAIN}}
- DNS propagation verified (check from multiple regions)
- DNS failover routing configured (if applicable)
- Old DNS records documented (for rollback reference)
Owner: {{DNS_OWNER}} | Due: T-5 days
2.2T-5 ApplicationDays: SSL Certificates
- TLS certificates provisioned for all domains
GET https://9ef3szvvsb.eu-west-1.awsapprunner.com/api/health{{DOMAIN}}returns✅*.{✅"status":"ok"{DOMAIN}}
-
CertificateNEXT_PUBLIC_SERVICE_MODE=productionconfirmedexpiryin>App90Runner env BANKID_MOCKis NOT set (unset in production)All CI pipeline checks passing onmainbranch (lint, typecheck, tests, E2E, docker-build)Database schema up-to-date (all tables exist)Demo seed data disabled (NODE_ENV=production)
2.3 External Services
BankID OIDC: test login with real Norwegian BankID succeedsBankID callback URL matches:https://getdrop.no/api/auth/bankid/callbackSumsub: test KYC flow with real document succeeds end-to-endSumsub webhook: POST to/api/kyc/webhookwith valid signature processed correctlyOpen Banking: test balance read (AISP) succeedsdays fromrealgo-liveNorwegian bank accountOpen Banking: test payment initiation (PISP) succeeds in sandbox
2.4 Domain & SSL
getdrop.noDNSArecord points to App Runner URL (or CloudFront if CDN added)SSL certificate valid forgetdrop.noanddatewww.getdrop.no- HTTPS redirect
enabledconfigured (HTTP → HTTPS) - HSTS header
presentconfigured -
response:SSLA or better ({{SSL_TEST_LINK}})Strict-Transport-Security:Labsmax-age=63072000;test:includeSubDomains;Gradepreload
2.
Owner: {{SSL_OWNER}} | Due: T-5 Monitoringdays
T-3 Days: CDN Configuration
-
BetterStack:CDNtoDropdistributionHealthpointingCheckmonitorproductionconfigured, status GREENorigin -
BetterStack:CacheperDropbehaviorsLandingconfiguredPagemonitor configured, status GREENspecification -
BetterStack:Staticescalationassetpolicycacheassignedheaderstocorrectall(1yrmonitorsfor fingerprinted assets) -
SlackCDN#drop-ops:WAFtestrulesalertenabledreceivedandsuccessfullytested -
CloudWatch:CDNApppurgeRunnercommandlogstestedstreamingandtodocumented/aws/apprunner/drop-web/.../application -
PublicCDNstatusperformance verified from target geographies
Owner: {{CDN_OWNER}} | Due: T-3 days
T-3 Days: Database Migration
- Final migration scripts reviewed and approved
- Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
- Rollback/down migration tested
- Migration script idempotent (safe to run twice)
- Database backup taken immediately before migration window
- Data integrity checks script prepared (
scripts/verify-migration.sh)
Owner: {{DB_OWNER}} | Due: T-3 days
T-2 Days: Feature Flags
- All new features behind feature flags
- Feature flags defaulting to OFF in production
- Flag rollout plan documented (which flags, in what order, with what criteria)
- Kill switch flags configured (disable any feature immediately if needed)
Owner: {{FF_OWNER}} | Due: T-2 days
T-2 Days: Third-Party Integrations
- {{INTEGRATION_1}} — live API keys configured in secrets manager
- {{INTEGRATION_2}} — live API keys configured in secrets manager
- Payment gateway: live mode activated and tested with real card (refunded)
- Email service: sending domain authenticated (SPF, DKIM, DMARC)
- All integrations tested in production with smoke tests
- Webhook URLs updated to production endpoints
Owner: {{INTEGRATION_OWNER}} | Due: T-2 days
T-1 Day: Monitoring & Alerting
- All alert rules deployed to production monitoring
- Alert routing configured — PagerDuty / on-call active
- Dashboards showing production data
- Log aggregation capturing production logs
- Distributed tracing enabled
- Synthetic monitoring configured (uptime checks every 1 min)
- Alert test fired and received by on-call
Owner: {{MONITORING_OWNER}} | Due: T-1 day
T-1 Day: Backup Verification
- Production backup job running on schedule
- Last backup restored to test environment and verified
- Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
- Point-in-time recovery tested
Owner: {{BACKUP_OWNER}} | Due: T-1 day
T-1 Day: Legal / Compliance Sign-off
- Privacy policy published and linked
- Terms of service published and linked
- Cookie consent banner implemented (if required by jurisdiction)
- GDPR data processing inventory updated
- Security assessment completed and any findings resolved or accepted
- Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}
Owner: {{LEGAL_OWNER}} | Due: T-1 day
T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)
- Staging smoke tests passing (last run: {{TIMESTAMP}})
- All engineers briefed and available
- War room open and all participants joined
- Rollback procedure rehearsed mentally
- Monitoring dashboards open
- Status page
live:updated:https:"Scheduled maintenance: {{TIME}} - {{END_TIME}}" - Customer support briefed on launch features and potential issues
- Deployment script /
/drop-status.betteruptime.comCI pipeline ready to trigger
3. Go-LiveLaunch Day Procedure (T-0)Hour by Hour)
Phase 1: Final Verification (T-2h)
# 1. Verify App Runner service
aws apprunner describe-service \
--service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--query 'Service.Status' --output text --region eu-west-1
# Expected: RUNNING
# 2. Health check
curl -s https://getdrop.no/api/health | jq
# Expected: { "data": { "status": "ok", "checks": { "db": { "status": "pass" } } } }
# 3. Verify RDS
aws rds describe-db-instances \
--db-instance-identifier drop-db \
--query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1
# Expected: available
# 4. Verify latest backup exists
aws rds describe-db-snapshots \
--db-instance-identifier drop-db --region eu-west-1 \
--query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].SnapshotCreateTime' \
--output text
# Expected: timestamp from last 24h
# 5. Create pre-launch manual snapshot
aws rds create-db-snapshot \
--db-instance-identifier drop-db \
--db-snapshot-identifier drop-db-pre-launch-$(date +%Y%m%d-%H%M) \
--region eu-west-1
PhaseH-0: 2:Deployment BankID Callback URL Verification (T-1h)Start
# Verify BankID client configuration
# Log in at BankID developer portal and confirm:
# - Callback URL: https://getdrop.no/api/auth/bankid/callback
# - Mobile callback: drop://auth/callback
# - Scopes: openid profile
Phase 3: Soft Launch (T-0, invite-only)
Configure App Runner environment:BANKID_MOCK=(unset — real BankID)SetNEXT_PUBLIC_APP_URL=https://getdrop.noDeploy latest production imageVerify health check:curl https://getdrop.no/api/healthConduct a live end-to-end test with Alem's real BankID:Navigate tohttps://getdrop.noClick "Logg inn med BankID"Authenticate with real BankIDVerify dashboard loads with bank account balance (AISP)Verify KYC status showsapproved(BankID = verified)Test disclosure endpoint before a remittanceTest QR payment scan (if merchant test account exists)
Phase 4: BetterStack Maintenance Window — Close (T+30min)
After confirming system is stable:
Close any BetterStack maintenance window (if created for go-live)Verify BetterStack monitors show greenVerify Slack#drop-opsreceived startup alert
Phase 5: Public Launch (T+1h, if soft launch successful)
Remove any invite-only restrictionAnnounce on social/marketing channelsMonitor#drop-opsfor the first 2 hoursCheck BetterStack every 30 minutes for first 2 hours
4. Feature Flags at Launch
| Owner | Status | Notes | ||
|---|---|---|---|---|
H+0:00 |
Announce in war room: "Deployment started" |
|||
H+0:00 |
Take final pre-deploy database backup |
|||
H+0:05 |
Enable maintenance mode (if applicable) |
|||
H+0:10 |
Trigger production deployment pipeline |
Pipeline: {{PIPELINE_LINK}} | ||
H+0:15 |
Monitor deployment progress |
H+0:15 → H+0:45: Database Migration Execution
| Time | Action | Owner | Status |
|---|---|---|---|
| H+0:15 | Confirm deployment artifact ready | {{DEPLOY_OWNER}} | |
| H+0:20 | Run database migrations: |
{{DB_OWNER}} |
|
| H+0:25 | Verify migration completed: |
{{DB_OWNER}} |
|
| H+0:30 | Confirm new application instances healthy | {{TECH_LEAD}} | |
| H+0:40 | Deploy new application version to all instances | {{DEPLOY_OWNER}} |
H+0:45 → H+1:00: DNS Cutover
| Time | Action | Owner | Status |
|---|---|---|---|
| H+0:45 | Point DNS to production load balancer | {{DNS_OWNER}} | |
| H+0:50 | Monitor DNS propagation | {{DNS_OWNER}} | |
| H+0:55 | Confirm HTTPS working from external network | {{TECH_LEAD}} | |
| H+1:00 | Disable maintenance mode | {{DEPLOY_OWNER}} |
H+1:00 → H+1:30: Smoke Tests
| Time | Action | Owner | Status |
|---|---|---|---|
| H+1:00 | Run automated smoke tests: |
{{QA_OWNER}} |
|
| H+1:10 | Manual smoke test — critical user journey 1 | {{QA_OWNER}} | |
| H+1:15 | Manual smoke test — critical user journey 2 | {{QA_OWNER}} | |
| H+1:20 | Verify payment processing (test transaction) | {{QA_OWNER}} | |
| H+1:25 | Verify email delivery (test email) | {{QA_OWNER}} | |
| H+1:30 | All smoke tests PASS → proceed to monitoring | {{IC}} |
H+1:30 → H+2:00: Monitoring Verification
| Time | Action | Owner | Status |
|---|---|---|---|
| H+1:30 | Verify error rate < {{ERROR_THRESHOLD}}% | {{TECH_LEAD}} | |
| H+1:35 | Verify P99 latency < {{P99_THRESHOLD}}ms | {{TECH_LEAD}} | |
| H+1:40 | Verify no unexpected spikes in DB CPU/connections | {{DB_OWNER}} | |
| H+1:50 | Begin enabling feature flags (per rollout plan) | {{FF_OWNER}} | |
| H+2:00 | Declare go-live successful | {{IC}} |
4. Post-Launch Monitoring (T+1 to T+7)
Enhanced Monitoring Period
Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal
| Period | Check Frequency | Responsible |
|---|---|---|
| H+0 to H+4 | Every 30 min | On-call engineer |
| H+4 to H+24 | Every 60 min | On-call engineer |
| Day 2-7 | Standard monitoring | On-call rotation |
Metrics to watch during enhanced monitoring:
- Error rate (target: < {{ERROR_THRESHOLD}}%)
- P99 latency (target: < {{P99_THRESHOLD}}ms)
- DB connection pool utilization (target: < {{DB_POOL}}%)
- Cache hit rate (target: > {{CACHE_HIT}}%)
- Memory trend (should be stable, not growing)
Support Escalation Procedures
| Issue Type | First Contact | Escalation |
|---|---|---|
| User-facing errors | Customer support → Engineering | On-call engineer |
| Performance degradation | On-call engineer | Tech lead + Eng manager |
| Data issues | On-call engineer | DB owner + Engineering lead |
| Security concern | Security contact → CISO | Immediate escalation |
Performance Baseline Comparison
Compare post-launch metrics to pre-launch staging baseline:
| Metric | Staging Baseline | Production Actual | Delta | Status |
|---|---|---|---|---|
| P95 latency | {{STG_P95}}ms | TBD | TBD | TBD |
| Error rate | {{STG_ERR}}% | TBD | TBD | TBD |
| Throughput | {{STG_RPS}} rps | TBD | TBD | TBD |
5. Rollback Triggers & Procedure
Rollback Decision Criteria
IfAutomatic anyrollback criticaltriggers:
- Smoke
foundtestsduring orfail aftergo-live:deployment - Error rate > {{ROLLBACK_ERROR_RATE}}% for {{ROLLBACK_DURATION}} consecutive minutes
- Database migration causes data integrity issues
Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):
Immediate
Actions- P99
(withinlatency 5> minutes)
{{ROLLBACK_P99}}ms #sustained Optionfor A:{{ROLLBACK_LATENCY_DURATION}} Rollbackmin
toCritical previousfeature Appbroken Runnerwith deploymentno awsquick apprunnerfix start-deploymentavailable
\Security --service-arnvulnerability arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--region eu-west-1
# Option B: If issue isdiscovered in environmentnew config,release
update
envRollback vars
#Procedure (viaQuick AWS Console — App Runner environment configuration)
# Then trigger new deployment
CommunicationReference)
SendAnnounceSlackinmessagewartoroom: "Initiating rollback"- Update status page: "We are investigating an issue and may revert recent changes"
- Run:
#drop-opsbash scripts/rollback.sh production:"Going(orbacktriggertoCI pipeline rollback) - Monitor health checks — confirm previous version
—healthy - If
[issue]"DB migration included: run down migrationbash scripts/migrate-down.sh production - Verify all smoke tests pass on previous version
- Update
BetterStackstatus page:create"Issueincidentresolved, system restored" IfNotifyusers are affected: prepare user communicationstakeholders
Post-Rollback
Confirm health check returns OK afterFull rollback
Confirmprocedure:BetterStackSeemonitors greenInvestigate root cause before re-attempting launchDocument in incident report
6. Post-Communication Plan
Pre-Launch Monitoring (First 24h)Communications
| When | Message | ||
|---|---|---|---|
|
T-3 days | Launch schedule and plan | |
| T-2 days | Features, FAQ, escalation path | ||
| T-1 |
"Exciting updates coming" | ||
KeyLaunch
metricsDay toCommunicationswatch:
| Audience | Channel | When | Message |
|---|---|---|---|
| Status |
status |
T-0 | "Scheduled deployment in |
| Internal | Slack |
At success | "🚀 |
| Users | Email / in-app | H+1 after success | Launch announcement |
| Status page | status page | H+1 | "Deployment complete — all systems normal" |
7. ComplianceStakeholder ActionsNotification at LaunchTimeline
-
Milestone Notify FinanstilsynetChannel ofOwner Deployment started Engineering team Slack war room {{IC}} Smoke tests pass Engineering + Product Slack {{IC}} Go-live servicedeclared(ifAll requiredstakeholdersbyEmail PISP/AISP+licence)Slack{{COMMS_LEAD}} Confirmgetdrop.no/vilkaar(termsRollback ofinitiatedservice)All isstakeholdersaccessible+- Management
ConfirmImmediate callgetdrop.no/personvern(privacy+policy)Slackis{{IC}} accessibleConfirm complaint submission form accessible via app Confirm GDPR consent is requested on first useConfirm AML monitoring is active (aml_alertstable populated on large transactions)- Deployment Checklist
- Rollback Plan
- Operational Runbook
- Monitoring & Observability
- Disaster Recovery Plan
Deployment ArchitectureEnvironment Configuration
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |