Go-Live Runbook
Go-Live Runbook
Project:
{{PROJECT_NAME}}Drop Version:{{VERSION}}0.1.0 Date:{{DATE}}2026-02-23 Author:{{AUTHOR}}Platform Architect (AI) Status:Draft |In Review| ApprovedReviewers:{{REVIEWERS}}Alem Bašić (CEO)
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial draft for Drop production go-live |
1. Overview
This runbook covers the full procedure for taking Drop from its current staging state to production launch. Drop is currently deployed on AWS App Runner (eu-west-1) with RDS PostgreSQL, but has not yet launched publicly. This runbook covers the steps required to open Drop to real Norwegian users.
Go-Live Overview
What:Owner: {{PROJECT_NAME}}Alem v{{VERSION}} production launch
When: {{LAUNCH_DATE}} at {{LAUNCH_TIME}} {{TIMEZONE}}
Deployment window: {{WINDOW_START}} – {{WINDOW_END}}Bašić ({{WINDOW_DURATION}}h window)CEO)
Go-Live Type:Date: {{TYPE}}TBD
Rollback Authority: Alem BašićIncidentPrerequisites Commander:for go-live:{{IC}}
access)
getdrop.no domain configured with DNS pointing to App Runnerprimary),3 {{IC_BACKUP}}monitors active)#drop-ops webhook configuredbackup)no Technicalpublic Lead:{{TECH_LEAD}}
2. Pre-Launch Checklist (T-7 days)
T-7 Days:2.1 Infrastructure Verification
-
AllAWSproductionAppinfrastructureRunnerprovisionedserviceandrunningtestedwithRUNNINGstatus -
LoadRDSbalancerPostgreSQLdrop-dbinstance available (status:available) - ECR repository contains latest production-tagged image
- AWS Secrets Manager contains all required secrets:
-
JWT_SECRET(min 32 chars, generated viaopenssl rand -base64 48) -
DATABASE_URL(PostgreSQL connection string) -
BANKID_CLIENT_IDandBANKID_CLIENT_SECRET -
SUMSUB_APP_TOKENandSUMSUB_SECRET_KEY -
SLACK_WEBHOOK_URL
-
2.2 Application
-
GET https://9ef3szvvsb.eu-west-1.awsapprunner.com/api/healthreturns{"status":"ok"} -
NEXT_PUBLIC_SERVICE_MODE=productionconfirmed in App Runner env -
BANKID_MOCKis NOT set (unset in production) - All CI pipeline checks passing
foronallmaininstances Auto-scaling groups configured and testedbranch (scale-uplint,+typecheck,scale-down)tests, E2E, docker-build)- Database
replicasschemainup-to-datesync(allandtablesreplication lag < {{REPLICATION_LAG}}sexist) -
BackupDemojobsseedrunningdatasuccessfullydisabled (lastNODE_ENV=production)
2.3 {{VERIFY_DATE}})External Services
- BankID OIDC: test login with real Norwegian BankID succeeds
-
CDNBankIDconfiguredcallbackandURLservingmatches:assetshttps://getdrop.no/api/auth/bankid/callback - Sumsub: test KYC flow with real document succeeds end-to-end
- Sumsub webhook: POST to
/api/kyc/webhookwith valid signature processed correctly -
AllOpenIAMBanking:rolestestandbalancepermissionsreadverified(AISP) succeeds from real Norwegian bank account -
InfrastructureOpenmonitoringBanking:dashboardstestshowingpaymentgreeninitiation - (PISP)
Estimatedsucceedscostinreviewed and within budgetsandbox
Owner:
2.4 {{INFRA_OWNER}}Domain |& Due: T-7 days
T-5 Days: DNS ConfigurationSSL
-
getdrop.no DNS records created/updated in {{DNS_PROVIDER}}
{{DOMAIN}}A →record Load balancer (TTL setpoints to {{LOW_TTL}}App forRunner easyURL rollback) (or api.{{DOMAIN}}CloudFront →if APICDN load balancer
www.{{DOMAIN}} → Redirect to {{DOMAIN}}
added)
-
DNS propagation verified (check from multiple regions)
DNS failover routing configured (if applicable)
Old DNS records documented (for rollback reference)
Owner: {{DNS_OWNER}} | Due: T-5 days
T-5 Days: SSL Certificates
certificate
TLS certificates provisionedvalid for all domains
{{DOMAIN}}getdrop.no ✅ and *.{{DOMAIN}}www.getdrop.no ✅
Certificate expiry > 90 days from go-live date
- HTTPS redirect
configuredenabled (HTTP → HTTPS)
- HSTS header
configured present - in
SSLresponse: LabsStrict-Transport-Security: test:max-age=63072000; GradeincludeSubDomains; A or better ({{SSL_TEST_LINK}})preload
Owner: {{SSL_OWNER}} | Due: T-
2.5 days
T-3 Days: CDN ConfigurationMonitoring
-
CDNBetterStack: distributionDrop pointingHealth toCheck productionmonitor originconfigured, status GREEN
-
CacheBetterStack: behaviorsDrop configuredLanding perPage specificationmonitor configured, status GREEN
-
StaticBetterStack: assetescalation cachepolicy headersassigned correctto (1yrall for fingerprinted assets)monitors
-
CDNSlack WAF#drop-ops: rulestest enabledalert andreceived testedsuccessfully
-
CDNCloudWatch: purgeApp commandRunner testedlogs andstreaming documentedto /aws/apprunner/drop-web/.../application
-
CDNPublic performance verified from target geographies
Owner: {{CDN_OWNER}} | Due: T-3 days
T-3 Days: Database Migration
Final migration scripts reviewed and approved
Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
Rollback/down migration tested
Migration script idempotent (safe to run twice)
Database backup taken immediately before migration window
Data integrity checks script prepared (scripts/verify-migration.sh)
Owner: {{DB_OWNER}} | Due: T-3 days
T-2 Days: Feature Flags
All new features behind feature flags
Feature flags defaulting to OFF in production
Flag rollout plan documented (which flags, in what order, with what criteria)
Kill switch flags configured (disable any feature immediately if needed)
Owner: {{FF_OWNER}} | Due: T-2 days
T-2 Days: Third-Party Integrations
{{INTEGRATION_1}} — live API keys configured in secrets manager
{{INTEGRATION_2}} — live API keys configured in secrets manager
Payment gateway: live mode activated and tested with real card (refunded)
Email service: sending domain authenticated (SPF, DKIM, DMARC)
All integrations tested in production with smoke tests
Webhook URLs updated to production endpoints
Owner: {{INTEGRATION_OWNER}} | Due: T-2 days
T-1 Day: Monitoring & Alerting
All alert rules deployed to production monitoring
Alert routing configured — PagerDuty / on-call active
Dashboards showing production data
Log aggregation capturing production logs
Distributed tracing enabled
Synthetic monitoring configured (uptime checks every 1 min)
Alert test fired and received by on-call
Owner: {{MONITORING_OWNER}} | Due: T-1 day
T-1 Day: Backup Verification
Production backup job running on schedule
Last backup restored to test environment and verified
Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
Point-in-time recovery tested
Owner: {{BACKUP_OWNER}} | Due: T-1 day
T-1 Day: Legal / Compliance Sign-off
Privacy policy published and linked
Terms of service published and linked
Cookie consent banner implemented (if required by jurisdiction)
GDPR data processing inventory updated
Security assessment completed and any findings resolved or accepted
Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}
Owner: {{LEGAL_OWNER}} | Due: T-1 day
T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)
Staging smoke tests passing (last run: {{TIMESTAMP}})
All engineers briefed and available
War room open and all participants joined
Rollback procedure rehearsed mentally
Monitoring dashboards open
Statusstatus page updated:live: "Scheduled maintenance: {{TIME}} - {{END_TIME}}"
Customer support briefed on launch features and potential issues
Deployment script https:/ CI pipeline ready to trigger/drop-status.betteruptime.com
3. LaunchGo-Live Day Procedure (Hour by Hour)T-0)
Phase 1: Final Verification (T-2h)
# 1. Verify App Runner service
aws apprunner describe-service \
--service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--query 'Service.Status' --output text --region eu-west-1
# Expected: RUNNING
# 2. Health check
curl -s https://getdrop.no/api/health | jq
# Expected: { "data": { "status": "ok", "checks": { "db": { "status": "pass" } } } }
# 3. Verify RDS
aws rds describe-db-instances \
--db-instance-identifier drop-db \
--query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1
# Expected: available
# 4. Verify latest backup exists
aws rds describe-db-snapshots \
--db-instance-identifier drop-db --region eu-west-1 \
--query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].SnapshotCreateTime' \
--output text
# Expected: timestamp from last 24h
# 5. Create pre-launch manual snapshot
aws rds create-db-snapshot \
--db-instance-identifier drop-db \
--db-snapshot-identifier drop-db-pre-launch-$(date +%Y%m%d-%H%M) \
--region eu-west-1
H-0:Phase Deployment2: StartBankID Callback URL Verification (T-1h)
# Verify BankID client configuration
# Log in at BankID developer portal and confirm:
# - Callback URL: https://getdrop.no/api/auth/bankid/callback
# - Mobile callback: drop://auth/callback
# - Scopes: openid profile
Phase 3: Soft Launch (T-0, invite-only)
- Configure App Runner environment:
BANKID_MOCK= (unset — real BankID)
- Set
NEXT_PUBLIC_APP_URL=https://getdrop.no
- Deploy latest production image
- Verify health check:
curl https://getdrop.no/api/health
- Conduct a live end-to-end test with Alem's real BankID:
- Navigate to
https://getdrop.no
- Click "Logg inn med BankID"
- Authenticate with real BankID
- Verify dashboard loads with bank account balance (AISP)
- Verify KYC status shows
approved (BankID = verified)
- Test disclosure endpoint before a remittance
- Test QR payment scan (if merchant test account exists)
Phase 4: BetterStack Maintenance Window — Close (T+30min)
After confirming system is stable:
- Close any BetterStack maintenance window (if created for go-live)
- Verify BetterStack monitors show green
- Verify Slack
#drop-ops received startup alert
Phase 5: Public Launch (T+1h, if soft launch successful)
- Remove any invite-only restriction
- Announce on social/marketing channels
- Monitor
#drop-ops for the first 2 hours
- Check BetterStack every 30 minutes for first 2 hours
4. Feature Flags at Launch
Flag
Launch State
Notes
NEXT_PUBLIC_FF_NOTIFICATIONS
true
Push notifications enabled
NEXT_PUBLIC_FF_MERCHANT_DASHBOARD
true
Merchant dashboard enabled
NEXT_PUBLIC_FF_VIRTUAL_CARDS
false
Not launched — requires card partner
NEXT_PUBLIC_FF_PHYSICAL_CARDS
false
Future feature
NEXT_PUBLIC_FF_CARD_DETAILS
false
Future feature
NEXT_PUBLIC_FF_CARD_FREEZE
false
Future feature
NEXT_PUBLIC_FF_CARD_PIN
false
Future feature
NEXT_PUBLIC_FF_SPENDING_LIMITS
false
Future feature
5. Rollback Procedure
If any critical issue is found during or after go-live:
Immediate Actions (within 5 minutes)
# Option A: Rollback to previous App Runner deployment
aws apprunner start-deployment \
--service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--region eu-west-1
# Option B: If issue is in environment config, update env vars
# (via AWS Console — App Runner environment configuration)
# Then trigger new deployment
Communication
- Send Slack message to
#drop-ops: "Going back to previous version — investigating [issue]"
- Update BetterStack status page: create incident
- If users are affected: prepare user communication
Post-Rollback
- Confirm health check returns OK after rollback
- Confirm BetterStack monitors green
- Investigate root cause before re-attempting launch
- Document in incident report
6. Post-Launch Monitoring (First 24h)
Time
Action
Owner
Status
Notes
H+0:00T+0 to T+2h
AnnounceMonitor in#drop-ops warSlack room: "Deployment started"
{{IC}}
continuously
H+0:00T+2h
TakeReview finalCloudWatch pre-deploylogs databasefor backup
{{DB_OWNER}}
errors
H+0:05T+4h
EnableCheck maintenancetransaction modevolume (ifand applicable) success {{DEPLOY_OWNER}}
rate
H+0:10T+8h
TriggerReview productionSumsub deploymentKYC pipeline queue {{DEPLOY_OWNER}} — any Pipeline:stuck {{PIPELINE_LINK}}applicants?
H+0:15T+24h
MonitorFirst deploymentSLA progress report {{TECH_LEAD}} — uptime, error rate, p99
H+0:15 → H+0:45: Database Migration Execution
Time
Action
Owner
Status
H+0:15
Confirm deployment artifact ready
{{DEPLOY_OWNER}}
latency
H+0:20T+7 days
RunFirst databaseweekly migrations: bash scripts/migrate-prod.sh
{{DB_OWNER}}
H+0:25
Verify migration completed: bash scripts/verify-migration.sh
{{DB_OWNER}}
H+0:30
Confirm new application instances healthy
{{TECH_LEAD}}
H+0:40
Deploy new application version to all instances
{{DEPLOY_OWNER}}
H+0:45 → H+1:00: DNS Cutover
Time
Action
Owner
Status
H+0:45
Point DNS to production load balancer
{{DNS_OWNER}}
H+0:50
Monitor DNS propagation
{{DNS_OWNER}}
H+0:55
Confirm HTTPS working from external network
{{TECH_LEAD}}
H+1:00
Disable maintenance mode
{{DEPLOY_OWNER}}
H+1:00 → H+1:30: Smoke Tests
Time
Action
Owner
Status
H+1:00
Run automated smoke tests: bash scripts/smoke-tests.sh production
{{QA_OWNER}}
H+1:10
Manual smoke test — critical user journey 1
{{QA_OWNER}}
H+1:15
Manual smoke test — critical user journey 2
{{QA_OWNER}}
H+1:20
Verify payment processing (test transaction)
{{QA_OWNER}}
H+1:25
Verify email delivery (test email)
{{QA_OWNER}}
H+1:30
All smoke tests PASS → proceed to monitoring
{{IC}}
H+1:30 → H+2:00: Monitoring Verification
Time
Action
Owner
Status
H+1:30
Verify error rate < {{ERROR_THRESHOLD}}%
{{TECH_LEAD}}
H+1:35
Verify P99 latency < {{P99_THRESHOLD}}ms
{{TECH_LEAD}}
H+1:40
Verify no unexpected spikes in DB CPU/connections
{{DB_OWNER}}
H+1:50
Begin enabling feature flags (per rollout plan)
{{FF_OWNER}}
H+2:00
Declare go-live successful
{{IC}}
4. Post-Launch Monitoring (T+1 to T+7)
Enhanced Monitoring Period
Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring
Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal
Period
Check Frequency
Responsible
H+0 to H+4
Every 30 min
On-call engineer
H+4 to H+24
Every 60 min
On-call engineer
Day 2-7
Standard monitoring
On-call rotationreview
MetricsKey metrics to watch during enhanced monitoring:watch:
ErrorBetterStack: rateuptime must stay green (target:>= < {{ERROR_THRESHOLD}}%99.9%)
P99Slack latency#drop-ops: (target:no <critical {{P99_THRESHOLD}}ms)
DB connection pool utilization (target: < {{DB_POOL}}%)
Cache hit rate (target: > {{CACHE_HIT}}%)
Memory trend (should be stable, not growing)
Support Escalation Procedures
Issue Type
First Contact
Escalation
User-facing errors
Customer support → Engineering
On-call engineer
Performance degradation
On-call engineer
Tech lead + Eng manager
Data issues
On-call engineer
DB owner + Engineering lead
Security concern
Security contact → CISO
Immediate escalation
Performance Baseline Comparison
Compare post-launch metrics to pre-launch staging baseline:
Metric
Staging Baseline
Production Actual
Delta
Status
P95 latency
{{STG_P95}}ms
TBD
TBD
TBD
Error rate
{{STG_ERR}}%
TBD
TBD
TBD
Throughput
{{STG_RPS}} rps
TBD
TBD
TBD
5. Rollback Triggers & Procedure
Rollback Decision Criteria
Automatic rollback triggers:
Smoke tests fail after deploymentalerts
- Error rate
>in {{ROLLBACK_ERROR_RATE}}%logs: for< {{ROLLBACK_DURATION}}1% consecutiveof minutesrequests
DatabaseBankID migrationlogin causessuccess datarate: integrity> issues99%
- KYC approval rate: > 80%
Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):
P99 latency > {{ROLLBACK_P99}}ms sustained for {{ROLLBACK_LATENCY_DURATION}} min
Critical feature broken with no quick fix available
Security vulnerability discovered in new release
Rollback Procedure (Quick Reference)
Announce in war room: "Initiating rollback"
Update status page: "We are investigating an issue and may revert recent changes"
Run: bash scripts/rollback.sh production (or trigger CI pipeline rollback)
Monitor health checks — confirm previous version healthy
If DB migration included: run down migration bash scripts/migrate-down.sh production
Verify all smoke tests pass on previous version
Update status page: "Issue resolved, system restored"
Notify stakeholders
Full rollback procedure: See rollback-plan.md
6. Communication Plan
Pre-Launch Communications
Audience
Channel
When
Message
Internal team
Slack #launches
T-3 days
Launch schedule and plan
Customer support
Briefing doc + Slack
T-2 days
Features, FAQ, escalation path
Existing users
Email / in-app banner
T-1 day
"Exciting updates coming"
Status page subscribers
Status page
T-4 hours
Scheduled maintenance notification
Launch Day Communications
Audience
Channel
When
Message
Status page
status page
T-0
"Scheduled deployment in progress"
Internal
Slack #launches
At success
"🚀 {{PROJECT}} is live!"
Users
Email / in-app
H+1 after success
Launch announcement
Status page
status page
H+1
"Deployment complete — all systems normal"
7. StakeholderCompliance NotificationActions Timelineat Launch
-
Notify Milestone Finanstilsynet Notify of Channel
Owner
Deployment started
Engineering team
Slack war room
{{IC}}
Smoke tests pass
Engineering + Product
Slack
{{IC}}
Go-live declared service All(if stakeholders required Emailby +PISP/AISP Slack licence)
{{COMMS_LEAD}}-
Confirm getdrop.no/vilkaar Rollback(terms initiated of Allservice) stakeholdersis +accessible
Management-
ImmediateConfirm callgetdrop.no/personvern +(privacy Slack policy) {{IC}} is accessible
-
Confirm complaint submission form accessible via app
Confirm GDPR consent is requested on first use
Confirm AML monitoring is active (aml_alerts table populated on large transactions)
Related Documents
Deployment Checklist
Rollback Plan
- Operational Runbook
Monitoring & Observability
- Disaster Recovery Plan
- Deployment Architecture
- Environment Configuration
Approval
Role
Name
Date
Signature
Author
Platform Architect (AI)
2026-02-23
Reviewer
Approver
Alem Bašić
getdrop.no DNS {{DOMAIN}}A→recordLoad balancer (TTL setpoints to{{LOW_TTL}}AppforRunnereasyURLrollback)(or CloudFrontapi.{{DOMAIN}}→ifAPICDNload balancerwww.{{DOMAIN}}→ Redirect to{{DOMAIN}}
{{DOMAIN}}getdrop.no✅and *.{{DOMAIN}}www.getdrop.no✅
Strict-Transport-Security: test:max-age=63072000; GradeincludeSubDomains; A or better ({{SSL_TEST_LINK}})preloadT-3 Days: CDN ConfigurationMonitoring
-
CDNBetterStack:distributionDroppointingHealthtoCheckproductionmonitororiginconfigured, status GREEN -
CacheBetterStack:behaviorsDropconfiguredLandingperPagespecificationmonitor configured, status GREEN -
StaticBetterStack:assetescalationcachepolicyheadersassignedcorrectto(1yrallfor fingerprinted assets)monitors -
CDNSlackWAF#drop-ops:rulestestenabledalertandreceivedtestedsuccessfully -
CDNCloudWatch:purgeAppcommandRunnertestedlogsandstreamingdocumentedto/aws/apprunner/drop-web/.../application -
CDNPublicperformance verified from target geographies
scripts/verify-migration.shhttps:/ CI pipeline ready to trigger/drop-status.betteruptime.com# 1. Verify App Runner service
aws apprunner describe-service \
--service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--query 'Service.Status' --output text --region eu-west-1
# Expected: RUNNING
# 2. Health check
curl -s https://getdrop.no/api/health | jq
# Expected: { "data": { "status": "ok", "checks": { "db": { "status": "pass" } } } }
# 3. Verify RDS
aws rds describe-db-instances \
--db-instance-identifier drop-db \
--query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1
# Expected: available
# 4. Verify latest backup exists
aws rds describe-db-snapshots \
--db-instance-identifier drop-db --region eu-west-1 \
--query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].SnapshotCreateTime' \
--output text
# Expected: timestamp from last 24h
# 5. Create pre-launch manual snapshot
aws rds create-db-snapshot \
--db-instance-identifier drop-db \
--db-snapshot-identifier drop-db-pre-launch-$(date +%Y%m%d-%H%M) \
--region eu-west-1
# Verify BankID client configuration
# Log in at BankID developer portal and confirm:
# - Callback URL: https://getdrop.no/api/auth/bankid/callback
# - Mobile callback: drop://auth/callback
# - Scopes: openid profile
BANKID_MOCK= (unset — real BankID)NEXT_PUBLIC_APP_URL=https://getdrop.nocurl https://getdrop.no/api/health- Navigate to
https://getdrop.no - Click "Logg inn med BankID"
- Authenticate with real BankID
- Verify dashboard loads with bank account balance (AISP)
- Verify KYC status shows
approved(BankID = verified) - Test disclosure endpoint before a remittance
- Test QR payment scan (if merchant test account exists)
#drop-ops received startup alert#drop-ops for the first 2 hoursNEXT_PUBLIC_FF_NOTIFICATIONStrueNEXT_PUBLIC_FF_MERCHANT_DASHBOARDtrueNEXT_PUBLIC_FF_VIRTUAL_CARDSfalseNEXT_PUBLIC_FF_PHYSICAL_CARDSfalseNEXT_PUBLIC_FF_CARD_DETAILSfalseNEXT_PUBLIC_FF_CARD_FREEZEfalseNEXT_PUBLIC_FF_CARD_PINfalseNEXT_PUBLIC_FF_SPENDING_LIMITSfalse# Option A: Rollback to previous App Runner deployment
aws apprunner start-deployment \
--service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
--region eu-west-1
# Option B: If issue is in environment config, update env vars
# (via AWS Console — App Runner environment configuration)
# Then trigger new deployment
#drop-ops: "Going back to previous version — investigating [issue]"#drop-ops bash scripts/migrate-prod.shbash scripts/verify-migration.shbash scripts/smoke-tests.sh production#drop-ops: bash scripts/rollback.sh productionbash scripts/migrate-down.sh production-
Notify FinanstilsynetMilestone ofNotifyChannelOwner -
policy)ImmediateConfirmcallgetdrop.no/personvern+(privacySlack is{{IC}}
getdrop.no/vilkaar aml_alerts table populated on large transactions)