Skip to main content

Go-Live Runbook

Go-Live Runbook

Project: {{PROJECT_NAME}}Drop Version: {{VERSION}}0.1.0 Date: {{DATE}}2026-02-23 Author: {{AUTHOR}}Platform Architect (AI) Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 {{DATE}}2026-02-23 {{AUTHOR}}Platform Architect (AI) Initial draft for Drop production go-live

1. Overview

This runbook covers the full procedure for taking Drop from its current staging state to production launch. Drop is currently deployed on AWS App Runner (eu-west-1) with RDS PostgreSQL, but has not yet launched publicly. This runbook covers the steps required to open Drop to real Norwegian users.

Go-Live Overview

What:Owner: {{PROJECT_NAME}}Alem v{{VERSION}} production launch When: {{LAUNCH_DATE}} at {{LAUNCH_TIME}} {{TIMEZONE}} Deployment window: {{WINDOW_START}} – {{WINDOW_END}}Bašić ({{WINDOW_DURATION}}h window)CEO) Go-Live Type:Date: {{TYPE}}TBD Rollback Authority: Alem Bašić

IncidentPrerequisites Commander:for go-live:

{{IC}}
  •  BankID OIDC client credentials obtained and configured
  •  Open Banking provider selected, contracted, and integrated
  •  Sumsub production account configured and webhooks verified
  •  getdrop.no domain configured with DNS pointing to App Runner
  •  SSL certificate provisioned
  •  BetterStack monitors configured (primary),3 {{IC_BACKUP}}monitors active)
  •  Slack #drop-ops webhook configured
  •  RDS security group locked down (backup)no Technicalpublic Lead:access)
  • {{TECH_LEAD}}
  • CommunicationsAll Lead:feature {{COMMS_LEAD}}flags Warreviewed Room:and {{WAR_ROOM_LINK}}set Statuscorrectly
  • Page:
  • {{STATUS_PAGE_URL}}

    Final security audit completed

2. Pre-Launch Checklist (T-7 days)

T-7 Days:2.1 Infrastructure Verification

  • AllAWS productionApp infrastructureRunner provisionedservice andrunning testedwith RUNNING status
  • LoadRDS balancerPostgreSQL drop-db instance available (status: available)
  •  ECR repository contains latest production-tagged image
  •  AWS Secrets Manager contains all required secrets:
    •  JWT_SECRET (min 32 chars, generated via openssl rand -base64 48)
    •  DATABASE_URL (PostgreSQL connection string)
    •  BANKID_CLIENT_ID and BANKID_CLIENT_SECRET
    •  SUMSUB_APP_TOKEN and SUMSUB_SECRET_KEY
    •  SLACK_WEBHOOK_URL

2.2 Application

  •  GET https://9ef3szvvsb.eu-west-1.awsapprunner.com/api/health returns {"status":"ok"}
  •  NEXT_PUBLIC_SERVICE_MODE=production confirmed in App Runner env
  •  BANKID_MOCK is NOT set (unset in production)
  •  All CI pipeline checks passing foron allmain instances
  •  Auto-scaling groups configured and testedbranch (scale-uplint, +typecheck, scale-down)tests, E2E, docker-build)
  • Database replicasschema inup-to-date sync(all andtables replication lag < {{REPLICATION_LAG}}sexist)
  • BackupDemo jobsseed runningdata successfullydisabled (lastNODE_ENV=production)
  • backup
verified:

2.3 {{VERIFY_DATE}})External Services

  •  BankID OIDC: test login with real Norwegian BankID succeeds
  • CDNBankID configuredcallback andURL servingmatches: assetshttps://getdrop.no/api/auth/bankid/callback
  •  Sumsub: test KYC flow with real document succeeds end-to-end
  •  Sumsub webhook: POST to /api/kyc/webhook with valid signature processed correctly
  • AllOpen IAMBanking: rolestest andbalance permissionsread verified(AISP) succeeds from real Norwegian bank account
  • InfrastructureOpen monitoringBanking: dashboardstest showingpayment green
  • initiation
  • (PISP) Estimatedsucceeds costin reviewed and within budgetsandbox

Owner:

2.4 {{INFRA_OWNER}}Domain |& Due: T-7 days


T-5 Days: DNS ConfigurationSSL

  • getdrop.no DNS records created/updated in {{DNS_PROVIDER}}
    • {{DOMAIN}}A record Load balancer (TTL setpoints to {{LOW_TTL}}App forRunner easyURL rollback)
    • (or
    • api.{{DOMAIN}}CloudFront if APICDN load balancer
    • www.{{DOMAIN}} → Redirect to {{DOMAIN}}
    added)
  • DNS propagation verified (check from multiple regions)
  •  DNS failover routing configured (if applicable)
  •  Old DNS records documented (for rollback reference)

Owner: {{DNS_OWNER}} | Due: T-5 days


T-5 Days: SSL Certificates

certificate
  •  TLS certificates provisionedvalid for all domains
    • {{DOMAIN}}getdrop.no
    • and
    • *.{{DOMAIN}}www.getdrop.no ✅
  •  Certificate expiry > 90 days from go-live date
  • HTTPS redirect configuredenabled (HTTP → HTTPS)
  • HSTS header configured
  • present
  • in SSLresponse: LabsStrict-Transport-Security: test:max-age=63072000; GradeincludeSubDomains; A or better ({{SSL_TEST_LINK}})preload

Owner: {{SSL_OWNER}} | Due: T-

2.5 days


T-3 Days: CDN ConfigurationMonitoring

  • CDNBetterStack: distributionDrop pointingHealth toCheck productionmonitor originconfigured, status GREEN
  • CacheBetterStack: behaviorsDrop configuredLanding perPage specificationmonitor configured, status GREEN
  • StaticBetterStack: assetescalation cachepolicy headersassigned correctto (1yrall for fingerprinted assets)monitors
  • CDNSlack WAF#drop-ops: rulestest enabledalert andreceived testedsuccessfully
  • CDNCloudWatch: purgeApp commandRunner testedlogs andstreaming documentedto /aws/apprunner/drop-web/.../application
  • CDNPublic performance verified from target geographies

Owner: {{CDN_OWNER}} | Due: T-3 days


T-3 Days: Database Migration

  •  Final migration scripts reviewed and approved
  •  Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
  •  Rollback/down migration tested
  •  Migration script idempotent (safe to run twice)
  •  Database backup taken immediately before migration window
  •  Data integrity checks script prepared (scripts/verify-migration.sh)

Owner: {{DB_OWNER}} | Due: T-3 days


T-2 Days: Feature Flags

  •  All new features behind feature flags
  •  Feature flags defaulting to OFF in production
  •  Flag rollout plan documented (which flags, in what order, with what criteria)
  •  Kill switch flags configured (disable any feature immediately if needed)

Owner: {{FF_OWNER}} | Due: T-2 days


T-2 Days: Third-Party Integrations

  •  {{INTEGRATION_1}} — live API keys configured in secrets manager
  •  {{INTEGRATION_2}} — live API keys configured in secrets manager
  •  Payment gateway: live mode activated and tested with real card (refunded)
  •  Email service: sending domain authenticated (SPF, DKIM, DMARC)
  •  All integrations tested in production with smoke tests
  •  Webhook URLs updated to production endpoints

Owner: {{INTEGRATION_OWNER}} | Due: T-2 days


T-1 Day: Monitoring & Alerting

  •  All alert rules deployed to production monitoring
  •  Alert routing configured — PagerDuty / on-call active
  •  Dashboards showing production data
  •  Log aggregation capturing production logs
  •  Distributed tracing enabled
  •  Synthetic monitoring configured (uptime checks every 1 min)
  •  Alert test fired and received by on-call

Owner: {{MONITORING_OWNER}} | Due: T-1 day


T-1 Day: Backup Verification

  •  Production backup job running on schedule
  •  Last backup restored to test environment and verified
  •  Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
  •  Point-in-time recovery tested

Owner: {{BACKUP_OWNER}} | Due: T-1 day


  •  Privacy policy published and linked
  •  Terms of service published and linked
  •  Cookie consent banner implemented (if required by jurisdiction)
  •  GDPR data processing inventory updated
  •  Security assessment completed and any findings resolved or accepted
  •  Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}

Owner: {{LEGAL_OWNER}} | Due: T-1 day


T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)

  •  Staging smoke tests passing (last run: {{TIMESTAMP}})
  •  All engineers briefed and available
  •  War room open and all participants joined
  •  Rollback procedure rehearsed mentally
  •  Monitoring dashboards open
  •  Statusstatus page updated:live: "Scheduled maintenance: {{TIME}} - {{END_TIME}}"
  •  Customer support briefed on launch features and potential issues
  •  Deployment script https:/ CI pipeline ready to trigger/drop-status.betteruptime.com

3. LaunchGo-Live Day Procedure (Hour by Hour)T-0)

Phase 1: Final Verification (T-2h)

# 1. Verify App Runner service
aws apprunner describe-service \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --query 'Service.Status' --output text --region eu-west-1
# Expected: RUNNING

# 2. Health check
curl -s https://getdrop.no/api/health | jq
# Expected: { "data": { "status": "ok", "checks": { "db": { "status": "pass" } } } }

# 3. Verify RDS
aws rds describe-db-instances \
  --db-instance-identifier drop-db \
  --query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1
# Expected: available

# 4. Verify latest backup exists
aws rds describe-db-snapshots \
  --db-instance-identifier drop-db --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].SnapshotCreateTime' \
  --output text
# Expected: timestamp from last 24h

# 5. Create pre-launch manual snapshot
aws rds create-db-snapshot \
  --db-instance-identifier drop-db \
  --db-snapshot-identifier drop-db-pre-launch-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

H-0:Phase Deployment2: StartBankID Callback URL Verification (T-1h)

# Verify BankID client configuration
# Log in at BankID developer portal and confirm:
# - Callback URL: https://getdrop.no/api/auth/bankid/callback
# - Mobile callback: drop://auth/callback
# - Scopes: openid profile

Phase 3: Soft Launch (T-0, invite-only)

  1. Configure App Runner environment: BANKID_MOCK= (unset — real BankID)
  2. Set NEXT_PUBLIC_APP_URL=https://getdrop.no
  3. Deploy latest production image
  4. Verify health check: curl https://getdrop.no/api/health
  5. Conduct a live end-to-end test with Alem's real BankID:
    •  Navigate to https://getdrop.no
    •  Click "Logg inn med BankID"
    •  Authenticate with real BankID
    •  Verify dashboard loads with bank account balance (AISP)
    •  Verify KYC status shows approved (BankID = verified)
    •  Test disclosure endpoint before a remittance
    •  Test QR payment scan (if merchant test account exists)

Phase 4: BetterStack Maintenance Window — Close (T+30min)

After confirming system is stable:

  1. Close any BetterStack maintenance window (if created for go-live)
  2. Verify BetterStack monitors show green
  3. Verify Slack #drop-ops received startup alert

Phase 5: Public Launch (T+1h, if soft launch successful)

  1. Remove any invite-only restriction
  2. Announce on social/marketing channels
  3. Monitor #drop-ops for the first 2 hours
  4. Check BetterStack every 30 minutes for first 2 hours

4. Feature Flags at Launch

FlagLaunch StateNotes
NEXT_PUBLIC_FF_NOTIFICATIONStruePush notifications enabled
NEXT_PUBLIC_FF_MERCHANT_DASHBOARDtrueMerchant dashboard enabled
NEXT_PUBLIC_FF_VIRTUAL_CARDSfalseNot launched — requires card partner
NEXT_PUBLIC_FF_PHYSICAL_CARDSfalseFuture feature
NEXT_PUBLIC_FF_CARD_DETAILSfalseFuture feature
NEXT_PUBLIC_FF_CARD_FREEZEfalseFuture feature
NEXT_PUBLIC_FF_CARD_PINfalseFuture feature
NEXT_PUBLIC_FF_SPENDING_LIMITSfalseFuture feature

5. Rollback Procedure

If any critical issue is found during or after go-live:

Immediate Actions (within 5 minutes)

# Option A: Rollback to previous App Runner deployment
aws apprunner start-deployment \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

# Option B: If issue is in environment config, update env vars
# (via AWS Console — App Runner environment configuration)
# Then trigger new deployment

Communication

  1. Send Slack message to #drop-ops: "Going back to previous version — investigating [issue]"
  2. Update BetterStack status page: create incident
  3. If users are affected: prepare user communication

Post-Rollback

  1. Confirm health check returns OK after rollback
  2. Confirm BetterStack monitors green
  3. Investigate root cause before re-attempting launch
  4. Document in incident report

6. Post-Launch Monitoring (First 24h)

success queueanyreportuptime,errorrate,p99
Time Action OwnerStatusNotes
H+0:00T+0 to T+2h AnnounceMonitor in#drop-ops warSlack room: "Deployment started"{{IC}}continuously
H+0:00T+2h TakeReview finalCloudWatch pre-deploylogs databasefor backup{{DB_OWNER}}errors
H+0:05T+4h EnableCheck maintenancetransaction modevolume (ifand applicable) {{DEPLOY_OWNER}}rate
H+0:10T+8h TriggerReview productionSumsub deploymentKYC pipeline {{DEPLOY_OWNER}} Pipeline:stuck {{PIPELINE_LINK}}applicants?
H+0:15T+24h MonitorFirst deploymentSLA progress {{TECH_LEAD}}

H+0:15 → H+0:45: Database Migration Execution

TimeActionOwnerStatus
H+0:15Confirm deployment artifact ready{{DEPLOY_OWNER}}latency
H+0:20T+7 days RunFirst databaseweekly migrations: bash scripts/migrate-prod.sh{{DB_OWNER}}
H+0:25Verify migration completed: bash scripts/verify-migration.sh{{DB_OWNER}}
H+0:30Confirm new application instances healthy{{TECH_LEAD}}
H+0:40Deploy new application version to all instances{{DEPLOY_OWNER}}

H+0:45 → H+1:00: DNS Cutover

TimeActionOwnerStatus
H+0:45Point DNS to production load balancer{{DNS_OWNER}}
H+0:50Monitor DNS propagation{{DNS_OWNER}}
H+0:55Confirm HTTPS working from external network{{TECH_LEAD}}
H+1:00Disable maintenance mode{{DEPLOY_OWNER}}

H+1:00 → H+1:30: Smoke Tests

TimeActionOwnerStatus
H+1:00Run automated smoke tests: bash scripts/smoke-tests.sh production{{QA_OWNER}}
H+1:10Manual smoke test — critical user journey 1{{QA_OWNER}}
H+1:15Manual smoke test — critical user journey 2{{QA_OWNER}}
H+1:20Verify payment processing (test transaction){{QA_OWNER}}
H+1:25Verify email delivery (test email){{QA_OWNER}}
H+1:30All smoke tests PASS → proceed to monitoring{{IC}}

H+1:30 → H+2:00: Monitoring Verification

TimeActionOwnerStatus
H+1:30Verify error rate < {{ERROR_THRESHOLD}}%{{TECH_LEAD}}
H+1:35Verify P99 latency < {{P99_THRESHOLD}}ms{{TECH_LEAD}}
H+1:40Verify no unexpected spikes in DB CPU/connections{{DB_OWNER}}
H+1:50Begin enabling feature flags (per rollout plan){{FF_OWNER}}
H+2:00Declare go-live successful{{IC}}

4. Post-Launch Monitoring (T+1 to T+7)

Enhanced Monitoring Period

Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal

PeriodCheck FrequencyResponsible
H+0 to H+4Every 30 minOn-call engineer
H+4 to H+24Every 60 minOn-call engineer
Day 2-7Standard monitoringOn-call rotationreview

MetricsKey metrics to watch during enhanced monitoring:watch:

  • ErrorBetterStack: rateuptime must stay green (target:>= < {{ERROR_THRESHOLD}}%99.9%)
  • P99Slack latency#drop-ops: (target:no <critical {{P99_THRESHOLD}}ms)
  • DB connection pool utilization (target: < {{DB_POOL}}%)
  • Cache hit rate (target: > {{CACHE_HIT}}%)
  • Memory trend (should be stable, not growing)

Support Escalation Procedures

Issue TypeFirst ContactEscalation
User-facing errorsCustomer support → EngineeringOn-call engineer
Performance degradationOn-call engineerTech lead + Eng manager
Data issuesOn-call engineerDB owner + Engineering lead
Security concernSecurity contact → CISOImmediate escalation

Performance Baseline Comparison

Compare post-launch metrics to pre-launch staging baseline:

MetricStaging BaselineProduction ActualDeltaStatus
P95 latency{{STG_P95}}msTBDTBDTBD
Error rate{{STG_ERR}}%TBDTBDTBD
Throughput{{STG_RPS}} rpsTBDTBDTBD

5. Rollback Triggers & Procedure

Rollback Decision Criteria

Automatic rollback triggers:

  • Smoke tests fail after deploymentalerts
  • Error rate >in {{ROLLBACK_ERROR_RATE}}%logs: for< {{ROLLBACK_DURATION}}1% consecutiveof minutesrequests
  • DatabaseBankID migrationlogin causessuccess datarate: integrity> issues99%
  • KYC approval rate: > 80%

Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):

  • P99 latency > {{ROLLBACK_P99}}ms sustained for {{ROLLBACK_LATENCY_DURATION}} min
  • Critical feature broken with no quick fix available
  • Security vulnerability discovered in new release

Rollback Procedure (Quick Reference)

  1. Announce in war room: "Initiating rollback"
  2. Update status page: "We are investigating an issue and may revert recent changes"
  3. Run: bash scripts/rollback.sh production (or trigger CI pipeline rollback)
  4. Monitor health checks — confirm previous version healthy
  5. If DB migration included: run down migration bash scripts/migrate-down.sh production
  6. Verify all smoke tests pass on previous version
  7. Update status page: "Issue resolved, system restored"
  8. Notify stakeholders

Full rollback procedure: See rollback-plan.md


6. Communication Plan

Pre-Launch Communications

AudienceChannelWhenMessage
Internal teamSlack #launchesT-3 daysLaunch schedule and plan
Customer supportBriefing doc + SlackT-2 daysFeatures, FAQ, escalation path
Existing usersEmail / in-app bannerT-1 day"Exciting updates coming"
Status page subscribersStatus pageT-4 hoursScheduled maintenance notification

Launch Day Communications

AudienceChannelWhenMessage
Status pagestatus pageT-0"Scheduled deployment in progress"
InternalSlack #launchesAt success"🚀 {{PROJECT}} is live!"
UsersEmail / in-appH+1 after successLaunch announcement
Status pagestatus pageH+1"Deployment complete — all systems normal"

7. StakeholderCompliance NotificationActions Timelineat Launch

  • NotifyFinanstilsynetof servicerequiredlicence)
  • Confirmgetdrop.no/vilkaarof
  • policy)isaccessible
  • Milestone Notify ChannelOwner
    Deployment startedEngineering teamSlack war room{{IC}}
    Smoke tests passEngineering + ProductSlack{{IC}}
    Go-live declared All(if stakeholders Emailby +PISP/AISP Slack {{COMMS_LEAD}}
    Rollback(terms initiated Allservice) stakeholdersis +accessible ManagementImmediateConfirm callgetdrop.no/personvern +(privacy Slack {{IC}}
    Confirm complaint submission form accessible via app
  •  Confirm GDPR consent is requested on first use
  •  Confirm AML monitoring is active (aml_alerts table populated on large transactions)


  • Approval

    Role Name Date Signature
    Author Platform Architect (AI) 2026-02-23
    Reviewer
    Approver Alem Bašić