Skip to main content

Go-Live Runbook

Go-Live Runbook

Project: Drop{{PROJECT_NAME}} Version: 0.1.0{{VERSION}} Date: 2026-02-23{{DATE}} Author: Platform Architect (AI){{AUTHOR}} Status: Draft | In Review | Approved Reviewers: Alem Bašić (CEO){{REVIEWERS}}

Document History

Version Date Author Changes
0.1 2026-02-23{{DATE}} Platform Architect (AI){{AUTHOR}} Initial draft for Drop production go-live

1. Go-Live Overview

ThisWhat: runbook{{PROJECT_NAME}} covers the full procedure for taking Drop from its current staging state tov{{VERSION}} production launch.launch DropWhen: is{{LAUNCH_DATE}} currentlyat deployed{{LAUNCH_TIME}} on{{TIMEZONE}} AWSDeployment Appwindow: Runner{{WINDOW_START}} – {{WINDOW_END}} (eu-west-1){{WINDOW_DURATION}}h with RDS PostgreSQL, but has not yet launched publicly. This runbook covers the steps required to open Drop to real Norwegian users.

Go-Live Owner: Alem Bašić (CEO)window) Go-Live Date:Type: TBD{{TYPE}} Rollback Authority: Alem Bašić

PrerequisitesIncident forCommander: go-live:

  •  BankID OIDC client credentials obtained and configured
  •  Open Banking provider selected, contracted, and integrated
  •  Sumsub production account configured and webhooks verified
  •  getdrop.no domain configured with DNS pointing to App Runner
  •  SSL certificate provisioned
  •  BetterStack monitors configured{{IC}} (3primary), monitors active)
  •  Slack #drop-ops webhook configured
  •  RDS security group locked down{{IC_BACKUP}} (nobackup) publicTechnical access)
  • Lead:
  • {{TECH_LEAD}} AllCommunications featureLead: flags{{COMMS_LEAD}} reviewedWar andRoom: set{{WAR_ROOM_LINK}} correctly
  • Status
  • Page: Final security audit completed
{{STATUS_PAGE_URL}}


2. Pre-Launch Checklist

(

T-7 days)

Days: Infrastructure Verification
  •  All production infrastructure provisioned and tested
  •  Load balancer health checks passing for all instances
  •  Auto-scaling groups configured and tested (scale-up + scale-down)
  •  Database replicas in sync and replication lag < {{REPLICATION_LAG}}s
  •  Backup jobs running successfully (last backup verified: {{VERIFY_DATE}})
  •  CDN configured and serving assets correctly
  •  All IAM roles and permissions verified
  •  Infrastructure monitoring dashboards showing green
  •  Estimated cost reviewed and within budget

Owner: {{INFRA_OWNER}} | Due: T-7 days


2.1T-5 InfrastructureDays: DNS Configuration

  • AWSDNS Apprecords Runnercreated/updated servicein running with RUNNING status
  •  RDS PostgreSQL drop-db instance available (status: available)
  •  ECR repository contains latest production-tagged image
  •  AWS Secrets Manager contains all required secrets:{{DNS_PROVIDER}}
    •  JWT_SECRET{{DOMAIN}} → Load balancer (minTTL 32set chars,to generated{{LOW_TTL}} viafor openssleasy rand -base64 48)rollback)
    •  DATABASE_URLapi.{{DOMAIN}} (PostgreSQL connectionAPI string)load balancer
    • www.{{DOMAIN}} → Redirect to BANKID_CLIENT_ID and BANKID_CLIENT_SECRET
    •  SUMSUB_APP_TOKEN and SUMSUB_SECRET_KEY
    •  SLACK_WEBHOOK_URL{{DOMAIN}}
  •  DNS propagation verified (check from multiple regions)
  •  DNS failover routing configured (if applicable)
  •  Old DNS records documented (for rollback reference)

Owner: {{DNS_OWNER}} | Due: T-5 days


2.2T-5 ApplicationDays: SSL Certificates

  • TLS certificates provisioned for all domains
    • GET https://9ef3szvvsb.eu-west-1.awsapprunner.com/api/health{{DOMAIN}} returns
    • *.{"status":"ok"{DOMAIN}} ✅
  • NEXT_PUBLIC_SERVICE_MODE=productionCertificate confirmedexpiry in> App90 Runner env
  •  BANKID_MOCK is NOT set (unset in production)
  •  All CI pipeline checks passing on main branch (lint, typecheck, tests, E2E, docker-build)
  •  Database schema up-to-date (all tables exist)
  •  Demo seed data disabled (NODE_ENV=production)

2.3 External Services

  •  BankID OIDC: test login with real Norwegian BankID succeeds
  •  BankID callback URL matches: https://getdrop.no/api/auth/bankid/callback
  •  Sumsub: test KYC flow with real document succeeds end-to-end
  •  Sumsub webhook: POST to /api/kyc/webhook with valid signature processed correctly
  •  Open Banking: test balance read (AISP) succeedsdays from realgo-live Norwegian bank account
  •  Open Banking: test payment initiation (PISP) succeeds in sandbox

2.4 Domain & SSL

  •  getdrop.no DNS A record points to App Runner URL (or CloudFront if CDN added)
  •  SSL certificate valid for getdrop.no and www.getdrop.nodate
  • HTTPS redirect enabledconfigured (HTTP → HTTPS)
  • HSTS header presentconfigured
  • in
  • response:SSL Strict-Transport-Security:Labs max-age=63072000;test: includeSubDomains;Grade preloadA or better ({{SSL_TEST_LINK}})

2.

Owner: {{SSL_OWNER}} | Due: T-5 Monitoringdays


T-3 Days: CDN Configuration

  • BetterStack:CDN Dropdistribution Healthpointing Checkto monitorproduction configured, status GREENorigin
  • BetterStack:Cache Dropbehaviors Landingconfigured Pageper monitor configured, status GREENspecification
  • BetterStack:Static escalationasset policycache assignedheaders tocorrect all(1yr monitorsfor fingerprinted assets)
  • SlackCDN #drop-ops:WAF testrules alertenabled receivedand successfullytested
  • CloudWatch:CDN Apppurge Runnercommand logstested streamingand to /aws/apprunner/drop-web/.../applicationdocumented
  • PublicCDN statusperformance verified from target geographies

Owner: {{CDN_OWNER}} | Due: T-3 days


T-3 Days: Database Migration

  •  Final migration scripts reviewed and approved
  •  Migration tested on staging with production-sized data (timing recorded: {{MIGRATION_TIME}}min)
  •  Rollback/down migration tested
  •  Migration script idempotent (safe to run twice)
  •  Database backup taken immediately before migration window
  •  Data integrity checks script prepared (scripts/verify-migration.sh)

Owner: {{DB_OWNER}} | Due: T-3 days


T-2 Days: Feature Flags

  •  All new features behind feature flags
  •  Feature flags defaulting to OFF in production
  •  Flag rollout plan documented (which flags, in what order, with what criteria)
  •  Kill switch flags configured (disable any feature immediately if needed)

Owner: {{FF_OWNER}} | Due: T-2 days


T-2 Days: Third-Party Integrations

  •  {{INTEGRATION_1}} — live API keys configured in secrets manager
  •  {{INTEGRATION_2}} — live API keys configured in secrets manager
  •  Payment gateway: live mode activated and tested with real card (refunded)
  •  Email service: sending domain authenticated (SPF, DKIM, DMARC)
  •  All integrations tested in production with smoke tests
  •  Webhook URLs updated to production endpoints

Owner: {{INTEGRATION_OWNER}} | Due: T-2 days


T-1 Day: Monitoring & Alerting

  •  All alert rules deployed to production monitoring
  •  Alert routing configured — PagerDuty / on-call active
  •  Dashboards showing production data
  •  Log aggregation capturing production logs
  •  Distributed tracing enabled
  •  Synthetic monitoring configured (uptime checks every 1 min)
  •  Alert test fired and received by on-call

Owner: {{MONITORING_OWNER}} | Due: T-1 day


T-1 Day: Backup Verification

  •  Production backup job running on schedule
  •  Last backup restored to test environment and verified
  •  Backup storage has sufficient capacity (> {{BACKUP_DAYS}} days)
  •  Point-in-time recovery tested

Owner: {{BACKUP_OWNER}} | Due: T-1 day


  •  Privacy policy published and linked
  •  Terms of service published and linked
  •  Cookie consent banner implemented (if required by jurisdiction)
  •  GDPR data processing inventory updated
  •  Security assessment completed and any findings resolved or accepted
  •  Legal sign-off obtained: {{LEGAL_SIGNOFF}} on {{DATE}}

Owner: {{LEGAL_OWNER}} | Due: T-1 day


T-0: Pre-Launch Final Checks (Within 2 Hours of Launch)

  •  Staging smoke tests passing (last run: {{TIMESTAMP}})
  •  All engineers briefed and available
  •  War room open and all participants joined
  •  Rollback procedure rehearsed mentally
  •  Monitoring dashboards open
  •  Status page live:updated: https:"Scheduled maintenance: {{TIME}} - {{END_TIME}}"
  •  Customer support briefed on launch features and potential issues
  •  Deployment script //drop-status.betteruptime.com CI pipeline ready to trigger

3. Go-LiveLaunch Day Procedure (T-0)Hour by Hour)

Phase 1: Final Verification (T-2h)

# 1. Verify App Runner service
aws apprunner describe-service \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --query 'Service.Status' --output text --region eu-west-1
# Expected: RUNNING

# 2. Health check
curl -s https://getdrop.no/api/health | jq
# Expected: { "data": { "status": "ok", "checks": { "db": { "status": "pass" } } } }

# 3. Verify RDS
aws rds describe-db-instances \
  --db-instance-identifier drop-db \
  --query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1
# Expected: available

# 4. Verify latest backup exists
aws rds describe-db-snapshots \
  --db-instance-identifier drop-db --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].SnapshotCreateTime' \
  --output text
# Expected: timestamp from last 24h

# 5. Create pre-launch manual snapshot
aws rds create-db-snapshot \
  --db-instance-identifier drop-db \
  --db-snapshot-identifier drop-db-pre-launch-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

PhaseH-0: 2:Deployment BankID Callback URL Verification (T-1h)Start

# Verify BankID client configuration
# Log in at BankID developer portal and confirm:
# - Callback URL: https://getdrop.no/api/auth/bankid/callback
# - Mobile callback: drop://auth/callback
# - Scopes: openid profile

Phase 3: Soft Launch (T-0, invite-only)

  1. Configure App Runner environment: BANKID_MOCK= (unset — real BankID)
  2. Set NEXT_PUBLIC_APP_URL=https://getdrop.no
  3. Deploy latest production image
  4. Verify health check: curl https://getdrop.no/api/health
  5. Conduct a live end-to-end test with Alem's real BankID:
    •  Navigate to https://getdrop.no
    •  Click "Logg inn med BankID"
    •  Authenticate with real BankID
    •  Verify dashboard loads with bank account balance (AISP)
    •  Verify KYC status shows approved (BankID = verified)
    •  Test disclosure endpoint before a remittance
    •  Test QR payment scan (if merchant test account exists)

Phase 4: BetterStack Maintenance Window — Close (T+30min)

After confirming system is stable:

  1. Close any BetterStack maintenance window (if created for go-live)
  2. Verify BetterStack monitors show green
  3. Verify Slack #drop-ops received startup alert

Phase 5: Public Launch (T+1h, if soft launch successful)

  1. Remove any invite-only restriction
  2. Announce on social/marketing channels
  3. Monitor #drop-ops for the first 2 hours
  4. Check BetterStack every 30 minutes for first 2 hours

4. Feature Flags at Launch

State notificationsenableddashboardenabledlaunched— requires card partnerfeature feature
FlagTime LaunchAction OwnerStatus Notes
NEXT_PUBLIC_FF_NOTIFICATIONSH+0:00 trueAnnounce in war room: "Deployment started" Push{{IC}}
NEXT_PUBLIC_FF_MERCHANT_DASHBOARDH+0:00 trueTake final pre-deploy database backup Merchant{{DB_OWNER}}
NEXT_PUBLIC_FF_VIRTUAL_CARDSH+0:05 falseEnable maintenance mode (if applicable) Not{{DEPLOY_OWNER}}
NEXT_PUBLIC_FF_PHYSICAL_CARDSH+0:10 falseTrigger production deployment pipeline Future{{DEPLOY_OWNER}} Pipeline: {{PIPELINE_LINK}}
NEXT_PUBLIC_FF_CARD_DETAILSH+0:15 falseMonitor deployment progress Future{{TECH_LEAD}}

H+0:15 → H+0:45: Database Migration Execution

TimeActionOwnerStatus
H+0:15Confirm deployment artifact ready{{DEPLOY_OWNER}}
H+0:20Run database migrations: NEXT_PUBLIC_FF_CARD_FREEZEbash scripts/migrate-prod.sh false{{DB_OWNER}} Future feature
H+0:25Verify migration completed: NEXT_PUBLIC_FF_CARD_PINbash scripts/verify-migration.sh false{{DB_OWNER}} Future feature
H+0:30Confirm new application instances healthy{{TECH_LEAD}}
H+0:40Deploy new application version to all instances{{DEPLOY_OWNER}}

H+0:45 → H+1:00: DNS Cutover

TimeActionOwnerStatus
H+0:45Point DNS to production load balancer{{DNS_OWNER}}
H+0:50Monitor DNS propagation{{DNS_OWNER}}
H+0:55Confirm HTTPS working from external network{{TECH_LEAD}}
H+1:00Disable maintenance mode{{DEPLOY_OWNER}}

H+1:00 → H+1:30: Smoke Tests

TimeActionOwnerStatus
H+1:00Run automated smoke tests: NEXT_PUBLIC_FF_SPENDING_LIMITSbash scripts/smoke-tests.sh production false{{QA_OWNER}} Future
H+1:10Manual smoke test — critical user journey 1{{QA_OWNER}}
H+1:15Manual smoke test — critical user journey 2{{QA_OWNER}}
H+1:20Verify payment processing (test transaction){{QA_OWNER}}
H+1:25Verify email delivery (test email){{QA_OWNER}}
H+1:30All smoke tests PASS → proceed to monitoring{{IC}}

H+1:30 → H+2:00: Monitoring Verification

TimeActionOwnerStatus
H+1:30Verify error rate < {{ERROR_THRESHOLD}}%{{TECH_LEAD}}
H+1:35Verify P99 latency < {{P99_THRESHOLD}}ms{{TECH_LEAD}}
H+1:40Verify no unexpected spikes in DB CPU/connections{{DB_OWNER}}
H+1:50Begin enabling feature flags (per rollout plan){{FF_OWNER}}
H+2:00Declare go-live successful{{IC}}

4. Post-Launch Monitoring (T+1 to T+7)

Enhanced Monitoring Period

Duration: {{POST_LAUNCH_MONITORING}}h enhanced monitoring Monitoring cadence: Every 30 min for first 4h, then hourly for 24h, then normal

PeriodCheck FrequencyResponsible
H+0 to H+4Every 30 minOn-call engineer
H+4 to H+24Every 60 minOn-call engineer
Day 2-7Standard monitoringOn-call rotation

Metrics to watch during enhanced monitoring:

  • Error rate (target: < {{ERROR_THRESHOLD}}%)
  • P99 latency (target: < {{P99_THRESHOLD}}ms)
  • DB connection pool utilization (target: < {{DB_POOL}}%)
  • Cache hit rate (target: > {{CACHE_HIT}}%)
  • Memory trend (should be stable, not growing)

Support Escalation Procedures

Issue TypeFirst ContactEscalation
User-facing errorsCustomer support → EngineeringOn-call engineer
Performance degradationOn-call engineerTech lead + Eng manager
Data issuesOn-call engineerDB owner + Engineering lead
Security concernSecurity contact → CISOImmediate escalation

Performance Baseline Comparison

Compare post-launch metrics to pre-launch staging baseline:

MetricStaging BaselineProduction ActualDeltaStatus
P95 latency{{STG_P95}}msTBDTBDTBD
Error rate{{STG_ERR}}%TBDTBDTBD
Throughput{{STG_RPS}} rpsTBDTBDTBD

5. Rollback Triggers & Procedure

Rollback Decision Criteria

IfAutomatic anyrollback criticaltriggers:

issue
    is
  • Smoke foundtests during orfail after go-live:deployment
  • Error rate > {{ROLLBACK_ERROR_RATE}}% for {{ROLLBACK_DURATION}} consecutive minutes
  • Database migration causes data integrity issues

Manual rollback triggers (decision by {{ROLLBACK_AUTHORITY}}):

Immediate
    Actions
  • P99 (withinlatency 5> minutes)

{{ROLLBACK_P99}}ms
#sustained Optionfor A:{{ROLLBACK_LATENCY_DURATION}} Rollbackmin
to
  • Critical previousfeature Appbroken Runnerwith deploymentno awsquick apprunnerfix start-deploymentavailable
  • \
  • Security --service-arnvulnerability arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \ --region eu-west-1 # Option B: If issue isdiscovered in environmentnew config,release
  • update env

    Rollback vars #Procedure (viaQuick AWS Console — App Runner environment configuration) # Then trigger new deployment

    CommunicationReference)

    1. SendAnnounce Slackin messagewar toroom: "Initiating rollback"
    2. Update status page: "We are investigating an issue and may revert recent changes"
    3. Run: #drop-opsbash scripts/rollback.sh production: "Going(or backtrigger toCI pipeline rollback)
    4. Monitor health checks — confirm previous version healthy
    5. investigating
    6. If [issue]"DB migration included: run down migration bash scripts/migrate-down.sh production
    7. Verify all smoke tests pass on previous version
    8. Update BetterStack status page: create"Issue incidentresolved, system restored"
    9. IfNotify users are affected: prepare user communicationstakeholders

    Post-Rollback

    1. Confirm health check returns OK after

      Full rollback

    2. Confirmprocedure: BetterStackSee monitors green
    3. Investigate root cause before re-attempting launch
    4. Document in incident report
    rollback-plan.md


    6. Post-Communication Plan

    Pre-Launch Monitoring (First 24h)

    Communications errors success
    TimeAudience ActionChannelWhenMessage
    T+0Internal to T+2hteam Monitor #drop-ops Slack continuously#launchesT-3 daysLaunch schedule and plan
    T+2hCustomer support ReviewBriefing CloudWatchdoc logs+ forSlack T-2 daysFeatures, FAQ, escalation path
    T+4hExisting users CheckEmail transaction/ volumein-app andbanner T-1 rateday"Exciting updates coming"
    T+8hStatus page subscribers ReviewStatus Sumsub KYC queue — any stuck applicants?
    T+24hpage FirstT-4 SLA report — uptime, error rate, p99 latency
    T+7 dayshours FirstScheduled weeklymaintenance reviewnotification

    Key

    Launch metricsDay toCommunications

    watch:

    • BetterStack:
    • uptimemuststaygreen(>=99.9%)
    • Slack
    • #drop-ops:noalertsrate <1%ofrequestsloginrate:approvalrate:>80%
      Audience Channel When Message
      Status criticalpage status
    • Errorpage
    • T-0"Scheduled deployment in logs:progress"
      Internal Slack
    • BankID#launches
    • At success "🚀 >{{PROJECT}} 99%is
    • KYClive!"
    • Users Email / in-appH+1 after successLaunch announcement
      Status pagestatus pageH+1"Deployment complete — all systems normal"

      7. ComplianceStakeholder ActionsNotification at LaunchTimeline

      • Finanstilsynetof (ifby
      • Confirmgetdrop.no/vilkaar(termsservice)Confirmisaccessible
      • Confirm complaint submission form accessible via app
      •  Confirm GDPR consent is requested on first use
      •  Confirm AML monitoring is active (aml_alerts table populated on large transactions)
      • MilestoneNotify Channel Owner
        Deployment startedEngineering teamSlack war room{{IC}}
        Smoke tests passEngineering + ProductSlack{{IC}}
        Go-live servicedeclared All requiredstakeholders Email PISP/AISP+ licence)Slack {{COMMS_LEAD}}
        Rollback ofinitiated All isstakeholders accessible+
      • Management
      • Immediate getdrop.no/personverncall (privacy+ policy)Slack {{IC}}


        Approval

        Role Name Date Signature
        Author Platform Architect (AI) 2026-02-23
        Reviewer
        Approver Alem Bašić