Disaster Recovery Plan

Project: ~~Drop~~{{PROJECT_NAME}} Version: ~~0.1.0~~{{VERSION}} Date: ~~2026-02-23~~{{DATE}} Author: ~~Platform Architect (AI)~~{{AUTHOR}} Status: Draft | In Review | Approved Reviewers: ~~Alem Bašić (CEO)~~{{REVIEWERS}}

Document History

Version	Date	Author	Changes
0.1	~~2026-02-23~~{{DATE}}	~~Platform Architect (AI)~~{{AUTHOR}}	~~Compiled~~Initial ~~from DR-RUNBOOK.md + infrastructure analysis~~draft

1. Business Continuity Overview

This plan documents the procedures to recover ~~Drop~~{{PROJECT_NAME}} services following a disaster event (~~infrastructure~~data center failure, data corruption, security breach, or ~~total~~catastrophic ~~region outage)~~failure). Drop is a PSD2 pass-through payment application — it never holds customer funds, so there is no risk of customer money being lost due to Drop infrastructure failure. The primary recovery concern is service availability and data integrity.

Plan Owner: ~~Alem Bašić (CEO), [email protected], +47 40 47 42 51~~{{DR_OWNER}} Plan Reviewer: ~~John (AI Director), Slack~~ #drop-alerts{{DR_REVIEWER}} Last Tested: ~~TBD — initial DR drill not yet conducted~~{{LAST_TEST_DATE}} Next Scheduled Test: ~~Q2 2026 (App Runner restart + RDS snapshot restore)~~{{NEXT_TEST_DATE}}

Disaster types covered:

~~App Runner service~~Infrastructure failure or(AZ/region ~~crash~~outage)
~~RDS~~Data ~~database failure~~corruption or ~~data~~accidental ~~corruption~~deletion
Security incident (~~unauthorized~~ransomware, ~~access,~~data ~~credential compromise)~~breach)
~~Full region~~Vendor/provider outage ~~(eu-west-1)~~
Catastrophic application failure ~~(bad deployment)~~

2. RPO / RTO Targets Per Service Tier

~~transaction~~

Tier	Description	RPO	RTO	~~Drop Services~~Examples
Tier 1 — Critical	Core user-facing ~~services~~services; downtime has direct revenue impact	~~5 minutes~~0 (~~PITR)~~real-time replication)	30< ~~minutes~~15 min	~~Auth~~Auth, ~~(BankID),~~checkout, ~~transactions~~core ~~(remittance + QR), health endpoint~~API
Tier 2 — Important	Supporting ~~features~~services; degraded experience without them	24< ~~hours (snapshot)~~	1 hour	~~Merchant~~< ~~dashboard,~~4 ~~notifications,~~hours	Notifications, ~~history~~reports
Tier 3 — Standard	~~Background~~Background/admin /services; ~~admin~~business can operate without temporarily	< 24 hours	< 24 hours	~~Audit~~Analytics, ~~logs,~~admin ~~AML alerts, complaint records~~panel

3. Service Tier Classification

~~cannotrevenue featurerevenue featurefordependencydegraded,blockingissue~~

Service	Tier	~~Justification~~Owner	Rationale
~~BankID authentication~~{{SERVICE_1}}	Tier 1	~~Users~~{{OWNER}}	Core ~~transact~~user ~~without login~~journey
~~Remittance API (~~`/api/transactions/remittance`){{SERVICE_2}}	Tier 1	~~Core~~{{OWNER}}	Authentication
~~QR payment API (~~`/api/transactions/qr-payment`){{SERVICE_3}}	1Tier 2	~~Core~~{{OWNER}}	Supporting
~~Bank account read (AISP)~~{{SERVICE_4}}	1Tier 3	~~Required~~{{OWNER}}	Admin ~~payment initiation~~only
~~Health~~Database ~~endpoint~~— (`/api/health`)Primary	Tier 1	~~Monitoring~~Platform	All services depend on it
~~Transaction~~Object ~~history~~Storage	Tier 2	UXPlatform	User nouploads

4. Backup Strategy

4.1 Database Backups

~~ops~~ ~~degraded~~

Database	Backup Type	Frequency	Retention	Location	Verified
{{DB_PRIMARY}}	Automated snapshot	Daily	30 days	{{BACKUP_LOCATION}}	Monthly
~~Merchant dashboard~~{{DB_PRIMARY}}	2Point-in-time recovery	~~Merchant~~Continuous	7 ~~impacted~~days	{{BACKUP_LOCATION}}	Monthly
~~Notifications~~{{DB_READ_REPLICA}}	2Not backed up separately	UX—	—	Rebuilt from primary	—

Automated backup tool: {{BACKUP_TOOL}} Backup encryption: AES-256, key managed in {{KMS_TOOL}} Cross-region copy: {{CROSS_REGION}}

4.2 File / Object Storage Backups

—~~notreal-~~

Storage	Backup Method	Frequency	Retention	DR Copy
{{S3_BUCKET}}	S3 versioning + replication	Continuous	{{RETENTION}}	{{DR_BUCKET}}
~~Audit log~~{{FILE_STORE}}	3Snapshot	~~Compliance~~Daily	30 ~~retained,~~days	Cross-region

4.3 Configuration Backups

~~periodically,notreal-~~

Config	Backup Method	Location	Frequency
IaC (Terraform)	Git repository	{{GIT_REPO}}	On change
Application config	Git repository	{{GIT_REPO}}	On change
Secrets	Secrets manager replication	{{SECRETS_BACKUP}}	Real-time
~~AML~~DNS ~~alerts~~records	3Export to Git	~~Reviewed~~{{GIT_REPO}}	Weekly
TLS certificates	Secrets manager	{{CERTS_BACKUP}}	On renewal

4.4 Backup Testing Schedule

Backup Type	Test Frequency	Last Test	Result	Tester
Database full restore	Monthly	{{DATE}}	{{RESULT}}	{{TESTER}}
Point-in-time restore	Quarterly	{{DATE}}	{{RESULT}}	{{TESTER}}
Object storage restore	Quarterly	{{DATE}}	{{RESULT}}	{{TESTER}}
Full DR failover drill	Bi-annually	{{DATE}}	{{RESULT}}	{{TESTER}}

4. Infrastructure Overview

Production

~~Service:~~ ~~AWS App Runner~~

~~Region:~~ ~~eu-west-1 (Ireland)~~

~~Service ARN:~~ arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec

~~Service URL:~~ https://9ef3szvvsb.eu-west-1.awsapprunner.com

~~ECR Repository:~~ 324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web

Database (Production)

~~RDS Instance:~~ drop-db

~~Endpoint:~~ drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com:5432

~~DB Name:~~ dropapp

~~DB User:~~ dropuser

~~Backup Strategy:~~ ~~Automated daily snapshots, 7-day retention~~

~~Backup Window:~~ ~~23:24–23:54 UTC daily~~

~~PITR:~~ ~~Enabled (5-minute granularity)~~

Staging

~~Platform:~~ ~~Fly.io, region~~ arn ~~(Stockholm)~~

~~App Name:~~ drop-staging

~~Database:~~ ~~SQLite ephemeral volume — no automated backup~~

5. Backup Strategy

Production RDS PostgreSQL

~~Automated Snapshots:~~ ~~Daily at 23:24 UTC~~

~~Retention Period:~~ ~~7 days~~

~~Point-in-Time Recovery (PITR):~~ ~~Enabled — any point within last 7 days, 5-minute granularity~~

~~Manual Snapshots:~~ ~~Created before every major deployment or migration~~

~~Snapshot verification:~~ ~~Run quarterly~~

ECR Docker Images

~~All pushed images retained~~ ~~in ECR repository~~

~~Rollback capability:~~ ~~Redeploy any previous image tag via~~ aws apprunner start-deployment

~~Lifecycle policy:~~ ~~Delete untagged images after 7 days, keep last 10 tagged releases~~

Staging (Fly.io)

~~No automated backup~~ ~~— ephemeral SQLite storage~~

~~Manual backup procedure:~~

flyctl ssh console -a drop-staging
sqlite3 /app/data/drop.db ".backup /app/data/backup-$(date +%Y%m%d).db"

6. RecoveryFailover Procedures

Scenario5.1 1:Automated App Runner Service DownFailover

~~Symptoms:~~

~~BetterStack alert:~~ Drop Health Check is DOWN

~~Slack~~ #drop-ops~~: critical alert~~

~~App Runner service status not~~ RUNNING

~~Investigation:~~

# Check service status
aws apprunner describe-service \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

# View recent logs (last 10 minutes)
aws logs tail /aws/apprunner/drop-web/8e45b0d335304487a1880f4e32d6aeec/application \
  --follow --since 10m --region eu-west-1

# Check deployment history
aws apprunner list-operations \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

~~Recovery Option A: Restart (preferred)~~

aws apprunner start-deployment \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

~~RTO:~~ ~~5–10 minutes |~~ ~~RPO:~~ ~~0 (no data loss)~~

~~Recovery Option B: Rollback to previous image~~

# List recent ECR images
aws ecr describe-images \
  --repository-name drop-web \
  --region eu-west-1 \
  --query 'sort_by(imageDetails,&imagePushedAt)[-5:]'

# Update App Runner image tag via console or update the deployment workflow
# Then trigger new deployment

~~RTO:~~ ~~15–20 minutes |~~ ~~RPO:~~ ~~0 (no data loss)~~

Scenario 2: RDS Database Failure

~~Symptoms:~~

/api/health ~~returns~~ {"status":"down"} ~~(HTTP 503)~~

~~BetterStack + Slack alerts fire~~

~~App Runner logs show connection timeout to RDS~~

~~Investigation:~~

# Check RDS status
aws rds describe-db-instances \
  --db-instance-identifier drop-db \
  --region eu-west-1 \
  --query 'DBInstances[0].DBInstanceStatus'

# Check available snapshots
aws rds describe-db-snapshots \
  --db-instance-identifier drop-db \
  --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`] | sort_by(@, &SnapshotCreateTime)[-5:]'

# Check events
aws rds describe-events \
  --source-identifier drop-db \
  --source-type db-instance \
  --region eu-west-1 --duration 60

~~Recovery Option A: Restore from automated snapshot~~

LATEST=$(aws rds describe-db-snapshots \
  --db-instance-identifier drop-db --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].DBSnapshotIdentifier' \
  --output text)

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier drop-db-restored \
  --db-snapshot-identifier $LATEST \
  --db-instance-class db.t4g.micro \
  --region eu-west-1

aws rds wait db-instance-available --db-instance-identifier drop-db-restored --region eu-west-1

# Update DATABASE_URL in App Runner environment with new endpoint
NEW_EP=$(aws rds describe-db-instances --db-instance-identifier drop-db-restored \
  --query 'DBInstances[0].Endpoint.Address' --output text --region eu-west-1)

~~RTO:~~ ~~30 minutes |~~ ~~RPO:~~ ~~24 hours (last snapshot)~~

~~Recovery Option B: Point-in-Time Recovery (PITR)~~

aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier drop-db \
  --target-db-instance-identifier drop-db-pitr \
  --restore-time $(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ') \
  --db-instance-class db.t4g.micro \
  --region eu-west-1

~~RTO:~~ ~~30 minutes |~~ ~~RPO:~~ ~~5 minutes (PITR granularity)~~

Scenario 3: Data Corruption

~~Symptoms:~~

~~Application reports data inconsistencies~~

~~User-reported missing or incorrect transactions~~

~~Audit log shows unexpected DELETE/UPDATE operations~~

~~Investigation:~~

# Check for soft-deleted users
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "SELECT COUNT(*) FROM users WHERE deleted_at IS NOT NULL;"

# Check recent suspicious audit log entries
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "SELECT * FROM audit_log WHERE action IN ('DELETE','UPDATE') ORDER BY timestamp DESC LIMIT 50;"

~~Recovery:~~ ~~Selective restore from clean snapshot (see Scenario 2 recovery steps) + merge affected tables.~~

~~RTO:~~ ~~1–2 hours (selective) |~~ ~~RPO:~~ ~~Depends on snapshot age~~

Scenario 4: Full Region Outage (eu-west-1)

~~Current State:~~ ~~No automated cross-region failover. Manual failover to eu-north-1 (Stockholm) required.~~

~~Investigation:~~

~~Check AWS Health Dashboard: https://health.aws.amazon.com/health/status~~

~~Verify RDS snapshot accessibility from eu-west-1~~

~~Manual Failover to eu-north-1:~~

# 1. Copy latest RDS snapshot to eu-north-1
LATEST=$(aws rds describe-db-snapshots --db-instance-identifier drop-db --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].DBSnapshotIdentifier' \
  --output text)

aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:eu-west-1:324480209768:snapshot:$LATEST \
  --target-db-snapshot-identifier drop-db-failover-$(date +%Y%m%d) \
  --region eu-north-1

# 2. Restore RDS in eu-north-1
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier drop-db-failover \
  --db-snapshot-identifier drop-db-failover-$(date +%Y%m%d) \
  --db-instance-class db.t4g.micro \
  --region eu-north-1

# 3. Create ECR repository in eu-north-1 and push latest image
# 4. Create App Runner service in eu-north-1
# 5. Update DNS when getdrop.no is active

~~RTO:~~ ~~2–4 hours (manual) |~~ ~~RPO:~~ ~~Up to 24 hours (last snapshot)~~

Scenario 5: Security Incident

~~Symptoms:~~

~~Suspicious audit log entries~~

~~Unauthorized access attempts~~

~~AML alerts triggered for unusual activity~~

~~Sumsub KYC bypass attempt~~

~~Investigation:~~

# Check audit log for recent suspicious activity
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "SELECT * FROM audit_log WHERE timestamp > NOW() - INTERVAL '24 hours' ORDER BY timestamp DESC;"

# Check AML alerts
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "SELECT * FROM aml_alerts WHERE status = 'open' OR created_at > NOW() - INTERVAL '24 hours';"

# Check CloudTrail for AWS API activity
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=drop-db \
  --region eu-west-1 --max-results 50

~~Containment:~~

# 1. Revoke compromised sessions immediately
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "UPDATE sessions SET revoked = 1 WHERE user_id IN (SELECT user_id FROM aml_alerts WHERE status = 'open');"

# 2. Disable affected users
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "UPDATE users SET kyc_status = 'rejected' WHERE id IN (SELECT user_id FROM aml_alerts WHERE severity = 'critical');"

# 3. Rotate database credentials
aws rds modify-db-instance \
  --db-instance-identifier drop-db \
  --master-user-password <new-password> \
  --apply-immediately --region eu-west-1

# 4. Take forensic snapshot
aws rds create-db-snapshot \
  --db-instance-identifier drop-db \
  --db-snapshot-identifier drop-db-incident-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

# 5. Rotate JWT_SECRET (invalidates all sessions)
# Generate: openssl rand -base64 48
# Update in AWS Secrets Manager + redeploy App Runner

~~Post-containment:~~

~~Analyze audit logs — identify scope of breach~~

~~File STR (Suspicious Transaction Report) if financial crime suspected~~

~~Notify Finanstilsynet if user PII compromised (GDPR requirement, 72-hour window)~~

~~User communication if required by GDPR Art. 34~~

~~RTO:~~ ~~Immediate containment (session revocation) / 24–48 hours full investigation~~

7. RTO / RPO Summary

~~loss)~~ ~~snapshot)~~

~~Scenario~~Component	~~RTO~~Automatic Failover	~~RPO~~Mechanism	Failover Time
~~App~~Database ~~Runner restart~~(Multi-AZ)	~~5–10 minutes~~Yes	0RDS ~~(no~~automatic ~~data~~failover	60-120 seconds
~~App~~Load ~~Runner rollback~~balancer	~~15–20 minutes~~Yes	0Health ~~(no~~check ~~data~~→ ~~loss)~~route to healthy targets	< 30 seconds
~~RDS snapshot restore~~CDN	~~30 minutes~~Yes	24Origin ~~hours~~health ~~(daily~~checks	< 60 seconds
~~RDS~~Redis ~~PITR~~(if ~~restore~~clustered)	~~30 minutes~~Yes	5Redis ~~minutes~~Sentinel / ElastiCache	< 30 seconds

Monitoring automatic failover:

Alert fires: MultiAZFailover CloudWatch event or equivalent

On-call notified immediately

No manual action required, but on-call must confirm recovery

5.2 Manual Failover Steps

Prerequisite: Automatic failover has NOT occurred or has failed.

Database Manual Failover (Tier 1)

Confirm primary is unavailable: ping {{DB_PRIMARY_HOST}} — should timeout

Connect to standby: psql {{STANDBY_HOST}}

Promote standby to primary: SELECT pg_promote();

Update DNS record db.{{INTERNAL_DOMAIN}} → {{STANDBY_HOST}}

DNS TTL: Ensure TTL was set to 60s pre-incident (if not, wait {{DNS_TTL}} seconds)

Verify applications are reconnecting: Check application logs for successful DB connections

Page on-call to verify all services healthy

Regional Failover (Catastrophic)

Declare DR event (approval from {{DR_AUTHORITY}})

Confirm primary region {{PRIMARY_REGION}} is unreachable

Activate standby in {{DR_REGION}}: terraform apply -var-file=envs/dr.tfvars

Restore database from latest cross-region snapshot

Update Route 53 / DNS to point to {{DR_REGION}} endpoints

Run smoke tests: bash scripts/smoke-tests.sh {{DR_REGION}}

Notify stakeholders (see Communication Plan)

Monitor enhanced metrics for {{MONITOR_PERIOD}}h

6. Recovery Procedures Per Service

Tier 1 Services

Service	Recovery Procedure	Recovery Script	Est. Time
{{SERVICE_1}}	1. Restore from snapshot 2. Verify config 3. Run smoke tests	`scripts/restore-{{SERVICE_1}}.sh`	{{TIME}}min
Authentication	1. Deploy from last known good image 2. Verify JWT keys 3. Test login flow	`scripts/restore-auth.sh`	{{TIME}}min

Tier 2 Services

Tier 3 Services

7. DR Drill Schedule & Scenarios

~~(logspreserved)~~

Drill Type	Frequency	Participants	Last Executed	Next Scheduled
Tabletop exercise	Quarterly	On-call team + engineering lead	{{DATE}}	{{DATE}}
Database failover test	Quarterly	DevOps + one developer	{{DATE}}	{{DATE}}
Full ~~region~~DR failover ~~(eu-west-1)~~	~~2–4 hours~~Bi-annually	24Entire ~~hours~~engineering team	{{DATE}}	{{DATE}}
~~Security~~Backup ~~incident~~restore ~~containment~~test	~~Immediate (session revocation)~~Monthly	0DevOps	{{DATE}}	{{DATE}}

Drill Scenarios to Cover:

Database primary failure (automatic failover test)

Accidental data deletion (point-in-time restore)

Single AZ outage (multi-AZ failover)

Full region failure (cross-region DR)

Ransomware/data corruption (restore from offline backup)

CDN outage (origin fallback)

Secret store unavailable (cached credentials)

8. ContactsCommunication Plan During DR Event

Internal Communications

Audience	Channel	Frequency	Owner
Engineering team	Slack #incidents + war room call	Real-time	Incident commander
Engineering management	Direct message	At declaration + hourly	Incident commander
Product/Business leadership	Email + Slack	At declaration + hourly	Incident commander
Customer support	Dedicated Slack channel	At declaration + 30 min	Support lead

External Communications

Audience	Channel	Trigger	Message
Customers	Status page ({{STATUS_PAGE}})	Within 15 min of confirmed incident	"We are investigating an issue"
Customers	Status page update	Every 30 min	Progress update
Customers	Email	If impact > {{EMAIL_THRESHOLD}}h	Direct notification
SLA customers	Direct contact	Per SLA contract	As contractually required

Communication templates: See go-live-runbook.md communication section

9. War Room Setup

War Room: {{WAR_ROOM_LINK}} Bridge Line: {{BRIDGE_NUMBER}} Document: Live incident doc created at: {{INCIDENT_DOC_TEMPLATE}}

Roles during DR event:

~~/ +47 40 47 42 51~~#drop-alerts~~support via AWS Console~~

Role	~~Name~~Responsibility	~~Contact~~Primary	Backup
~~Primary~~ Incident ~~Owner~~Commander	~~Alem~~Coordinates ~~Bašić~~response, ~~(CEO)~~final decisions	~~[email protected]~~{{IC}}	{{IC_BACKUP}}
AITechnical ~~Operations~~Lead	~~John~~Leads ~~(AI~~technical ~~Director)~~recovery	~~Slack~~{{TECH_LEAD}}	{{TECH_BACKUP}}
~~AWS~~Communications ~~Support~~Lead	~~AWS~~Internal/external updates	~~Premium~~{{COMMS_LEAD}}	{{COMMS_BACKUP}}
~~Fly.io Support (staging)~~Scribe	~~Fly.io~~Documents timeline, actions taken	~~[email protected]~~
~~Sumsub Support~~{{SCRIBE}}	~~Sumsub~~	~~[email protected]~~
~~BankID Support~~	~~Vipps MobilePay (BankID operator)~~	~~Per contract~~Rotate

9.10. RunbookPost-Recovery MaintenanceVerification Checklist

Review Schedule

Test
Schedule

~~Q2 2026:~~ ~~Full~~
11. DR ~~drill~~Test ~~— App Runner restart + RDS snapshot restore to temp instance~~

~~Quarterly:~~ ~~App Runner rollback test~~

~~Monthly:~~ ~~Verify automated RDS snapshot creation~~

ChangeResults Log

~~Architect(AI)~~

Date	~~Change~~Test Type	~~Author~~Scenario	RTO Achieved	RPO Achieved	Issues Found	Resolved By
~~2026-02-23~~{{DATE}}	~~Initial version from DR runbook + infra analysis~~{{TYPE}}	~~Platform~~{{SCENARIO}}	{{RTO}}	{{RPO}}	{{ISSUES}}	{{RESOLVED}}

~~Deployment Architecture~~

Post-Mortem

Approval

Role	Name	Date
Author	~~Platform Architect (AI)~~	~~2026-02-23~~
Reviewer
Approver	~~Alem Bašić~~

Disaster Recovery Plan

Disaster Recovery Plan

Document History

1. Business Continuity Overview

2. RPO / RTO Targets Per Service Tier

3. Service Tier Classification

4. Backup Strategy

4.1 Database Backups

4.2 File / Object Storage Backups

4.3 Configuration Backups

4.4 Backup Testing Schedule

4. Infrastructure Overview

Production

Database (Production)

Staging

5. Backup Strategy

Production RDS PostgreSQL

ECR Docker Images

Staging (Fly.io)

6. RecoveryFailover Procedures

Scenario5.1 1:Automated App Runner Service DownFailover

Scenario 2: RDS Database Failure

Scenario 3: Data Corruption

Scenario 4: Full Region Outage (eu-west-1)

Scenario 5: Security Incident

7. RTO / RPO Summary

5.2 Manual Failover Steps

Database Manual Failover (Tier 1)

Regional Failover (Catastrophic)

6. Recovery Procedures Per Service

Tier 1 Services

Tier 2 Services

Tier 3 Services

7. DR Drill Schedule & Scenarios

8. ContactsCommunication Plan During DR Event

Internal Communications

External Communications

9. War Room Setup

9.10. RunbookPost-Recovery MaintenanceVerification Checklist

Review Schedule

Test Schedule

11. DR drillTest — App Runner restart + RDS snapshot restore to temp instance

ChangeResults Log

Related Documents

Approval

Test
Schedule