Skip to main content

Operational Runbook

Operational Runbook

Project: Drop Version: 0.1.0 Date: 2026-02-23 Author: Platform Architect (AI) Status: In Review Reviewers: Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 2026-02-23 Platform Architect (AI) Initial draft covering day-to-day Drop operations

1. Overview

This runbook covers day-to-day operations of Drop's production environment. Drop runs on AWS App Runner (eu-west-1) with RDS PostgreSQL.

Primary operations contact: Alem Bašić — [email protected] / +47 40 47 42 51 AI Operations: John (AI Director) — Slack #drop-alerts


2. Quick Reference

Production Infrastructure

Component Identifier
App Runner service arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec
App Runner URL https://9ef3szvvsb.eu-west-1.awsapprunner.com
RDS instance drop-db
RDS endpoint drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com:5432
ECR repository 324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web
Staging https://drop-staging.fly.dev
Status page https://drop-status.betteruptime.com
Slack alerts #drop-ops on alai-talk.slack.com

Quick Health Check

# Application health (production)
curl -s https://getdrop.no/api/health | jq

# App Runner status
aws apprunner describe-service \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --query 'Service.Status' --output text --region eu-west-1

# RDS status
aws rds describe-db-instances \
  --db-instance-identifier drop-db \
  --query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1

# Live App Runner logs
aws logs tail /aws/apprunner/drop-web/8e45b0d335304487a1880f4e32d6aeec/application \
  --follow --region eu-west-1

3. Routine Operations

3.1 Daily Checks

  • BetterStack: all 3 monitors green (health, landing, US east)
  • Slack #drop-ops: no unresolved critical alerts from last 24h
  • App Runner service status: RUNNING
  • RDS snapshot from last night: exists and < 24h old
# Verify last RDS snapshot
aws rds describe-db-snapshots \
  --db-instance-identifier drop-db --region eu-west-1 \
  --query 'DBSnapshots[?SnapshotType==`automated`]|sort_by(@,&SnapshotCreateTime)[-1].{id:DBSnapshotIdentifier,time:SnapshotCreateTime}' \
  --output table

3.2 Weekly Checks

  • Review CloudWatch logs for recurring error patterns
  • Check RDS free storage space (alert if < 2GB)
  • Review AML alerts table for any open cases
  • Review pending KYC applicants (stuck in pending status > 24h)
  • Check ECR — clean up untagged images manually if lifecycle policy hasn't run
# Check RDS storage
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name FreeStorageSpace \
  --dimensions Name=DBInstanceIdentifier,Value=drop-db \
  --start-time $(date -u -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date -u --iso-8601=seconds) \
  --period 3600 \
  --statistics Average \
  --region eu-west-1

# Check pending KYC (connect to RDS first via bastion or VPN)
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "SELECT id, email, kyc_status, created_at FROM users WHERE kyc_status = 'pending' ORDER BY created_at ASC;"

3.3 Monthly Checks

  • Review SLA report (uptime, error rate, p99 latency)
  • Test BetterStack alerts (pause monitor → verify escalation fires → resume)
  • Verify RDS snapshot restore works (restore to temp instance, verify data, delete)
  • Review secret rotation schedule — anything due?
  • Review STR reports table — any pending filings?

4. Deployment Procedure

4.1 Standard Deployment (App Runner)

# 1. Ensure all CI checks pass on main branch
# 2. Build and push new Docker image to ECR
docker build -t drop-app .
docker tag drop-app:latest 324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web:$(git rev-parse --short HEAD)
aws ecr get-login-password --region eu-west-1 | \
  docker login --username AWS --password-stdin 324480209768.dkr.ecr.eu-west-1.amazonaws.com
docker push 324480209768.dkr.ecr.eu-west-1.amazonaws.com/drop-web:$(git rev-parse --short HEAD)

# 3. Create pre-deployment RDS snapshot
aws rds create-db-snapshot \
  --db-instance-identifier drop-db \
  --db-snapshot-identifier drop-db-pre-deploy-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

# 4. Create BetterStack maintenance window (prevents false alerts)
# Go to BetterStack → Maintenance Windows → Create Window (30 min)

# 5. Trigger App Runner deployment
aws apprunner start-deployment \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

# 6. Monitor deployment status
aws apprunner describe-service \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --query 'Service.Status' --output text --region eu-west-1
# Wait for RUNNING

# 7. Verify health
curl -s https://getdrop.no/api/health | jq

# 8. Close BetterStack maintenance window

Typical deployment time: 3–5 minutes

4.2 Staging Deployment (Fly.io)

# Deploy to Fly.io staging
cd src/drop-app
fly deploy --app drop-staging

# Verify staging health
curl -s https://drop-staging.fly.dev/api/health | jq

4.3 Emergency Rollback

# Identify previous ECR image
aws ecr describe-images --repository-name drop-web --region eu-west-1 \
  --query 'sort_by(imageDetails,&imagePushedAt)[-2].imageDigest' --output text

# Update App Runner to use previous image tag via console,
# then trigger deployment:
aws apprunner start-deployment \
  --service-arn arn:aws:apprunner:eu-west-1:324480209768:service/drop-web/8e45b0d335304487a1880f4e32d6aeec \
  --region eu-west-1

5. Secret Rotation

5.1 Rotate JWT_SECRET

Impact: All active user sessions immediately invalidated. All logged-in users are logged out.

# 1. Generate new secret
NEW_SECRET=$(openssl rand -base64 48)

# 2. Update in AWS Secrets Manager
aws secretsmanager update-secret \
  --secret-id drop/production/jwt-secret \
  --secret-string "$NEW_SECRET" \
  --region eu-west-1

# 3. Update App Runner environment variable (via console or CLI)
# Then trigger new deployment

# 4. Log rotation in audit_log
psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com -U dropuser -d dropapp \
  -c "INSERT INTO audit_log (id, action, resource_type, resource_id, details) VALUES (gen_random_uuid(), 'secret_rotated', 'secret', 'JWT_SECRET', '{\"rotated_at\": \"$(date -u --iso-8601=seconds)\"}');"

5.2 Rotate Database Password

# 1. Generate new password
NEW_PASS=$(openssl rand -base64 32)

# 2. Update RDS master password
aws rds modify-db-instance \
  --db-instance-identifier drop-db \
  --master-user-password "$NEW_PASS" \
  --apply-immediately \
  --region eu-west-1

# 3. Update DATABASE_URL in Secrets Manager with new password
# 4. Trigger App Runner redeployment to pick up new DATABASE_URL
# 5. Verify health: curl https://getdrop.no/api/health

6. Database Operations

6.1 Connect to Production Database

Note: RDS must be accessible — either via VPN, bastion host, or AWS Systems Manager Session Manager.

psql -h drop-db.czu2qe4quy4v.eu-west-1.rds.amazonaws.com \
     -U dropuser \
     -d dropapp \
     -c "SELECT 1;"

6.2 User Management Queries

-- Check user KYC status
SELECT id, email, kyc_status, auth_provider, created_at
FROM users WHERE email = '[email protected]';

-- List pending KYC users (> 24h)
SELECT id, email, kyc_status, created_at FROM users
WHERE kyc_status = 'pending'
  AND created_at < NOW() - INTERVAL '24 hours'
ORDER BY created_at ASC;

-- Revoke all sessions for a user (emergency)
UPDATE sessions SET revoked = 1
WHERE user_id = 'usr_...' AND revoked = 0;

-- Soft-delete user (GDPR erasure)
UPDATE users SET deleted_at = NOW() WHERE id = 'usr_...';
UPDATE sessions SET revoked = 1 WHERE user_id = 'usr_...';

6.3 Transaction Queries

-- Recent transactions (last 24h)
SELECT id, type, status, send_amount, send_currency, created_at
FROM transactions
WHERE created_at > NOW() - INTERVAL '24 hours'
ORDER BY created_at DESC LIMIT 50;

-- Failed transactions (may need investigation)
SELECT t.*, u.email FROM transactions t
JOIN users u ON t.user_id = u.id
WHERE t.status = 'failed'
  AND t.created_at > NOW() - INTERVAL '7 days'
ORDER BY t.created_at DESC;

-- AML: large transactions (> NOK 50,000)
SELECT * FROM transactions
WHERE send_amount > 50000
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY send_amount DESC;

6.4 Manual RDS Snapshot

# Create manual snapshot before risky operations
aws rds create-db-snapshot \
  --db-instance-identifier drop-db \
  --db-snapshot-identifier drop-db-manual-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

# Wait for snapshot to complete
aws rds wait db-snapshot-completed \
  --db-snapshot-identifier drop-db-manual-$(date +%Y%m%d-%H%M) \
  --region eu-west-1

7. AML & Compliance Operations

7.1 AML Alert Review

-- View open AML alerts
SELECT a.*, u.email, t.send_amount, t.send_currency
FROM aml_alerts a
JOIN users u ON a.user_id = u.id
LEFT JOIN transactions t ON a.transaction_id = t.id
WHERE a.status = 'open'
ORDER BY a.created_at DESC;

-- Close an AML alert (after review)
UPDATE aml_alerts SET status = 'closed', reviewed_at = NOW(),
  reviewer_notes = 'Reviewed — legitimate transaction'
WHERE id = 'alert_...';

7.2 STR Filing

If financial crime is suspected:

-- File STR
INSERT INTO str_reports (
  id, user_id, transaction_id, report_type, details, filed_at, status
) VALUES (
  gen_random_uuid(), 'usr_...', 'tx_...', 'suspicious_transaction',
  '{"reason": "Unusual pattern", "amount": 50000}',
  NOW(), 'filed'
);

Then contact Finanstilsynet via the official STR filing portal.

7.3 GDPR Requests

Data export request:

-- User data is exported via /api/user/data-export endpoint
-- Check data_access_requests table
SELECT * FROM data_access_requests WHERE user_id = 'usr_...' ORDER BY created_at DESC;

Erasure request:

-- Account deletion (soft delete)
UPDATE users SET deleted_at = NOW() WHERE id = 'usr_...';
UPDATE sessions SET revoked = 1 WHERE user_id = 'usr_...';
-- Note: data retained for 5 years per hvitvaskingsloven

8. Incident Response

8.1 Alert Triage

When a Slack alert fires in #drop-ops:

Alert First Response Escalation
Health check DOWN Run quick health check, check App Runner logs After 5 min: restart App Runner
Error spike Check CloudWatch logs for error pattern After 10 min: escalate
App startup/shutdown Informational — no action unless unexpected N/A

8.2 Common Issues

Issue: Health check returns 503 (DB unreachable)

# 1. Check RDS status
aws rds describe-db-instances --db-instance-identifier drop-db \
  --query 'DBInstances[0].DBInstanceStatus' --output text --region eu-west-1

# 2. If not 'available', wait for AWS to auto-recover or follow DR Scenario 2
# 3. Check connection string in App Runner environment
# 4. Restart App Runner service

Issue: BankID login failing

# Check App Runner logs for BankID errors
aws logs filter-log-events \
  --log-group-name /aws/apprunner/drop-web/8e45b0d335304487a1880f4e32d6aeec/application \
  --filter-pattern "BankID" --region eu-west-1

# Verify BankID environment variables are set
# Check BankID status: https://driftsstatus.vippsmobilepay.com/

Issue: KYC verification stuck in pending

# Check Sumsub dashboard for stuck applicants
# Or query:
psql -c "SELECT id, email, kyc_status FROM users WHERE kyc_status='pending' AND created_at < NOW()-INTERVAL '2 hours';"
# Force-process via Sumsub dashboard or API call

9. Monitoring Verification Commands

# 1. Full health check
curl -s https://getdrop.no/api/health | python3 -m json.tool

# 2. Database latency check
curl -s https://getdrop.no/api/health | jq '.data.checks.db.latencyMs'
# Alert if > 100ms

# 3. Check app version
curl -s https://getdrop.no/api/health | jq '.data.version'

# 4. Check uptime
curl -s https://getdrop.no/api/health | jq '.data.uptime'


Approval

Role Name Date Signature
Author Platform Architect (AI) 2026-02-23
Reviewer
Approver Alem Bašić