P0: Implementation Checklist

P0 Implementation Checklist — Drop Support Systems

Date: 2026-02-22 Status: Ready for Implementation Total Effort: ~21 hours (2-3 days) Owner: John (AI Director)

Overview

This checklist tracks the 6 production-blocking (P0) items that must be completed before Drop can launch to production. Each item addresses a critical gap in monitoring, compliance, or incident response.

P0 Items

1. Server-Side Error Tracking ⏱️ 2 hours (revised)

Problem: ~~All server errors are invisible after Sentry removed~~ CORRECTED: sentry-server.ts already exists with lightweight Envelope API (no @sentry/node dep, Turbopack compatible). However, only 5/25+ routes have captureServerError integrated.

Status: 🟡 Partially Complete (library done, coverage gaps)

Tasks:

~~Research Sentry Edge SDK compatibility~~ Already solved: custom Envelope API
~~Install and configure~~ src/lib/sentry-server.ts already complete
~~Update sentry-server.ts~~ Already has captureServerError + captureServerMessage
Expand captureServerError to ALL API routes (currently only 5 routes)
Test: Trigger 500 error in expanded routes, verify Sentry event
Configure source maps upload (optional but recommended)

Deliverables:

✅ src/lib/sentry-server.ts (already complete — Envelope API, no SDK dep)
✅ Integrated in: bankid, bankid/callback, qr-payment, remittance, health
🔨 Expanding to: all remaining API routes (~20 routes)

Acceptance Criteria:

ALL API routes have captureServerError in catch blocks
Error includes context tags (endpoint name, userId)

2. Audit Logging System ⏱️ 0 hours (ALREADY COMPLETE)

Problem: ~~PSD2 requires immutable audit trail~~ CORRECTED: Audit logging is FULLY IMPLEMENTED.

Status: ✅ Complete

What exists:

src/lib/audit.ts — Full audit library with 30+ action types, logAudit(), getAuditLog(), countAuditEntries()
audit_log table in DB schema (initial migration + db.ts fallback)
Indexes on user_id, timestamp, action
5-year retention documented (data-retention.ts explicitly excludes audit_log from cleanup)
Fire-and-forget pattern (doesn't block user actions)
Integrated in 20+ API routes: auth, transactions, cards, recipients, settings, consents, complaints, user management, GDPR endpoints
Admin audit export: /api/admin/audit/ endpoint exists
GDPR data export: /api/user/data-export/ includes audit log
Structured logger also captures audit events (stdout for CloudWatch)

No action needed. This was incorrectly flagged as missing in the initial analysis.

3. WAF Deployment ⏱️ 2 hours

Problem: WAF rules defined but not enforced (requires reverse proxy).

Status: ⬜ Not Started

Tasks:

Deliverables:

✅ infrastructure/cloudflare-waf-setup.md (to be created)
⬜ Cloudflare WAF configured
⬜ Test results documented

Acceptance Criteria:

SQLi attacks blocked with 403
XSS attacks blocked with 403
Legitimate requests pass through
WAF logs visible in Cloudflare dashboard

4. Log Aggregation & Retention ⏱️ 2 hours

Problem: Structured logs write to stdout but aren't retained or searchable.

Status: ⬜ Not Started

Tasks:

Deliverables:

✅ infrastructure/cloudwatch-logs-setup.md (created)
⬜ CloudWatch retention policies set
⬜ Log Insights queries saved
⬜ CloudWatch alarms active

Acceptance Criteria:

Logs retained for 30 days (production)
Log Insights queries return results in <5 seconds
Error spike triggers Slack alert within 2 minutes
Service downtime triggers alert within 5 minutes

5. External Uptime Monitoring ⏱️ 1 hour

Problem: BetterStack documented but not deployed.

Status: ⬜ Not Started

Tasks:

Deliverables:

✅ docs/infrastructure/BETTERSTACK-SETUP.md (already exists)
⬜ BetterStack account with monitors active
⬜ Slack integration tested

Acceptance Criteria:

Health endpoint monitored every 3 minutes
Downtime alert received in <5 minutes
Alert includes endpoint URL and status
Status page shows current uptime %

6. Payment/Banking Failure Runbooks ⏱️ 4 hours

Problem: DR runbook covers infrastructure but not fintech-specific failures.

Status: ✅ Partially Complete

Tasks:

BankID integration failure runbook
PISP payment failure runbook (remittance + QR)
AISP balance retrieval failure runbook
Swan API outage runbook
Sumsub KYC failure runbook
Neonomics open banking outage runbook
Test each runbook in staging (simulate failure)
Update docs/dr-runbook.md to reference new runbooks

Deliverables:

✅ support/runbooks/bankid-failure.md (created)
✅ support/runbooks/pisp-payment-failure.md (created)
⬜ support/runbooks/aisp-balance-failure.md
⬜ support/runbooks/swan-api-outage.md
⬜ support/runbooks/sumsub-kyc-failure.md
⬜ support/runbooks/neonomics-outage.md

Acceptance Criteria:

Each runbook includes: symptoms, diagnosis, solutions, escalation
Runbooks tested (manual simulation in staging)
Team trained on runbook usage
Runbooks linked from main DR runbook

Progress Tracking

Completion Status

Item	Status	Progress	Blocker
1. Server-side error tracking	🟡 Expanding	80% (lib done, expanding to all routes)	None
2. Audit logging	✅ COMPLETE	100% (was already built)	None
3. WAF deployment	🟡 Ready	90% (Terraform written, needs apply)	`terraform apply`
4. Log aggregation	🔨 Building	50% (CloudWatch alarms being added)	None
5. External monitoring	⬜ Not Started	0%	BetterStack account signup
6. Runbooks	🔨 Building	33% → 100% (4 remaining being written)	None

Overall Progress: ~70% (revised — audit logging was already 100%)

Priority Order

Week 1 (High Impact, Low Effort):

✅ External monitoring (1h) — Immediate visibility into outages
✅ CloudWatch retention (30min) — Logs already flowing, just set policy
⬜ CloudWatch alarms (1.5h) — Automated alerting

Week 2 (Critical Compliance): 4. ⬜ Audit logging schema (2h) — Create table and library 5. ⬜ Audit logging integration (6h) — Wire into endpoints

Week 3 (Security & Error Tracking): 6. ⬜ Server-side error tracking (4h) — Sentry edge setup 7. ⬜ WAF deployment (2h) — Security hardening

Week 4 (Runbooks): 8. ⬜ Remaining runbooks (2h) — AISP, Swan, Sumsub, Neonomics

Dependencies

External Dependencies

BetterStack account signup (5 min, no approval needed)
Sentry organization/project (existing, or create new)
Cloudflare account (existing for DNS, WAF is free tier)

Internal Dependencies

Alem approval for:
- Audit log schema changes
- CloudWatch cost ($17/month estimate)
- BetterStack Pro upgrade (optional, $20/month for 30s interval)

Blocked Items

Some runbooks require Phase 2 context (real banking integrations)
- Can document procedures but can't fully test without live APIs
- Mark as "draft" until Phase 2

Testing Plan

Test 1: Error Tracking

# Trigger server error
curl -X POST http://localhost:3000/api/test/error \
  -H "Content-Type: application/json" \
  -d '{"trigger":"server_error"}'

# Verify in Sentry:
# - Event appears within 30s
# - Stack trace includes source file/line
# - User context present (if logged in)

Test 2: Audit Logging

# Perform audit-worthy action
curl -X POST http://localhost:3000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","password":"wrong"}'

# Check database (PostgreSQL 16):
psql "$DATABASE_URL" -c "SELECT * FROM audit_log ORDER BY timestamp DESC LIMIT 1;"

# Expected:
# audit_xxx|2026-02-22T10:00:00Z|usr_123|login_failure|...|1.2.3.4|Mozilla...

Test 3: WAF

# Test SQLi blocking
curl "https://getdrop.no/api/test?id=1' OR '1'='1" -v

# Expected: HTTP 403 Forbidden

# Test legitimate request
curl "https://getdrop.no/api/health" -v

# Expected: HTTP 200 OK

Test 4: CloudWatch Alarms

# Trigger error spike (loop 15 errors)
for i in {1..15}; do
  curl http://localhost:3000/api/test/error
  sleep 2
done

# Expected:
# - CloudWatch alarm fires after 2 minutes (2 x 1min periods)
# - Slack alert received in #drop-ops
# - Email sent to [email protected]

Test 5: BetterStack

# Stop app
docker stop drop-app

# Wait 3-5 minutes

# Expected:
# - BetterStack detects downtime
# - Slack alert in #drop-ops
# - Email to [email protected]

# Restart app
docker start drop-app

# Expected:
# - BetterStack detects recovery
# - "UP" notification sent

Rollout Plan

Phase 1: Non-Intrusive (Day 1)

External monitoring (BetterStack)
CloudWatch retention policies
CloudWatch alarms (passive, alerts only)

Risk: None. These are read-only additions.

Phase 2: Database Changes (Day 2)

Audit log schema migration
Audit log library (no integrations yet)

Risk: Low. New table, no app changes. Test migration in dev first.

Phase 3: Code Integration (Day 3-4)

Audit logging in auth endpoints
Server-side error tracking (Sentry edge)
WAF deployment

Risk: Medium. Requires code changes + deployment. Deploy to staging first, test 24h, then production.

Phase 4: Runbooks (Day 5)

Complete remaining runbooks
Team training session
Runbook testing in staging

Risk: None. Documentation only, no production changes.

Success Metrics

After P0 completion, we should achieve:

✅ 100% server errors visible (Sentry events)
✅ 100% audit events logged (auth, admin, data access)
✅ >99.9% uptime detection (BetterStack)
✅ <5 min MTTD (mean time to detect incidents)
✅ <15 min MTTR (mean time to recover, using runbooks)
✅ 0 security vulnerabilities from WAF bypass

Approvals

Required Approvals

Alem: Audit log schema changes
Alem: CloudWatch cost ($17/month)
Alem: BetterStack account (free tier OK? or Pro $20/month?)

Sign-Off

John (AI Director): Technical implementation complete
Alem (CEO): Business approval for costs + rollout
Validator (QA): Testing complete, acceptance criteria met

Next Steps

Review this analysis with Alem
Get approvals for costs and schema changes
Create Mission Control tasks for each P0 item
Begin implementation (priority order above)
Test thoroughly in staging before production
Document completion in this checklist

support/SUPPORT-SYSTEMS-ANALYSIS.md — Full analysis (all P0/P1/P2 items)
support/audit-logging-setup.md — Audit logging implementation guide
support/runbooks/bankid-failure.md — BankID failure recovery
support/runbooks/pisp-payment-failure.md — Payment failure recovery
infrastructure/cloudwatch-logs-setup.md — Log aggregation setup
infrastructure/waf-rules.md — WAF rule definitions

Status: Ready for approval and implementation Next Review: After P0 completion (before Phase 2 launch)

P0: Implementation Checklist

Support Overview

Support Systems Analysis

Audit Logging Setup

Runbook: AISP Balance Failure

Runbook: BankID Failure

Runbook: PISP Payment Failure

Runbook: Sumsub KYC Failure

Runbook: Swan API Outage

ALAI Infrastructure — Service Catalog & Runbooks

P0: Implementation Checklist

P0 Implementation Checklist — Drop Support Systems

Overview

P0 Items

1. Server-Side Error Tracking ⏱️ 2 hours (revised)

2. Audit Logging System ⏱️ 0 hours (ALREADY COMPLETE)

3. WAF Deployment ⏱️ 2 hours

4. Log Aggregation & Retention ⏱️ 2 hours

5. External Uptime Monitoring ⏱️ 1 hour

6. Payment/Banking Failure Runbooks ⏱️ 4 hours

Progress Tracking

Completion Status

Priority Order

Dependencies

External Dependencies

Internal Dependencies

Blocked Items

Testing Plan

Test 1: Error Tracking

Test 2: Audit Logging

Test 3: WAF

Test 4: CloudWatch Alarms

Test 5: BetterStack

Rollout Plan

Phase 1: Non-Intrusive (Day 1)

Phase 2: Database Changes (Day 2)

Phase 3: Code Integration (Day 3-4)

Phase 4: Runbooks (Day 5)

Success Metrics

Approvals

Required Approvals

Sign-Off

Next Steps

Related Documents