SLA Report
SLA Report
Project: Drop Version: 0.1.0 Date: 2026-02-23 Author: Platform Architect (AI) Status: In Review Reviewers: Alem Bašić (CEO)
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-02-23 | Platform Architect (AI) | Initial SLA targets and example monthly report (pre-launch) |
1. Overview
This document defines Drop's Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs). It also includes an example monthly SLA report based on simulated post-launch data. Use as a template for monthly operational reporting.
Reporting Period: February 2026 (example — pre-launch) Environment: Production (AWS App Runner eu-west-1 + RDS PostgreSQL) Report Author: Alem Bašić Distribution: Internal (Alem Bašić, Platform Architect)
2. SLA Definitions
2.1 Service Level Targets
| Metric | SLO Target | SLA Commitment | Measurement Window |
|---|---|---|---|
| Availability | 99.95% | 99.9% | Monthly rolling |
| API Response Time (p50) | < 200ms | < 500ms | Monthly rolling |
| API Response Time (p99) | < 1,000ms | < 2,000ms | Monthly rolling |
| Error Rate | < 0.1% | < 1% | Monthly rolling |
| BankID Login Success Rate | > 99.5% | > 99% | Monthly rolling |
| KYC Initiation Success Rate | > 99% | > 98% | Monthly rolling |
| Payment Initiation Success Rate | > 99% | > 98% | Monthly rolling |
| Health Check Recovery Time (RTO) | < 10 min | < 30 min | Per incident |
2.2 Availability Budget
| SLA Target | Monthly Allowed Downtime | Weekly Allowed Downtime |
|---|---|---|
| 99.9% | 43.8 minutes/month | 10.1 minutes/week |
| 99.95% | 21.9 minutes/month | 5.0 minutes/week |
| 99.99% | 4.4 minutes/month | 1.0 minutes/week |
Current infrastructure capability: 99.9% (SLA commitment). 99.95% is the SLO target once PgBouncer and multi-AZ standby are in place.
2.3 Exclusions
The following are excluded from SLA calculations:
- Scheduled maintenance windows (posted to BetterStack status page > 24h in advance)
- BankID OIDC provider outages (upstream at Vipps MobilePay)
- Open Banking provider outages (upstream — provider TBD)
- Force majeure events (AWS region outage, etc.)
- Sumsub KYC service outages (upstream)
3. Service Level Indicators (SLIs)
3.1 Availability
Definition: Percentage of minutes in the reporting period where GET /api/health returns HTTP 200 with {"status":"ok"}.
Measurement: BetterStack Drop Health Check monitor (1-minute check interval, 3 global locations).
Formula: (total_minutes - downtime_minutes) / total_minutes × 100
3.2 Response Time
Definition: p50 and p99 response time for all API requests.
Measurement: CloudWatch App Runner request metrics (future: add APM tool).
Alert threshold: p99 > 1,000ms triggers Slack #drop-ops alert.
3.3 Error Rate
Definition: Percentage of HTTP requests returning 5xx status codes.
Measurement: Slack alert fires when > 5 errors occur in 60 seconds (from src/lib/alerts.ts).
Formula: 5xx_requests / total_requests × 100
3.4 BankID Login Success Rate
Definition: Percentage of BankID login initiations that result in a successful session creation.
Measurement: Application audit log (audit_log table: session_created / bankid_initiate).
3.5 Transaction Success Rate
Definition: Percentage of initiated transactions (remittance + QR) that reach completed status.
Measurement: Database query: COUNT(*) WHERE status='completed' / COUNT(*) WHERE status IN ('completed','failed').
4. Monthly SLA Report — February 2026 (Example)
NOTE: This is a simulated example report for template purposes. Actual metrics will be populated from BetterStack, CloudWatch, and database queries after public launch.
4.1 Availability
| Metric | Target | Actual | Status |
|---|---|---|---|
| Uptime | 99.9% | 99.94% | PASS |
| Total downtime | < 43.8 min | 28 min (1 incident) | PASS |
| Incidents | N/A | 1 (INC-2026-001) | — |
| Maintenance windows | N/A | 0 | — |
Uptime chart (example):
Week 1 (Feb 03–09): ████████████████████ 100.00%
Week 2 (Feb 10–16): ████████████████████ 100.00%
Week 3 (Feb 17–23): ███████████████████▌ 99.72% (INC-2026-001: 28 min)
Week 4 (Feb 24–28): ████████████████████ 100.00%
─────────────────────────────────────────────────
Monthly: ████████████████████ 99.94%
4.2 Performance
| Metric | Target | Actual | Status |
|---|---|---|---|
| p50 response time | < 200ms | ~85ms | PASS |
| p99 response time | < 1,000ms | ~420ms | PASS |
| p99 during incident | N/A | Timeout (excluded) | N/A |
Slow endpoints (example):
| Endpoint | p99 Latency | Notes |
|---|---|---|
POST /api/transactions |
~380ms | Sumsub KYC check adds latency |
GET /api/bank-accounts/balance |
~350ms | Open Banking API round trip |
POST /api/auth/bankid/callback |
~180ms | DB session write |
GET /api/health |
~45ms | DB ping |
4.3 Error Rate
| Metric | Target | Actual | Status |
|---|---|---|---|
| Overall error rate | < 1% | 0.06% | PASS |
| Error rate (excl. incident) | < 0.1% | 0.03% | PASS |
| 5xx errors | < 1% | 0.06% | PASS |
| 4xx errors | N/A | 2.1% | INFO (mostly 401 auth failures) |
Top 5xx errors (example):
| Error | Count | Root Cause |
|---|---|---|
| 503 during INC-2026-001 | ~240 | DB connection pool exhaustion |
| 500 Sumsub timeout | 3 | Sumsub API latency spike |
| 503 cold start | 1 | App Runner scale-to-zero (if applicable) |
4.4 BankID Authentication
| Metric | Target | Actual | Status |
|---|---|---|---|
| Login success rate | > 99% | 99.7% | PASS |
| Login failures | — | 0.3% | INFO |
| CSRF rejections | — | 0 | — |
| Age verification failures | — | 2 | INFO |
Note: Login failures during INC-2026-001 excluded from calculation (upstream service impacted by pool exhaustion).
4.5 KYC (Sumsub)
| Metric | Target | Actual | Status |
|---|---|---|---|
| KYC initiation success rate | > 98% | 99.2% | PASS |
| GREEN (approved) rate | N/A | 91% | INFO |
| RED (rejected) rate | N/A | 6% | INFO |
| RETRY rate | N/A | 3% | INFO |
| Average review time | N/A | ~2h | INFO |
4.6 Transactions
| Metric | Target | Actual | Status |
|---|---|---|---|
| Transaction success rate | > 98% | TBD — Open Banking provider not yet live | TBD |
| Remittance completion | > 98% | TBD | TBD |
| QR payment completion | > 98% | TBD | TBD |
5. Incident Summary
| Incident ID | Severity | Start | Duration | Impact | Root Cause | Status |
|---|---|---|---|---|---|---|
| INC-2026-001 | P1 | 2026-02-20 10:30 UTC | 28 min | 100% users | RDS connection pool exhaustion | Closed |
SLA credit impact: 28 minutes downtime. Monthly uptime = 99.94%. SLA (99.9%) met. No SLA credit applicable.
6. Infrastructure Health
6.1 App Runner
| Metric | Value | Notes |
|---|---|---|
| Service status | RUNNING | — |
| Deployments this month | 3 | Bug fix, security patch, config update |
| Deployment failures | 0 | — |
| Average deployment time | ~4 min | — |
6.2 RDS PostgreSQL
| Metric | Value | Threshold | Status |
|---|---|---|---|
| DB instance status | available | — | PASS |
| Free storage space | ~18 GB | Alert if < 2 GB | PASS |
| Backup retention | 7 days | 7 days minimum | PASS |
| Last automated snapshot | < 24h ago | < 24h | PASS |
| PITR enabled | Yes | Required | PASS |
| Max connections (month) | ~45 (peak) | Alert at 70 | PASS (alarm not yet configured) |
6.3 BetterStack Monitors
| Monitor | Checks | Passed | Failed | Uptime |
|---|---|---|---|---|
| Drop Health Check | ~40,320 | ~40,292 | ~28 (during incident) | 99.93% |
| Drop Landing Page | ~40,320 | ~40,320 | 0 | 100.00% |
| US East Health | ~40,320 | ~40,292 | ~28 (during incident) | 99.93% |
7. Security & Compliance
| Check | Status | Notes |
|---|---|---|
| No PII data exposure incidents | PASS | — |
| Audit log continuity | PASS | All events logged |
| Secret rotation (JWT_SECRET) | Not yet due | Rotation scheduled Q2 2026 |
| Secret rotation (DB password) | Not yet due | Rotation scheduled Q2 2026 |
| AML alerts reviewed | N/A | No open cases |
| Pending KYC > 24h | 0 | — |
| GDPR requests | 0 | None received |
8. SLA Trending
| Month | Uptime | p99 Latency | Error Rate | Incidents |
|---|---|---|---|---|
| Feb 2026 | 99.94% | ~420ms | 0.06% | 1 (P1) |
| Mar 2026 | TBD | TBD | TBD | — |
| Apr 2026 | TBD | TBD | TBD | — |
9. Action Items from This Report
| # | Action | Owner | Due |
|---|---|---|---|
| 1 | Configure CloudWatch alarm on DatabaseConnections > 70 | Alem | Within 1 week |
| 2 | Implement PgBouncer / RDS Proxy before public launch | Platform | Before v1.0 |
| 3 | Add p99 latency CloudWatch alarm (> 1,000ms → Slack alert) | Platform | Before v1.0 |
| 4 | Review 4xx error volume — categorize auth failures vs user errors | Alem | Within 2 weeks |
10. SLA Report Distribution & Cadence
| Audience | Frequency | Format |
|---|---|---|
| Alem Bašić (CEO) | Monthly | This document |
| Platform Architect | Monthly | This document |
| Finanstilsynet (if required) | Annually | Aggregated uptime report |
Report generation date: Last business day of each month. Data sources: BetterStack dashboard + CloudWatch + audit_log queries.
Related Documents
- Monitoring & Observability
- Incident Report INC-2026-001
- Post-Mortem INC-2026-001
- Operational Runbook
- Disaster Recovery Plan
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Platform Architect (AI) | 2026-02-23 | |
| Reviewer | |||
| Approver | Alem Bašić |