Skip to main content

SLA Report

SLA Report

Project: Drop{{PROJECT_NAME}} Version: 0.1.0{{VERSION}} Date: 2026-02-23{{DATE}} Author: Platform Architect (AI){{AUTHOR}} Status: Draft | In Review | Approved Reviewers: Alem Bašić (CEO){{REVIEWERS}}

Document History

Version Date Author Changes
0.1 2026-02-23{{DATE}} Platform Architect (AI){{AUTHOR}} Initial SLA targets and example monthly report (pre-launch)draft

1. OverviewReporting Period

This

documentdefinesDrop'sServiceLevelAgreements(SLAs),ServiceLevel(SLOs),andServiceLevelItalsoincludesanSLAreportbased on simulated post-launch data. Use as a template for monthly operational reporting.

Reporting Period: February 2026 (example — pre-launch) Environment: Production (AWS App Runner eu-west-1 + RDS PostgreSQL)

AlemBašić Internal(AlemBašić,Platform Architect)

Field Value
Period {{MONTH}} Objectives{{YEAR}}
From {{START_DATE}} Indicators00:00:00 (SLIs).UTC
To {{END_DATE}} example23:59:59 monthlyUTC
Report Author:Generated {{REPORT_DATE}}
Distribution:Generated By {{AUTHOR}}

2. SLA DefinitionsSummary Table

2.1 Service Level Targets

Metric SLO TargetSLA CommitmentMeasurement Window
Availability99.95%99.9%Monthly rolling
API Response Time (p50)< 200ms< 500msMonthly rolling
API Response Time (p99)< 1,000ms< 2,000msMonthly rolling
Error Rate< 0.1%< 1%Monthly rolling
BankID Login Success Rate> 99.5%> 99%Monthly rolling
KYC Initiation Success Rate> 99%> 98%Monthly rolling
Payment Initiation Success Rate> 99%> 98%Monthly rolling
Health Check Recovery Time (RTO)< 10 min< 30 minPer incident

2.2 Availability Budget

SLA Target Monthly Allowed DowntimeWeekly Allowed Downtime
99.9%43.8 minutes/month10.1 minutes/week
99.95%21.9 minutes/month5.0 minutes/week
99.99%4.4 minutes/month1.0 minutes/week

Current infrastructure capability: 99.9% (SLA commitment). 99.95% is the SLO target once PgBouncer and multi-AZ standby are in place.

2.3 Exclusions

The following are excluded from SLA calculations:

  • Scheduled maintenance windows (posted to BetterStack status page > 24h in advance)
  • BankID OIDC provider outages (upstream at Vipps MobilePay)
  • Open Banking provider outages (upstream — provider TBD)
  • Force majeure events (AWS region outage, etc.)
  • Sumsub KYC service outages (upstream)

3. Service Level Indicators (SLIs)

3.1 Availability

Definition: Percentage of minutes in the reporting period where GET /api/health returns HTTP 200 with {"status":"ok"}.

Measurement: BetterStack Drop Health Check monitor (1-minute check interval, 3 global locations).

Formula: (total_minutes - downtime_minutes) / total_minutes × 100

3.2 Response Time

Definition: p50 and p99 response time for all API requests.

Measurement: CloudWatch App Runner request metrics (future: add APM tool).

Alert threshold: p99 > 1,000ms triggers Slack #drop-ops alert.

3.3 Error Rate

Definition: Percentage of HTTP requests returning 5xx status codes.

Measurement: Slack alert fires when > 5 errors occur in 60 seconds (from src/lib/alerts.ts).

Formula: 5xx_requests / total_requests × 100

3.4 BankID Login Success Rate

Definition: Percentage of BankID login initiations that result in a successful session creation.

Measurement: Application audit log (audit_log table: session_created / bankid_initiate).

3.5 Transaction Success Rate

Definition: Percentage of initiated transactions (remittance + QR) that reach completed status.

Measurement: Database query: COUNT(*) WHERE status='completed' / COUNT(*) WHERE status IN ('completed','failed').


4. Monthly SLA Report — February 2026 (Example)

NOTE: This is a simulated example report for template purposes. Actual metrics will be populated from BetterStack, CloudWatch, and database queries after public launch.

4.1 Availability

MetricTargetActualStatus
Uptime99.9%99.94%PASS
Total downtime< 43.8 min28 min (1 incident)PASS
IncidentsN/A1 (INC-2026-001)
Maintenance windowsN/A0

Uptime chart (example):

Week 1 (Feb 03–09):  ████████████████████ 100.00%
Week 2 (Feb 10–16):  ████████████████████ 100.00%
Week 3 (Feb 17–23):  ███████████████████▌  99.72% (INC-2026-001: 28 min)
Week 4 (Feb 24–28):  ████████████████████ 100.00%
─────────────────────────────────────────────────
Monthly:              ████████████████████  99.94%

4.2 Performance

MetricTargetActualStatus
p50 response time< 200ms~85msPASS
p99 response time< 1,000ms~420msPASS
p99 during incidentN/ATimeout (excluded)N/A

Slow endpoints (example):

Endpointp99 LatencyNotes
POST /api/transactions~380msSumsub KYC check adds latency
GET /api/bank-accounts/balance~350msOpen Banking API round trip
POST /api/auth/bankid/callback~180msDB session write
GET /api/health~45msDB ping

4.3 Error Rate

MetricTargetActualStatus
Overall error rate< 1%0.06%PASS
Error rate (excl. incident)< 0.1%0.03%PASS
5xx errors< 1%0.06%PASS
4xx errorsN/A2.1%INFO (mostly 401 auth failures)

Top 5xx errors (example):

ErrorCountRoot Cause
503 during INC-2026-001~240DB connection pool exhaustion
500 Sumsub timeout3Sumsub API latency spike
503 cold start1App Runner scale-to-zero (if applicable)

4.4 BankID Authentication

MetricTargetActualStatus
Login success rate> 99%99.7%PASS
Login failures0.3%INFO
CSRF rejections0
Age verification failures2INFO

Note: Login failures during INC-2026-001 excluded from calculation (upstream service impacted by pool exhaustion).

4.5 KYC (Sumsub)

MetricTargetActualStatus
KYC initiation success rate> 98%99.2%PASS
GREEN (approved) rateN/A91%INFO
RED (rejected) rateN/A6%INFO
RETRY rateN/A3%INFO
Average review timeN/A~2hINFO

4.6 Transactions

MetricTargetActualStatus
Transaction success rate> 98%TBD — Open Banking provider not yet liveTBD
Remittance completion> 98%TBDTBD
QR payment completion> 98%TBDTBD

5. Incident Summary

Incident IDSeverityStartDurationImpactRoot CauseStatus
INC-2026-001P12026-02-20 10:30 UTC28 min100% usersRDS connection pool exhaustionClosed

SLA credit impact: 28 minutes downtime. Monthly uptime = 99.94%. SLA (99.9%) met. No SLA credit applicable.


6. Infrastructure Health

6.1 App Runner

MetricValueNotes
Service statusRUNNING
Deployments this month3Bug fix, security patch, config update
Deployment failures0
Average deployment time~4 min

6.2 RDS PostgreSQL

MetricValueThresholdStatus
DB instance statusavailablePASS
Free storage space~18 GBAlert if < 2 GBPASS
Backup retention7 days7 days minimumPASS
Last automated snapshot< 24h ago< 24hPASS
PITR enabledYesRequiredPASS
Max connections (month)~45 (peak)Alert at 70PASS (alarm not yet configured)

6.3 BetterStack Monitors

MonitorChecksPassedFailedUptime
Drop Health Check~40,320~40,292~28 (during incident)99.93%
Drop Landing Page~40,320~40,3200100.00%
US East Health~40,320~40,292~28 (during incident)99.93%

7. Security & Compliance

events scheduled scheduled open received
Check Status Notes
NoAvailability PII data exposure incidents(uptime) PASS≥ {{AVAIL_SLA}}% {{AVAIL_ACTUAL}}%✅ Pass / ❌ Breach
AuditP95 logResponse continuityTime PASS≤ {{P95_SLA}}ms All{{P95_ACTUAL}}ms loggedPass / ❌ Breach
SecretP99 rotationResponse (JWT_SECRET)Time Not yet due{{P99_SLA}}ms Rotation{{P99_ACTUAL}}ms Q2Pass 2026/ ❌ Breach
SecretError rotation (DB password)Rate Not yet due{{ERR_SLA}}% Rotation{{ERR_ACTUAL}}% Q2Pass 2026/ ❌ Breach
AMLMTTR alerts(P1 reviewedincidents) N/A≤ {{MTTR_SLA}} No{{MTTR_ACTUAL}} casesPass / ❌ Breach
PendingMTTD KYC(alert > 24hdetection) 0≤ {{MTTD_SLA}} {{MTTD_ACTUAL}}✅ Pass / ❌ Breach
GDPRScheduled requestsmaintenance 0≤ {{MAINT_SLA}}h/mo None{{MAINT_ACTUAL}}h ✅ Pass / ❌ Breach

Overall SLA compliance this period: {{OVERALL_STATUS}}


3. Availability Report

3.1 Uptime Percentage

ServiceTotal MinutesDowntime MinutesUptime MinutesUptime %
{{SERVICE_1}}{{TOTAL_MIN}}{{DOWN_MIN}}{{UP_MIN}}{{UP_PCT}}%
{{SERVICE_2}}{{TOTAL_MIN}}{{DOWN_MIN}}{{UP_MIN}}{{UP_PCT}}%
Aggregate{{AGG_UPTIME}}%

Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.

3.2 Downtime Incidents

Incident IDStartEndDurationServiceCauseSLA Counted
INC-{{ID}}{{START}}{{END}}{{DURATION}}min{{SERVICE}}{{CAUSE}}Yes / Excluded

Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes

3.3 Maintenance Windows

DateDurationService AffectedPre-announcedPurpose
{{DATE}}{{DURATION}}min{{SERVICE}}Yes ({{DAYS}} days advance notice){{PURPOSE}}

8.4. Performance Report

4.1 Response Time

Service / EndpointP50P90P95P99MaxSLA (P95)Status
Overall{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌
GET /{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌
POST /api/{{RESOURCE}}{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌

4.2 Throughput

ServiceAvg Requests/secPeak Requests/secPeak Time
{{SERVICE_1}}{{AVG_RPS}}{{PEAK_RPS}}{{PEAK_TIME}}

Total requests served this period: {{TOTAL_REQUESTS}}

4.3 Error Rate

ServiceTotal Requests4xx Errors5xx ErrorsError RateSLAStatus
{{SERVICE_1}}{{TOTAL}}{{4XX}}{{5XX}}{{ERR_RATE}}%≤ {{ERR_SLA}}%✅ / ❌

5. Incident Summary

5.1 Incidents by Severity

SeverityCountTotal DurationAvg MTTR
P1 (Critical){{P1_COUNT}}{{P1_DURATION}}{{P1_MTTR}}
P2 (High){{P2_COUNT}}{{P2_DURATION}}{{P2_MTTR}}
P3 (Medium){{P3_COUNT}}{{P3_DURATION}}{{P3_MTTR}}
P4 (Low){{P4_COUNT}}{{P4_DURATION}}{{P4_MTTR}}
Total{{TOTAL_COUNT}}{{TOTAL_DURATION}}{{AVG_MTTR}}

5.2 MTTR (Mean Time to Resolve)

SeveritySLA TargetThis PeriodLast PeriodTrend
P1≤ {{P1_MTTR_SLA}}{{P1_MTTR_ACT}}{{P1_MTTR_PREV}}↑ / ↓ / →
P2≤ {{P2_MTTR_SLA}}{{P2_MTTR_ACT}}{{P2_MTTR_PREV}}↑ / ↓ / →

5.3 MTTD (Mean Time to Detect)

PeriodMTTDvs SLATrend
This period{{MTTD_ACT}}{{MTTD_STATUS}}↑ / ↓ / →
Last period{{MTTD_PREV}}

6. SLA TrendingBreach Analysis

{{#if SLA_BREACH}}

Breach Details

Breach #MetricSLAActualDurationCustomers Affected
1{{METRIC}}{{SLA_TARGET}}{{ACTUAL}}{{BREACH_DURATION}}{{CUSTOMERS}}

Root Cause

{{BREACH_ROOT_CAUSE}}

Remediation

{{BREACH_REMEDIATION}}

Contractual Obligations

CustomerContract ReferenceCredit DueNotification RequiredNotification Sent
{{CUSTOMER}}{{CONTRACT_REF}}${{CREDIT}}Yes{{DATE}}

{{else}}

No SLA breaches this period. All commitments met.

{{/if}}


7. Trend Analysis

Availability Trend (Last 6 Months)

Month Uptime % p99vs LatencyError RateTarget Incidents
Feb 2026{{MONTH_6}} 99.94%{{PCT}}% ~420ms{{STATUS}} 0.06%1 (P1){{COUNT}}
Mar 2026{{MONTH_5}} TBD{{PCT}}% TBD{{STATUS}} TBD{{COUNT}}
Apr 2026{{MONTH_4}} TBD{{PCT}}% TBD{{STATUS}} TBD{{COUNT}}
{{MONTH_3}} {{PCT}}%{{STATUS}}{{COUNT}}
{{MONTH_2}}{{PCT}}%{{STATUS}}{{COUNT}}
{{MONTH_1}} (This period){{PCT}}%{{STATUS}}{{COUNT}}

P95 Latency Trend (Last 6 Months)

MonthP95 (ms)vs SLA
{{MONTH_6}}{{P95}}ms✅ / ❌
{{MONTH_5}}{{P95}}ms✅ / ❌
{{MONTH_4}}{{P95}}ms✅ / ❌
{{MONTH_3}}{{P95}}ms✅ / ❌
{{MONTH_2}}{{P95}}ms✅ / ❌
{{MONTH_1}} (This period){{P95}}ms✅ / ❌

8. Improvement Initiatives

InitiativeSourceOwnerTarget DateStatusExpected Impact
{{INITIATIVE_1}}Post-mortem INC-{{ID}}{{OWNER}}{{DATE}}{{STATUS}}+{{IMPACT}}% availability
{{INITIATIVE_2}}Proactive{{OWNER}}{{DATE}}{{STATUS}}P99 < {{P99}} ms
{{INITIATIVE_3}}Customer feedback{{OWNER}}{{DATE}}{{STATUS}}Reduce MTTR by 30%

9. ActionCustomer ItemsCommunication fromSummary

This Report 1 weekv1.0
#Date ActionType OwnerRecipients DueSubjectSent By
1{{DATE}} ConfigureIncident CloudWatch alarm on DatabaseConnections > 70notification AlemAll customers Within{{SUBJECT}} {{SENDER}}
2{{DATE}} ImplementSLA PgBouncercredit / RDS Proxy before public launchnotice PlatformAffected customers Before{{SUBJECT}} {{SENDER}}
3{{DATE}} AddMonthly p99SLA latency CloudWatch alarm (> 1,000ms → Slack alert)report PlatformEnterprise customers Before v1.0
4{{SUBJECT}} Review 4xx error volume — categorize auth failures vs user errorsAlemWithin 2 weeks{{SENDER}}

10. SLANext ReportPeriod DistributionTargets

& Cadence documentdocumentuptimereport
AudienceMetric FrequencyThis Period FormatNext Period TargetRationale
Alem Bašić (CEO)Availability Monthly{{AVAIL_ACT}}% This{{AVAIL_NEXT}}% {{RATIONALE}}
PlatformP95 Architectlatency Monthly{{P95_ACT}}ms This{{P95_NEXT}}ms {{RATIONALE}}
FinanstilsynetError (if required)rate Annually{{ERR_ACT}}% Aggregated{{ERR_NEXT}}% {{RATIONALE}}
MTTR (P1){{MTTR_ACT}}{{MTTR_NEXT}}{{RATIONALE}}

Report generation date: Last business day of each month. Data sources: BetterStack dashboard + CloudWatch + audit_log queries.



Approval

Role Name Date Signature
Author Platform Architect (AI) 2026-02-23
Reviewer
Approver Alem Bašić