Skip to main content

SLA Report

SLA Report

Project: {{PROJECT_NAME}}Drop Version: {{VERSION}}0.1.0 Date: {{DATE}}2026-02-23 Author: {{AUTHOR}}Platform Architect (AI) Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}Alem Bašić (CEO)

Document History

Version Date Author Changes
0.1 {{DATE}}2026-02-23 {{AUTHOR}}Platform Architect (AI) Initial draftSLA targets and example monthly report (pre-launch)

1. Overview

This document defines Drop's Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs). It also includes an example monthly SLA report based on simulated post-launch data. Use as a template for monthly operational reporting.

Reporting PeriodPeriod: February 2026 (example — pre-launch) Environment: Production (AWS App Runner eu-west-1 + RDS PostgreSQL) Report Author: Alem Bašić Distribution: Internal (Alem Bašić, Platform Architect)


2. SLA Definitions

2.1 Service Level Targets

{{YEAR}} UTC UTC
FieldMetric ValueSLO TargetSLA CommitmentMeasurement Window
PeriodAvailability {{MONTH}}99.95% 99.9%Monthly rolling
FromAPI Response Time (p50) {{START_DATE}}< 00:00:00200ms < 500msMonthly rolling
ToAPI Response Time (p99) {{END_DATE}}< 23:59:591,000ms < 2,000msMonthly rolling
ReportError GeneratedRate {{REPORT_DATE}}< 0.1%< 1%Monthly rolling
GeneratedBankID ByLogin Success Rate {{AUTHOR}}> 99.5%> 99%Monthly rolling
KYC Initiation Success Rate> 99%> 98%Monthly rolling
Payment Initiation Success Rate> 99%> 98%Monthly rolling
Health Check Recovery Time (RTO)< 10 min< 30 minPer incident

2.2 Availability Budget

SLA TargetMonthly Allowed DowntimeWeekly Allowed Downtime
99.9%43.8 minutes/month10.1 minutes/week
99.95%21.9 minutes/month5.0 minutes/week
99.99%4.4 minutes/month1.0 minutes/week

Current infrastructure capability: 99.9% (SLA commitment). 99.95% is the SLO target once PgBouncer and multi-AZ standby are in place.

2.3 Exclusions

The following are excluded from SLA calculations:

  • Scheduled maintenance windows (posted to BetterStack status page > 24h in advance)
  • BankID OIDC provider outages (upstream at Vipps MobilePay)
  • Open Banking provider outages (upstream — provider TBD)
  • Force majeure events (AWS region outage, etc.)
  • Sumsub KYC service outages (upstream)

2.3. Service Level Indicators (SLIs)

3.1 Availability

Definition: Percentage of minutes in the reporting period where GET /api/health returns HTTP 200 with {"status":"ok"}.

Measurement: BetterStack Drop Health Check monitor (1-minute check interval, 3 global locations).

Formula: (total_minutes - downtime_minutes) / total_minutes × 100

3.2 Response Time

Definition: p50 and p99 response time for all API requests.

Measurement: CloudWatch App Runner request metrics (future: add APM tool).

Alert threshold: p99 > 1,000ms triggers Slack #drop-ops alert.

3.3 Error Rate

Definition: Percentage of HTTP requests returning 5xx status codes.

Measurement: Slack alert fires when > 5 errors occur in 60 seconds (from src/lib/alerts.ts).

Formula: 5xx_requests / total_requests × 100

3.4 BankID Login Success Rate

Definition: Percentage of BankID login initiations that result in a successful session creation.

Measurement: Application audit log (audit_log table: session_created / bankid_initiate).

3.5 Transaction Success Rate

Definition: Percentage of initiated transactions (remittance + QR) that reach completed status.

Measurement: Database query: COUNT(*) WHERE status='completed' / COUNT(*) WHERE status IN ('completed','failed').


4. Monthly SLA SummaryReport Table— February 2026 (Example)

NOTE: This is a simulated example report for template purposes. Actual metrics will be populated from BetterStack, CloudWatch, and database queries after public launch.

4.1 Availability

Metric SLA Target ActualStatus
Uptime99.9%99.94%PASS
Total downtime< 43.8 min28 min (1 incident)PASS
IncidentsN/A1 (INC-2026-001)
Maintenance windowsN/A0

Uptime chart (example):

Week 1 (Feb 03–09):  ████████████████████ 100.00%
Week 2 (Feb 10–16):  ████████████████████ 100.00%
Week 3 (Feb 17–23):  ███████████████████▌  99.72% (INC-2026-001: 28 min)
Week 4 (Feb 24–28):  ████████████████████ 100.00%
─────────────────────────────────────────────────
Monthly:              ████████████████████  99.94%

4.2 Performance

MetricTargetActualStatus
p50 response time< 200ms~85msPASS
p99 response time< 1,000ms~420msPASS
p99 during incidentN/ATimeout (excluded)N/A

Slow endpoints (example):

Endpointp99 LatencyNotes
POST /api/transactions~380msSumsub KYC check adds latency
GET /api/bank-accounts/balance~350msOpen Banking API round trip
POST /api/auth/bankid/callback~180msDB session write
GET /api/health~45msDB ping

4.3 Error Rate

MetricTargetActualStatus
Overall error rate< 1%0.06%PASS
Error rate (excl. incident)< 0.1%0.03%PASS
5xx errors< 1%0.06%PASS
4xx errorsN/A2.1%INFO (mostly 401 auth failures)

Top 5xx errors (example):

ErrorCountRoot Cause
503 during INC-2026-001~240DB connection pool exhaustion
500 Sumsub timeout3Sumsub API latency spike
503 cold start1App Runner scale-to-zero (if applicable)

4.4 BankID Authentication

MetricTargetActualStatus
Login success rate> 99%99.7%PASS
Login failures0.3%INFO
CSRF rejections0
Age verification failures2INFO

Note: Login failures during INC-2026-001 excluded from calculation (upstream service impacted by pool exhaustion).

4.5 KYC (Sumsub)

MetricTargetActualStatus
KYC initiation success rate> 98%99.2%PASS
GREEN (approved) rateN/A91%INFO
RED (rejected) rateN/A6%INFO
RETRY rateN/A3%INFO
Average review timeN/A~2hINFO

4.6 Transactions

MetricTargetActualStatus
Transaction success rate> 98%TBD — Open Banking provider not yet liveTBD
Remittance completion> 98%TBDTBD
QR payment completion> 98%TBDTBD

5. Incident Summary

Incident IDSeverityStartDurationImpactRoot CauseStatus
INC-2026-001P12026-02-20 10:30 UTC28 min100% usersRDS connection pool exhaustionClosed

SLA credit impact: 28 minutes downtime. Monthly uptime = 99.94%. SLA (99.9%) met. No SLA credit applicable.


6. Infrastructure Health

6.1 App Runner

MetricValueNotes
Service statusRUNNING
Deployments this month3Bug fix, security patch, config update
Deployment failures0
Average deployment time~4 min

6.2 RDS PostgreSQL

MetricValueThresholdStatus
DB instance statusavailablePASS
Free storage space~18 GBAlert if < 2 GBPASS
Backup retention7 days7 days minimumPASS
Last automated snapshot< 24h ago< 24hPASS
PITR enabledYesRequiredPASS
Max connections (month)~45 (peak)Alert at 70PASS (alarm not yet configured)

6.3 BetterStack Monitors

MonitorChecksPassedFailedUptime
Drop Health Check~40,320~40,292~28 (during incident)99.93%
Drop Landing Page~40,320~40,3200100.00%
US East Health~40,320~40,292~28 (during incident)99.93%

7. Security & Compliance

All Rotation Rotation No None
Check Status Notes
AvailabilityNo (uptime)PII data exposure incidents ≥ {{AVAIL_SLA}}%PASS {{AVAIL_ACTUAL}}%✅ Pass / ❌ Breach
P95Audit Responselog Timecontinuity ≤ {{P95_SLA}}msPASS {{P95_ACTUAL}}ms events Pass / ❌ Breachlogged
P99Secret Responserotation Time(JWT_SECRET) Not {{P99_SLA}}msyet due {{P99_ACTUAL}}ms scheduled PassQ2 / ❌ Breach2026
ErrorSecret Raterotation (DB password) Not {{ERR_SLA}}%yet due {{ERR_ACTUAL}}% scheduled PassQ2 / ❌ Breach2026
MTTRAML (P1alerts incidents)reviewed ≤ {{MTTR_SLA}}N/A {{MTTR_ACTUAL}} open Pass / ❌ Breachcases
MTTDPending (alertKYC detection)> 24h ≤ {{MTTD_SLA}}0 {{MTTD_ACTUAL}}✅ Pass / ❌ Breach
ScheduledGDPR maintenancerequests ≤ {{MAINT_SLA}}h/mo0 {{MAINT_ACTUAL}}h ✅ Pass / ❌ Breachreceived

Overall


3. Availability ReportTrending

3.1 Uptime Percentage

ServiceTotal MinutesDowntime MinutesMonth Uptime Minutes Uptimep99 %LatencyError RateIncidents
{{SERVICE_1}}Feb 2026 {{TOTAL_MIN}}99.94% {{DOWN_MIN}}~420ms {{UP_MIN}}0.06% {{UP_PCT}}%1 (P1)
{{SERVICE_2}}Mar 2026 {{TOTAL_MIN}}TBD {{DOWN_MIN}}TBD {{UP_MIN}}TBD {{UP_PCT}}%
AggregateApr 2026 TBD TBD TBD {{AGG_UPTIME}}%

Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.

3.2 Downtime Incidents

Incident IDStartEndDurationServiceCauseSLA Counted
INC-{{ID}}{{START}}{{END}}{{DURATION}}min{{SERVICE}}{{CAUSE}}Yes / Excluded

Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes

3.3 Maintenance Windows

DateDurationService AffectedPre-announcedPurpose
{{DATE}}{{DURATION}}min{{SERVICE}}Yes ({{DAYS}} days advance notice){{PURPOSE}}

4.9. PerformanceAction Items from This Report

4.1 Response Time

Service / EndpointP50P90P95P99MaxSLA (P95)Status
Overall{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌
GET /{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌
POST /api/{{RESOURCE}}{{P50}}ms{{P90}}ms{{P95}}ms{{P99}}ms{{MAX}}ms{{SLA}}ms✅ / ❌

4.2 Throughput

ServiceAvg Requests/secPeak Requests/secPeak Time
{{SERVICE_1}}{{AVG_RPS}}{{PEAK_RPS}}{{PEAK_TIME}}

Total requests served this period: {{TOTAL_REQUESTS}}

4.3 Error Rate

ServiceTotal Requests4xx Errors5xx ErrorsError RateSLAStatus
{{SERVICE_1}}{{TOTAL}}{{4XX}}{{5XX}}{{ERR_RATE}}%≤ {{ERR_SLA}}%✅ / ❌

5. Incident Summary

5.1 Incidents by Severity

SeverityCountTotal DurationAvg MTTR
P1 (Critical){{P1_COUNT}}{{P1_DURATION}}{{P1_MTTR}}
P2 (High){{P2_COUNT}}{{P2_DURATION}}{{P2_MTTR}}
P3 (Medium){{P3_COUNT}}{{P3_DURATION}}{{P3_MTTR}}
P4 (Low){{P4_COUNT}}{{P4_DURATION}}{{P4_MTTR}}
Total{{TOTAL_COUNT}}{{TOTAL_DURATION}}{{AVG_MTTR}}

5.2 MTTR (Mean Time to Resolve)

SeveritySLA TargetThis PeriodLast PeriodTrend
P1≤ {{P1_MTTR_SLA}}{{P1_MTTR_ACT}}{{P1_MTTR_PREV}}↑ / ↓ / →
P2≤ {{P2_MTTR_SLA}}{{P2_MTTR_ACT}}{{P2_MTTR_PREV}}↑ / ↓ / →

5.3 MTTD (Mean Time to Detect)

PeriodMTTDvs SLATrend
This period{{MTTD_ACT}}{{MTTD_STATUS}}↑ / ↓ / →
Last period{{MTTD_PREV}}

6. SLA Breach Analysis

{{#if SLA_BREACH}}

Breach Details

Within1
Breach # MetricAction SLAOwner ActualDurationCustomers AffectedDue
1 {{METRIC}}Configure CloudWatch alarm on DatabaseConnections > 70 {{SLA_TARGET}}Alem {{ACTUAL}} {{BREACH_DURATION}} {{CUSTOMERS}}

Root Cause

{{BREACH_ROOT_CAUSE}}

Remediation

{{BREACH_REMEDIATION}}

Contractual Obligations

CustomerContract ReferenceCredit DueNotification RequiredNotification Sent
{{CUSTOMER}}{{CONTRACT_REF}}${{CREDIT}}Yes{{DATE}}

{{else}}

No SLA breaches this period. All commitments met.

{{/if}}


7. Trend Analysis

Availability Trend (Last 6 Months)

Within2
MonthUptime %vs TargetIncidents
{{MONTH_6}}{{PCT}}%{{STATUS}}{{COUNT}}week
{{MONTH_5}}2 {{PCT}}%Implement PgBouncer / RDS Proxy before public launch {{STATUS}}Platform {{COUNT}}Before v1.0
{{MONTH_4}}3 {{PCT}}%Add p99 latency CloudWatch alarm (> 1,000ms → Slack alert) {{STATUS}}Platform {{COUNT}}Before v1.0
{{MONTH_3}}4 {{PCT}}%Review 4xx error volume — categorize auth failures vs user errors {{STATUS}}Alem {{COUNT}}
{{MONTH_2}}{{PCT}}%{{STATUS}}{{COUNT}}
{{MONTH_1}} (This period){{PCT}}%{{STATUS}}{{COUNT}}

P95 Latency Trend (Last 6 Months)

MonthP95 (ms)vs SLA
{{MONTH_6}}{{P95}}ms✅ / ❌
{{MONTH_5}}{{P95}}ms✅ / ❌
{{MONTH_4}}{{P95}}ms✅ / ❌
{{MONTH_3}}{{P95}}ms✅ / ❌
{{MONTH_2}}{{P95}}ms✅ / ❌
{{MONTH_1}} (This period){{P95}}ms✅ / ❌

8. Improvement Initiatives

InitiativeSourceOwnerTarget DateStatusExpected Impact
{{INITIATIVE_1}}Post-mortem INC-{{ID}}{{OWNER}}{{DATE}}{{STATUS}}+{{IMPACT}}% availability
{{INITIATIVE_2}}Proactive{{OWNER}}{{DATE}}{{STATUS}}P99 < {{P99}} ms
{{INITIATIVE_3}}Customer feedback{{OWNER}}{{DATE}}{{STATUS}}Reduce MTTR by 30%

9. Customer Communication Summary

DateTypeRecipientsSubjectSent By
{{DATE}}Incident notificationAll customers{{SUBJECT}}{{SENDER}}
{{DATE}}SLA credit noticeAffected customers{{SUBJECT}}{{SENDER}}
{{DATE}}Monthly SLA reportEnterprise customers{{SUBJECT}}{{SENDER}}weeks

10. NextSLA PeriodReport TargetsDistribution & Cadence

ThisThisAggregateduptime
MetricAudience This PeriodFrequency Next Period TargetRationaleFormat
AvailabilityAlem Bašić (CEO) {{AVAIL_ACT}}%Monthly {{AVAIL_NEXT}}% {{RATIONALE}}document
P95Platform latencyArchitect {{P95_ACT}}msMonthly {{P95_NEXT}}ms {{RATIONALE}}document
ErrorFinanstilsynet rate(if required) {{ERR_ACT}}%Annually {{ERR_NEXT}}% {{RATIONALE}}
MTTR (P1){{MTTR_ACT}}{{MTTR_NEXT}}{{RATIONALE}}report

Report generation date: Last business day of each month. Data sources: BetterStack dashboard + CloudWatch + audit_log queries.



Approval

Role Name Date Signature
Author Platform Architect (AI) 2026-02-23
Reviewer
Approver Alem Bašić