SLA Report
SLA Report
Project:
Drop{{PROJECT_NAME}} Version:0.1.0{{VERSION}} Date:2026-02-23{{DATE}} Author:Platform Architect (AI){{AUTHOR}} Status: Draft | In Review | Approved Reviewers:Alem Bašić (CEO){{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | Initial |
1. OverviewReporting Period
ThisdocumentdefinesField Drop'sValue ServiceLevelAgreements(SLAs),ServicePeriod Level{{MONTH}} Objectives{{YEAR}}(SLOs),andServiceFrom Level{{START_DATE}} Indicators00:00:00(SLIs).UTCItalsoincludesTo an{{END_DATE}} example23:59:59monthlyUTCSLAreportbased on simulated post-launch data. Use as a template for monthly operational reporting.
Reporting Period:February 2026 (example — pre-launch)Environment:Production (AWS App Runner eu-west-1 + RDS PostgreSQL)Report Author:GeneratedAlem{{REPORT_DATE}} BašićDistribution:Generated ByInternal{{AUTHOR}} (AlemBašić,Platform Architect)
2. SLA
DefinitionsSummary Table
2.1 Service Level Targets
Metric SLO TargetSLA CommitmentMeasurement WindowAvailability99.95%99.9%Monthly rollingAPI Response Time (p50)< 200ms< 500msMonthly rollingAPI Response Time (p99)< 1,000ms< 2,000msMonthly rollingError Rate< 0.1%< 1%Monthly rollingBankID Login Success Rate> 99.5%> 99%Monthly rollingKYC Initiation Success Rate> 99%> 98%Monthly rollingPayment Initiation Success Rate> 99%> 98%Monthly rollingHealth Check Recovery Time (RTO)< 10 min< 30 minPer incident
2.2 Availability Budget
SLA Target Monthly Allowed DowntimeWeekly Allowed Downtime99.9%43.8 minutes/month10.1 minutes/week99.95%21.9 minutes/month5.0 minutes/week99.99%4.4 minutes/month1.0 minutes/week
Current infrastructure capability:99.9% (SLA commitment). 99.95% is the SLO target once PgBouncer and multi-AZ standby are in place.
2.3 Exclusions
The following are excluded from SLA calculations:
Scheduled maintenance windows (posted to BetterStack status page > 24h in advance)BankID OIDC provider outages (upstream at Vipps MobilePay)Open Banking provider outages (upstream — provider TBD)Force majeure events (AWS region outage, etc.)Sumsub KYC service outages (upstream)
3. Service Level Indicators (SLIs)
3.1 Availability
Definition:Percentage of minutes in the reporting period whereGET /api/healthreturns HTTP 200 with{"status":"ok"}.
Measurement:BetterStackDrop Health Checkmonitor (1-minute check interval, 3 global locations).
Formula:(total_minutes - downtime_minutes) / total_minutes × 100
3.2 Response Time
Definition:p50 and p99 response time for all API requests.
Measurement:CloudWatch App Runner request metrics (future: add APM tool).
Alert threshold:p99 > 1,000ms triggers Slack#drop-opsalert.
3.3 Error Rate
Definition:Percentage of HTTP requests returning 5xx status codes.
Measurement:Slack alert fires when > 5 errors occur in 60 seconds (fromsrc/lib/alerts.ts).
Formula:5xx_requests / total_requests × 100
3.4 BankID Login Success Rate
Definition:Percentage of BankID login initiations that result in a successful session creation.
Measurement:Application audit log (audit_logtable:session_created/bankid_initiate).
3.5 Transaction Success Rate
Definition:Percentage of initiated transactions (remittance + QR) that reachcompletedstatus.
Measurement:Database query:COUNT(*) WHERE status='completed' / COUNT(*) WHERE status IN ('completed','failed').
4. Monthly SLA Report — February 2026 (Example)
NOTE:This is a simulated example report for template purposes.Actualmetrics will be populated from BetterStack, CloudWatch, and database queries after public launch.
4.1 Availability
MetricTargetActualStatusUptime99.9%99.94%PASSTotal downtime< 43.8 min28 min (1 incident)PASSIncidentsN/A1 (INC-2026-001)—Maintenance windowsN/A0—
Uptime chart (example):Week 1 (Feb 03–09): ████████████████████ 100.00% Week 2 (Feb 10–16): ████████████████████ 100.00% Week 3 (Feb 17–23): ███████████████████▌ 99.72% (INC-2026-001: 28 min) Week 4 (Feb 24–28): ████████████████████ 100.00% ───────────────────────────────────────────────── Monthly: ████████████████████ 99.94%
4.2 Performance
MetricTargetActualStatusp50 response time< 200ms~85msPASSp99 response time< 1,000ms~420msPASSp99 during incidentN/ATimeout (excluded)N/A
Slow endpoints (example):
Endpointp99 LatencyNotesPOST /api/transactions~380msSumsub KYC check adds latencyGET /api/bank-accounts/balance~350msOpen Banking API round tripPOST /api/auth/bankid/callback~180msDB session writeGET /api/health~45msDB ping
4.3 Error Rate
MetricTargetActualStatusOverall error rate< 1%0.06%PASSError rate (excl. incident)< 0.1%0.03%PASS5xx errors< 1%0.06%PASS4xx errorsN/A2.1%INFO (mostly 401 auth failures)
Top 5xx errors (example):
ErrorCountRoot Cause503 during INC-2026-001~240DB connection pool exhaustion500 Sumsub timeout3Sumsub API latency spike503 cold start1App Runner scale-to-zero (if applicable)
4.4 BankID Authentication
MetricTargetActualStatusLogin success rate> 99%99.7%PASSLogin failures—0.3%INFOCSRF rejections—0—Age verification failures—2INFO
Note:Login failures during INC-2026-001 excluded from calculation (upstream service impacted by pool exhaustion).
4.5 KYC (Sumsub)
MetricTargetActualStatusKYC initiation success rate> 98%99.2%PASSGREEN (approved) rateN/A91%INFORED (rejected) rateN/A6%INFORETRY rateN/A3%INFOAverage review timeN/A~2hINFO
4.6 Transactions
MetricTargetActualStatusTransaction success rate> 98%TBD — Open Banking provider not yet liveTBDRemittance completion> 98%TBDTBDQR payment completion> 98%TBDTBD
5. Incident Summary
Incident IDSeverityStartDurationImpactRoot CauseStatusINC-2026-001P12026-02-20 10:30 UTC28 min100% usersRDS connection pool exhaustionClosed
SLA credit impact:28 minutes downtime. Monthly uptime = 99.94%.SLA (99.9%) met.No SLA credit applicable.
6. Infrastructure Health
6.1 App Runner
MetricValueNotesService statusRUNNING—Deployments this month3Bug fix, security patch, config updateDeployment failures0—Average deployment time~4 min—
6.2 RDS PostgreSQL
MetricValueThresholdStatusDB instance statusavailable—PASSFree storage space~18 GBAlert if < 2 GBPASSBackup retention7 days7 days minimumPASSLast automated snapshot< 24h ago< 24hPASSPITR enabledYesRequiredPASSMax connections (month)~45 (peak)Alert at 70PASS (alarm not yet configured)
6.3 BetterStack Monitors
MonitorChecksPassedFailedUptimeDrop Health Check~40,320~40,292~28 (during incident)99.93%Drop Landing Page~40,320~40,3200100.00%US East Health~40,320~40,292~28 (during incident)99.93%
7. Security & Compliance
CheckStatus Notes NoAvailabilityPII data exposure incidents(uptime)PASS≥ {{AVAIL_SLA}}%—{{AVAIL_ACTUAL}}%✅ Pass / ❌ Breach AuditP95logResponsecontinuityTimePASS≤ {{P95_SLA}}msAll{{P95_ACTUAL}}msevents✅ loggedPass / ❌ BreachSecretP99rotationResponse(JWT_SECRET)TimeNot≤yet due{{P99_SLA}}msRotation{{P99_ACTUAL}}msscheduled✅ Q2Pass2026/ ❌ BreachSecretErrorrotation (DB password)RateNot≤yet due{{ERR_SLA}}%Rotation{{ERR_ACTUAL}}%scheduled✅ Q2Pass2026/ ❌ BreachAMLMTTRalerts(P1reviewedincidents)N/A≤ {{MTTR_SLA}}No{{MTTR_ACTUAL}}open✅ casesPass / ❌ BreachPendingMTTDKYC(alert> 24hdetection)0≤ {{MTTD_SLA}}—{{MTTD_ACTUAL}}✅ Pass / ❌ Breach GDPRScheduledrequestsmaintenance0≤ {{MAINT_SLA}}h/moNone{{MAINT_ACTUAL}}hreceived✅ Pass / ❌ Breach Overall SLA compliance this period: {{OVERALL_STATUS}}
3. Availability Report
3.1 Uptime Percentage
Service Total Minutes Downtime Minutes Uptime Minutes Uptime % {{SERVICE_1}} {{TOTAL_MIN}} {{DOWN_MIN}} {{UP_MIN}} {{UP_PCT}}% {{SERVICE_2}} {{TOTAL_MIN}} {{DOWN_MIN}} {{UP_MIN}} {{UP_PCT}}% Aggregate {{AGG_UPTIME}}% Note: Only unplanned downtime counts against SLA uptime calculations. See Section 3.3 for maintenance exclusions.
3.2 Downtime Incidents
Incident ID Start End Duration Service Cause SLA Counted INC-{{ID}} {{START}} {{END}} {{DURATION}}min {{SERVICE}} {{CAUSE}} Yes / Excluded Total unplanned downtime: {{TOTAL_DOWNTIME}} minutes Downtime excluded (scheduled maintenance): {{EXCL_DOWNTIME}} minutes
3.3 Maintenance Windows
Date Duration Service Affected Pre-announced Purpose {{DATE}} {{DURATION}}min {{SERVICE}} Yes ({{DAYS}} days advance notice) {{PURPOSE}}
8.4. Performance Report4.1 Response Time
Service / Endpoint P50 P90 P95 P99 Max SLA (P95) Status Overall {{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌ GET /{{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌ POST /api/{{RESOURCE}}{{P50}}ms {{P90}}ms {{P95}}ms {{P99}}ms {{MAX}}ms {{SLA}}ms ✅ / ❌ 4.2 Throughput
Service Avg Requests/sec Peak Requests/sec Peak Time {{SERVICE_1}} {{AVG_RPS}} {{PEAK_RPS}} {{PEAK_TIME}} Total requests served this period: {{TOTAL_REQUESTS}}
4.3 Error Rate
Service Total Requests 4xx Errors 5xx Errors Error Rate SLA Status {{SERVICE_1}} {{TOTAL}} {{4XX}} {{5XX}} {{ERR_RATE}}% ≤ {{ERR_SLA}}% ✅ / ❌
5. Incident Summary
5.1 Incidents by Severity
Severity Count Total Duration Avg MTTR P1 (Critical) {{P1_COUNT}} {{P1_DURATION}} {{P1_MTTR}} P2 (High) {{P2_COUNT}} {{P2_DURATION}} {{P2_MTTR}} P3 (Medium) {{P3_COUNT}} {{P3_DURATION}} {{P3_MTTR}} P4 (Low) {{P4_COUNT}} {{P4_DURATION}} {{P4_MTTR}} Total {{TOTAL_COUNT}} {{TOTAL_DURATION}} {{AVG_MTTR}} 5.2 MTTR (Mean Time to Resolve)
Severity SLA Target This Period Last Period Trend P1 ≤ {{P1_MTTR_SLA}} {{P1_MTTR_ACT}} {{P1_MTTR_PREV}} ↑ / ↓ / → P2 ≤ {{P2_MTTR_SLA}} {{P2_MTTR_ACT}} {{P2_MTTR_PREV}} ↑ / ↓ / → 5.3 MTTD (Mean Time to Detect)
Period MTTD vs SLA Trend This period {{MTTD_ACT}} {{MTTD_STATUS}} ↑ / ↓ / → Last period {{MTTD_PREV}}
6. SLA
TrendingBreach Analysis{{#if SLA_BREACH}}
Breach Details
Breach # Metric SLA Actual Duration Customers Affected 1 {{METRIC}} {{SLA_TARGET}} {{ACTUAL}} {{BREACH_DURATION}} {{CUSTOMERS}} Root Cause
{{BREACH_ROOT_CAUSE}}
Remediation
{{BREACH_REMEDIATION}}
Contractual Obligations
Customer Contract Reference Credit Due Notification Required Notification Sent {{CUSTOMER}} {{CONTRACT_REF}} ${{CREDIT}} Yes {{DATE}} {{else}}
No SLA breaches this period. All commitments met.
{{/if}}
7. Trend Analysis
Availability Trend (Last 6 Months)
Month Uptime % p99vsLatencyError RateTargetIncidents Feb 2026{{MONTH_6}}99.94%{{PCT}}%~420ms{{STATUS}}0.06%1 (P1){{COUNT}}Mar 2026{{MONTH_5}}TBD{{PCT}}%TBD{{STATUS}}TBD—{{COUNT}}Apr 2026{{MONTH_4}}TBD{{PCT}}%TBD{{STATUS}}TBD{{COUNT}}{{MONTH_3}} —{{PCT}}%{{STATUS}} {{COUNT}} {{MONTH_2}} {{PCT}}% {{STATUS}} {{COUNT}} {{MONTH_1}} (This period) {{PCT}}% {{STATUS}} {{COUNT}} P95 Latency Trend (Last 6 Months)
Month P95 (ms) vs SLA {{MONTH_6}} {{P95}}ms ✅ / ❌ {{MONTH_5}} {{P95}}ms ✅ / ❌ {{MONTH_4}} {{P95}}ms ✅ / ❌ {{MONTH_3}} {{P95}}ms ✅ / ❌ {{MONTH_2}} {{P95}}ms ✅ / ❌ {{MONTH_1}} (This period) {{P95}}ms ✅ / ❌
8. Improvement Initiatives
Initiative Source Owner Target Date Status Expected Impact {{INITIATIVE_1}} Post-mortem INC-{{ID}} {{OWNER}} {{DATE}} {{STATUS}} +{{IMPACT}}% availability {{INITIATIVE_2}} Proactive {{OWNER}} {{DATE}} {{STATUS}} P99 < {{P99}} ms {{INITIATIVE_3}} Customer feedback {{OWNER}} {{DATE}} {{STATUS}} Reduce MTTR by 30%
9.
ActionCustomerItemsCommunicationfromSummaryThis Report
#DateActionTypeOwnerRecipientsDueSubjectSent By 1{{DATE}}ConfigureIncidentCloudWatch alarm on DatabaseConnections > 70notificationAlemAll customersWithin{{SUBJECT}}1 week{{SENDER}} 2{{DATE}}ImplementSLAPgBouncercredit/ RDS Proxy before public launchnoticePlatformAffected customersBefore{{SUBJECT}}v1.0{{SENDER}} 3{{DATE}}AddMonthlyp99SLAlatency CloudWatch alarm (> 1,000ms → Slack alert)reportPlatformEnterprise customersBefore v1.04{{SUBJECT}}Review 4xx error volume — categorize auth failures vs user errorsAlemWithin 2 weeks{{SENDER}}
10.
SLANextReportPeriodDistributionTargets& Cadence
AudienceMetricFrequencyThis PeriodFormatNext Period TargetRationale Alem Bašić (CEO)AvailabilityMonthly{{AVAIL_ACT}}%This{{AVAIL_NEXT}}%document{{RATIONALE}} PlatformP95ArchitectlatencyMonthly{{P95_ACT}}msThis{{P95_NEXT}}msdocument{{RATIONALE}} FinanstilsynetError(if required)rateAnnually{{ERR_ACT}}%Aggregated{{ERR_NEXT}}%uptime{{RATIONALE}} reportMTTR (P1) {{MTTR_ACT}} {{MTTR_NEXT}} {{RATIONALE}}
Report generation date:Last business day of each month.Data sources:BetterStack dashboard + CloudWatch + audit_log queries.
Related Documents
- Monitoring & Observability
- Incident Report
INC-2026-001- Post-Mortem
INC-2026-001Operational RunbookDisaster Recovery Plan
Approval
Role Name Date Signature Author Platform Architect (AI)2026-02-23Reviewer Approver Alem Bašić