Runbook: NBS IPS Outage Runbook: NBS IPS Outage Purpose This runbook provides step-by-step procedures for detecting, triaging, and responding to NBS IPS (Narodna Banka Srbije Instant Payment System) outages or degraded performance. Trigger Any of the following conditions indicate potential NBS IPS outage: High rejection rate: >10% of payment initiation requests returning RJCT status Timeout spike: >5 consecutive timeout errors (>30s response time) HTTP 5xx errors: NBS IPS API returning 500/502/503/504 NBS status page alert: Official communication from NBS about system maintenance or outage Preconditions Before following this runbook, ensure: You have access to production logs (CloudWatch, Datadog, or local Docker logs) You have access to NBS IPS status page: https://www.nbs.rs (check payment system availability) You have NBS contact details: platne.institucije@nbs.rs, +381 11 3027 100 You have access to Drop Srbija admin dashboard (to pause outbound payments if needed) You have access to status page publishing tool (to notify users) Step-by-Step Response Step 1: Verify the Issue (ETA: 2 minutes) Check Docker logs for NBS IPS errors: docker logs dropsrbija-api | grep NbsIpsLog | tail -50 Look for patterns: Multiple RJCT (rejection) responses with error code AM05 (duplicate submission) Multiple RJCT with error code TECH (technical error) Timeout messages: java.net.SocketTimeoutException: Read timed out HTTP 503 Service Unavailable HTTP 500 Internal Server Error Example of normal log: 2026-04-16T10:15:32Z [NbsIpsLog] transaction_id=abc123 request_type=initiate response_status=200 nbs_ips_status=ACCP Example of outage log: 2026-04-16T10:45:12Z [NbsIpsLog] transaction_id=def456 request_type=initiate response_status=503 error_message="Service Temporarily Unavailable" 2026-04-16T10:45:23Z [NbsIpsLog] transaction_id=ghi789 request_type=initiate response_status=500 error_message="Internal Server Error" Check metrics dashboard (if available): Transaction success rate (should be >95%) Average NBS IPS response time (should be <2s) Error rate by HTTP status code Step 2: Check NBS IPS Status Page (ETA: 1 minute) Official NBS status page: Go to https://www.nbs.rs Navigate to: Payment Systems → IPS → System Availability Check for announcements: Scheduled maintenance windows Incident notifications System degradation alerts If NBS confirms outage: Note the estimated resolution time (ERT) Proceed to Step 4 (Customer Communication) If NBS shows "All Systems Operational": Outage may be isolated to Drop Srbija's connection Proceed to Step 3 (Technical Diagnosis) Step 3: Technical Diagnosis (ETA: 5 minutes) Possible causes of isolated failures: Network issue between Drop Srbija and NBS: Check VPN/VPC connectivity (if applicable) Verify mTLS certificates haven't expired Test network path: curl -I https://ips.nbs.rs Rate limiting: NBS may throttle requests if Drop exceeds transaction quota Check nbs_ips_logs table for HTTP 429 (Too Many Requests) Solution: Implement exponential backoff (already in adapter) Authentication failure: mTLS client certificate may have expired API key rotation (if NBS uses API keys) Check for HTTP 401/403 errors in logs Partner bank integration issue: If Drop operates as agent under Article 24, outage may be at partner bank's IPS gateway Contact bank technical support: [Partner Bank Support Number] Run diagnostic test transaction: # Send test payment (100 RSD to known-good recipient) curl -X POST http://localhost:3003/v1/ips/initiate \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "recipientPhone": "+381601234567", "amount": 100, "description": "IPS diagnostic test" }' Expected responses: Success: {"transactionId": "...", "status": "PENDING"} → NBS IPS is working Timeout: No response after 30s → Network/connectivity issue RJCT: {"status": "failed", "error": "TECH"} → NBS technical error HTTP 503: NBS IPS is down Step 4: Customer Communication (ETA: 3 minutes) If outage is confirmed (NBS or Drop technical issue): 1. Post status page update (https://status.dropsrbija.rs or in-app banner): Serbian template: 🔴 Problemi sa trenutnim plaćanjima NBS instant plaćanja su trenutno nedostupna zbog [razloga]. Radimo na rešavanju problema. Očekivano vreme povratka: [ETA ili "u najkraćem roku"] Vaša sredstva su sigurna. Pokušajte ponovo za nekoliko minuta. Ažurirano: [timestamp] English translation (for reference): 🔴 Issues with instant payments NBS instant payments are currently unavailable due to [reason]. We are working on resolving the issue. Expected resolution time: [ETA or "as soon as possible"] Your funds are safe. Please try again in a few minutes. Updated: [timestamp] 2. Send SMS to active users (optional, for extended outages >30 min): SMS template: Drop Srbija: Instant plaćanja trenutno nedostupna zbog problema sa NBS sistemom. Vaša sredstva su sigurna. Pokušajte ponovo za 30 min. Info: dropsrbija.rs/status 3. Email to high-value users (optional, for outages >2 hours): Subject: Obaveštenje o trenutnim problemima sa plaćanjima Body: [See incident-notification-procedure.md for full email template] Step 5: Escalate to NBS (if needed) (ETA: 5 minutes) When to escalate: Outage duration >15 minutes AND NBS status page shows "operational" You suspect issue is specific to Drop Srbija's integration Multiple banks reporting similar issues (check Serbian fintech community channels) NBS Contact: Email: platne.institucije@nbs.rs Phone: +381 11 3027 100 (business hours: 08:00-16:00 CET) Emergency Phone: [TBD — request during onboarding] Escalation email template: Subject: IPS Payment Failures — Drop Srbija (Urgent) Poštovani, Prijavljujemo tehničke probleme sa IPS plaćanjima preko Drop Srbija platforme. Simptomi: - Vreme početka: [timestamp] - Procenat neuspelih transakcija: [X%] - HTTP status kodovi: [500/503/timeout] - Broj pogođenih korisnika: [Y] Dijagnostika: - NBS status stranica pokazuje "operativno" - mTLS sertifikati validni do: [expiry date] - Test transakcije vraćaju: [error message] Molimo za hitnu pomoć u dijagnostici problema. Kontakt: [your name], [phone], [email] Drop Srbija d.o.o. [Company registration details] Step 6: Implement Workaround (if available) (ETA: 10 minutes) Workaround Option 1: Switch to Alternate Bank Adapter If Drop has partnerships with multiple banks: # Update environment variable (requires deployment) export BANK_PARTNER=alternate_bank_id # Restart API docker compose restart api Workaround Option 2: Queue Transactions for Retry Backend already implements exponential backoff retry for failed transactions: First retry: 30 seconds Second retry: 2 minutes Third retry: 10 minutes After 3 failures: Mark transaction as failed , user receives notification No manual intervention needed — adapter handles retries automatically. Workaround Option 3: Pause Outbound Payments If outage is prolonged (>2 hours) and retries are causing cascading failures: # Via admin API (requires admin JWT) curl -X POST http://localhost:3003/admin/payments/pause \ -H "Authorization: Bearer " \ -d '{"reason": "NBS IPS outage", "estimatedResumptionTime": "2026-04-16T14:00:00Z"}' Effect: New payment requests return HTTP 503 with message: "Plaćanja su privremeno nedostupna" Existing pending transactions continue retry attempts Users see in-app banner: "Trenutno ne primamo nova plaćanja" Resume payments: curl -X POST http://localhost:3003/admin/payments/resume \ -H "Authorization: Bearer " Step 7: Monitor Recovery (ETA: Ongoing) Once NBS IPS is back online: Check transaction backlog: # Count transactions in "processing" state psql -h localhost -p 5434 -U dropsrbija -d dropsrbija_prod -c \ "SELECT COUNT(*) FROM transactions WHERE status = 'processing' AND created_at > NOW() - INTERVAL '2 hours';" Verify retry processing: Backend retries failed transactions automatically. Monitor logs: docker logs dropsrbija-api | grep "RetryProcessor" | tail -20 Expected output: [RetryProcessor] Retrying transaction abc123 (attempt 1/3) [NbsIpsLog] transaction_id=abc123 response_status=200 nbs_ips_status=ACCP [RetryProcessor] Transaction abc123 succeeded on retry Update status page: ✅ Problemi rešeni NBS instant plaćanja su ponovo dostupna. Sva odložena plaćanja će biti procesirana automatski. Hvala na strpljenju. Ažurirano: [timestamp] Post-incident review (within 24 hours): Total downtime duration Number of affected transactions Number of users impacted Root cause (NBS outage vs Drop technical issue) Lessons learned Action items (e.g., multi-bank redundancy, better monitoring) Step 8: Report to NBS (if required) (ETA: 72 hours) If outage meets NBS incident reporting criteria: Duration >2 hours, OR 1000 failed transactions, OR Security/data breach involved Follow incident notification procedure: See incident-notification-procedure.md for full 3-track reporting protocol: Track 1: NBS initial notification (within 4 hours) Track 2: NBS detailed report (within 72 hours) Track 3: Poverenik data breach notification (if applicable) Expected Outcome Success criteria: ✅ Outage detected within 5 minutes of occurrence ✅ Root cause identified (NBS outage vs Drop technical issue) ✅ Users notified within 15 minutes (status page update) ✅ NBS contacted (if Drop-specific issue) ✅ Workaround implemented (if available) ✅ Service restored (or ETA communicated) ✅ Post-incident review completed Metrics to track: MTTD (Mean Time to Detect): <5 minutes MTTR (Mean Time to Resolve): <30 minutes for Drop issues, variable for NBS outages User notification latency: <15 minutes Escalation Path Level 1 (On-call engineer): Follow Steps 1-6 above Level 2 (Technical Lead): If unresolved after 30 minutes Contact: Petter Graff (CodeCraft, petter-graff@alai.no) Decide: Implement workaround, escalate to NBS, or wait for NBS resolution Level 3 (CEO): If outage >2 hours OR customer impact >10,000 users Contact: Alem Basic (alem@alai.no, +47 404 74 251) Decide: Public communication strategy, regulatory notification (NBS + Poverenik) Post-Outage Actions Mandatory: Update runbook — If new failure mode discovered, add to Step 3 diagnostic checklist Improve monitoring — Add alert for specific error pattern that triggered outage Document lessons learned — Add entry to Decision Log Optional (if pattern recurs): Implement multi-bank redundancy — Partner with 2+ banks for IPS access Pre-queue transactions — Buffer transactions locally during known NBS maintenance windows Enhanced monitoring — Set up synthetic transaction every 5 minutes to detect outages faster Related Documents Incident Notification Procedure — Full NBS/Poverenik reporting protocol Architecture Overview — NBS IPS integration design Decision Log — Historical incident post-mortems Last Updated: 2026-04-16 Next Review: After first real NBS IPS outage (to validate runbook effectiveness)