Skip to main content

Runbook: NBS IPS Outage

Runbook: NBS IPS Outage

Purpose

This runbook provides step-by-step procedures for detecting, triaging, and responding to NBS IPS (Narodna Banka Srbije Instant Payment System) outages or degraded performance.

Trigger

Any of the following conditions indicate potential NBS IPS outage:

  1. High rejection rate: >10% of payment initiation requests returning RJCT status
  2. Timeout spike: >5 consecutive timeout errors (>30s response time)
  3. HTTP 5xx errors: NBS IPS API returning 500/502/503/504
  4. NBS status page alert: Official communication from NBS about system maintenance or outage

Preconditions

Before following this runbook, ensure:

  • You have access to production logs (CloudWatch, Datadog, or local Docker logs)
  • You have access to NBS IPS status page: https://www.nbs.rs (check payment system availability)
  • You have NBS contact details: [email protected], +381 11 3027 100
  • You have access to Drop Srbija admin dashboard (to pause outbound payments if needed)
  • You have access to status page publishing tool (to notify users)

Step-by-Step Response

Step 1: Verify the Issue (ETA: 2 minutes)

Check Docker logs for NBS IPS errors:

docker logs dropsrbija-api | grep NbsIpsLog | tail -50

Look for patterns:

  • Multiple RJCT (rejection) responses with error code AM05 (duplicate submission)
  • Multiple RJCT with error code TECH (technical error)
  • Timeout messages: java.net.SocketTimeoutException: Read timed out
  • HTTP 503 Service Unavailable
  • HTTP 500 Internal Server Error

Example of normal log:

2026-04-16T10:15:32Z [NbsIpsLog] transaction_id=abc123 request_type=initiate response_status=200 nbs_ips_status=ACCP

Example of outage log:

2026-04-16T10:45:12Z [NbsIpsLog] transaction_id=def456 request_type=initiate response_status=503 error_message="Service Temporarily Unavailable"
2026-04-16T10:45:23Z [NbsIpsLog] transaction_id=ghi789 request_type=initiate response_status=500 error_message="Internal Server Error"

Check metrics dashboard (if available):

  • Transaction success rate (should be >95%)
  • Average NBS IPS response time (should be <2s)
  • Error rate by HTTP status code

Step 2: Check NBS IPS Status Page (ETA: 1 minute)

Official NBS status page:

  1. Go to https://www.nbs.rs
  2. Navigate to: Payment Systems → IPS → System Availability
  3. Check for announcements:
    • Scheduled maintenance windows
    • Incident notifications
    • System degradation alerts

If NBS confirms outage:

  • Note the estimated resolution time (ERT)
  • Proceed to Step 4 (Customer Communication)

If NBS shows "All Systems Operational":

  • Outage may be isolated to Drop Srbija's connection
  • Proceed to Step 3 (Technical Diagnosis)

Step 3: Technical Diagnosis (ETA: 5 minutes)

Possible causes of isolated failures:

  1. Network issue between Drop Srbija and NBS:

    • Check VPN/VPC connectivity (if applicable)
    • Verify mTLS certificates haven't expired
    • Test network path: curl -I https://ips.nbs.rs
  2. Rate limiting:

    • NBS may throttle requests if Drop exceeds transaction quota
    • Check nbs_ips_logs table for HTTP 429 (Too Many Requests)
    • Solution: Implement exponential backoff (already in adapter)
  3. Authentication failure:

    • mTLS client certificate may have expired
    • API key rotation (if NBS uses API keys)
    • Check for HTTP 401/403 errors in logs
  4. Partner bank integration issue:

    • If Drop operates as agent under Article 24, outage may be at partner bank's IPS gateway
    • Contact bank technical support: [Partner Bank Support Number]

Run diagnostic test transaction:

# Send test payment (100 RSD to known-good recipient)
curl -X POST http://localhost:3003/v1/ips/initiate \
  -H "Authorization: Bearer <admin_jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "recipientPhone": "+381601234567",
    "amount": 100,
    "description": "IPS diagnostic test"
  }'

Expected responses:

  • Success: {"transactionId": "...", "status": "PENDING"} → NBS IPS is working
  • Timeout: No response after 30s → Network/connectivity issue
  • RJCT: {"status": "failed", "error": "TECH"} → NBS technical error
  • HTTP 503: NBS IPS is down

Step 4: Customer Communication (ETA: 3 minutes)

If outage is confirmed (NBS or Drop technical issue):

1. Post status page update (https://status.dropsrbija.rs or in-app banner):

Serbian template:

🔴 Problemi sa trenutnim plaćanjima

NBS instant plaćanja su trenutno nedostupna zbog [razloga]. 
Radimo na rešavanju problema.

Očekivano vreme povratka: [ETA ili "u najkraćem roku"]

Vaša sredstva su sigurna. Pokušajte ponovo za nekoliko minuta.

Ažurirano: [timestamp]

English translation (for reference):

🔴 Issues with instant payments

NBS instant payments are currently unavailable due to [reason]. 
We are working on resolving the issue.

Expected resolution time: [ETA or "as soon as possible"]

Your funds are safe. Please try again in a few minutes.

Updated: [timestamp]

2. Send SMS to active users (optional, for extended outages >30 min):

SMS template:

Drop Srbija: Instant plaćanja trenutno nedostupna zbog problema sa NBS sistemom. Vaša sredstva su sigurna. Pokušajte ponovo za 30 min. Info: dropsrbija.rs/status

3. Email to high-value users (optional, for outages >2 hours):

Subject: Obaveštenje o trenutnim problemima sa plaćanjima

Body: [See incident-notification-procedure.md for full email template]

Step 5: Escalate to NBS (if needed) (ETA: 5 minutes)

When to escalate:

  • Outage duration >15 minutes AND NBS status page shows "operational"
  • You suspect issue is specific to Drop Srbija's integration
  • Multiple banks reporting similar issues (check Serbian fintech community channels)

NBS Contact:

  • Email: [email protected]
  • Phone: +381 11 3027 100 (business hours: 08:00-16:00 CET)
  • Emergency Phone: [TBD — request during onboarding]

Escalation email template:

Subject: IPS Payment Failures — Drop Srbija (Urgent)

Poštovani,

Prijavljujemo tehničke probleme sa IPS plaćanjima preko Drop Srbija platforme.

Simptomi:
- Vreme početka: [timestamp]
- Procenat neuspelih transakcija: [X%]
- HTTP status kodovi: [500/503/timeout]
- Broj pogođenih korisnika: [Y]

Dijagnostika:
- NBS status stranica pokazuje "operativno"
- mTLS sertifikati validni do: [expiry date]
- Test transakcije vraćaju: [error message]

Molimo za hitnu pomoć u dijagnostici problema.

Kontakt: [your name], [phone], [email]

Drop Srbija d.o.o.
[Company registration details]

Step 6: Implement Workaround (if available) (ETA: 10 minutes)

Workaround Option 1: Switch to Alternate Bank Adapter

If Drop has partnerships with multiple banks:

# Update environment variable (requires deployment)
export BANK_PARTNER=alternate_bank_id

# Restart API
docker compose restart api

Workaround Option 2: Queue Transactions for Retry

Backend already implements exponential backoff retry for failed transactions:

  • First retry: 30 seconds
  • Second retry: 2 minutes
  • Third retry: 10 minutes
  • After 3 failures: Mark transaction as failed, user receives notification

No manual intervention needed — adapter handles retries automatically.

Workaround Option 3: Pause Outbound Payments

If outage is prolonged (>2 hours) and retries are causing cascading failures:

# Via admin API (requires admin JWT)
curl -X POST http://localhost:3003/admin/payments/pause \
  -H "Authorization: Bearer <admin_jwt>" \
  -d '{"reason": "NBS IPS outage", "estimatedResumptionTime": "2026-04-16T14:00:00Z"}'

Effect:

  • New payment requests return HTTP 503 with message: "Plaćanja su privremeno nedostupna"
  • Existing pending transactions continue retry attempts
  • Users see in-app banner: "Trenutno ne primamo nova plaćanja"

Resume payments:

curl -X POST http://localhost:3003/admin/payments/resume \
  -H "Authorization: Bearer <admin_jwt>"

Step 7: Monitor Recovery (ETA: Ongoing)

Once NBS IPS is back online:

  1. Check transaction backlog:
# Count transactions in "processing" state
psql -h localhost -p 5434 -U dropsrbija -d dropsrbija_prod -c \
  "SELECT COUNT(*) FROM transactions WHERE status = 'processing' AND created_at > NOW() - INTERVAL '2 hours';"
  1. Verify retry processing:

Backend retries failed transactions automatically. Monitor logs:

docker logs dropsrbija-api | grep "RetryProcessor" | tail -20

Expected output:

[RetryProcessor] Retrying transaction abc123 (attempt 1/3)
[NbsIpsLog] transaction_id=abc123 response_status=200 nbs_ips_status=ACCP
[RetryProcessor] Transaction abc123 succeeded on retry
  1. Update status page:
✅ Problemi rešeni

NBS instant plaćanja su ponovo dostupna. Sva odložena plaćanja će biti procesirana automatski.

Hvala na strpljenju.

Ažurirano: [timestamp]
  1. Post-incident review (within 24 hours):
  • Total downtime duration
  • Number of affected transactions
  • Number of users impacted
  • Root cause (NBS outage vs Drop technical issue)
  • Lessons learned
  • Action items (e.g., multi-bank redundancy, better monitoring)

Step 8: Report to NBS (if required) (ETA: 72 hours)

If outage meets NBS incident reporting criteria:

  • Duration >2 hours, OR
  • 1000 failed transactions, OR

  • Security/data breach involved

Follow incident notification procedure:

See incident-notification-procedure.md for full 3-track reporting protocol:

  • Track 1: NBS initial notification (within 4 hours)
  • Track 2: NBS detailed report (within 72 hours)
  • Track 3: Poverenik data breach notification (if applicable)

Expected Outcome

Success criteria:

  • ✅ Outage detected within 5 minutes of occurrence
  • ✅ Root cause identified (NBS outage vs Drop technical issue)
  • ✅ Users notified within 15 minutes (status page update)
  • ✅ NBS contacted (if Drop-specific issue)
  • ✅ Workaround implemented (if available)
  • ✅ Service restored (or ETA communicated)
  • ✅ Post-incident review completed

Metrics to track:

  • MTTD (Mean Time to Detect): <5 minutes
  • MTTR (Mean Time to Resolve): <30 minutes for Drop issues, variable for NBS outages
  • User notification latency: <15 minutes

Escalation Path

Level 1 (On-call engineer): Follow Steps 1-6 above

Level 2 (Technical Lead): If unresolved after 30 minutes

  • Contact: Petter Graff (CodeCraft, [email protected])
  • Decide: Implement workaround, escalate to NBS, or wait for NBS resolution

Level 3 (CEO): If outage >2 hours OR customer impact >10,000 users

  • Contact: Alem Basic ([email protected], +47 404 74 251)
  • Decide: Public communication strategy, regulatory notification (NBS + Poverenik)

Post-Outage Actions

Mandatory:

  1. Update runbook — If new failure mode discovered, add to Step 3 diagnostic checklist
  2. Improve monitoring — Add alert for specific error pattern that triggered outage
  3. Document lessons learned — Add entry to Decision Log

Optional (if pattern recurs):

  1. Implement multi-bank redundancy — Partner with 2+ banks for IPS access
  2. Pre-queue transactions — Buffer transactions locally during known NBS maintenance windows
  3. Enhanced monitoring — Set up synthetic transaction every 5 minutes to detect outages faster

Last Updated: 2026-04-16
Next Review: After first real NBS IPS outage (to validate runbook effectiveness)