Runbook: AISP Balance Failure
Runbook: AISP Balance Fetch Failure
Service: AISP (Account Information Service Provider) Severity: MEDIUM (users can't see bank balance) MTTR Target: <20 minutes Owner: John (AI Director)
Symptoms
Users report they cannot see their bank account balance in Drop. Symptoms include:
- Dashboard shows "Balance unavailable" or stale balance
- Error message: "Could not fetch account information"
- Infinite loading spinner on balance widget
- Balance shows "0 kr" or "—" instead of actual amount
User impact: Cannot verify available funds before making payments (may lead to insufficient funds errors).
Diagnosis
1. Check Neonomics AISP Status
External status:
# Neonomics has no public status page — test via API
curl -X GET https://api.neonomics.io/health \
-H "Authorization: Bearer <api-key>" \
-v
# Expected: HTTP 200
# If 500/503: Neonomics outage
Check specific bank connectivity:
# List supported banks and their status
curl -X GET https://api.neonomics.io/banks \
-H "Authorization: Bearer <api-key>" \
| jq '.[] | select(.country == "NO") | {name, status}'
# Look for: "status": "degraded" or "offline"
2. Check Drop Logs
# CloudWatch Logs (production)
aws logs filter-log-events \
--log-group-name /aws/apprunner/drop-production \
--filter-pattern "aisp" \
--start-time $(date -u -d '15 minutes ago' +%s)000 \
--region eu-west-1
# Look for:
# - "AISP consent expired"
# - "AISP API timeout"
# - "AISP 401 Unauthorized"
# - "Bank API unavailable: DNB"
3. Check User Consent Status
# Verify Open Banking consent hasn't expired
# Consent is valid for 90 days from last authorization
# Check database for expired consents (PostgreSQL 16)
psql "$DATABASE_URL" <<EOF
SELECT
user_id,
bank_name,
consent_expires_at,
EXTRACT(EPOCH FROM (consent_expires_at - NOW())) / 86400 AS days_remaining
FROM bank_accounts
WHERE consent_expires_at < NOW() + INTERVAL '7 days'
ORDER BY consent_expires_at ASC
LIMIT 10;
EOF
# If days_remaining < 0: consent expired
# If days_remaining < 7: warn user to renew soon
4. Test AISP Flow
Manual test (staging):
# 1. Login
TOKEN=$(curl -X POST https://drop-staging.fly.dev/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"[email protected]","password":"test1234"}' \
| jq -r '.data.token')
# 2. Fetch balance
curl -X GET https://drop-staging.fly.dev/api/accounts/balance \
-H "Authorization: Bearer $TOKEN" \
-v
# Expected: HTTP 200, { "balance": 15000.50, "currency": "NOK" }
# If 401: Consent expired
# If 500: AISP integration broken
5. Check Rate Limiting
# Check if Neonomics API rate limit exceeded
aws logs filter-log-events \
--log-group-name /aws/apprunner/drop-production \
--filter-pattern "rate_limit" \
--start-time $(date -u -d '10 minutes ago' +%s)000 \
| jq '.events[].message' \
| grep -E "429|X-RateLimit"
# If many 429 errors: rate limiting issue
Common Causes & Solutions
Cause 1: Expired Open Banking Consent
Probability: 40% (PSD2 consent expires after 90 days)
Symptoms:
- Error code:
CONSENT_EXPIREDorCONSENT_INVALID - Logs show: "AISP consent no longer valid"
- Specific users affected (not all users)
Solution:
-
Identify affected users:
-- PostgreSQL 16 SELECT user_id, email, bank_name, consent_expires_at FROM bank_accounts JOIN users ON users.id = bank_accounts.user_id WHERE consent_expires_at < NOW(); -
Notify users to re-authorize:
Push notification (Norwegian):
Banktilkobling utløpt Godkjenningen for å hente saldo fra [Bank] har utløpt. Trykk her for å fornye tilkoblingen.Email (Norwegian):
Emne: Godkjenn tilgang til bankkonto på nytt Hei, Din godkjenning for å vise saldo fra [Bank] har utløpt etter 90 dager. Dette er et PSD2-sikkerhetskrav. Logg inn i Drop og koble til bankkontoen på nytt for å fortsette å se saldoen din. Mvh, Drop -
Guide user through re-consent:
- User taps notification → redirect to "Reconnect Bank Account" screen
- Initiate new AISP consent flow (BankID + bank authorization)
- Update
consent_expires_at= NOW() + INTERVAL '90 days'
-
Automatic consent renewal reminder:
# Cron job to warn users 7 days before expiry # Send reminder: "Your bank connection expires in 7 days, renew now"
ETA: Immediate (user action required)
Cause 2: Bank API Outage or Maintenance
Probability: 15% (specific bank temporarily unavailable)
Symptoms:
- All users of specific bank (e.g., DNB, Nordea) cannot fetch balance
- Other banks work fine
- Logs show: "Bank API timeout" or "502 Bad Gateway"
Solution:
-
Identify affected bank:
# Check which bank is failing aws logs filter-log-events \ --log-group-name /aws/apprunner/drop-production \ --filter-pattern "Bank API" \ --start-time $(date -u -d '30 minutes ago' +%s)000 \ | jq '.events[].message' \ | grep -o '"bank":"[^"]*"' \ | sort | uniq -c | sort -rn # Example output: "bank":"DNB" appears 50 times -
Check bank status:
- Visit bank's website: check for maintenance announcements
- Norwegian banks often schedule maintenance 02:00-06:00 CET
- DNB status: https://www.dnb.no/drift
- Nordea status: https://www.nordea.no/info/driftsmeldinger
-
Notify affected users (Norwegian):
Emne: Saldo midlertidig utilgjengelig for [Bank] Hei, Vi opplever for øyeblikket problemer med å hente saldo fra [Bank]. Dette skyldes tekniske problemer hos banken. Du kan fortsatt gjøre betalinger, men saldoen vises ikke akkurat nå. Vi jobber med å gjenopprette tjenesten. Estimert løsning: [X minutter/timer] Mvh, Drop -
Implement graceful degradation:
// src/app/api/accounts/balance/route.ts async function fetchBalance(userId: string) { try { return await neonomicsClient.getBalance(userId); } catch (error) { if (error.code === 'BANK_API_TIMEOUT') { // Return cached balance with warning const cached = await getCachedBalance(userId); return { balance: cached?.balance || null, currency: 'NOK', lastUpdated: cached?.timestamp, warning: 'Balance may be outdated due to bank API issues' }; } throw error; } }
ETA: Depends on bank (typically <2 hours for maintenance, <1 hour for incidents)
Cause 3: Neonomics API Outage
Probability: 10% (Neonomics service disruption)
Symptoms:
- ALL users cannot fetch balance regardless of bank
- Logs show: "Neonomics API unreachable" or HTTP 503
- Test API call to Neonomics fails
Solution:
-
Verify Neonomics outage:
# Test Neonomics health endpoint curl -X GET https://api.neonomics.io/health \ -H "Authorization: Bearer <api-key>" \ -v # If timeout or 503: confirmed outage -
Contact Neonomics support:
- Email: [email protected]
- Slack: #neonomics-support (if available)
- Check Neonomics Slack for incident updates
-
Enable fallback mode:
# Show cached balances to all users aws apprunner update-service --service-arn <ARN> \ --instance-configuration "EnvironmentVariables={ AISP_FALLBACK_MODE=cached, AISP_FALLBACK_CACHE_TTL=3600 }" -
Communicate to users (Norwegian):
Emne: Saldo vises med forsinkelse Hei, Vår leverandør for bankdata opplever tekniske problemer. Saldoen du ser kan være opptil 1 time gammel. Du kan fortsatt gjøre betalinger som normalt. Vi forventer at tjenesten er tilbake innen [X minutter]. Mvh, Drop -
Monitor Neonomics status:
- Check every 10 minutes for resolution
- When API is back: disable fallback mode
aws apprunner update-service --service-arn <ARN> \ --instance-configuration "EnvironmentVariables={ AISP_FALLBACK_MODE=live }"
ETA: Depends on Neonomics (typically <2 hours)
Cause 4: Invalid or Revoked API Credentials
Probability: 5% (after credential rotation or account issue)
Symptoms:
- Logs show: "401 Unauthorized" or "invalid_api_key"
- All AISP requests fail immediately
- Other Drop services work fine (auth, database, etc.)
Solution:
-
Verify Neonomics API credentials:
bw get item "Neonomics API" --session $BW_SESSION # Check: # - API key is not expired # - API key has AISP permissions # - Correct environment (production vs sandbox) -
Update App Runner environment variables:
aws apprunner update-service --service-arn <ARN> \ --source-configuration "ImageRepository={...}" \ --instance-configuration "EnvironmentVariables={ NEONOMICS_API_KEY=<correct-key>, NEONOMICS_ENVIRONMENT=production }" -
Trigger deployment:
aws apprunner start-deployment --service-arn <ARN> --region eu-west-1 # Wait 3-5 minutes for deployment to complete -
Test after deployment:
# Verify AISP working curl -X GET https://getdrop.no/api/accounts/balance \ -H "Authorization: Bearer <test-user-token>" \ -v # Expected: HTTP 200 with balance data
ETA: 10 minutes
Cause 5: Network or Firewall Issues
Probability: 5% (AWS security group misconfiguration)
Symptoms:
- Logs show: "Connection timeout" or "ECONNREFUSED"
- AISP API requests never reach Neonomics
- Other external APIs may also fail
Solution:
-
Check outbound connectivity:
# App Runner egress is unrestricted by default # If using VPC connector, check security group aws ec2 describe-security-groups \ --group-ids <vpc-connector-sg> \ --region eu-west-1 \ | jq '.SecurityGroups[].IpPermissionsEgress' -
Test DNS resolution:
# From your local machine or bastion host nslookup api.neonomics.io # Should resolve to Neonomics IP # If NXDOMAIN: DNS issue -
Check AWS service health:
# Check App Runner service events aws apprunner list-operations \ --service-arn <ARN> \ --region eu-west-1 \ | jq '.OperationSummaryList[] | select(.Type == "CREATE_SERVICE" or .Type == "UPDATE_SERVICE")' # Look for recent errors -
Whitelist Neonomics IPs (if using strict firewall):
- Contact Neonomics for IP ranges
- Add to security group outbound rules
- Allow HTTPS (443) to Neonomics endpoints
ETA: 15 minutes (if quick fix), 1 hour (if requires networking changes)
Cause 6: Rate Limiting (High Traffic)
Probability: 10% (during peak hours or viral event)
Symptoms:
- Logs show: HTTP 429 "Too Many Requests"
- Intermittent failures (some users see balance, others don't)
- Rate limit headers in logs
Solution:
-
Check rate limit headers:
aws logs filter-log-events \ --log-group-name /aws/apprunner/drop-production \ --filter-pattern "X-RateLimit" \ --start-time $(date -u -d '5 minutes ago' +%s)000 \ | jq -r '.events[].message' \ | grep -E "X-RateLimit-(Limit|Remaining|Reset)" -
Implement request throttling:
// src/lib/aisp-client.ts import PQueue from 'p-queue'; const queue = new PQueue({ concurrency: 10, // Max 10 concurrent requests interval: 1000, // Per second intervalCap: 50 // Max 50 requests per second }); export async function fetchBalance(userId: string) { return queue.add(() => neonomicsClient.getBalance(userId)); } -
Cache balance aggressively during rate limit:
// src/lib/balance-cache.ts const CACHE_TTL_NORMAL = 60; // 60 seconds const CACHE_TTL_RATE_LIMIT = 300; // 5 minutes during rate limit export async function getBalanceWithCache(userId: string) { const cached = await redis.get(`balance:${userId}`); if (cached) return JSON.parse(cached); try { const balance = await fetchBalance(userId); await redis.setex(`balance:${userId}`, CACHE_TTL_NORMAL, JSON.stringify(balance)); return balance; } catch (error) { if (error.status === 429) { // Extend cache TTL during rate limit await redis.expire(`balance:${userId}`, CACHE_TTL_RATE_LIMIT); } throw error; } } -
Contact Neonomics to increase rate limit:
- Email support with traffic stats
- Request higher API quota for production
- Provide justification (user growth, peak times)
ETA: 5 minutes (automatic caching), 1-2 days (if quota increase needed)
Emergency Workarounds
Option 1: Cached Balance Mode
Use case: AISP provider down >30 minutes, users need to see approximate balance
Steps:
-
Enable cached balance fallback:
aws apprunner update-service --service-arn <ARN> \ --instance-configuration "EnvironmentVariables={ AISP_MODE=cached, AISP_CACHE_TTL=3600 }" -
Show warning banner in app:
⚠️ Saldo vises med forsinkelse Vi viser din sist kjente saldo fra [timestamp]. Tjenesten er tilbake til normal snart. -
Allow payments to proceed:
- Users can still initiate payments (PISP)
- Balance check uses cached value
- Risk: Insufficient funds errors if balance changed
-
Revert when AISP is back:
aws apprunner update-service --service-arn <ARN> \ --instance-configuration "EnvironmentVariables={ AISP_MODE=live }"
Risk: Cached balance may be stale (up to 1 hour old). Users may attempt payments with insufficient funds.
Option 2: Hide Balance, Allow Payments
Use case: AISP down, no reliable cache, but PISP still works
Steps:
Risk: User experience degraded. May attempt failed payments.
Post-Incident Actions
-
Refresh all expired consents proactively:
-- PostgreSQL 16: send renewal reminders 7 days before expiry SELECT user_id, email, consent_expires_at FROM bank_accounts JOIN users ON users.id = bank_accounts.user_id WHERE consent_expires_at < NOW() + INTERVAL '7 days' AND consent_renewal_reminder_sent = FALSE; -
Document incident:
touch ~/ALAI/products/Drop/comms/incidents/$(date +%Y-%m-%d)-aisp-failure.md -
Review caching strategy:
- Is cache TTL appropriate?
- Should we cache balance longer during incidents?
- Add metrics: cache hit rate, staleness
-
Update monitoring:
- Add synthetic AISP test (fetch balance every 5 min)
- Alert on AISP failure rate >10%
- Track consent expiry dates
-
Improve user communication:
- Auto-notify users when AISP is degraded
- Show balance age: "Updated 5 minutes ago"
Escalation
| Time | Action |
|---|---|
| 0 min | John starts diagnosis |
| 10 min | If Neonomics outage confirmed, notify Alem |
| 20 min | If not resolved, enable cached balance mode |
| 1 hour | Public communication to users (Norwegian email/push) |
| 2 hours | Contact Neonomics support via phone if no response |
Contacts
- Neonomics Support: [email protected]
- Neonomics Slack: #neonomics-support (if available)
- Internal: Alem (CEO, final decision on fallback modes)
Related Documentation
docs/architecture/open-banking.md— AISP flow diagramssrc/app/api/accounts/balance/route.ts— Balance fetch implementationdocs/compliance/psd2-requirements.md— PSD2 consent rules (90-day expiry)- Vaultwarden item: "Neonomics API" — Credentials
Last Updated: 2026-02-22 Next Review: Before Phase 2 (Banking Integration)
No comments to display
No comments to display