Skip to main content

Runbook: AISP Balance Failure

Runbook: AISP Balance Fetch Failure

Service: AISP (Account Information Service Provider) Severity: MEDIUM (users can't see bank balance) MTTR Target: <20 minutes Owner: John (AI Director)


Symptoms

Users report they cannot see their bank account balance in Drop. Symptoms include:

  • Dashboard shows "Balance unavailable" or stale balance
  • Error message: "Could not fetch account information"
  • Infinite loading spinner on balance widget
  • Balance shows "0 kr" or "—" instead of actual amount

User impact: Cannot verify available funds before making payments (may lead to insufficient funds errors).


Diagnosis

1. Check Neonomics AISP Status

External status:

# Neonomics has no public status page — test via API
curl -X GET https://api.neonomics.io/health \
  -H "Authorization: Bearer <api-key>" \
  -v

# Expected: HTTP 200
# If 500/503: Neonomics outage

Check specific bank connectivity:

# List supported banks and their status
curl -X GET https://api.neonomics.io/banks \
  -H "Authorization: Bearer <api-key>" \
  | jq '.[] | select(.country == "NO") | {name, status}'

# Look for: "status": "degraded" or "offline"

2. Check Drop Logs

# CloudWatch Logs (production)
aws logs filter-log-events \
  --log-group-name /aws/apprunner/drop-production \
  --filter-pattern "aisp" \
  --start-time $(date -u -d '15 minutes ago' +%s)000 \
  --region eu-west-1

# Look for:
# - "AISP consent expired"
# - "AISP API timeout"
# - "AISP 401 Unauthorized"
# - "Bank API unavailable: DNB"

3. Check User Consent Status

# Verify Open Banking consent hasn't expired
# Consent is valid for 90 days from last authorization

# Check database for expired consents #(PostgreSQL (example16)
-psql adjust to actual Drop DB)
sqlite3 drop.db"$DATABASE_URL" <<EOF
SELECT
  user_id,
  bank_name,
  consent_expires_at,
  julianday(consent_expires_at)EXTRACT(EPOCH FROM (consent_expires_at - julianday('now'NOW())) as/ 86400 AS days_remaining
FROM bank_accounts
WHERE consent_expires_at < datetime('now',NOW() + INTERVAL '+7 days')
ORDER BY consent_expires_at ASC
LIMIT 10;
EOF

# If days_remaining < 0: consent expired
# If days_remaining < 7: warn user to renew soon

4. Test AISP Flow

Manual test (staging):

# 1. Login
TOKEN=$(curl -X POST https://drop-staging.fly.dev/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"[email protected]","password":"test1234"}' \
  | jq -r '.data.token')

# 2. Fetch balance
curl -X GET https://drop-staging.fly.dev/api/accounts/balance \
  -H "Authorization: Bearer $TOKEN" \
  -v

# Expected: HTTP 200, { "balance": 15000.50, "currency": "NOK" }
# If 401: Consent expired
# If 500: AISP integration broken

5. Check Rate Limiting

# Check if Neonomics API rate limit exceeded
aws logs filter-log-events \
  --log-group-name /aws/apprunner/drop-production \
  --filter-pattern "rate_limit" \
  --start-time $(date -u -d '10 minutes ago' +%s)000 \
  | jq '.events[].message' \
  | grep -E "429|X-RateLimit"

# If many 429 errors: rate limiting issue

Common Causes & Solutions

Cause 1: Expired Open Banking Consent

Probability: 40% (PSD2 consent expires after 90 days)

Symptoms:

Solution:

  1. Identify affected users:

    -- PostgreSQL 16
    SELECT user_id, email, bank_name, consent_expires_at
    FROM bank_accounts
    JOIN users ON users.id = bank_accounts.user_id
    WHERE consent_expires_at < datetime('now'NOW();
    
  2. Notify users to re-authorize:

    Push notification (Norwegian):

    Banktilkobling utløpt
    Godkjenningen for å hente saldo fra [Bank] har utløpt.
    Trykk her for å fornye tilkoblingen.
    

    Email (Norwegian):

    Emne: Godkjenn tilgang til bankkonto på nytt
    
    Hei,
    
    Din godkjenning for å vise saldo fra [Bank] har utløpt etter 90 dager.
    Dette er et PSD2-sikkerhetskrav.
    
    Logg inn i Drop og koble til bankkontoen på nytt for å fortsette å se saldoen din.
    
    Mvh,
    Drop
    
  3. Guide user through re-consent:

    • User taps notification → redirect to "Reconnect Bank Account" screen
    • Initiate new AISP consent flow (BankID + bank authorization)
    • Update consent_expires_at = datetime('now',NOW() + INTERVAL '+90 days')
  4. Automatic consent renewal reminder:

    # Cron job to warn users 7 days before expiry
    # Send reminder: "Your bank connection expires in 7 days, renew now"
    

ETA: Immediate (user action required)


Cause 2: Bank API Outage or Maintenance

Probability: 15% (specific bank temporarily unavailable)

Symptoms:

  • All users of specific bank (e.g., DNB, Nordea) cannot fetch balance
  • Other banks work fine
  • Logs show: "Bank API timeout" or "502 Bad Gateway"

Solution:

  1. Identify affected bank:

    # Check which bank is failing
    aws logs filter-log-events \
      --log-group-name /aws/apprunner/drop-production \
      --filter-pattern "Bank API" \
      --start-time $(date -u -d '30 minutes ago' +%s)000 \
      | jq '.events[].message' \
      | grep -o '"bank":"[^"]*"' \
      | sort | uniq -c | sort -rn
    
    # Example output: "bank":"DNB" appears 50 times
    
  2. Check bank status:

    • Visit bank's website: check for maintenance announcements
    • Norwegian banks often schedule maintenance 02:00-06:00 CET
    • DNB status: https://www.dnb.no/drift
    • Nordea status: https://www.nordea.no/info/driftsmeldinger
  3. Notify affected users (Norwegian):

    Emne: Saldo midlertidig utilgjengelig for [Bank]
    
    Hei,
    
    Vi opplever for øyeblikket problemer med å hente saldo fra [Bank].
    Dette skyldes tekniske problemer hos banken.
    
    Du kan fortsatt gjøre betalinger, men saldoen vises ikke akkurat nå.
    Vi jobber med å gjenopprette tjenesten.
    
    Estimert løsning: [X minutter/timer]
    
    Mvh,
    Drop
    
  4. Implement graceful degradation:

    // src/app/api/accounts/balance/route.ts
    async function fetchBalance(userId: string) {
      try {
        return await neonomicsClient.getBalance(userId);
      } catch (error) {
        if (error.code === 'BANK_API_TIMEOUT') {
          // Return cached balance with warning
          const cached = await getCachedBalance(userId);
          return {
            balance: cached?.balance || null,
            currency: 'NOK',
            lastUpdated: cached?.timestamp,
            warning: 'Balance may be outdated due to bank API issues'
          };
        }
        throw error;
      }
    }
    

ETA: Depends on bank (typically <2 hours for maintenance, <1 hour for incidents)


Cause 3: Neonomics API Outage

Probability: 10% (Neonomics service disruption)

Symptoms:

  • ALL users cannot fetch balance regardless of bank
  • Logs show: "Neonomics API unreachable" or HTTP 503
  • Test API call to Neonomics fails

Solution:

  1. Verify Neonomics outage:

    # Test Neonomics health endpoint
    curl -X GET https://api.neonomics.io/health \
      -H "Authorization: Bearer <api-key>" \
      -v
    
    # If timeout or 503: confirmed outage
    
  2. Contact Neonomics support:

    • Email: [email protected]
    • Slack: #neonomics-support (if available)
    • Check Neonomics Slack for incident updates
  3. Enable fallback mode:

    # Show cached balances to all users
    aws apprunner update-service --service-arn <ARN> \
      --instance-configuration "EnvironmentVariables={
        AISP_FALLBACK_MODE=cached,
        AISP_FALLBACK_CACHE_TTL=3600
      }"
    
  4. Communicate to users (Norwegian):

    Emne: Saldo vises med forsinkelse
    
    Hei,
    
    Vår leverandør for bankdata opplever tekniske problemer.
    Saldoen du ser kan være opptil 1 time gammel.
    
    Du kan fortsatt gjøre betalinger som normalt.
    Vi forventer at tjenesten er tilbake innen [X minutter].
    
    Mvh,
    Drop
    
  5. Monitor Neonomics status:

    • Check every 10 minutes for resolution
    • When API is back: disable fallback mode
    aws apprunner update-service --service-arn <ARN> \
      --instance-configuration "EnvironmentVariables={
        AISP_FALLBACK_MODE=live
      }"
    

ETA: Depends on Neonomics (typically <2 hours)


Cause 4: Invalid or Revoked API Credentials

Probability: 5% (after credential rotation or account issue)

Symptoms:

  • Logs show: "401 Unauthorized" or "invalid_api_key"
  • All AISP requests fail immediately
  • Other Drop services work fine (auth, database, etc.)

Solution:

  1. Verify Neonomics API credentials:

    bw get item "Neonomics API" --session $BW_SESSION
    
    # Check:
    # - API key is not expired
    # - API key has AISP permissions
    # - Correct environment (production vs sandbox)
    
  2. Update App Runner environment variables:

    aws apprunner update-service --service-arn <ARN> \
      --source-configuration "ImageRepository={...}" \
      --instance-configuration "EnvironmentVariables={
        NEONOMICS_API_KEY=<correct-key>,
        NEONOMICS_ENVIRONMENT=production
      }"
    
  3. Trigger deployment:

    aws apprunner start-deployment --service-arn <ARN> --region eu-west-1
    
    # Wait 3-5 minutes for deployment to complete
    
  4. Test after deployment:

    # Verify AISP working
    curl -X GET https://getdrop.no/api/accounts/balance \
      -H "Authorization: Bearer <test-user-token>" \
      -v
    
    # Expected: HTTP 200 with balance data
    

ETA: 10 minutes


Cause 5: Network or Firewall Issues

Probability: 5% (AWS security group misconfiguration)

Symptoms:

  • Logs show: "Connection timeout" or "ECONNREFUSED"
  • AISP API requests never reach Neonomics
  • Other external APIs may also fail

Solution:

  1. Check outbound connectivity:

    # App Runner egress is unrestricted by default
    # If using VPC connector, check security group
    aws ec2 describe-security-groups \
      --group-ids <vpc-connector-sg> \
      --region eu-west-1 \
      | jq '.SecurityGroups[].IpPermissionsEgress'
    
  2. Test DNS resolution:

    # From your local machine or bastion host
    nslookup api.neonomics.io
    
    # Should resolve to Neonomics IP
    # If NXDOMAIN: DNS issue
    
  3. Check AWS service health:

    # Check App Runner service events
    aws apprunner list-operations \
      --service-arn <ARN> \
      --region eu-west-1 \
      | jq '.OperationSummaryList[] | select(.Type == "CREATE_SERVICE" or .Type == "UPDATE_SERVICE")'
    
    # Look for recent errors
    
  4. Whitelist Neonomics IPs (if using strict firewall):

    • Contact Neonomics for IP ranges
    • Add to security group outbound rules
    • Allow HTTPS (443) to Neonomics endpoints

ETA: 15 minutes (if quick fix), 1 hour (if requires networking changes)


Cause 6: Rate Limiting (High Traffic)

Probability: 10% (during peak hours or viral event)

Symptoms:

  • Logs show: HTTP 429 "Too Many Requests"
  • Intermittent failures (some users see balance, others don't)
  • Rate limit headers in logs

Solution:

  1. Check rate limit headers:

    aws logs filter-log-events \
      --log-group-name /aws/apprunner/drop-production \
      --filter-pattern "X-RateLimit" \
      --start-time $(date -u -d '5 minutes ago' +%s)000 \
      | jq -r '.events[].message' \
      | grep -E "X-RateLimit-(Limit|Remaining|Reset)"
    
  2. Implement request throttling:

    // src/lib/aisp-client.ts
    import PQueue from 'p-queue';
    
    const queue = new PQueue({
      concurrency: 10,      // Max 10 concurrent requests
      interval: 1000,        // Per second
      intervalCap: 50        // Max 50 requests per second
    });
    
    export async function fetchBalance(userId: string) {
      return queue.add(() => neonomicsClient.getBalance(userId));
    }
    
  3. Cache balance aggressively during rate limit:

    // src/lib/balance-cache.ts
    const CACHE_TTL_NORMAL = 60;      // 60 seconds
    const CACHE_TTL_RATE_LIMIT = 300; // 5 minutes during rate limit
    
    export async function getBalanceWithCache(userId: string) {
      const cached = await redis.get(`balance:${userId}`);
      if (cached) return JSON.parse(cached);
    
      try {
        const balance = await fetchBalance(userId);
        await redis.setex(`balance:${userId}`, CACHE_TTL_NORMAL, JSON.stringify(balance));
        return balance;
      } catch (error) {
        if (error.status === 429) {
          // Extend cache TTL during rate limit
          await redis.expire(`balance:${userId}`, CACHE_TTL_RATE_LIMIT);
        }
        throw error;
      }
    }
    
  4. Contact Neonomics to increase rate limit:

    • Email support with traffic stats
    • Request higher API quota for production
    • Provide justification (user growth, peak times)

ETA: 5 minutes (automatic caching), 1-2 days (if quota increase needed)


Emergency Workarounds

Option 1: Cached Balance Mode

Use case: AISP provider down >30 minutes, users need to see approximate balance

Steps:

  1. Enable cached balance fallback:

    aws apprunner update-service --service-arn <ARN> \
      --instance-configuration "EnvironmentVariables={
        AISP_MODE=cached,
        AISP_CACHE_TTL=3600
      }"
    
  2. Show warning banner in app:

    ⚠️ Saldo vises med forsinkelse
    Vi viser din sist kjente saldo fra [timestamp].
    Tjenesten er tilbake til normal snart.
    
  3. Allow payments to proceed:

    • Users can still initiate payments (PISP)
    • Balance check uses cached value
    • Risk: Insufficient funds errors if balance changed
  4. Revert when AISP is back:

    aws apprunner update-service --service-arn <ARN> \
      --instance-configuration "EnvironmentVariables={
        AISP_MODE=live
      }"
    

Risk: Cached balance may be stale (up to 1 hour old). Users may attempt payments with insufficient funds.


Option 2: Hide Balance, Allow Payments

Use case: AISP down, no reliable cache, but PISP still works

Steps:

  1. Show "Balance unavailable" message:

    Saldo midlertidig utilgjengelig
    Du kan fortsatt gjøre betalinger som normalt.
    Banken vil avvise betalingen hvis du ikke har nok midler.
    
  2. Allow payments without balance check:

    • User enters payment amount
    • Drop initiates payment via PISP
    • Bank performs real-time balance check
    • If insufficient funds: bank rejects, user gets clear error
  3. Communicate ETA to users:

    Vi jobber med å gjenopprette saldovisning.
    Estimert tid: [X minutter]
    

Risk: User experience degraded. May attempt failed payments.


Post-Incident Actions

  1. Refresh all expired consents proactively:

    -- SendPostgreSQL 16: send renewal reminders 7 days before expiry
    SELECT user_id, email, consent_expires_at
    FROM bank_accounts
    JOIN users ON users.id = bank_accounts.user_id
    WHERE consent_expires_at < datetime('now',NOW() + INTERVAL '+7 days')
    AND consent_renewal_reminder_sent = 0;FALSE;
    
  2. Document incident:

    touch ~/ALAI/products/Drop/comms/incidents/$(date +%Y-%m-%d)-aisp-failure.md
    
  3. Review caching strategy:

    • Is cache TTL appropriate?
    • Should we cache balance longer during incidents?
    • Add metrics: cache hit rate, staleness
  4. Update monitoring:

    • Add synthetic AISP test (fetch balance every 5 min)
    • Alert on AISP failure rate >10%
    • Track consent expiry dates
  5. Improve user communication:

    • Auto-notify users when AISP is degraded
    • Show balance age: "Updated 5 minutes ago"

Escalation

Time Action
0 min John starts diagnosis
10 min If Neonomics outage confirmed, notify Alem
20 min If not resolved, enable cached balance mode
1 hour Public communication to users (Norwegian email/push)
2 hours Contact Neonomics support via phone if no response

Contacts

  • Neonomics Support: [email protected]
  • Neonomics Slack: #neonomics-support (if available)
  • Internal: Alem (CEO, final decision on fallback modes)

  • docs/architecture/open-banking.md — AISP flow diagrams
  • src/app/api/accounts/balance/route.ts — Balance fetch implementation
  • docs/compliance/psd2-requirements.md — PSD2 consent rules (90-day expiry)
  • Vaultwarden item: "Neonomics API" — Credentials

Last Updated: 2026-02-22 Next Review: Before Phase 2 (Banking Integration)