# Runbook: AISP Balance Failure

# Runbook: AISP Balance Fetch Failure

**Service:** AISP (Account Information Service Provider)
**Severity:** MEDIUM (users can't see bank balance)
**MTTR Target:** <20 minutes
**Owner:** John (AI Director)

---

## Symptoms

Users report they cannot see their bank account balance in Drop. Symptoms include:

- Dashboard shows "Balance unavailable" or stale balance
- Error message: "Could not fetch account information"
- Infinite loading spinner on balance widget
- Balance shows "0 kr" or "—" instead of actual amount

**User impact:** Cannot verify available funds before making payments (may lead to insufficient funds errors).

---

## Diagnosis

### 1. Check Neonomics AISP Status

**External status:**
```bash
# Neonomics has no public status page — test via API
curl -X GET https://api.neonomics.io/health \
  -H "Authorization: Bearer <api-key>" \
  -v

# Expected: HTTP 200
# If 500/503: Neonomics outage
```

**Check specific bank connectivity:**
```bash
# List supported banks and their status
curl -X GET https://api.neonomics.io/banks \
  -H "Authorization: Bearer <api-key>" \
  | jq '.[] | select(.country == "NO") | {name, status}'

# Look for: "status": "degraded" or "offline"
```

### 2. Check Drop Logs

```bash
# CloudWatch Logs (production)
aws logs filter-log-events \
  --log-group-name /aws/apprunner/drop-production \
  --filter-pattern "aisp" \
  --start-time $(date -u -d '15 minutes ago' +%s)000 \
  --region eu-west-1

# Look for:
# - "AISP consent expired"
# - "AISP API timeout"
# - "AISP 401 Unauthorized"
# - "Bank API unavailable: DNB"
```

### 3. Check User Consent Status

```bash
# Verify Open Banking consent hasn't expired
# Consent is valid for 90 days from last authorization

# Check database for expired consents (PostgreSQL 16)
psql "$DATABASE_URL" <<EOF
SELECT
  user_id,
  bank_name,
  consent_expires_at,
  EXTRACT(EPOCH FROM (consent_expires_at - NOW())) / 86400 AS days_remaining
FROM bank_accounts
WHERE consent_expires_at < NOW() + INTERVAL '7 days'
ORDER BY consent_expires_at ASC
LIMIT 10;
EOF

# If days_remaining < 0: consent expired
# If days_remaining < 7: warn user to renew soon
```

### 4. Test AISP Flow

**Manual test (staging):**
```bash
# 1. Login
TOKEN=$(curl -X POST https://drop-staging.fly.dev/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"test@example.com","password":"test1234"}' \
  | jq -r '.data.token')

# 2. Fetch balance
curl -X GET https://drop-staging.fly.dev/api/accounts/balance \
  -H "Authorization: Bearer $TOKEN" \
  -v

# Expected: HTTP 200, { "balance": 15000.50, "currency": "NOK" }
# If 401: Consent expired
# If 500: AISP integration broken
```

### 5. Check Rate Limiting

```bash
# Check if Neonomics API rate limit exceeded
aws logs filter-log-events \
  --log-group-name /aws/apprunner/drop-production \
  --filter-pattern "rate_limit" \
  --start-time $(date -u -d '10 minutes ago' +%s)000 \
  | jq '.events[].message' \
  | grep -E "429|X-RateLimit"

# If many 429 errors: rate limiting issue
```

---

## Common Causes & Solutions

### Cause 1: Expired Open Banking Consent

**Probability:** 40% (PSD2 consent expires after 90 days)

**Symptoms:**
- Error code: `CONSENT_EXPIRED` or `CONSENT_INVALID`
- Logs show: "AISP consent no longer valid"
- Specific users affected (not all users)

**Solution:**

1. **Identify affected users:**
   ```sql
   -- PostgreSQL 16
   SELECT user_id, email, bank_name, consent_expires_at
   FROM bank_accounts
   JOIN users ON users.id = bank_accounts.user_id
   WHERE consent_expires_at < NOW();
   ```

2. **Notify users to re-authorize:**

   **Push notification (Norwegian):**
   ```
   Banktilkobling utløpt
   Godkjenningen for å hente saldo fra [Bank] har utløpt.
   Trykk her for å fornye tilkoblingen.
   ```

   **Email (Norwegian):**
   ```
   Emne: Godkjenn tilgang til bankkonto på nytt

   Hei,

   Din godkjenning for å vise saldo fra [Bank] har utløpt etter 90 dager.
   Dette er et PSD2-sikkerhetskrav.

   Logg inn i Drop og koble til bankkontoen på nytt for å fortsette å se saldoen din.

   Mvh,
   Drop
   ```

3. **Guide user through re-consent:**
   - User taps notification → redirect to "Reconnect Bank Account" screen
   - Initiate new AISP consent flow (BankID + bank authorization)
   - Update `consent_expires_at` = NOW() + INTERVAL '90 days'

4. **Automatic consent renewal reminder:**
   ```bash
   # Cron job to warn users 7 days before expiry
   # Send reminder: "Your bank connection expires in 7 days, renew now"
   ```

**ETA:** Immediate (user action required)

---

### Cause 2: Bank API Outage or Maintenance

**Probability:** 15% (specific bank temporarily unavailable)

**Symptoms:**
- All users of specific bank (e.g., DNB, Nordea) cannot fetch balance
- Other banks work fine
- Logs show: "Bank API timeout" or "502 Bad Gateway"

**Solution:**

1. **Identify affected bank:**
   ```bash
   # Check which bank is failing
   aws logs filter-log-events \
     --log-group-name /aws/apprunner/drop-production \
     --filter-pattern "Bank API" \
     --start-time $(date -u -d '30 minutes ago' +%s)000 \
     | jq '.events[].message' \
     | grep -o '"bank":"[^"]*"' \
     | sort | uniq -c | sort -rn

   # Example output: "bank":"DNB" appears 50 times
   ```

2. **Check bank status:**
   - Visit bank's website: check for maintenance announcements
   - Norwegian banks often schedule maintenance 02:00-06:00 CET
   - DNB status: https://www.dnb.no/drift
   - Nordea status: https://www.nordea.no/info/driftsmeldinger

3. **Notify affected users (Norwegian):**
   ```
   Emne: Saldo midlertidig utilgjengelig for [Bank]

   Hei,

   Vi opplever for øyeblikket problemer med å hente saldo fra [Bank].
   Dette skyldes tekniske problemer hos banken.

   Du kan fortsatt gjøre betalinger, men saldoen vises ikke akkurat nå.
   Vi jobber med å gjenopprette tjenesten.

   Estimert løsning: [X minutter/timer]

   Mvh,
   Drop
   ```

4. **Implement graceful degradation:**
   ```typescript
   // src/app/api/accounts/balance/route.ts
   async function fetchBalance(userId: string) {
     try {
       return await neonomicsClient.getBalance(userId);
     } catch (error) {
       if (error.code === 'BANK_API_TIMEOUT') {
         // Return cached balance with warning
         const cached = await getCachedBalance(userId);
         return {
           balance: cached?.balance || null,
           currency: 'NOK',
           lastUpdated: cached?.timestamp,
           warning: 'Balance may be outdated due to bank API issues'
         };
       }
       throw error;
     }
   }
   ```

**ETA:** Depends on bank (typically <2 hours for maintenance, <1 hour for incidents)

---

### Cause 3: Neonomics API Outage

**Probability:** 10% (Neonomics service disruption)

**Symptoms:**
- ALL users cannot fetch balance regardless of bank
- Logs show: "Neonomics API unreachable" or HTTP 503
- Test API call to Neonomics fails

**Solution:**

1. **Verify Neonomics outage:**
   ```bash
   # Test Neonomics health endpoint
   curl -X GET https://api.neonomics.io/health \
     -H "Authorization: Bearer <api-key>" \
     -v

   # If timeout or 503: confirmed outage
   ```

2. **Contact Neonomics support:**
   - Email: support@neonomics.io
   - Slack: #neonomics-support (if available)
   - Check Neonomics Slack for incident updates

3. **Enable fallback mode:**
   ```bash
   # Show cached balances to all users
   aws apprunner update-service --service-arn <ARN> \
     --instance-configuration "EnvironmentVariables={
       AISP_FALLBACK_MODE=cached,
       AISP_FALLBACK_CACHE_TTL=3600
     }"
   ```

4. **Communicate to users (Norwegian):**
   ```
   Emne: Saldo vises med forsinkelse

   Hei,

   Vår leverandør for bankdata opplever tekniske problemer.
   Saldoen du ser kan være opptil 1 time gammel.

   Du kan fortsatt gjøre betalinger som normalt.
   Vi forventer at tjenesten er tilbake innen [X minutter].

   Mvh,
   Drop
   ```

5. **Monitor Neonomics status:**
   - Check every 10 minutes for resolution
   - When API is back: disable fallback mode
   ```bash
   aws apprunner update-service --service-arn <ARN> \
     --instance-configuration "EnvironmentVariables={
       AISP_FALLBACK_MODE=live
     }"
   ```

**ETA:** Depends on Neonomics (typically <2 hours)

---

### Cause 4: Invalid or Revoked API Credentials

**Probability:** 5% (after credential rotation or account issue)

**Symptoms:**
- Logs show: "401 Unauthorized" or "invalid_api_key"
- All AISP requests fail immediately
- Other Drop services work fine (auth, database, etc.)

**Solution:**

1. **Verify Neonomics API credentials:**
   ```bash
   bw get item "Neonomics API" --session $BW_SESSION

   # Check:
   # - API key is not expired
   # - API key has AISP permissions
   # - Correct environment (production vs sandbox)
   ```

2. **Update App Runner environment variables:**
   ```bash
   aws apprunner update-service --service-arn <ARN> \
     --source-configuration "ImageRepository={...}" \
     --instance-configuration "EnvironmentVariables={
       NEONOMICS_API_KEY=<correct-key>,
       NEONOMICS_ENVIRONMENT=production
     }"
   ```

3. **Trigger deployment:**
   ```bash
   aws apprunner start-deployment --service-arn <ARN> --region eu-west-1

   # Wait 3-5 minutes for deployment to complete
   ```

4. **Test after deployment:**
   ```bash
   # Verify AISP working
   curl -X GET https://getdrop.no/api/accounts/balance \
     -H "Authorization: Bearer <test-user-token>" \
     -v

   # Expected: HTTP 200 with balance data
   ```

**ETA:** 10 minutes

---

### Cause 5: Network or Firewall Issues

**Probability:** 5% (AWS security group misconfiguration)

**Symptoms:**
- Logs show: "Connection timeout" or "ECONNREFUSED"
- AISP API requests never reach Neonomics
- Other external APIs may also fail

**Solution:**

1. **Check outbound connectivity:**
   ```bash
   # App Runner egress is unrestricted by default
   # If using VPC connector, check security group
   aws ec2 describe-security-groups \
     --group-ids <vpc-connector-sg> \
     --region eu-west-1 \
     | jq '.SecurityGroups[].IpPermissionsEgress'
   ```

2. **Test DNS resolution:**
   ```bash
   # From your local machine or bastion host
   nslookup api.neonomics.io

   # Should resolve to Neonomics IP
   # If NXDOMAIN: DNS issue
   ```

3. **Check AWS service health:**
   ```bash
   # Check App Runner service events
   aws apprunner list-operations \
     --service-arn <ARN> \
     --region eu-west-1 \
     | jq '.OperationSummaryList[] | select(.Type == "CREATE_SERVICE" or .Type == "UPDATE_SERVICE")'

   # Look for recent errors
   ```

4. **Whitelist Neonomics IPs (if using strict firewall):**
   - Contact Neonomics for IP ranges
   - Add to security group outbound rules
   - Allow HTTPS (443) to Neonomics endpoints

**ETA:** 15 minutes (if quick fix), 1 hour (if requires networking changes)

---

### Cause 6: Rate Limiting (High Traffic)

**Probability:** 10% (during peak hours or viral event)

**Symptoms:**
- Logs show: HTTP 429 "Too Many Requests"
- Intermittent failures (some users see balance, others don't)
- Rate limit headers in logs

**Solution:**

1. **Check rate limit headers:**
   ```bash
   aws logs filter-log-events \
     --log-group-name /aws/apprunner/drop-production \
     --filter-pattern "X-RateLimit" \
     --start-time $(date -u -d '5 minutes ago' +%s)000 \
     | jq -r '.events[].message' \
     | grep -E "X-RateLimit-(Limit|Remaining|Reset)"
   ```

2. **Implement request throttling:**
   ```typescript
   // src/lib/aisp-client.ts
   import PQueue from 'p-queue';

   const queue = new PQueue({
     concurrency: 10,      // Max 10 concurrent requests
     interval: 1000,        // Per second
     intervalCap: 50        // Max 50 requests per second
   });

   export async function fetchBalance(userId: string) {
     return queue.add(() => neonomicsClient.getBalance(userId));
   }
   ```

3. **Cache balance aggressively during rate limit:**
   ```typescript
   // src/lib/balance-cache.ts
   const CACHE_TTL_NORMAL = 60;      // 60 seconds
   const CACHE_TTL_RATE_LIMIT = 300; // 5 minutes during rate limit

   export async function getBalanceWithCache(userId: string) {
     const cached = await redis.get(`balance:${userId}`);
     if (cached) return JSON.parse(cached);

     try {
       const balance = await fetchBalance(userId);
       await redis.setex(`balance:${userId}`, CACHE_TTL_NORMAL, JSON.stringify(balance));
       return balance;
     } catch (error) {
       if (error.status === 429) {
         // Extend cache TTL during rate limit
         await redis.expire(`balance:${userId}`, CACHE_TTL_RATE_LIMIT);
       }
       throw error;
     }
   }
   ```

4. **Contact Neonomics to increase rate limit:**
   - Email support with traffic stats
   - Request higher API quota for production
   - Provide justification (user growth, peak times)

**ETA:** 5 minutes (automatic caching), 1-2 days (if quota increase needed)

---

## Emergency Workarounds

### Option 1: Cached Balance Mode

**Use case:** AISP provider down >30 minutes, users need to see approximate balance

**Steps:**

1. Enable cached balance fallback:
   ```bash
   aws apprunner update-service --service-arn <ARN> \
     --instance-configuration "EnvironmentVariables={
       AISP_MODE=cached,
       AISP_CACHE_TTL=3600
     }"
   ```

2. Show warning banner in app:
   ```
   ⚠️ Saldo vises med forsinkelse
   Vi viser din sist kjente saldo fra [timestamp].
   Tjenesten er tilbake til normal snart.
   ```

3. Allow payments to proceed:
   - Users can still initiate payments (PISP)
   - Balance check uses cached value
   - Risk: Insufficient funds errors if balance changed

4. **Revert when AISP is back:**
   ```bash
   aws apprunner update-service --service-arn <ARN> \
     --instance-configuration "EnvironmentVariables={
       AISP_MODE=live
     }"
   ```

**Risk:** Cached balance may be stale (up to 1 hour old). Users may attempt payments with insufficient funds.

---

### Option 2: Hide Balance, Allow Payments

**Use case:** AISP down, no reliable cache, but PISP still works

**Steps:**

1. Show "Balance unavailable" message:
   ```
   Saldo midlertidig utilgjengelig
   Du kan fortsatt gjøre betalinger som normalt.
   Banken vil avvise betalingen hvis du ikke har nok midler.
   ```

2. Allow payments without balance check:
   - User enters payment amount
   - Drop initiates payment via PISP
   - Bank performs real-time balance check
   - If insufficient funds: bank rejects, user gets clear error

3. Communicate ETA to users:
   ```
   Vi jobber med å gjenopprette saldovisning.
   Estimert tid: [X minutter]
   ```

**Risk:** User experience degraded. May attempt failed payments.

---

## Post-Incident Actions

1. **Refresh all expired consents proactively:**
   ```sql
   -- PostgreSQL 16: send renewal reminders 7 days before expiry
   SELECT user_id, email, consent_expires_at
   FROM bank_accounts
   JOIN users ON users.id = bank_accounts.user_id
   WHERE consent_expires_at < NOW() + INTERVAL '7 days'
   AND consent_renewal_reminder_sent = FALSE;
   ```

2. **Document incident:**
   ```bash
   touch ~/ALAI/products/Drop/comms/incidents/$(date +%Y-%m-%d)-aisp-failure.md
   ```

3. **Review caching strategy:**
   - Is cache TTL appropriate?
   - Should we cache balance longer during incidents?
   - Add metrics: cache hit rate, staleness

4. **Update monitoring:**
   - Add synthetic AISP test (fetch balance every 5 min)
   - Alert on AISP failure rate >10%
   - Track consent expiry dates

5. **Improve user communication:**
   - Auto-notify users when AISP is degraded
   - Show balance age: "Updated 5 minutes ago"

---

## Escalation

| Time | Action |
|------|--------|
| 0 min | John starts diagnosis |
| 10 min | If Neonomics outage confirmed, notify Alem |
| 20 min | If not resolved, enable cached balance mode |
| 1 hour | Public communication to users (Norwegian email/push) |
| 2 hours | Contact Neonomics support via phone if no response |

---

## Contacts

- **Neonomics Support:** support@neonomics.io
- **Neonomics Slack:** #neonomics-support (if available)
- **Internal:** Alem (CEO, final decision on fallback modes)

---

## Related Documentation

- `docs/architecture/open-banking.md` — AISP flow diagrams
- `src/app/api/accounts/balance/route.ts` — Balance fetch implementation
- `docs/compliance/psd2-requirements.md` — PSD2 consent rules (90-day expiry)
- Vaultwarden item: "Neonomics API" — Credentials

---

**Last Updated:** 2026-02-22
**Next Review:** Before Phase 2 (Banking Integration)