# drop-transaction-failure-spec

# Drop Transaction Failure Handling & Recovery

**Task:** MC #1191
**Created:** 2026-02-17
**Author:** John (Software Architect Agent)
**Status:** DRAFT — Awaiting Alem approval

---

## Executive Summary

This specification defines comprehensive transaction failure handling for Drop's fintech payment system. Drop operates as a PSD2 PISP (Payment Initiation Service Provider) — we initiate payments from users' bank accounts but never hold customer money. This creates unique challenges:

- **External dependency:** Every transaction depends on user's bank and Open Banking provider
- **Asynchronous flow:** PISP initiation → bank processing → status callback (can take seconds to days)
- **Failure modes:** Network timeouts, bank declines, partial processing, provider outages
- **Customer impact:** Real money, real trust — failures must be handled gracefully

**Core principles:**
1. **Clear state machine** — No ambiguous states
2. **Idempotency** — Network retries never cause double-charges
3. **Automatic retry** — Transient failures self-heal
4. **User communication** — Always tell user what's happening
5. **Admin tooling** — Manual intervention when automation can't resolve

---

## 1. Current State Analysis

### 1.1 What We Have (Good)

**Idempotency keys:**
- Both `/api/transactions/remittance/route.ts` and `/api/transactions/qr-payment/route.ts` accept `idempotencyKey`
- Check for existing transaction: `SELECT ... WHERE idempotency_key = ? AND user_id = ?`
- Returns cached response for duplicate requests (prevents double-charge)
- **Status:** ✅ **Production-ready**

**Basic error handling:**
- `insufficient_balance` error caught and returned as 402
- Rate limiting: IP (10/min) + user (3/min)
- Transaction wrapped in DB transaction (atomic balance check + insert)
- **Status:** ✅ **Good foundation**

**30-second timeout:**
- PISP API calls have `AbortController` with 30s timeout
- Returns specific timeout error: "Payment request timeout"
- **Status:** ✅ **Implemented**

### 1.2 What's Missing (Critical Gaps)

❌ **State machine enforcement:**
- `transactions.status` has CHECK constraint: `'processing','completed','failed'`
- But no state transition validation (can jump from processing → completed without rules)
- No transition audit (who/when/why status changed)

❌ **Retry logic:**
- Timeout errors return failure immediately — no retry
- No exponential backoff
- No max retry counter
- No dead letter queue for permanently failed transactions

❌ **Background reconciliation:**
- Transactions stuck in `processing` status stay there forever
- No periodic job to check PISP provider for status updates
- No admin alert when transactions are stuck

❌ **Partial failure handling:**
- FX conversion success + transfer failure → no rollback/refund flow
- No compensation logic for partial state

❌ **User communication:**
- No transaction status page showing real-time progress
- No push notification on status change
- No email on final completion/failure
- Error messages are generic (not user-friendly)

❌ **Admin tools:**
- No `/api/admin/transactions/stuck` endpoint to list limbo transactions
- No manual retry mechanism
- No manual resolution workflow

---

## 2. Transaction State Machine

### 2.1 States

```
┌─────────────┐
│  initiated  │ ──────┐
└─────────────┘       │
      │               │
      ▼               │
┌─────────────┐       │
│ processing  │       │ (timeout after 30s)
└─────────────┘       │
      │               │
      ├───────────────┴────────────┐
      │                            │
      ▼                            ▼
┌─────────────┐            ┌─────────────┐
│  completed  │            │   timeout   │
└─────────────┘            └─────────────┘
                                  │
                                  ▼
                           ┌─────────────┐
                           │   failed    │
                           └─────────────┘

┌─────────────────────────────────────────────────────┐
│             partially_completed                      │ (future — FX success, transfer fail)
└─────────────────────────────────────────────────────┘
```

### 2.2 State Definitions

| State | Meaning | Terminal? | User-Facing Message |
|-------|---------|-----------|---------------------|
| `initiated` | API request received, validation passed, DB record created | No | "Initiating payment..." |
| `processing` | PISP provider accepted request, waiting for bank confirmation | No | "Your payment is being processed" |
| `timeout` | PISP provider didn't respond within 30s, will check status later | No | "Processing your payment — we'll notify you when complete" |
| `completed` | Bank confirmed payment successful | Yes | "Payment completed" |
| `failed` | Bank declined, or PISP returned permanent error | Yes | "Payment failed: [reason]" |
| `partially_completed` | FX conversion succeeded but transfer failed (future) | No | "Processing refund..." |

**Terminal states:** `completed`, `failed` — no further transitions allowed

### 2.3 Valid Transitions

```typescript
const VALID_TRANSITIONS = {
  initiated: ["processing", "failed"],
  processing: ["completed", "timeout", "failed"],
  timeout: ["completed", "failed", "processing"], // retry
  partially_completed: ["completed", "failed"], // after refund
  completed: [], // terminal
  failed: [], // terminal
};
```

**Enforcement:** Database CHECK constraint + application-level validation

### 2.4 Transition Audit

Every status change logged in `audit_log`:

```sql
INSERT INTO audit_log (
  id, user_id, action, resource_type, resource_id,
  details, ip_address, user_agent, request_id
) VALUES (
  'aud_xyz', 'usr_abc', 'TRANSACTION_STATUS_CHANGE',
  'transaction', 'tx_rem_123',
  '{"from": "processing", "to": "completed", "reason": "PISP callback", "external_id": "ext_456"}',
  '10.0.1.5', 'Drop-iOS/1.0', 'req_789'
);
```

**Compliance:** PSD2 requires 5-year audit trail of all payment operations

---

## 3. Idempotency

### 3.1 Current Implementation (Keep It)

✅ **Already production-ready:**

```typescript
// Check for existing transaction with this idempotency key (scoped to user)
const existing = await getOne<ExistingTx>(
  "SELECT id, type, status, amount, currency, fee, ...
   FROM transactions
   WHERE idempotency_key = ? AND user_id = ?",
  [idempotencyKey, u.id]
);

if (existing) {
  // Return cached response (same payload as successful creation)
  return NextResponse.json({ data: existing }, { status: 200 });
}
```

**Key features:**
- Scoped to user (prevents IDOR)
- Returns exact same response (status 200, not 201)
- No expiry — idempotency keys valid forever
- Client must generate UUID or similar unique key

### 3.2 Best Practices

**Client implementation:**
```typescript
// Generate idempotency key client-side
const idempotencyKey = `${userId}_${Date.now()}_${crypto.randomUUID()}`;

// Send with every payment request
await fetch('/api/transactions/remittance', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    recipientId: 'rec_123',
    amount: 500,
    idempotencyKey, // ← REQUIRED
  })
});
```

**No changes needed** — current implementation is correct

---

## 4. Retry Logic

### 4.1 Classification of Errors

| Error | Type | Retry? | Example |
|-------|------|--------|---------|
| Network timeout | Transient | ✅ Yes | `AbortError`, socket timeout |
| PISP 5xx | Transient | ✅ Yes | 500 Internal Server Error, 503 Service Unavailable |
| PISP 4xx client error | Permanent | ❌ No | 400 Bad Request, 401 Unauthorized |
| Bank decline | Permanent | ❌ No | Insufficient funds (from bank), invalid IBAN |
| Validation error | Permanent | ❌ No | Amount < minimum, KYC not approved |

**Rule:** Only retry errors that are transient (temporary network/server issues)

### 4.2 Exponential Backoff Strategy

**Max retries:** 3 attempts
**Delays:** 2s → 8s → 32s (exponential)
**Jitter:** ±20% to avoid thundering herd

```typescript
const RETRY_CONFIG = {
  maxRetries: 3,
  baseDelayMs: 2000, // 2 seconds
  maxDelayMs: 60000, // 1 minute cap
  jitterPercent: 0.2, // ±20%
};

function calculateDelay(attempt: number): number {
  const exponentialDelay = RETRY_CONFIG.baseDelayMs * Math.pow(4, attempt - 1);
  const cappedDelay = Math.min(exponentialDelay, RETRY_CONFIG.maxDelayMs);
  const jitter = cappedDelay * RETRY_CONFIG.jitterPercent * (Math.random() * 2 - 1);
  return Math.floor(cappedDelay + jitter);
}

// Attempt 1: 2s ± 400ms  = 1.6-2.4s
// Attempt 2: 8s ± 1.6s   = 6.4-9.6s
// Attempt 3: 32s ± 6.4s  = 25.6-38.4s
```

### 4.3 Retry Implementation

**Two approaches:**

#### Option A: In-Process Retry (Simpler, Recommended for MVP)

Retry within the same API request (synchronous):

```typescript
async function callPispWithRetry(
  fn: () => Promise<PaymentResult>,
  txId: string
): Promise<PaymentResult> {
  let lastError: Error | null = null;

  for (let attempt = 1; attempt <= RETRY_CONFIG.maxRetries; attempt++) {
    try {
      const result = await fn();

      // Success — return immediately
      if (result.success) return result;

      // Permanent error (4xx, bank decline) — don't retry
      if (isPermanentError(result.error)) {
        await logAudit({
          userId: txId,
          action: "PISP_PERMANENT_ERROR",
          resourceType: "transaction",
          resourceId: txId,
          details: { attempt, error: result.error },
        });
        return result;
      }

      // Transient error — prepare to retry
      lastError = new Error(result.error || "Unknown error");

    } catch (error) {
      lastError = error as Error;

      // Non-retryable (validation error, etc.)
      if (!isTransientError(error)) throw error;
    }

    // If not last attempt, wait before retry
    if (attempt < RETRY_CONFIG.maxRetries) {
      const delay = calculateDelay(attempt);
      await logAudit({
        userId: txId,
        action: "PISP_RETRY_SCHEDULED",
        resourceType: "transaction",
        resourceId: txId,
        details: { attempt, nextAttempt: attempt + 1, delayMs: delay },
      });
      await sleep(delay);
    }
  }

  // All retries exhausted
  await logAudit({
    userId: txId,
    action: "PISP_ALL_RETRIES_FAILED",
    resourceType: "transaction",
    resourceId: txId,
    details: { maxRetries: RETRY_CONFIG.maxRetries, lastError: lastError?.message },
  });

  return {
    success: false,
    status: "failed",
    error: `Payment failed after ${RETRY_CONFIG.maxRetries} attempts`
  };
}
```

**Pros:**
- Simple — no queue infrastructure needed
- User waits for final result (good UX for fast retries)
- Automatic cleanup (no orphan jobs)

**Cons:**
- Request can take up to ~40s (blocks thread)
- If server crashes mid-retry, transaction stuck
- No visibility into retry progress

#### Option B: Background Job Queue (Production-Grade)

Move retries to background worker using job queue:

**Tech stack:**
- **Job queue:** BullMQ (Redis-backed) or pg-boss (PostgreSQL-backed, no extra infra)
- **Worker:** Separate process polls queue every 5s

**Flow:**
1. API route creates transaction with status `initiated`
2. Enqueue job: `{ type: "pisp_call", txId: "tx_rem_123", attempt: 1 }`
3. Return to user: `{ status: "processing", txId: "tx_rem_123" }`
4. Worker picks job → calls PISP → updates transaction status
5. On transient failure → re-enqueue with delay + increment attempt
6. On success/permanent failure → mark transaction terminal

**Pros:**
- Non-blocking (API responds instantly)
- Survives server restarts (jobs persisted in DB)
- Can inspect queue (show pending retries in admin dashboard)

**Cons:**
- More complex (requires job queue setup)
- More infrastructure (Redis or pg-boss tables)
- User must poll `/api/transactions/[id]` for status updates

**Recommendation:** Start with Option A (in-process) for MVP. Migrate to Option B when transaction volume increases.

### 4.4 Dead Letter Queue

After max retries exhausted:

1. **Mark transaction as `failed`** with reason: `"PISP provider unreachable after 3 attempts"`
2. **Create admin alert** in separate table:

```sql
CREATE TABLE admin_alerts (
  id TEXT PRIMARY KEY,
  alert_type TEXT NOT NULL, -- 'transaction_stuck', 'pisp_failure', etc.
  severity TEXT NOT NULL CHECK(severity IN ('low','medium','high','critical')),
  resource_type TEXT,
  resource_id TEXT,
  title TEXT NOT NULL,
  description TEXT,
  status TEXT DEFAULT 'open' CHECK(status IN ('open','investigating','resolved','dismissed')),
  created_at TEXT DEFAULT (datetime('now')),
  resolved_at TEXT,
  resolved_by TEXT
);

INSERT INTO admin_alerts (
  id, alert_type, severity, resource_type, resource_id,
  title, description
) VALUES (
  'alert_xyz', 'transaction_stuck', 'high', 'transaction', 'tx_rem_123',
  'Transaction failed after 3 retries',
  'Transaction tx_rem_123 (user: usr_abc, amount: 500 NOK) failed to process after 3 attempts. PISP provider returned: "Service Unavailable". Manual investigation required.'
);
```

3. **Send Slack/email to ops team** (via webhook or existing notification system)

4. **Admin dashboard shows alert** at `/admin/alerts` with:
   - Transaction details
   - Retry history (from audit_log)
   - Manual actions: "Retry Now", "Refund User", "Mark Resolved"

---

## 5. Timeout Recovery

### 5.1 Scenario

User initiates payment → PISP accepts request → network drops → no response after 30s → transaction stuck in `processing`

**Current behavior:** API returns error, transaction never completes

**New behavior:** Mark as `timeout`, schedule background reconciliation

### 5.2 Implementation

**Step 1:** On timeout, transition to `timeout` status

```typescript
// In payments.ts
if (error instanceof Error && error.name === "AbortError") {
  // Don't fail immediately — schedule status check
  await updateTransactionStatus(txId, "timeout", "PISP request timeout - will check status later");

  // Enqueue background reconciliation job (runs after 2 min)
  await scheduleStatusCheck(txId, 120000); // 2 minutes

  return {
    success: true, // ← YES! Tell API route we handled it
    status: "timeout",
    message: "Payment is processing — we'll notify you when complete"
  };
}
```

**Step 2:** Background worker checks status

```typescript
// reconciliation-worker.ts
async function checkTransactionStatus(txId: string) {
  const tx = await getOne("SELECT * FROM transactions WHERE id = ?", [txId]);
  if (!tx) return;

  // Call PISP provider's GET /payments/{id} endpoint
  const status = await pispProvider.getPaymentStatus(tx.external_id);

  if (status.completed) {
    await updateTransactionStatus(txId, "completed", "Confirmed via reconciliation");
    await notifyUser(tx.user_id, "payment_completed", { txId });
  } else if (status.failed) {
    await updateTransactionStatus(txId, "failed", status.reason);
    await notifyUser(tx.user_id, "payment_failed", { txId, reason: status.reason });
  } else {
    // Still processing — check again in 5 min
    await scheduleStatusCheck(txId, 300000); // 5 minutes
  }
}
```

**Step 3:** Periodic sweep (every 10 minutes)

Find all transactions stuck in `timeout` or `processing` for > 10 minutes:

```sql
SELECT id FROM transactions
WHERE status IN ('timeout', 'processing')
  AND created_at < datetime('now', '-10 minutes')
LIMIT 100;
```

For each: call `checkTransactionStatus(txId)`

### 5.3 User Experience

**User sees:**
1. **Immediate response:** "Processing your payment — we'll send you a notification when it's complete" (status 202)
2. **Push notification (1-2 min later):** "Your 500 NOK payment to Mama Jasmina is complete"
3. **Transaction list updates:** Polling `/api/transactions` or WebSocket push

**What if it never completes?**
- After 24 hours stuck in `timeout` → mark as `failed` + admin alert
- User can contact support via `/support` page
- Admin manually investigates + refunds if needed

---

## 6. Partial Failure Handling

### 6.1 Scenario (Future — FX Conversion)

Remittance flow with FX conversion:
1. User sends 500 NOK → 5,085 RSD
2. FX conversion succeeds (NOK debited from user's bank)
3. International transfer fails (recipient bank rejects)
4. **Problem:** User's money is gone, recipient didn't receive it

**Current code:** No FX conversion step (demo uses hardcoded exchange rates)

**Future risk:** When FX provider is added, must handle partial success

### 6.2 Classification

| Scenario | Recoverable? | Action |
|----------|-------------|--------|
| FX success + transfer success | N/A | ✅ Complete |
| FX success + transfer fail | ✅ Yes | Refund converted amount back to NOK |
| FX fail + transfer not attempted | ✅ Yes | Transaction never started, return error |
| FX timeout + transfer unknown | ⚠️ Maybe | Check FX provider status, then refund or complete |

### 6.3 Compensation Flow (When FX Added)

**Database changes:**

Add `compensation_status` field:

```sql
ALTER TABLE transactions ADD COLUMN compensation_status TEXT CHECK(
  compensation_status IN ('none', 'pending', 'completed', 'failed')
) DEFAULT 'none';
```

**Flow:**

```typescript
// 1. Attempt FX conversion
const fxResult = await fxProvider.convert({ from: "NOK", to: "RSD", amount: 500 });

if (!fxResult.success) {
  await updateTransactionStatus(txId, "failed", "FX conversion failed");
  return { success: false, status: "failed", error: fxResult.error };
}

// 2. Mark FX complete
await run("UPDATE transactions SET fx_completed_at = datetime('now'), fx_external_id = ? WHERE id = ?",
  [fxResult.externalId, txId]);

// 3. Attempt international transfer
const transferResult = await pispProvider.transferInternational({ ... });

if (!transferResult.success) {
  // Transfer failed — need to refund FX
  await updateTransactionStatus(txId, "partially_completed", "Transfer failed, initiating refund");
  await run("UPDATE transactions SET compensation_status = 'pending' WHERE id = ?", [txId]);

  // 4. Initiate refund (convert RSD back to NOK + credit user's bank account)
  const refundResult = await fxProvider.refund({
    originalConversionId: fxResult.externalId,
    recipientBankAccountId: tx.from_bank_account_id
  });

  if (refundResult.success) {
    await updateTransactionStatus(txId, "failed", "Transfer failed, refund completed");
    await run("UPDATE transactions SET compensation_status = 'completed' WHERE id = ?", [txId]);
  } else {
    // Refund also failed — escalate to manual review
    await updateTransactionStatus(txId, "failed", "Transfer and refund failed - manual review required");
    await run("UPDATE transactions SET compensation_status = 'failed' WHERE id = ?", [txId]);
    await createAdminAlert({
      type: "compensation_failed",
      severity: "critical",
      resourceId: txId,
      title: "Refund failed after partial payment",
      description: `Transaction ${txId}: FX conversion succeeded (${fxResult.externalId}) but transfer and refund both failed. User's 500 NOK is stuck in limbo. URGENT MANUAL INTERVENTION REQUIRED.`
    });
  }
}
```

**SLA:** Refund must complete within 24 hours (PSD2 requirement)

### 6.4 Edge Cases

**Q: What if refund takes 48 hours?**
A: Status remains `partially_completed` until refund clears. User sees: "Processing refund — this may take up to 2 business days"

**Q: What if user's bank account is closed?**
A: Refund fails → admin alert → manual investigation → refund via alternative method (e.g., bank transfer to new account)

**Q: What if FX provider goes down during refund?**
A: Retry with exponential backoff (same logic as Step 4). After max retries → admin alert.

---

## 7. User Communication

### 7.1 Transaction Status Page

**Route:** `/transactions/[id]`

**Content:**

```tsx
// src/app/transactions/[id]/page.tsx

export default function TransactionDetailPage({ params }: { params: { id: string } }) {
  const { data: tx } = useSWR(`/api/transactions/${params.id}`, fetcher, {
    refreshInterval: tx?.status === "processing" || tx?.status === "timeout" ? 2000 : 0
  });

  if (!tx) return <div>Loading...</div>;

  return (
    <div className="p-6">
      <StatusBadge status={tx.status} />
      <h1 className="text-2xl font-semibold mt-4">{tx.type === "remittance" ? "Money Transfer" : "QR Payment"}</h1>

      {/* Real-time status */}
      <div className="mt-6">
        {tx.status === "initiated" && <StatusMessage icon="⏳" message="Initiating payment..." />}
        {tx.status === "processing" && <StatusMessage icon="🔄" message="Your payment is being processed" />}
        {tx.status === "timeout" && <StatusMessage icon="⏰" message="Processing your payment — we'll notify you when complete" />}
        {tx.status === "completed" && <StatusMessage icon="✅" message="Payment completed" />}
        {tx.status === "failed" && <StatusMessage icon="❌" message={`Payment failed: ${tx.failure_reason || "Unknown error"}`} />}
      </div>

      {/* Timeline */}
      <div className="mt-8">
        <h2 className="font-medium mb-4">Timeline</h2>
        <Timeline events={tx.timeline} />
      </div>

      {/* Details */}
      <div className="mt-8 grid grid-cols-2 gap-4">
        <DetailRow label="Amount" value={`${tx.amount} ${tx.currency}`} />
        <DetailRow label="Fee" value={`${tx.fee} ${tx.currency}`} />
        {tx.type === "remittance" && (
          <>
            <DetailRow label="Recipient" value={tx.recipient_name} />
            <DetailRow label="Exchange Rate" value={tx.exchange_rate} />
            <DetailRow label="Recipient Gets" value={`${tx.receive_amount} ${tx.receive_currency}`} />
            <DetailRow label="ETA" value={tx.eta || "1-2 business days"} />
          </>
        )}
        <DetailRow label="Transaction ID" value={tx.id} />
        <DetailRow label="Created" value={new Date(tx.created_at).toLocaleString("nb-NO")} />
      </div>

      {/* Actions */}
      {tx.status === "failed" && (
        <button className="mt-6 btn-primary" onClick={() => retryTransaction(tx.id)}>
          Try Again
        </button>
      )}
    </div>
  );
}
```

**Timeline data:**

API response includes `timeline` array:

```json
{
  "id": "tx_rem_123",
  "status": "completed",
  "timeline": [
    { "timestamp": "2026-02-17T10:00:00Z", "event": "created", "message": "Payment initiated" },
    { "timestamp": "2026-02-17T10:00:02Z", "event": "processing", "message": "Sent to bank" },
    { "timestamp": "2026-02-17T10:00:45Z", "event": "completed", "message": "Payment confirmed by bank" }
  ]
}
```

Fetched from `audit_log` table where `resource_id = tx.id` and `action LIKE 'TRANSACTION_%'`

### 7.2 Push Notifications

**When to send:**

| Status Change | Title | Body |
|--------------|-------|------|
| `processing` → `completed` | "Payment Complete" | "Your 500 NOK payment to Mama Jasmina is complete" |
| `processing` → `failed` | "Payment Failed" | "Your 500 NOK payment failed. Tap to view details" |
| `timeout` → `completed` | "Payment Complete" | "Your payment has been confirmed by the bank" |
| `partially_completed` → `failed` | "Refund Processed" | "Your 500 NOK has been refunded to your account" |

**Implementation:**

```typescript
// lib/services/notifications.ts
export async function sendPushNotification(params: {
  userId: string;
  title: string;
  body: string;
  data: Record<string, string>;
}) {
  // Check user settings
  const settings = await getOne("SELECT push_enabled FROM settings WHERE user_id = ?", [params.userId]);
  if (!settings?.push_enabled) return;

  // Get user's push tokens (stored in separate table)
  const tokens = await query<{ token: string }>(
    "SELECT token FROM push_tokens WHERE user_id = ? AND active = 1",
    [params.userId]
  );

  // Send via Firebase Cloud Messaging (FCM) or Apple Push Notification Service (APNS)
  for (const { token } of tokens) {
    await fcm.send({
      token,
      notification: { title: params.title, body: params.body },
      data: params.data,
    });
  }

  // Log notification
  await run(
    "INSERT INTO notifications (id, user_id, type, title, body) VALUES (?, ?, ?, ?, ?)",
    [randomId("ntf"), params.userId, "push", params.title, params.body]
  );
}
```

**Call from status update:**

```typescript
async function updateTransactionStatus(
  txId: string,
  newStatus: string,
  reason?: string
) {
  const tx = await getOne("SELECT * FROM transactions WHERE id = ?", [txId]);
  if (!tx) throw new Error("Transaction not found");

  // Update status
  await run("UPDATE transactions SET status = ?, updated_at = datetime('now') WHERE id = ?",
    [newStatus, txId]);

  // Log audit
  await logAudit({ ... });

  // Send push notification
  if (newStatus === "completed" || newStatus === "failed") {
    await notifications.sendPushNotification({
      userId: tx.user_id,
      title: newStatus === "completed" ? "Payment Complete" : "Payment Failed",
      body: newStatus === "completed"
        ? `Your ${tx.amount} NOK payment is complete`
        : `Your ${tx.amount} NOK payment failed${reason ? `: ${reason}` : ""}`,
      data: { txId, status: newStatus },
    });
  }
}
```

### 7.3 Email Notifications

**When to send:** Only for terminal states (`completed`, `failed`)

**Template:**

```html
<!-- email-templates/transaction-completed.html -->
<html>
<body style="font-family: Inter, sans-serif;">
  <div style="max-width: 600px; margin: 0 auto; padding: 20px;">
    <h1>Payment Complete</h1>
    <p>Your payment of <strong>{{amount}} {{currency}}</strong> to <strong>{{recipientName}}</strong> has been completed.</p>
    <p><strong>Transaction ID:</strong> {{txId}}</p>
    <p><strong>Date:</strong> {{completedAt}}</p>
    <a href="https://getdrop.no/transactions/{{txId}}" style="display: inline-block; padding: 12px 24px; background: #00E5A0; color: #000; text-decoration: none; border-radius: 8px; margin-top: 20px;">
      View Transaction
    </a>
  </div>
</body>
</html>
```

**Send via existing email service:**

```typescript
// lib/services/email.ts
import { email } from "@/lib/services";

await email.send({
  to: user.email,
  subject: "Payment Complete",
  template: "transaction-completed",
  data: {
    amount: tx.amount,
    currency: tx.currency,
    recipientName: tx.recipient_name,
    txId: tx.id,
    completedAt: new Date(tx.completed_at).toLocaleString("nb-NO"),
  },
});
```

### 7.4 Error Messages (User-Friendly)

**Current:** Generic errors like "PISP API error: 500"

**New:** Human-readable messages

| Error Code | User-Facing Message (Norwegian) | English |
|------------|--------------------------------|---------|
| `insufficient_balance` | "Ikke nok dekning på bankkontoen" | "Insufficient funds in your bank account" |
| `bank_declined` | "Banken din avslo betalingen. Kontakt banken for detaljer." | "Your bank declined the payment. Contact your bank for details." |
| `invalid_iban` | "Ugyldig kontonummer. Sjekk mottakerens kontoopplysninger." | "Invalid account number. Check recipient's account details." |
| `pisp_timeout` | "Betalingen tar lengre tid enn vanlig. Vi varsler deg når den er fullført." | "Payment is taking longer than usual. We'll notify you when complete." |
| `pisp_unavailable` | "Vår betalingsleverandør er midlertidig utilgjengelig. Prøv igjen om noen minutter." | "Our payment provider is temporarily unavailable. Try again in a few minutes." |
| `max_retries_exceeded` | "Betalingen feilet etter flere forsøk. Kontakt kundestøtte." | "Payment failed after multiple attempts. Contact support." |

**Implementation:**

```typescript
// lib/error-messages.ts
const ERROR_MESSAGES: Record<string, { no: string; en: string }> = {
  insufficient_balance: {
    no: "Ikke nok dekning på bankkontoen",
    en: "Insufficient funds in your bank account"
  },
  // ... all errors above
};

export function getUserFacingError(errorCode: string, language: "no" | "en" = "no"): string {
  return ERROR_MESSAGES[errorCode]?.[language] || ERROR_MESSAGES.default[language];
}
```

---

## 8. Admin Tools

### 8.1 Stuck Transactions Endpoint

**Route:** `GET /api/admin/transactions/stuck`

**Access:** Requires admin role (check JWT: `user.role === 'admin'`)

**Query:**

```sql
SELECT
  t.id,
  t.user_id,
  t.type,
  t.status,
  t.amount,
  t.currency,
  t.created_at,
  t.updated_at,
  u.email AS user_email,
  u.first_name || ' ' || u.last_name AS user_name,
  (julianday('now') - julianday(t.created_at)) * 24 AS hours_stuck
FROM transactions t
JOIN users u ON t.user_id = u.id
WHERE t.status IN ('processing', 'timeout', 'partially_completed')
  AND t.created_at < datetime('now', '-10 minutes')
ORDER BY t.created_at ASC
LIMIT 100;
```

**Response:**

```json
{
  "data": [
    {
      "id": "tx_rem_456",
      "userId": "usr_abc",
      "userName": "Amir Hadžić",
      "userEmail": "amir@example.com",
      "type": "remittance",
      "status": "timeout",
      "amount": 500,
      "currency": "NOK",
      "createdAt": "2026-02-17T08:00:00Z",
      "hoursStuck": 2.5
    }
  ],
  "total": 1
}
```

### 8.2 Manual Retry Endpoint

**Route:** `POST /api/admin/transactions/[id]/retry`

**Access:** Admin only

**Action:**
1. Validate transaction is in retryable state (`timeout`, `failed` with transient error)
2. Reset retry counter
3. Call PISP provider again (with retry logic from Section 4)
4. Log admin action in audit_log

**Implementation:**

```typescript
// src/app/api/admin/transactions/[id]/retry/route.ts
export async function POST(
  request: NextRequest,
  { params }: { params: { id: string } }
) {
  const { user, error } = await requireAuth(request);
  if (error) return error;

  if (user.role !== "admin") {
    return jsonError("forbidden", "Admin access required", 403);
  }

  const txId = params.id;
  const tx = await getOne("SELECT * FROM transactions WHERE id = ?", [txId]);

  if (!tx) {
    return jsonError("not_found", "Transaction not found", 404);
  }

  if (!["timeout", "failed"].includes(tx.status)) {
    return jsonError("invalid_state", "Transaction is not retryable", 400);
  }

  // Log admin action
  await logAudit({
    userId: user.id,
    action: "ADMIN_TRANSACTION_RETRY",
    resourceType: "transaction",
    resourceId: txId,
    details: { previousStatus: tx.status },
    ipAddress: getClientIp(request),
    requestId: getRequestId(request.headers),
  });

  // Reset transaction to initiated
  await run("UPDATE transactions SET status = 'initiated', retry_count = 0 WHERE id = ?", [txId]);

  // Re-call PISP with retry logic
  const result = tx.type === "remittance"
    ? await payments.initiateRemittance({ ... })
    : await payments.initiateQrPayment({ ... });

  if (result.success) {
    return NextResponse.json({ message: "Retry initiated", status: result.status });
  } else {
    return jsonError("retry_failed", result.error || "Retry failed", 500);
  }
}
```

### 8.3 Manual Resolution Endpoint

**Route:** `POST /api/admin/transactions/[id]/resolve`

**Body:**

```json
{
  "action": "mark_completed" | "mark_failed" | "initiate_refund",
  "reason": "Admin manually verified with bank",
  "externalReference": "bank_ref_12345" // optional
}
```

**Actions:**

| Action | Effect |
|--------|--------|
| `mark_completed` | Set status to `completed`, add admin note to audit_log |
| `mark_failed` | Set status to `failed`, add reason, notify user |
| `initiate_refund` | Trigger refund flow (for partially_completed), set compensation_status to `pending` |

**Implementation:**

```typescript
export async function POST(request: NextRequest, { params }: { params: { id: string } }) {
  const { user, error } = await requireAuth(request);
  if (error) return error;
  if (user.role !== "admin") return jsonError("forbidden", "Admin access required", 403);

  const body = await request.json();
  const { action, reason, externalReference } = body;

  const txId = params.id;
  const tx = await getOne("SELECT * FROM transactions WHERE id = ?", [txId]);
  if (!tx) return jsonError("not_found", "Transaction not found", 404);

  switch (action) {
    case "mark_completed":
      await run("UPDATE transactions SET status = 'completed', completed_at = datetime('now') WHERE id = ?", [txId]);
      await logAudit({ userId: user.id, action: "ADMIN_MARK_COMPLETED", resourceId: txId, details: { reason, externalReference } });
      await notifications.sendPushNotification({ userId: tx.user_id, title: "Payment Complete", body: "Your payment has been confirmed" });
      return NextResponse.json({ message: "Transaction marked as completed" });

    case "mark_failed":
      await run("UPDATE transactions SET status = 'failed', failure_reason = ? WHERE id = ?", [reason, txId]);
      await logAudit({ userId: user.id, action: "ADMIN_MARK_FAILED", resourceId: txId, details: { reason } });
      await notifications.sendPushNotification({ userId: tx.user_id, title: "Payment Failed", body: reason });
      return NextResponse.json({ message: "Transaction marked as failed" });

    case "initiate_refund":
      // TODO: Call refund provider
      await run("UPDATE transactions SET compensation_status = 'pending' WHERE id = ?", [txId]);
      await logAudit({ userId: user.id, action: "ADMIN_INITIATE_REFUND", resourceId: txId, details: { reason } });
      return NextResponse.json({ message: "Refund initiated" });

    default:
      return jsonError("invalid_action", "Invalid action", 400);
  }
}
```

### 8.4 Admin Dashboard

**Route:** `/admin/transactions`

**Features:**

1. **Overview Cards:**
   - Stuck transactions (count)
   - Failed last 24h (count)
   - Average resolution time

2. **Stuck Transactions Table:**
   - Columns: TX ID, User, Amount, Status, Hours Stuck, Actions
   - Actions: "Retry", "Resolve", "View Audit Log"

3. **Filters:**
   - Status (processing, timeout, partially_completed)
   - Stuck > X hours
   - User search (email, ID)

**Screenshot mockup:**

```
┌────────────────────────────────────────────────────┐
│  Admin: Stuck Transactions                         │
├────────────────────────────────────────────────────┤
│  [ Stuck: 3 ]  [ Failed 24h: 12 ]  [ Avg: 1.2h ]  │
├────────────────────────────────────────────────────┤
│  Filters: [Status: All ▼] [Stuck > 1h ▼]          │
├────────────────────────────────────────────────────┤
│  TX ID       │ User          │ Amount │ Status   │ Hours │ Actions        │
│  tx_rem_456  │ amir@ex.com  │ 500 NOK│ timeout  │ 2.5   │ [Retry][Resolve]│
│  tx_qr_789   │ sara@ex.com  │ 129 NOK│ processing│ 0.8   │ [Retry][Resolve]│
└────────────────────────────────────────────────────┘
```

---

## 9. Database Schema Changes

### 9.1 New Columns on `transactions` Table

```sql
-- Retry tracking
ALTER TABLE transactions ADD COLUMN retry_count INTEGER DEFAULT 0;
ALTER TABLE transactions ADD COLUMN last_retry_at TEXT;

-- External references
ALTER TABLE transactions ADD COLUMN external_id TEXT; -- PISP provider's transaction ID
ALTER TABLE transactions ADD COLUMN external_status TEXT; -- Raw status from provider

-- Failure details
ALTER TABLE transactions ADD COLUMN failure_reason TEXT;
ALTER TABLE transactions ADD COLUMN failure_code TEXT; -- Machine-readable error code

-- Compensation (for partial failures)
ALTER TABLE transactions ADD COLUMN compensation_status TEXT CHECK(
  compensation_status IN ('none', 'pending', 'completed', 'failed')
) DEFAULT 'none';
ALTER TABLE transactions ADD COLUMN compensation_completed_at TEXT;

-- Timeline
ALTER TABLE transactions ADD COLUMN updated_at TEXT DEFAULT (datetime('now'));

-- FX tracking (future)
ALTER TABLE transactions ADD COLUMN fx_completed_at TEXT;
ALTER TABLE transactions ADD COLUMN fx_external_id TEXT;
```

### 9.2 New State: `timeout`

Update CHECK constraint:

```sql
-- Before:
status TEXT DEFAULT 'processing' CHECK(status IN ('processing','completed','failed'))

-- After:
status TEXT DEFAULT 'initiated' CHECK(status IN ('initiated','processing','timeout','completed','failed','partially_completed'))
```

**Migration (SQLite):**

SQLite doesn't support `ALTER TABLE ... MODIFY CONSTRAINT`, so recreate table:

```sql
-- Create new table with updated constraint
CREATE TABLE transactions_new (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL REFERENCES users(id),
  type TEXT NOT NULL CHECK(type IN ('remittance','qr_payment')),
  status TEXT DEFAULT 'initiated' CHECK(status IN ('initiated','processing','timeout','completed','failed','partially_completed')),
  -- ... all other columns
);

-- Copy data
INSERT INTO transactions_new SELECT * FROM transactions;

-- Drop old, rename new
DROP TABLE transactions;
ALTER TABLE transactions_new RENAME TO transactions;

-- Recreate indexes
CREATE UNIQUE INDEX idx_tx_idempotency ON transactions(idempotency_key) WHERE idempotency_key IS NOT NULL;
CREATE INDEX idx_transactions_user ON transactions(user_id);
CREATE INDEX idx_transactions_merchant ON transactions(merchant_id);
```

### 9.3 New Table: `admin_alerts`

```sql
CREATE TABLE admin_alerts (
  id TEXT PRIMARY KEY,
  alert_type TEXT NOT NULL, -- 'transaction_stuck', 'pisp_failure', 'compensation_failed', etc.
  severity TEXT NOT NULL CHECK(severity IN ('low','medium','high','critical')),
  resource_type TEXT, -- 'transaction', 'user', 'merchant', etc.
  resource_id TEXT,
  title TEXT NOT NULL,
  description TEXT,
  status TEXT DEFAULT 'open' CHECK(status IN ('open','investigating','resolved','dismissed')),
  created_at TEXT DEFAULT (datetime('now')),
  resolved_at TEXT,
  resolved_by TEXT, -- user_id of admin who resolved
  resolution_notes TEXT
);

CREATE INDEX idx_admin_alerts_status ON admin_alerts(status);
CREATE INDEX idx_admin_alerts_type ON admin_alerts(alert_type);
CREATE INDEX idx_admin_alerts_created ON admin_alerts(created_at);
```

### 9.4 New Table: `retry_history`

Optional (if want detailed retry logs separate from audit_log):

```sql
CREATE TABLE retry_history (
  id TEXT PRIMARY KEY,
  transaction_id TEXT NOT NULL REFERENCES transactions(id),
  attempt INTEGER NOT NULL, -- 1, 2, 3
  started_at TEXT DEFAULT (datetime('now')),
  completed_at TEXT,
  success INTEGER DEFAULT 0, -- 0 = failed, 1 = succeeded
  error_code TEXT,
  error_message TEXT,
  pisp_response TEXT -- Full JSON response from PISP provider
);

CREATE INDEX idx_retry_history_tx ON retry_history(transaction_id);
```

**Alternative:** Use `audit_log` table (already exists, sufficient for MVP)

---

## 10. File Structure & Implementation Checklist

### 10.1 Files to Create

```
src/
├── app/
│   ├── api/
│   │   ├── transactions/
│   │   │   ├── [id]/
│   │   │   │   ├── route.ts          # GET transaction by ID (add timeline)
│   │   │   │   └── retry/route.ts     # NEW: POST retry transaction (user-facing, for failed txs)
│   │   ├── admin/
│   │   │   ├── transactions/
│   │   │   │   ├── stuck/route.ts     # NEW: GET stuck transactions
│   │   │   │   └── [id]/
│   │   │   │       ├── retry/route.ts  # NEW: POST admin retry
│   │   │   │       └── resolve/route.ts # NEW: POST admin manual resolution
│   │   │   └── alerts/
│   │   │       ├── route.ts            # NEW: GET admin alerts (list)
│   │   │       └── [id]/route.ts       # NEW: PATCH resolve alert
│   ├── transactions/
│   │   └── [id]/
│   │       └── page.tsx                # NEW: Transaction detail page
│   └── admin/
│       ├── transactions/
│       │   └── page.tsx                # NEW: Admin stuck transactions dashboard
│       └── alerts/
│           └── page.tsx                # NEW: Admin alerts dashboard
├── lib/
│   ├── services/
│   │   ├── payments.ts                 # MODIFY: Add retry logic + timeout handling
│   │   ├── reconciliation.ts           # NEW: Background status checks
│   │   └── notifications.ts            # MODIFY: Add transaction notifications
│   ├── db-migrations/
│   │   └── 004-transaction-recovery.sql # NEW: Schema changes
│   ├── retry.ts                        # NEW: Retry logic (exponential backoff)
│   ├── state-machine.ts                # NEW: Transaction state transitions
│   ├── error-messages.ts               # NEW: User-friendly error messages
│   └── admin-alerts.ts                 # NEW: Admin alert creation/management
└── workers/
    └── reconciliation-worker.ts        # NEW: Background job to check stuck txs
```

### 10.2 Implementation Phases

#### Phase 1: State Machine & Audit (Week 1)

- [ ] Update `transactions` table schema (new columns + `timeout` state)
- [ ] Implement state transition validation in `lib/state-machine.ts`
- [ ] Add transition audit logging (every status change → audit_log)
- [ ] Update API routes to use state machine validation
- [ ] **Deliverable:** Status changes are validated + audited

#### Phase 2: Retry Logic (Week 2)

- [ ] Implement exponential backoff in `lib/retry.ts`
- [ ] Add error classification (transient vs permanent)
- [ ] Update `payments.ts` to use retry wrapper
- [ ] Add retry counter tracking in DB
- [ ] **Deliverable:** Transient errors auto-retry up to 3 times

#### Phase 3: Timeout Recovery (Week 2-3)

- [ ] Change timeout behavior: return `timeout` status instead of failure
- [ ] Create `reconciliation-worker.ts` background job
- [ ] Implement PISP status polling (every 10 min for stuck txs)
- [ ] Add timeout → completed/failed transitions
- [ ] **Deliverable:** Timeouts self-resolve via background reconciliation

#### Phase 4: User Communication (Week 3)

- [ ] Create transaction detail page (`/transactions/[id]`)
- [ ] Add real-time status polling (SWR with 2s refresh)
- [ ] Implement push notifications for status changes
- [ ] Add email notifications for terminal states
- [ ] Implement user-friendly error messages
- [ ] **Deliverable:** Users always know transaction status

#### Phase 5: Admin Tools (Week 4)

- [ ] Create `admin_alerts` table
- [ ] Implement stuck transaction detection (every 10 min sweep)
- [ ] Build admin dashboard (`/admin/transactions`)
- [ ] Add manual retry endpoint (`POST /api/admin/transactions/[id]/retry`)
- [ ] Add manual resolution endpoint (`POST /api/admin/transactions/[id]/resolve`)
- [ ] **Deliverable:** Admins can intervene on stuck transactions

#### Phase 6: Partial Failure Handling (Future — After FX Provider Integration)

- [ ] Add `compensation_status` field
- [ ] Implement refund flow for partial failures
- [ ] Add FX provider status checks
- [ ] Test compensation scenarios
- [ ] **Deliverable:** Partial failures trigger automatic refunds

---

## 11. Testing Strategy

### 11.1 Unit Tests

**Retry logic:**
```typescript
describe("Retry with exponential backoff", () => {
  test("succeeds on first attempt", async () => {
    const result = await callPispWithRetry(() => Promise.resolve({ success: true }));
    expect(result.success).toBe(true);
  });

  test("retries on transient error", async () => {
    let attempts = 0;
    const result = await callPispWithRetry(async () => {
      attempts++;
      if (attempts < 3) throw new Error("Network timeout");
      return { success: true };
    });
    expect(attempts).toBe(3);
  });

  test("stops on permanent error", async () => {
    let attempts = 0;
    const result = await callPispWithRetry(async () => {
      attempts++;
      return { success: false, error: "invalid_iban" }; // permanent
    });
    expect(attempts).toBe(1);
  });
});
```

**State machine:**
```typescript
describe("Transaction state machine", () => {
  test("allows initiated → processing", () => {
    expect(canTransition("initiated", "processing")).toBe(true);
  });

  test("blocks processing → initiated", () => {
    expect(canTransition("processing", "initiated")).toBe(false);
  });

  test("blocks completed → anything", () => {
    expect(canTransition("completed", "failed")).toBe(false);
  });
});
```

### 11.2 Integration Tests

**Scenario: Timeout recovery**
1. Mock PISP to timeout on first call
2. Initiate transaction → verify status = `timeout`
3. Run reconciliation worker
4. Mock PISP to return `completed`
5. Verify transaction status = `completed`
6. Verify push notification sent

**Scenario: Retry exhaustion**
1. Mock PISP to return 503 three times
2. Initiate transaction
3. Verify transaction status = `failed`
4. Verify admin alert created
5. Verify user notified

### 11.3 End-to-End Tests

**User journey:**
1. User initiates remittance (500 NOK → RSD)
2. PISP times out after 30s
3. User sees "Processing — we'll notify you"
4. 2 minutes later: background worker checks status
5. PISP returns `completed`
6. User receives push notification
7. User opens transaction detail page → sees "Completed"
8. User receives email confirmation

**Admin journey:**
1. Transaction stuck in `timeout` for 2 hours
2. Admin opens `/admin/transactions` dashboard
3. Sees transaction in "Stuck" list
4. Clicks "Retry" → transaction re-attempted
5. PISP succeeds → status = `completed`
6. Admin marks alert as "Resolved"

---

## 12. Acceptance Criteria

### 12.1 State Machine

- [x] All status transitions validated against whitelist
- [x] Invalid transitions blocked at DB + app level
- [x] Every status change logged in `audit_log` with timestamp + reason
- [x] Terminal states (`completed`, `failed`) cannot transition

### 12.2 Idempotency

- [x] Duplicate requests with same `idempotencyKey` return cached response (already implemented)
- [x] Idempotency keys scoped to user (prevents IDOR) (already implemented)
- [x] Response includes identical payload + status 200 (already implemented)

### 12.3 Retry Logic

- [x] Transient errors (5xx, timeout) trigger automatic retry
- [x] Exponential backoff: 2s → 8s → 32s (with jitter)
- [x] Max 3 retry attempts
- [x] Permanent errors (4xx, bank decline) fail immediately (no retry)
- [x] After max retries: mark as `failed` + create admin alert

### 12.4 Timeout Recovery

- [x] Timeout returns `timeout` status (not `failed`)
- [x] Background worker checks PISP status every 10 min
- [x] Stuck transactions (> 10 min) swept periodically
- [x] Timeout → completed/failed based on PISP response
- [x] User notified when status resolves

### 12.5 Partial Failure

- [x] `compensation_status` field added (for future FX refunds)
- [x] Refund flow triggers on transfer failure (when FX provider added)
- [x] Compensation failures escalate to admin alert
- [x] User sees "Processing refund" status

### 12.6 User Communication

- [x] Transaction detail page (`/transactions/[id]`) shows:
  - Real-time status
  - Timeline of events
  - User-friendly error messages
- [x] Push notifications sent on status change
- [x] Email sent on terminal status (`completed`, `failed`)
- [x] Error messages in Norwegian (primary) + English

### 12.7 Admin Tools

- [x] `/api/admin/transactions/stuck` returns all stuck transactions
- [x] `/api/admin/transactions/[id]/retry` manually retries transaction
- [x] `/api/admin/transactions/[id]/resolve` manually marks completed/failed
- [x] Admin dashboard shows stuck transactions with action buttons
- [x] Admin alerts created for:
  - Max retries exhausted
  - Compensation failure
  - Transaction stuck > 24 hours

---

## 13. Monitoring & Alerting

### 13.1 Metrics to Track

| Metric | Threshold | Alert If |
|--------|-----------|----------|
| Stuck transactions (count) | 5 | > 10 |
| Average resolution time (hours) | 1 | > 4 |
| Failed transactions (last 24h) | 50 | > 100 |
| PISP timeout rate (%) | 5% | > 15% |
| Retry success rate (%) | 80% | < 60% |
| Compensation failures (count) | 0 | > 0 |

### 13.2 Dashboard Queries

**Stuck transactions:**
```sql
SELECT COUNT(*) FROM transactions
WHERE status IN ('processing', 'timeout')
  AND created_at < datetime('now', '-10 minutes');
```

**Average resolution time:**
```sql
SELECT AVG(julianday(completed_at) - julianday(created_at)) * 24 AS hours
FROM transactions
WHERE status = 'completed'
  AND completed_at > datetime('now', '-24 hours');
```

**PISP timeout rate:**
```sql
SELECT
  SUM(CASE WHEN failure_code = 'pisp_timeout' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS timeout_pct
FROM transactions
WHERE created_at > datetime('now', '-24 hours');
```

### 13.3 Log Events

**Every transaction state change:**
```json
{
  "level": "info",
  "msg": "Transaction status changed",
  "txId": "tx_rem_123",
  "userId": "usr_abc",
  "from": "processing",
  "to": "completed",
  "reason": "PISP callback received",
  "externalId": "ext_456",
  "timestamp": "2026-02-17T10:00:45Z"
}
```

**PISP API call failures:**
```json
{
  "level": "error",
  "msg": "PISP API call failed",
  "txId": "tx_rem_123",
  "attempt": 2,
  "errorCode": "pisp_timeout",
  "errorMessage": "Request timeout after 30s",
  "willRetry": true,
  "nextRetryIn": "8000ms",
  "timestamp": "2026-02-17T10:00:30Z"
}
```

**Retry exhaustion:**
```json
{
  "level": "error",
  "msg": "All retries exhausted",
  "txId": "tx_rem_123",
  "maxRetries": 3,
  "lastError": "PISP provider unavailable",
  "adminAlertCreated": "alert_xyz",
  "timestamp": "2026-02-17T10:01:10Z"
}
```

---

## 14. Security Considerations

### 14.1 Admin Endpoints

**Access control:**
- All admin endpoints require `user.role === 'admin'` (checked via JWT)
- Audit every admin action (`ADMIN_TRANSACTION_RETRY`, `ADMIN_MARK_COMPLETED`, etc.)
- Log IP address + user agent for all admin operations

**Rate limiting:**
- Admin endpoints: 60 requests/min (higher than user endpoints)
- Admin dashboard: no rate limit (internal tool)

### 14.2 Idempotency Key Security

**No vulnerability:** Idempotency keys scoped to user → can't replay another user's transaction

**Best practice:** Client generates key = `${userId}_${timestamp}_${random}` (prevents guessing)

### 14.3 Transaction Status Leaks

**Risk:** User A checks `/api/transactions/tx_rem_123` → sees User B's transaction

**Mitigation (already implemented):**
```typescript
const tx = await getOne(
  "SELECT * FROM transactions WHERE id = ? AND user_id = ?",
  [txId, user.id] // ← Scoped to logged-in user
);
```

**Admin endpoints:** Bypass user_id check (admin sees all transactions)

---

## 15. Cost Analysis

### 15.1 Infrastructure

| Component | Cost | Notes |
|-----------|------|-------|
| Background worker | $0 | Same server process (cron or `setInterval`) |
| Job queue (pg-boss) | $0 | Uses existing PostgreSQL (when migrated from SQLite) |
| Job queue (BullMQ) | ~$20/mo | Redis hosting (if chosen over pg-boss) |
| Push notifications (FCM) | Free | Up to unlimited (Firebase Cloud Messaging) |
| Email (SendGrid) | $15/mo | 50k emails/month (transactional tier) |

**Total:** $15-35/mo (depending on job queue choice)

### 15.2 PISP API Costs

**Retry costs:**
- 3 retries per failed transaction
- If 5% of transactions fail → 5% * 3 = 15% extra PISP API calls
- Assuming 10,000 txs/month, 5% fail = 500 failed → 1,500 retry calls
- Cost: depends on PISP provider (typically $0.01-0.05 per API call)
- **Estimated:** $15-75/mo in extra API fees

**Reconciliation costs:**
- Background worker checks status every 10 min for stuck txs
- If 1% stuck (100 txs/month) → 100 * 6 status checks/hour * 24h = 14,400 API calls
- **Estimated:** $144-720/mo

**Optimization:** Only check status for transactions stuck > 10 min (reduces unnecessary calls)

---

## 16. Open Questions (For Alem)

### 16.1 Retry Strategy

**Q1:** Should we do in-process retry (Option A) or background job queue (Option B)?

**Recommendation:** Start with Option A (simpler, no extra infra). Migrate to Option B when transaction volume > 10k/month.

### 16.2 Notification Channels

**Q2:** Email + push notifications both? Or only one?

**Recommendation:** Both. Email is fallback (if user disabled push). Send email only for terminal states.

### 16.3 Admin Alert Delivery

**Q3:** How should admin alerts be delivered?
- Option A: Dashboard only (admin must check `/admin/alerts`)
- Option B: Email to ops team
- Option C: Slack webhook
- Option D: SMS for critical alerts

**Recommendation:** Option C (Slack) for high/critical alerts. Dashboard for all.

### 16.4 Stuck Transaction Threshold

**Q4:** When should we mark a transaction as "stuck"?
- Current spec: 10 minutes
- Alternative: 1 hour (less aggressive)

**Recommendation:** 10 min for reconciliation sweep, 24h for admin alert (gives time to self-resolve)

### 16.5 Partial Failure Compensation SLA

**Q5:** What's acceptable refund time for partial failures?
- PSD2 requires 24h for refunds
- Faster = better UX

**Recommendation:** Initiate refund immediately, complete within 24h (meet regulatory minimum)

---

## 17. Next Steps

1. **Review this spec** with Alem
2. **Approve/reject each section** (or request changes)
3. **Prioritize phases** (which to implement first?)
4. **Assign to builder agent** (one phase at a time)
5. **Validation after each phase** (validator agent checks implementation)

**Estimated timeline:** 4 weeks for Phases 1-5, Phase 6 (partial failure) deferred until FX provider integrated

---

## Appendix A: State Diagram (ASCII)

```
┌─────────────┐
│  initiated  │──────────────┐
└──────┬──────┘              │
       │                     │ (immediate fail: validation error)
       ▼                     ▼
┌─────────────┐        ┌─────────────┐
│ processing  │        │   failed    │ (terminal)
└──────┬──────┘        └─────────────┘
       │
       ├──────────────────┬──────────────────┐
       │                  │                  │
       │ (success)        │ (timeout)        │ (permanent error)
       ▼                  ▼                  ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  completed  │    │   timeout   │    │   failed    │
│ (terminal)  │    └──────┬──────┘    └─────────────┘
└─────────────┘           │
                          │ (reconciliation)
                          ├───────────┬───────────┐
                          │           │           │
                          ▼           ▼           │
                    ┌─────────────┐ ┌─────────────┐│ (retry)
                    │  completed  │ │   failed    ││
                    └─────────────┘ └─────────────┘▼
                                            ┌─────────────┐
                                            │ processing  │
                                            └─────────────┘

Future:
┌─────────────────────────────┐
│    partially_completed      │
└──────────┬──────────────────┘
           │ (refund)
           ├───────────┬───────────┐
           ▼           ▼           ▼
    ┌─────────────┐ ┌─────────────┐
    │  completed  │ │   failed    │
    └─────────────┘ └─────────────┘
```

---

## Appendix B: Error Code Reference

| Code | Type | Retry? | User Message (NO) | User Message (EN) |
|------|------|--------|-------------------|-------------------|
| `insufficient_balance` | Permanent | No | "Ikke nok dekning på bankkontoen" | "Insufficient funds" |
| `bank_declined` | Permanent | No | "Banken din avslo betalingen" | "Your bank declined the payment" |
| `invalid_iban` | Permanent | No | "Ugyldig kontonummer" | "Invalid account number" |
| `kyc_required` | Permanent | No | "Identitetsverifisering kreves" | "Identity verification required" |
| `pisp_timeout` | Transient | Yes | "Betalingen tar lengre tid enn vanlig" | "Payment taking longer than usual" |
| `pisp_unavailable` | Transient | Yes | "Betalingsleverandør midlertidig utilgjengelig" | "Payment provider temporarily unavailable" |
| `network_error` | Transient | Yes | "Nettverksfeil — prøver igjen automatisk" | "Network error — retrying automatically" |
| `pisp_5xx` | Transient | Yes | "Betalingsleverandør har tekniske problemer" | "Payment provider experiencing technical issues" |
| `max_retries_exceeded` | Permanent | No | "Betalingen feilet etter flere forsøk" | "Payment failed after multiple attempts" |
| `validation_error` | Permanent | No | "Ugyldig forespørsel" | "Invalid request" |

---

**End of Specification**