# Architecture

# QODY Architecture

**Author:** Petter Graff (CodeCraft / ALAI Architecture) | **Date:** 2026-06-22

## System Context

Three independently deployable micro-frontends (MFE) talk to one Ktor API. The API owns Postgres, emits domain events to an internal bus, fans real-time updates out over WebSocket/SSE, reads feature flags from Unleash, and talks to a payment provider via webhooks.

### Component Diagram

```mermaid
graph TB
  subgraph Clients
    G["Guest MFE<br/>(QR menu, cart, pay)<br/>public, no-login"]
    S["Staff/Kitchen MFE<br/>(KDS, order board)<br/>JWT staff"]
    A["Admin MFE<br/>(venue dashboard,<br/>menu editor, plans)<br/>JWT admin"]
  end

  subgraph Edge
    CDN["CDN / static host<br/>per-MFE bundles"]
    GW["Reverse proxy / API gateway<br/>(TLS, CORS, rate-limit,<br/>public /guest carve-out)"]
  end

  subgraph Backend["Ktor API (Kotlin)"]
    R["Route groups:<br/>/guest /staff /admin /webhooks /health"]
    SVC["Domain services<br/>(Order, Menu, Session,<br/>Payment, Tenant)"]
    EVT["Event bus<br/>(in-proc -> Postgres outbox<br/>-> upgradeable to Kafka)"]
    RT["Real-time hub<br/>(WebSocket + SSE fallback)"]
    FF["Unleash client<br/>(per-venue/per-plan flags)"]
  end

  DB[("PostgreSQL 16<br/>RLS tenant isolation<br/>Flyway migrations")]
  PAY["Payment provider(s)<br/>Stripe / market-specific"]
  UNL["Unleash server"]
  OBS["Sentry + structured logs<br/>+ /health"]

  G --> CDN
  S --> CDN
  A --> CDN
  G --> GW
  S --> GW
  A --> GW
  GW --> R
  R --> SVC
  SVC --> DB
  SVC --> EVT
  EVT --> RT
  EVT --> DB
  RT -. "live order/table updates" .-> S
  RT -. "table status" .-> G
  SVC --> FF
  FF --> UNL
  SVC --> PAY
  PAY -- "webhook (signed)" --> R
  SVC --> OBS

```

### Why These Boundaries

- **One API, three MFEs.** The MFE split is about deploy cadence and blast radius, not about microservices. Guest menu changes ship hourly; the admin dashboard ships weekly. A bug in the menu editor must never take down table ordering.
- **Event bus starts in-process with a Postgres transactional outbox.** Order state transitions write the state change AND the outbox row in the same DB transaction (no lost events, no dual-write inconsistency). A dispatcher drains the outbox to the real-time hub. When a venue chain needs cross-service scale, the outbox drains to Kafka instead.
- **Real-time hub = WebSocket with SSE fallback.** Kitchen display systems (KDS) sit on venue Wi-Fi that is hostile (NAT, captive portals, flaky AP roaming). Design for failure: heartbeat + auto-reconnect + on-reconnect state resync.

## Multi-Tenancy Model

**Tenant = Venue.** A Tenant/Organization may own multiple Venues for chains; the RLS scope key is `venue_id`, with an optional `org_id` parent for chain-level admin.

Per ALAI database rules DB-05/DB-06: every tenant-scoped table carries `venue_id UUID NOT NULL` and RLS is **ENABLED + FORCED**.

```sql
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE  ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
  USING (venue_id = current_setting('app.current_venue_id', true)::uuid);

CREATE POLICY tenant_insert ON orders
  AS RESTRICTIVE FOR INSERT
  WITH CHECK (venue_id = current_setting('app.current_venue_id', true)::uuid);

```

The Ktor layer sets `SET app.current_venue_id = '<uuid>'` at connection checkout (HikariCP) inside the request/transaction scope, and **resets it on release**. Stale tenant context on a pooled connection is a silent cross-venue data breach.

### Bilko RLS Lesson — Hard Requirement (Tool-Verified 2026-06-19)

The most expensive Bilko bug was NOT a missing policy. It was that the application DB role had the `BYPASSRLS` attribute, which **silently overrides FORCE ROW LEVEL SECURITY** — RLS looked configured but isolated nothing. Mandatory for QODY:

1. The app connects as a dedicated role (e.g. `qody_app`) that **MUST NOT** have `BYPASSRLS` and **MUST NOT** be the table owner.
2. Migrations/owner DDL run as a separate privileged role used only by Flyway, never by the running app.
3. CI startup-validation query (fail-closed) on every boot: ```sql
    SELECT rolname, rolbypassrls FROM pg_roles WHERE rolname = 'qody_app';
    -- must return rolbypassrls = false, or the app refuses to start
    
    ```
4. RLS isolation E2E test (Proveo): create two venues, set context to venue A, assert venue B's orders are invisible AND uninsertable.

### Guest Path Special-Casing

The guest MFE is anonymous (no JWT). The guest still must be scoped to one venue+table. Scoping comes from the signed QR token, not from a login. The API resolves the QR token to `venue_id`/`table_id` server-side, sets RLS context from that, and the guest can only ever touch their own table's open session. Guest endpoints are explicitly carved out of auth at the gateway (a tight `/guest/*` allowlist).

## Core Domain Model

UUID PKs, `NUMERIC(19,4)` money, `TIMESTAMPTZ`, `deleted_at` soft delete, `version` optimistic lock on mutable entities, `venue_id` + RLS on all tenant tables.

```mermaid
erDiagram
  ORGANIZATION ||--o{ VENUE : owns
  VENUE ||--o{ TABLE : has
  VENUE ||--o{ MENU : publishes
  VENUE ||--o{ STAFF : employs
  MENU ||--o{ CATEGORY : contains
  CATEGORY ||--o{ MENU_ITEM : lists
  MENU_ITEM ||--o{ MODIFIER_GROUP : has
  MODIFIER_GROUP ||--o{ MODIFIER : offers
  TABLE ||--o{ TABLE_SESSION : hosts
  TABLE_SESSION ||--o{ ORDER : groups
  ORDER ||--o{ ORDER_LINE : contains
  ORDER_LINE ||--o{ ORDER_LINE_MODIFIER : applies
  ORDER ||--o{ PAYMENT : settled_by
  STAFF }o--|| ROLE : assigned

```

### Key Entities

<table id="bkmrk-entity-purpose-key-f"><thead><tr><th>Entity</th><th>Purpose</th><th>Key Fields</th></tr></thead><tbody><tr><td>`organization`</td><td>Chain owner (optional parent)</td><td>id, name, plan\_tier</td></tr><tr><td>`venue`</td><td>The tenant boundary</td><td>id, org\_id, name, slug, branding(jsonb), timezone, currency, plan\_tier</td></tr><tr><td>`restaurant_table`</td><td>Physical table</td><td>id, venue\_id, label, qr\_token\_id, capacity</td></tr><tr><td>`menu`</td><td>Versioned menu for a venue</td><td>id, venue\_id, name, is\_active, valid\_from/until</td></tr><tr><td>`menu_item`</td><td>Sellable item</td><td>id, category\_id, venue\_id, name, description, price NUMERIC(19,4), tax\_rate, allergens(jsonb)</td></tr><tr><td>`table_session`</td><td>One sitting at a table</td><td>id, venue\_id, table\_id, status, opened\_at, closed\_at</td></tr><tr><td>`order`</td><td>A submission within a session</td><td>id, venue\_id, table\_session\_id, status, subtotal, tax\_total, tip\_amount, total, version</td></tr><tr><td>`order_line`</td><td>Line in an order</td><td>id, order\_id, venue\_id, menu\_item\_id, qty, unit\_price, line\_total, note, status</td></tr><tr><td>`payment`</td><td>Settlement attempt/record</td><td>id, venue\_id, order\_id, provider, provider\_ref, amount, currency, status, idempotency\_key</td></tr></tbody></table>

**Money/price snapshotting.** `order_line.unit_price` and `order_line_modifier.price_delta_snapshot` are *copied at order time*. The menu price can change tomorrow; what the guest agreed to pay is frozen on the line.

**Branding** lives in `venue.branding` (jsonb: logo, colours, accent) so white-labeling is a data concern, not a build concern.

## Order Lifecycle

States are explicit and enforced server-side (a state machine). Illegal transitions are rejected, not silently ignored. Every transition writes a row to the transactional outbox → real-time hub.

```mermaid
stateDiagram-v2
  [*] --> SESSION_OPEN: QR scan resolves token -> open/attach TableSession
  SESSION_OPEN --> CART: guest adds items (client-side draft, server-validated)
  CART --> SUBMITTED: guest submits order (server validates availability + price + flags)
  SUBMITTED --> ACCEPTED: staff/kitchen accepts (or auto-accept flag)
  ACCEPTED --> IN_PREP: kitchen starts
  IN_PREP --> READY: kitchen marks ready
  READY --> SERVED: waiter serves
  SERVED --> PAID: payment captured (pay-now or pay-at-end)
  PAID --> CLOSED: session settled, table freed
  SUBMITTED --> CANCELLED: staff/guest cancels pre-accept
  ACCEPTED --> CANCELLED: staff cancels (with reason)
  CLOSED --> [*]

```

### Real-Time Propagation

- `SUBMITTED` event → appears instantly on Kitchen MFE order board (the demo "wow" moment)
- `IN_PREP`/`READY` → guest sees their order status on the table; waiter sees "ready for pickup"
- `SERVED`/`PAID`/`CLOSED` → table status flips to free on the Staff MFE floor view

### Payment Timing

Payment timing is a venue setting (flag-gated):

- **Pay-per-order** (fast casual / bar): each order pays immediately; SUBMITTED → PAID may precede kitchen
- **Pay-at-end** (table service): orders accumulate on the table\_session; one settlement at the end

**Idempotency.** Payment captures and webhook handlers use `payment.idempotency_key`. A retried Stripe webhook must never double-charge or double-advance state.

**Reconnect resync.** On KDS reconnect the client calls `GET /staff/orders?status=open` and rebuilds its board from authoritative state.

## API Surface (Ktor Route Groups)

```
/health                      GET    liveness/readiness (MUST), RLS-role self-check

# ---- GUEST (public, scoped by signed QR token, no JWT) ----
/guest/resolve               POST   { qrToken } -> { venueId, tableId, sessionId, branding }
/guest/menu                  GET    active menu for resolved venue
/guest/session/{id}          GET    current session + my orders + live status
/guest/cart/validate         POST   server-side price/availability/flag re-check
/guest/order                 POST   submit order (idempotency key) -> SUBMITTED
/guest/payment/intent        POST   create payment intent
/guest/payment/confirm       POST   confirm/capture
/guest/stream                GET    SSE: my order/table status updates

# ---- STAFF / KITCHEN (JWT staff, role-gated) ----
/staff/auth/login            POST   email+password -> JWT
/staff/orders                GET    open orders board
/staff/orders/{id}/accept    POST   SUBMITTED -> ACCEPTED
/staff/orders/{id}/prep      POST   ACCEPTED -> IN_PREP
/staff/orders/{id}/ready     POST   IN_PREP -> READY
/staff/orders/{id}/serve     POST   READY -> SERVED
/staff/sessions/{id}/close   POST   settle + free table -> CLOSED
/staff/stream                WS     live order events (KDS)

# ---- ADMIN / VENUE DASHBOARD (JWT admin/owner) ----
/admin/venues                CRUD   venue + branding
/admin/tables                CRUD   tables + QR token (re)generation
/admin/menus                 CRUD   menu/category/item/modifier
/admin/staff                 CRUD   staff + roles
/admin/reports               GET    sales/orders summaries

# ---- WEBHOOKS (signature-verified) ----
/webhooks/payment/{provider} POST   signed payment events

```

## Feature-Flag Map (Unleash)

Same pattern as Bilko feature-enable (MC #102481): the **plan tier** drives a set of Unleash flags; flags are evaluated with a venue context so a flag can also be force-toggled for a single venue (pilot, demo, A/B).

<table id="bkmrk-capability-flag-key-"><thead><tr><th>Capability</th><th>Flag key</th><th>Basic</th><th>Pro</th><th>Enterprise</th></tr></thead><tbody><tr><td>QR menu + order + pay (core)</td><td>always-on</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Kitchen display (KDS real-time)</td><td>`kds.realtime`</td><td>✓</td><td>✓</td><td>✓</td></tr><tr><td>Multi-language menu</td><td>`menu.multilang`</td><td>–</td><td>✓</td><td>✓</td></tr><tr><td>Tipping at checkout</td><td>`pay.tipping`</td><td>–</td><td>✓</td><td>✓</td></tr><tr><td>Split bill</td><td>`pay.splitbill`</td><td>–</td><td>✓</td><td>✓</td></tr><tr><td>Pay-at-end (table tab)</td><td>`pay.payatend`</td><td>–</td><td>✓</td><td>✓</td></tr><tr><td>AI upsell / recommendations</td><td>`ai.upsell`</td><td>–</td><td>–</td><td>✓</td></tr><tr><td>White-label theming</td><td>`brand.whitelabel`</td><td>–</td><td>✓</td><td>✓</td></tr><tr><td>Chain dashboard</td><td>`chain.dashboard`</td><td>–</td><td>–</td><td>✓</td></tr></tbody></table>

Backend gates the *capability* so a flag is a real security/contract boundary, not just a UI hide. The MFE hides UI; the API enforces.

## Architectural Non-Negotiables

1. `qody_app` DB role MUST NOT have BYPASSRLS and MUST NOT own tables; fail-closed startup check.
2. RLS ENABLED + FORCED on every tenant table; `app.current_venue_id` set at checkout, reset on release.
3. Money is `NUMERIC(19,4)`, snapshotted on order lines; never recomputed from live catalogue.
4. Order state machine is server-enforced; illegal transitions rejected; transitions emit via transactional outbox.
5. Real-time is an optimization over an authoritative DB; clients resync on reconnect.
6. Payment webhooks signature-verified + idempotent; never double-charge/double-advance.
7. Capabilities enforced at the API (flag = contract boundary), not just hidden in the MFE.
8. Deploy verification per ZAKON PI2 — verify the new revision actually serves 100%.
9. Distribute only proven seams. Start in-process; earn Kafka/microservices, do not anticipate them.