Test Strategy

Project: Bilko Version: 1.01 Date: 2026-~~02-25~~05-21 Author: Ops Architect / ALAI Documentation Team Status: ~~Final~~Active Reviewers: Tech Lead, Alem Bašić

Document History

Version	Date	Author	Changes
0.1	2026-02-23	Ops Architect	Initial draft
1.0	2026-02-25	ALAI Documentation Team	Finalized — approved for production use
1.1	2026-05-21	ALAI Documentation Team	Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy

1. Testing Philosophy & Principles

Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.

Core Principles:

Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
Tests are first-class code — reviewed, maintained, and refactored alongside production code
Test the behavior, not the implementation — tests enable safe refactoring of internals
Fast feedback — unit tests run in < 3 min; full suite < 10 min
No test = no ship — financial logic without a test is a P0 blocker for merging
Isolation — every test cleans up after itself; no test depends on another

Testing philosophy: Bilko follows ~~the~~an ~~test~~industry-standard ~~pyramid~~layered —strategy: ~~heavy~~focused unit ~~test~~ coverage offor financial calculations and business logic, ~~targeted~~strong ~~integration~~integration/contract ~~tests~~coverage for Ktor API + ~~database,~~PostgreSQL behavior, and ~~E2E~~a ~~tests~~small Playwright layer for ~~the 4~~ critical user ~~journeys~~journeys. ~~(invoice,~~This ~~expense,~~is ~~report,~~compatible ~~auth).~~with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage.

2. Layered Test PyramidStrategy

                 /\
        /E2E\        ← 10% — 12 tests — Playwright /------\E2E / IntegDemo \Smoke
        ←Critical 30%browser —journeys 35+ testsdeployed —health Supertestevidence

              /----------\Integration /+ Contract Tests (dominant)
       Ktor routes, services, PostgreSQL/Testcontainers, auth,
       RBAC, multi-tenant isolation, frontend/backend API contract

                    Focused Unit \Tests
       ←Financial 60%engine, —VAT, 45currency, testsvalidators, —pure Vitest
   /--------------\logic

~~Distribution~~Target ~~(92 total planned — see TEST-INVENTORY.md):~~distribution:

~~60%~~ Unit ~~Tests (45)~~tests — ~~Financial~~financial ~~logic,~~logic ~~utilities,~~and ~~auth~~pure business rules, especially packages/core and accounting/tax code
~~30%~~Integration/contract ~~Integration Tests (35)~~tests — API ~~endpoints,~~routes, ~~database,~~DB ~~org-scoping~~behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility
~~10%~~Critical Playwright E2E ~~Tests (12)~~ — ~~Invoice,~~a small, maintained set for invoice, expense, report, ~~auth~~auth, and settings flows

Real-demo smoke — non-destructive deployed health check against https://bilko-demo.alai.no

Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate

3. Testing Tools

Type	Tool	Version	Purpose	Config
Unit testing	Vitest + Kotlin/JUnit	Latest	Business logic, ~~utilities~~utilities, services	`vitest.config.ts`package configs / Gradle
Mocking	~~Vitest~~Vitest, ~~built-in~~MockK	—	Mock external deps ~~(no~~where ~~real DB)~~appropriate	Built-in / Gradle
Integration testing	~~Supertest~~Ktor test host + JUnit	Latest	API endpoint testing with ~~real PG~~PostgreSQL	`apps/api/src/test/setup.tsbuild.gradle.kts`
Test database	PostgreSQL 15	1515/16	Real database ~~for~~via ~~integration~~local ~~tests~~DB/Testcontainers	Gradle `.env.testintegrationTest`
E2E testing	Playwright	Latest	Browser automation, critical user flows	`apps/e2e/playwright.config.ts`
Coverage	c8Kover (+ Vitest ~~built-in)~~	—	Coverage reports and ratchets	`vitest.config.ts`Gradle Kover / package configs
Performance	k6	Latest	Load testing (PLANNED Phase 2)	`apps/e2e/load/`

Why Vitest (not Jest)

ESM native, Vite-based → faster
Compatible with Turborepo
Watch mode with HMR
Same API as Jest (easy migration)

Why Playwright (not Cypress)

Multi-browser: Chromium, Firefox, WebKit (Safari)
Auto-wait (no flaky tests from race conditions)
Parallel execution (workers: 4)
Video and trace on failure

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

Attribute	Value
Scope	Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting
External dependencies	Mocked — no real DB, network, or filesystem
Coverage target	> 95% for financial logic; > 90% utilities; > 80% services; > 80% overall
Execution time	< 3 minutes
Runs on	Every commit, pre-commit hook (lint + type-check only), CI on every push
Written by	Developer who writes the feature

What to unit test:

calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25%
validateDoubleEntry(debit, credit) — must be equal, error on imbalance
convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4)
calculateInvoiceTotal(items) — subtotal, tax, discount, total
lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's

What NOT to unit test:

~~Prisma ORM~~Framework internals
~~Express~~(Ktor, ~~framework~~Next.js, ~~boilerplate~~Exposed, JDBC)
Simple property getters/setters with no logic
Full browser journeys that belong in Playwright E2E or demo smoke

4.2 Integration + Contract Tests (Supertest)

Attribute	Value
Scope	~~All~~Ktor API ~~routes~~routes, ~~with~~service ~~real~~boundaries, PostgreSQL 15behavior, ~~database~~contracts
External dependencies	Real PostgreSQL ~~(test~~via ~~container in CI,~~ `bilko_test`local DB ~~local)~~or Testcontainers where needed
Coverage target	All service boundaries; > 80% of integration paths
Execution time	< 510 minutes for blocking gate
Runs on	Every ~~PR,~~PR / deploy gate, blocking merge where configured
Written by	Developer who writes the API endpoint or client contract

What to integration test:

Auth flow (register, login, refresh, logout)
Invoice CRUD + status transitions (draft → sent → paid)
Expense CRUD + approval flow
Reports API (P&L, VAT, balance sheet)
Organization scoping — org A cannot read org B's data (P0 security test)
RBAC enforcement — viewer cannot create, owner can delete

4.3 E2E Tests (Playwright)

Attribute	Value
Scope	~~4 critical~~Critical user journeys through ~~deployed~~deployed/staging application
External dependencies	Real (staging ~~environment~~services; orproduction/demo ~~production)~~only for non-destructive smoke
Coverage target	~~4 critical~~Critical journeys + 8smoke, ~~sub-scenarios~~not exhaustive UI coverage
Execution time	< 8 minutes for critical gate; < 1 minute for real-demo smoke
Runs on	Post-staging deploy, pre-production ~~gate~~gate, post-deploy smoke
Written by	Developer + QA collaboration

Critical journeys:

Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior

Invoice Flow: Create ~~(6-step~~draft ~~wizard)~~in resettable staging/demo tenant → Preview → ~~Send →~~ Send/Mark Paid where safe
Expense Flow: Add → Upload Receipt → ~~Approve~~Approve/Reject/Pay →where ~~Pay~~safe
Report Flow: Generate P&LL/VAT report → Export ~~PDF~~PDF/XLSX where safe
~~Auth~~Settings Flow: ~~Register~~Organization/settings/users →page ~~Login~~loads →and ~~2FA~~key →controls ~~Logout~~are visible

Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.

5. Test Data Management

Approach	Used For	Tool	Cleanup
Test factories	Unit + integration	`apps/api/src/test/factories/`	Per-test (beforeEach teardown)
Database seeding	~~E2E~~E2E/full-demo ~~tests~~rehearsal	`packages/database/prisma/seed.ts`Flyway fixtures / seed scripts / API factories	Per ~~E2E~~resettable environment run
PostgreSQL transactions	Integration tests	~~Prisma~~Test `$transaction` ~~rollback~~or teardown helpers	Per test

Isolation rule: beforeEach in integration tests ~~clears~~must ~~all~~create ~~tables~~isolated ~~via~~organization/user ~~Prisma~~fixtures deleteMany()and ~~cascade.~~clean up deterministically. Cross-test dependence is forbidden.

Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.

6. Coverage Requirements

Layer	Lines	Branches	Functions	Enforcement
Financial logic (VAT, double-entry, currency)	≥ 95%	≥ 90%	≥ 100%	CI hard fail
Authentication utils	≥ 95%	≥ 90%	≥ 100%	CI hard fail
API handlers	≥ 80%	≥ 75%	≥ 80%	CI hard fail
Utilities	≥ 90%	≥ 85%	≥ 90%	CI hard fail
Overall minimum	≥ 80%	≥ 75%	≥ 80%	CI hard fail

Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.

7. Quality Gates

PR Merge Gate

All unit tests pass
All integration tests pass
Coverage ≥ minimum thresholds
Linting passes (ESLint + Prettier)
Type checking passes (TypeScript strict)
No new HIGH/CRITICAL security findings

Staging Deploy Gate

All PR gates passed
Build artifact created successfully

Production Deploy Gate

~~All~~Critical E2E ~~tests~~gate ~~pass~~passes on ~~staging~~staging/resettable environment

Real-demo smoke passes after deploy with screenshot/video evidence
Performance baseline not degraded > 20% for relevant changes
Manual approval in CI pipeline

8. Responsibility Matrix

Test Type	Writes	Reviews	Maintains	Signs Off
Unit tests	Developer	PR reviewer	Developer	Tech Lead
Integration tests	Developer	QA / Tech Lead	Developer	Tech Lead
E2E tests	Developer	Tech Lead	Developer	Tech Lead
Performance tests	DevOps	Tech Lead	DevOps	Alem Bašić

9. Test Reporting & Metrics

Metric	Target
Test pass rate	≥ 99% unit, ≥ 95% E2E
Flaky test rate	< 2%
Full suite execution time	< 10 min
Coverage trend	Stable or improving per sprint
Financial logic coverage	≥ 95% at all times

10. Continuous Testing in CI/CD

Stage	Tests Run	Blocking
Pre-commit (local)	lint + type-check only	Recommended (Husky)
PR open/update	unit + integration + lint + type-check	Yes — blocks merge
Staging deploy	Critical E2E (Playwright, 3Chromium ~~browsers)~~primary; other browsers scheduled/risk-based)	Yes — blocks production
Production deploy	~~Smoke~~Real-demo ~~tests~~smoke (`npm run test:real-demo-smoke`) with evidence	Yes — ~~auto-rollback~~rollback/escalate on failure
Nightly ~~(PLANNED)~~/ scheduled	Full E2E ~~suite~~regression + destructive/resettable tests + performance	No — ~~alerts~~alerts/issues, ~~only~~not automatic deploy blocker

Demo Testing Plan

Approval

Role	Name	Date
Author	Ops Architect	2026-02-23
Reviewer	Tech Lead
Approver	Alem Bašić

Test Strategy

Test Strategy

Document History

1. Testing Philosophy & Principles

2. Layered Test PyramidStrategy

3. Testing Tools

Why Vitest (not Jest)

Why Playwright (not Cypress)

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

4.2 Integration + Contract Tests (Supertest)

4.3 E2E Tests (Playwright)

5. Test Data Management

6. Coverage Requirements

7. Quality Gates

PR Merge Gate

Staging Deploy Gate

Production Deploy Gate

8. Responsibility Matrix

9. Test Reporting & Metrics

10. Continuous Testing in CI/CD

Related Documents

Approval