Test Strategy

Project: Bilko Version: 1.1 Date: 2026-05-21 Author: Ops Architect / ALAI Documentation Team Status: Active Reviewers: Tech Lead, Alem Bašić

Document History

Version	Date	Author	Changes
0.1	2026-02-23	Ops Architect	Initial draft
1.0	2026-02-25	ALAI Documentation Team	Finalized — approved for production use
1.1	2026-05-21	ALAI Documentation Team	Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy

1. Testing Philosophy & Principles

Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.

Core Principles:

Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
Tests are first-class code — reviewed, maintained, and refactored alongside production code
Test the behavior, not the implementation — tests enable safe refactoring of internals
Fast feedback — unit tests run in < 3 min; full suite < 10 min
No test = no ship — financial logic without a test is a P0 blocker for merging
Isolation — every test cleans up after itself; no test depends on another

Testing philosophy: Bilko follows an industry-standard layered strategy: focused unit coverage for financial calculations and business logic, strong integration/contract coverage for Ktor API + PostgreSQL behavior, and a small Playwright layer for critical user journeys. This is compatible with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage.

2. Layered Test Strategy

                 Playwright E2E / Demo Smoke
        Critical browser journeys + deployed health evidence

              Integration + Contract Tests (dominant)
       Ktor routes, services, PostgreSQL/Testcontainers, auth,
       RBAC, multi-tenant isolation, frontend/backend API contract

                    Focused Unit Tests
       Financial engine, VAT, currency, validators, pure logic

Target distribution:

Unit tests — financial logic and pure business rules, especially packages/core and accounting/tax code
Integration/contract tests — API routes, DB behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility
Critical Playwright E2E — a small, maintained set for invoice, expense, report, auth, and settings flows
Real-demo smoke — non-destructive deployed health check against https://bilko-demo.alai.no
Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate

3. Testing Tools

Type	Tool	Version	Purpose	Config
Unit testing	Vitest + Kotlin/JUnit	Latest	Business logic, utilities, services	package configs / Gradle
Mocking	Vitest, MockK	—	Mock external deps where appropriate	Built-in / Gradle
Integration testing	Ktor test host + JUnit	Latest	API endpoint testing with PostgreSQL	`apps/api/build.gradle.kts`
Test database	PostgreSQL	15/16	Real database via local DB/Testcontainers	Gradle `integrationTest`
E2E testing	Playwright	Latest	Browser automation, critical user flows	`apps/e2e/playwright.config.ts`
Coverage	Kover + Vitest	—	Coverage reports and ratchets	Gradle Kover / package configs
Performance	k6	Latest	Load testing (PLANNED Phase 2)	`apps/e2e/load/`

Why Vitest (not Jest)

ESM native, Vite-based → faster
Compatible with Turborepo
Watch mode with HMR
Same API as Jest (easy migration)

Why Playwright (not Cypress)

Multi-browser: Chromium, Firefox, WebKit (Safari)
Auto-wait (no flaky tests from race conditions)
Parallel execution (workers: 4)
Video and trace on failure

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

Attribute	Value
Scope	Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting
External dependencies	Mocked — no real DB, network, or filesystem
Coverage target	> 95% for financial logic; > 90% utilities; > 80% services; > 80% overall
Execution time	< 3 minutes
Runs on	Every commit, pre-commit hook (lint + type-check only), CI on every push
Written by	Developer who writes the feature

What to unit test:

calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25%
validateDoubleEntry(debit, credit) — must be equal, error on imbalance
convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4)
calculateInvoiceTotal(items) — subtotal, tax, discount, total
lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's

What NOT to unit test:

Framework internals (Ktor, Next.js, Exposed, JDBC)
Simple property getters/setters with no logic
Full browser journeys that belong in Playwright E2E or demo smoke

4.2 Integration + Contract Tests

Attribute	Value
Scope	Ktor API routes, service boundaries, PostgreSQL behavior, contracts
External dependencies	Real PostgreSQL via local DB or Testcontainers where needed
Coverage target	All service boundaries; > 80% of integration paths
Execution time	< 10 minutes for blocking gate
Runs on	Every PR / deploy gate, blocking merge where configured
Written by	Developer who writes the API endpoint or client contract

What to integration test:

Auth flow (register, login, refresh, logout)
Invoice CRUD + status transitions (draft → sent → paid)
Expense CRUD + approval flow
Reports API (P&L, VAT, balance sheet)
Organization scoping — org A cannot read org B's data (P0 security test)
RBAC enforcement — viewer cannot create, owner can delete

4.3 E2E Tests (Playwright)

Attribute	Value
Scope	Critical user journeys through deployed/staging application
External dependencies	Real staging services; production/demo only for non-destructive smoke
Coverage target	Critical journeys + smoke, not exhaustive UI coverage
Execution time	< 8 minutes for critical gate; < 1 minute for real-demo smoke
Runs on	Post-staging deploy, pre-production gate, post-deploy smoke
Written by	Developer + QA collaboration

Critical journeys:

Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior
Invoice Flow: Create draft in resettable staging/demo tenant → Preview → Send/Mark Paid where safe
Expense Flow: Add → Upload Receipt → Approve/Reject/Pay where safe
Report Flow: Generate P&L/VAT report → Export PDF/XLSX where safe
Settings Flow: Organization/settings/users page loads and key controls are visible

Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.

5. Test Data Management

Approach	Used For	Tool	Cleanup
Test factories	Unit + integration	`apps/api/src/test/factories/`	Per-test (beforeEach teardown)
Database seeding	E2E/full-demo rehearsal	Flyway fixtures / seed scripts / API factories	Per resettable environment run
PostgreSQL transactions	Integration tests	Test transaction or teardown helpers	Per test

Isolation rule: integration tests must create isolated organization/user fixtures and clean up deterministically. Cross-test dependence is forbidden.

Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.

6. Coverage Requirements

Layer	Lines	Branches	Functions	Enforcement
Financial logic (VAT, double-entry, currency)	≥ 95%	≥ 90%	≥ 100%	CI hard fail
Authentication utils	≥ 95%	≥ 90%	≥ 100%	CI hard fail
API handlers	≥ 80%	≥ 75%	≥ 80%	CI hard fail
Utilities	≥ 90%	≥ 85%	≥ 90%	CI hard fail
Overall minimum	≥ 80%	≥ 75%	≥ 80%	CI hard fail

Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.

7. Quality Gates

PR Merge Gate

All unit tests pass
All integration tests pass
Coverage ≥ minimum thresholds
Linting passes (ESLint + Prettier)
Type checking passes (TypeScript strict)
No new HIGH/CRITICAL security findings

Staging Deploy Gate

All PR gates passed
Build artifact created successfully

Production Deploy Gate

Critical E2E gate passes on staging/resettable environment
Real-demo smoke passes after deploy with screenshot/video evidence
Performance baseline not degraded > 20% for relevant changes
Manual approval in CI pipeline

8. Responsibility Matrix

Test Type	Writes	Reviews	Maintains	Signs Off
Unit tests	Developer	PR reviewer	Developer	Tech Lead
Integration tests	Developer	QA / Tech Lead	Developer	Tech Lead
E2E tests	Developer	Tech Lead	Developer	Tech Lead
Performance tests	DevOps	Tech Lead	DevOps	Alem Bašić

9. Test Reporting & Metrics

Metric	Target
Test pass rate	≥ 99% unit, ≥ 95% E2E
Flaky test rate	< 2%
Full suite execution time	< 10 min
Coverage trend	Stable or improving per sprint
Financial logic coverage	≥ 95% at all times

10. Continuous Testing in CI/CD

Stage	Tests Run	Blocking
Pre-commit (local)	lint + type-check only	Recommended (Husky)
PR open/update	unit + integration + lint + type-check	Yes — blocks merge
Staging deploy	Critical E2E (Playwright, Chromium primary; other browsers scheduled/risk-based)	Yes — blocks production
Production deploy	Real-demo smoke (`npm run test:real-demo-smoke`) with evidence	Yes — rollback/escalate on failure
Nightly / scheduled	Full E2E regression + destructive/resettable tests + performance	No — alerts/issues, not automatic deploy blocker

Approval

Role	Name	Date
Author	Ops Architect	2026-02-23
Reviewer	Tech Lead
Approver	Alem Bašić

Test Strategy

Test Strategy

Document History

1. Testing Philosophy & Principles

2. Layered Test Strategy

3. Testing Tools

Why Vitest (not Jest)

Why Playwright (not Cypress)

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

4.2 Integration + Contract Tests

4.3 E2E Tests (Playwright)

5. Test Data Management

6. Coverage Requirements

7. Quality Gates

PR Merge Gate

Staging Deploy Gate

Production Deploy Gate

8. Responsibility Matrix

9. Test Reporting & Metrics

10. Continuous Testing in CI/CD

Related Documents

Approval