Test Strategy

Project: Bilko Version: 1.10 Date: 2026-~~05-21~~02-25 Author: Ops Architect ~~/ ALAI Documentation Team~~ Status: ~~Active~~Final Reviewers: Tech Lead, Alem Bašić

Document History

Version	Date	Author	Changes
0.1	2026-02-23	Ops Architect	Initial draft
1.0	2026-02-25	ALAI Documentation Team	Finalized — approved for production use
~~1.1~~	~~2026-05-21~~	~~ALAI Documentation Team~~	~~Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy~~

1. Testing Philosophy & Principles

Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.

Core Principles:

Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
Tests are first-class code — reviewed, maintained, and refactored alongside production code
Test the behavior, not the implementation — tests enable safe refactoring of internals
Fast feedback — unit tests run in < 3 min; full suite < 10 min
No test = no ship — financial logic without a test is a P0 blocker for merging
Isolation — every test cleans up after itself; no test depends on another

Testing philosophy: Bilko follows anthe ~~industry-standard~~test ~~layered~~pyramid ~~strategy:~~— ~~focused~~heavy unit test coverage ~~for~~of financial calculations and business logic, ~~strong~~targeted ~~integration/contract~~integration ~~coverage~~tests for ~~Ktor~~ API + ~~PostgreSQL behavior,~~database, and aE2E ~~small Playwright layer~~tests for the 4 critical user ~~journeys.~~journeys ~~This~~(invoice, isexpense, ~~compatible~~report, ~~with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser.~~auth). We do not aim for 100% E2E coverage.

2. Layered Test StrategyPyramid

         /\
        /E2E\        ← 10% — 12 tests — Playwright
       E2E/------\
      /  DemoInteg Smoke\     Critical← browser30% journeys— +35 deployedtests health— evidenceSupertest
     Integration/----------\
    + Contract Tests (dominant)
       Ktor routes, services, PostgreSQL/Testcontainers, auth,
       RBAC, multi-tenant isolation, frontend/backend API contract

                    Focused/    Unit    Tests\   Financial← engine,60% VAT,— currency,45 validators,tests pure— logicVitest
   /--------------\

~~Target~~Distribution ~~distribution:~~(92 total planned — see TEST-INVENTORY.md):

60% Unit ~~tests~~Tests (45) — ~~financial~~Financial ~~logic~~logic, ~~and~~utilities, ~~pure business rules, especially~~ packages/core ~~and accounting/tax code~~auth
~~Integration/contract~~30% ~~tests~~Integration Tests (35) — API ~~routes,~~endpoints, DBdatabase, ~~behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility~~org-scoping
~~Critical Playwright~~10% E2E Tests (12) — ~~a small, maintained set for invoice,~~Invoice, expense, report, ~~auth, and settings~~auth flows

~~Real-demo smoke — non-destructive deployed health check against~~ https://bilko-demo.alai.no

~~Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate~~

3. Testing Tools

Type	Tool	Version	Purpose	Config
Unit testing	Vitest ~~+ Kotlin/JUnit~~	Latest	Business logic, ~~utilities, services~~utilities	~~package configs / Gradle~~`vitest.config.ts`
Mocking	~~Vitest,~~Vitest ~~MockK~~built-in	—	Mock external deps ~~where~~(no ~~appropriate~~real DB)	Built-in ~~/ Gradle~~
Integration testing	~~Ktor test host + JUnit~~Supertest	Latest	API endpoint testing with ~~PostgreSQL~~real PG	`apps/api/build.gradle.ktssrc/test/setup.ts`
Test database	PostgreSQL 15	~~15/16~~15	Real database ~~via~~for ~~local~~integration ~~DB/Testcontainers~~tests	~~Gradle~~ `integrationTest.env.test`
E2E testing	Playwright	Latest	Browser automation, ~~critical~~ user flows	`apps/e2e/playwright.config.ts`
Coverage	~~Kover~~c8 +(Vitest ~~Vitest~~built-in)	—	Coverage reports ~~and ratchets~~	~~Gradle Kover / package configs~~`vitest.config.ts`
Performance	k6	Latest	Load testing (PLANNED Phase 2)	`apps/e2e/load/`

Why Vitest (not Jest)

ESM native, Vite-based → faster
Compatible with Turborepo
Watch mode with HMR
Same API as Jest (easy migration)

Why Playwright (not Cypress)

Multi-browser: Chromium, Firefox, WebKit (Safari)
Auto-wait (no flaky tests from race conditions)
Parallel execution (workers: 4)
Video and trace on failure

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

Attribute	Value
Scope	Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting
External dependencies	Mocked — no real DB, network, or filesystem
Coverage target	> 95% for financial logic; > 90% utilities; > 80% services; > 80% overall
Execution time	< 3 minutes
Runs on	Every commit, pre-commit hook (lint + type-check only), CI on every push
Written by	Developer who writes the feature

What to unit test:

calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25%
validateDoubleEntry(debit, credit) — must be equal, error on imbalance
convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4)
calculateInvoiceTotal(items) — subtotal, tax, discount, total
lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's

What NOT to unit test:

~~Framework~~Prisma ORM internals

~~(Ktor,~~

Express ~~Next.js,~~framework ~~Exposed, JDBC)~~boilerplate
Simple property getters/setters with no logic
~~Full browser journeys that belong in Playwright E2E or demo smoke~~

4.2 Integration +Tests Contract Tests(Supertest)

Attribute	Value
Scope	~~Ktor~~All API ~~routes,~~routes ~~service~~with ~~boundaries,~~real PostgreSQL ~~behavior,~~15 ~~contracts~~database
External dependencies	Real PostgreSQL ~~via~~(test ~~local~~container in CI, `bilko_test` DB ~~or Testcontainers where needed~~local)
Coverage target	All service boundaries; > 80% of integration paths
Execution time	< 105 minutes ~~for blocking gate~~
Runs on	Every ~~PR / deploy gate,~~PR, blocking merge ~~where configured~~
Written by	Developer who writes the API endpoint ~~or client contract~~

What to integration test:

Auth flow (register, login, refresh, logout)
Invoice CRUD + status transitions (draft → sent → paid)
Expense CRUD + approval flow
Reports API (P&L, VAT, balance sheet)
Organization scoping — org A cannot read org B's data (P0 security test)
RBAC enforcement — viewer cannot create, owner can delete

4.3 E2E Tests (Playwright)

Attribute	Value
Scope	~~Critical~~4 critical user journeys through ~~deployed/staging~~deployed application
External dependencies	Real (staging ~~services;~~environment ~~production/demo~~or ~~only for non-destructive smoke~~production)
Coverage target	~~Critical~~4 critical journeys + ~~smoke,~~8 ~~not exhaustive UI coverage~~sub-scenarios
Execution time	< 8 minutes ~~for critical gate; < 1 minute for real-demo smoke~~
Runs on	Post-staging deploy, pre-production ~~gate, post-deploy smoke~~gate
Written by	Developer + QA collaboration

Critical journeys:

~~Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior~~

Invoice Flow: Create ~~draft~~(6-step ~~in resettable staging/demo tenant~~wizard) → Preview → ~~Send/~~Send → Mark Paid ~~where safe~~
Expense Flow: Add → Upload Receipt → ~~Approve/Reject/~~Approve → Pay ~~where safe~~
Report Flow: Generate P&~~L/VAT report~~L → Export ~~PDF/XLSX where safe~~PDF
~~Settings~~Auth Flow: ~~Organization/settings/users~~Register ~~page~~→ ~~loads~~Login ~~and~~→ ~~key~~2FA ~~controls~~→ ~~are visible~~Logout

~~Rule:~~ public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.

5. Test Data Management

Approach	Used For	Tool	Cleanup
Test factories	Unit + integration	`apps/api/src/test/factories/`	Per-test (beforeEach teardown)
Database seeding	~~E2E/full-demo~~E2E ~~rehearsal~~tests	~~Flyway fixtures / seed scripts / API factories~~`packages/database/prisma/seed.ts`	Per ~~resettable environment~~E2E run
PostgreSQL transactions	Integration tests	~~Test~~Prisma `$transaction` ~~or teardown helpers~~rollback	Per test

Isolation rule: beforeEach in integration tests ~~must~~clears ~~create~~all ~~isolated~~tables ~~organization/user~~via ~~fixtures~~Prisma ~~and~~deleteMany() ~~clean up deterministically. Cross-test dependence is forbidden.~~cascade.

Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.

6. Coverage Requirements

Layer	Lines	Branches	Functions	Enforcement
Financial logic (VAT, double-entry, currency)	≥ 95%	≥ 90%	≥ 100%	CI hard fail
Authentication utils	≥ 95%	≥ 90%	≥ 100%	CI hard fail
API handlers	≥ 80%	≥ 75%	≥ 80%	CI hard fail
Utilities	≥ 90%	≥ 85%	≥ 90%	CI hard fail
Overall minimum	≥ 80%	≥ 75%	≥ 80%	CI hard fail

Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.

7. Quality Gates

PR Merge Gate

All unit tests pass
All integration tests pass
Coverage ≥ minimum thresholds
Linting passes (ESLint + Prettier)
Type checking passes (TypeScript strict)
No new HIGH/CRITICAL security findings

Staging Deploy Gate

All PR gates passed
Build artifact created successfully

Production Deploy Gate

~~Critical~~All E2E ~~gate~~tests ~~passes~~pass on ~~staging/resettable environment~~

~~Real-demo smoke passes after deploy with screenshot/video evidence~~staging
Performance baseline not degraded > 20% ~~for relevant changes~~
Manual approval in CI pipeline

8. Responsibility Matrix

Test Type	Writes	Reviews	Maintains	Signs Off
Unit tests	Developer	PR reviewer	Developer	Tech Lead
Integration tests	Developer	QA / Tech Lead	Developer	Tech Lead
E2E tests	Developer	Tech Lead	Developer	Tech Lead
Performance tests	DevOps	Tech Lead	DevOps	Alem Bašić

9. Test Reporting & Metrics

Metric	Target
Test pass rate	≥ 99% unit, ≥ 95% E2E
Flaky test rate	< 2%
Full suite execution time	< 10 min
Coverage trend	Stable or improving per sprint
Financial logic coverage	≥ 95% at all times

10. Continuous Testing in CI/CD

Stage	Tests Run	Blocking
Pre-commit (local)	lint + type-check only	Recommended (Husky)
PR open/update	unit + integration + lint + type-check	Yes — blocks merge
Staging deploy	~~Critical~~ E2E (Playwright, ~~Chromium~~3 ~~primary; other browsers scheduled/risk-based)~~browsers)	Yes — blocks production
Production deploy	~~Real-demo~~Smoke ~~smoke (~~`npm run test:real-demo-smoke`~~) with evidence~~tests	Yes — ~~rollback/escalate~~auto-rollback on failure
Nightly ~~/ scheduled~~(PLANNED)	Full E2E ~~regression + destructive/resettable tests~~suite + performance	No — ~~alerts/issues,~~alerts ~~not automatic deploy blocker~~only

~~Demo Testing Plan~~

Approval

Role	Name	Date
Author	Ops Architect	2026-02-23
Reviewer	Tech Lead
Approver	Alem Bašić

Test Strategy

Test Strategy

Document History

1. Testing Philosophy & Principles

2. Layered Test StrategyPyramid

3. Testing Tools

Why Vitest (not Jest)

Why Playwright (not Cypress)

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

4.2 Integration +Tests Contract Tests(Supertest)

4.3 E2E Tests (Playwright)

5. Test Data Management

6. Coverage Requirements

7. Quality Gates

PR Merge Gate

Staging Deploy Gate

Production Deploy Gate

8. Responsibility Matrix

9. Test Reporting & Metrics

10. Continuous Testing in CI/CD

Related Documents

Approval