Test Strategy
Test Strategy
Project: Bilko Version: 1.1 Date: 2026-05-21 Author: Ops Architect / ALAI Documentation Team Status: Active Reviewers: Tech Lead, Alem Bašić
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-02-23 | Ops Architect | Initial draft |
| 1.0 | 2026-02-25 | ALAI Documentation Team | Finalized — approved for production use |
| 1.1 | 2026-05-21 | ALAI Documentation Team | Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy |
1. Testing Philosophy & Principles
Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.
Core Principles:
- Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
- Tests are first-class code — reviewed, maintained, and refactored alongside production code
- Test the behavior, not the implementation — tests enable safe refactoring of internals
- Fast feedback — unit tests run in < 3 min; full suite < 10 min
- No test = no ship — financial logic without a test is a P0 blocker for merging
- Isolation — every test cleans up after itself; no test depends on another
Testing philosophy: Bilko follows an industry-standard layered strategy: focused unit coverage for financial calculations and business logic, strong integration/contract coverage for Ktor API + PostgreSQL behavior, and a small Playwright layer for critical user journeys. This is compatible with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage.
2. Layered Test Strategy
Playwright E2E / Demo Smoke
Critical browser journeys + deployed health evidence
Integration + Contract Tests (dominant)
Ktor routes, services, PostgreSQL/Testcontainers, auth,
RBAC, multi-tenant isolation, frontend/backend API contract
Focused Unit Tests
Financial engine, VAT, currency, validators, pure logic
Target distribution:
- Unit tests — financial logic and pure business rules, especially
packages/coreand accounting/tax code - Integration/contract tests — API routes, DB behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility
- Critical Playwright E2E — a small, maintained set for invoice, expense, report, auth, and settings flows
- Real-demo smoke — non-destructive deployed health check against
https://bilko-demo.alai.no - Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate
3. Testing Tools
| Type | Tool | Version | Purpose | Config |
|---|---|---|---|---|
| Unit testing | Vitest + Kotlin/JUnit | Latest | Business logic, utilities, services | package configs / Gradle |
| Mocking | Vitest, MockK | — | Mock external deps where appropriate | Built-in / Gradle |
| Integration testing | Ktor test host + JUnit | Latest | API endpoint testing with PostgreSQL | apps/api/build.gradle.kts |
| Test database | PostgreSQL | 15/16 | Real database via local DB/Testcontainers | Gradle integrationTest |
| E2E testing | Playwright | Latest | Browser automation, critical user flows | apps/e2e/playwright.config.ts |
| Coverage | Kover + Vitest | — | Coverage reports and ratchets | Gradle Kover / package configs |
| Performance | k6 | Latest | Load testing (PLANNED Phase 2) | apps/e2e/load/ |
Why Vitest (not Jest)
- ESM native, Vite-based → faster
- Compatible with Turborepo
- Watch mode with HMR
- Same API as Jest (easy migration)
Why Playwright (not Cypress)
- Multi-browser: Chromium, Firefox, WebKit (Safari)
- Auto-wait (no flaky tests from race conditions)
- Parallel execution (workers: 4)
- Video and trace on failure
4. Test Scope by Layer
4.1 Unit Tests (Vitest)
| Attribute | Value |
|---|---|
| Scope | Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting |
| External dependencies | Mocked — no real DB, network, or filesystem |
| Coverage target | > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall |
| Execution time | < 3 minutes |
| Runs on | Every commit, pre-commit hook (lint + type-check only), CI on every push |
| Written by | Developer who writes the feature |
What to unit test:
calculateVAT(amount, rate, country)— Serbia 20%, BiH 17%, Croatia 25%validateDoubleEntry(debit, credit)— must be equal, error on imbalanceconvertCurrency(amount, fromCurrency, toCurrency, exchangeRate)— NUMERIC(19,4)calculateInvoiceTotal(items)— subtotal, tax, discount, totallockExchangeRate(date, fromCurrency, toCurrency)— historical rate, not today's
What NOT to unit test:
- Framework internals (Ktor, Next.js, Exposed, JDBC)
- Simple property getters/setters with no logic
- Full browser journeys that belong in Playwright E2E or demo smoke
4.2 Integration + Contract Tests
| Attribute | Value |
|---|---|
| Scope | Ktor API routes, service boundaries, PostgreSQL behavior, contracts |
| External dependencies | Real PostgreSQL via local DB or Testcontainers where needed |
| Coverage target | All service boundaries; > 80% of integration paths |
| Execution time | < 10 minutes for blocking gate |
| Runs on | Every PR / deploy gate, blocking merge where configured |
| Written by | Developer who writes the API endpoint or client contract |
What to integration test:
- Auth flow (register, login, refresh, logout)
- Invoice CRUD + status transitions (draft → sent → paid)
- Expense CRUD + approval flow
- Reports API (P&L, VAT, balance sheet)
- Organization scoping — org A cannot read org B's data (P0 security test)
- RBAC enforcement — viewer cannot create, owner can delete
4.3 E2E Tests (Playwright)
| Attribute | Value |
|---|---|
| Scope | Critical user journeys through deployed/staging application |
| External dependencies | Real staging services; production/demo only for non-destructive smoke |
| Coverage target | Critical journeys + smoke, not exhaustive UI coverage |
| Execution time | < 8 minutes for critical gate; < 1 minute for real-demo smoke |
| Runs on | Post-staging deploy, pre-production gate, post-deploy smoke |
| Written by | Developer + QA collaboration |
Critical journeys:
- Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior
- Invoice Flow: Create draft in resettable staging/demo tenant → Preview → Send/Mark Paid where safe
- Expense Flow: Add → Upload Receipt → Approve/Reject/Pay where safe
- Report Flow: Generate P&L/VAT report → Export PDF/XLSX where safe
- Settings Flow: Organization/settings/users page loads and key controls are visible
Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.
5. Test Data Management
| Approach | Used For | Tool | Cleanup |
|---|---|---|---|
| Test factories | Unit + integration | apps/api/src/test/factories/ |
Per-test (beforeEach teardown) |
| Database seeding | E2E/full-demo rehearsal | Flyway fixtures / seed scripts / API factories | Per resettable environment run |
| PostgreSQL transactions | Integration tests | Test transaction or teardown helpers | Per test |
Isolation rule: integration tests must create isolated organization/user fixtures and clean up deterministically. Cross-test dependence is forbidden.
Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.
6. Coverage Requirements
| Layer | Lines | Branches | Functions | Enforcement |
|---|---|---|---|---|
| Financial logic (VAT, double-entry, currency) | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| Authentication utils | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| API handlers | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
| Utilities | ≥ 90% | ≥ 85% | ≥ 90% | CI hard fail |
| Overall minimum | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.
7. Quality Gates
PR Merge Gate
- All unit tests pass
- All integration tests pass
- Coverage ≥ minimum thresholds
- Linting passes (ESLint + Prettier)
- Type checking passes (TypeScript strict)
- No new HIGH/CRITICAL security findings
Staging Deploy Gate
- All PR gates passed
- Build artifact created successfully
Production Deploy Gate
- Critical E2E gate passes on staging/resettable environment
- Real-demo smoke passes after deploy with screenshot/video evidence
- Performance baseline not degraded > 20% for relevant changes
- Manual approval in CI pipeline
8. Responsibility Matrix
| Test Type | Writes | Reviews | Maintains | Signs Off |
|---|---|---|---|---|
| Unit tests | Developer | PR reviewer | Developer | Tech Lead |
| Integration tests | Developer | QA / Tech Lead | Developer | Tech Lead |
| E2E tests | Developer | Tech Lead | Developer | Tech Lead |
| Performance tests | DevOps | Tech Lead | DevOps | Alem Bašić |
9. Test Reporting & Metrics
| Metric | Target |
|---|---|
| Test pass rate | ≥ 99% unit, ≥ 95% E2E |
| Flaky test rate | < 2% |
| Full suite execution time | < 10 min |
| Coverage trend | Stable or improving per sprint |
| Financial logic coverage | ≥ 95% at all times |
10. Continuous Testing in CI/CD
| Stage | Tests Run | Blocking |
|---|---|---|
| Pre-commit (local) | lint + type-check only | Recommended (Husky) |
| PR open/update | unit + integration + lint + type-check | Yes — blocks merge |
| Staging deploy | Critical E2E (Playwright, Chromium primary; other browsers scheduled/risk-based) | Yes — blocks production |
| Production deploy | Real-demo smoke (npm run test:real-demo-smoke) with evidence |
Yes — rollback/escalate on failure |
| Nightly / scheduled | Full E2E regression + destructive/resettable tests + performance | No — alerts/issues, not automatic deploy blocker |
Related Documents
- Test Plan
- E2E Test Plan
- Performance Test Plan
- Definition of Done
- CI/CD Pipeline
- TESTING-GUIDE.md
- TEST-INVENTORY.md
- Demo Testing Plan
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Ops Architect | 2026-02-23 | |
| Reviewer | Tech Lead | ||
| Approver | Alem Bašić |