Test Strategy
Test Strategy
Project: Bilko Version: 1.
01 Date: 2026-02-2505-21 Author: Ops Architect / ALAI Documentation Team Status:FinalActive Reviewers: Tech Lead, Alem Bašić
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-02-23 | Ops Architect | Initial draft |
| 1.0 | 2026-02-25 | ALAI Documentation Team | Finalized — approved for production use |
| 1.1 | 2026-05-21 | ALAI Documentation Team | Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy |
1. Testing Philosophy & Principles
Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.
Core Principles:
- Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
- Tests are first-class code — reviewed, maintained, and refactored alongside production code
- Test the behavior, not the implementation — tests enable safe refactoring of internals
- Fast feedback — unit tests run in < 3 min; full suite < 10 min
- No test = no ship — financial logic without a test is a P0 blocker for merging
- Isolation — every test cleans up after itself; no test depends on another
Testing philosophy: Bilko follows thean testindustry-standard pyramidlayered —strategy: heavyfocused unit test coverage offor financial calculations and business logic, targetedstrong integrationintegration/contract testscoverage for Ktor API + database,PostgreSQL behavior, and E2Ea testssmall Playwright layer for the 4 critical user journeysjourneys. (invoice,This expense,is report,compatible auth).with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage.
2. Layered Test PyramidStrategy
/\
/E2E\ ← 10% — 12 tests — Playwright /------\E2E / IntegDemo \Smoke
←Critical 30%browser —journeys 35+ testsdeployed —health Supertestevidence
/----------\Integration /+ Contract Tests (dominant)
Ktor routes, services, PostgreSQL/Testcontainers, auth,
RBAC, multi-tenant isolation, frontend/backend API contract
Focused Unit \Tests
←Financial 60%engine, —VAT, 45currency, testsvalidators, —pure Vitest
/--------------\logic
DistributionTarget (92 total planned — see TEST-INVENTORY.md):distribution:
60%UnitTests (45)tests —Financialfinanciallogic,logicutilities,andauthpure business rules, especiallypackages/coreand accounting/tax code30%Integration/contractIntegration Tests (35)tests — APIendpoints,routes,database,DBorg-scopingbehavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility10%Critical Playwright E2ETests (12)—Invoice,a small, maintained set for invoice, expense, report,authauth, and settings flows- Real-demo smoke — non-destructive deployed health check against
https://bilko-demo.alai.no - Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate
3. Testing Tools
| Type | Tool | Version | Purpose | Config |
|---|---|---|---|---|
| Unit testing | Vitest + Kotlin/JUnit | Latest | Business logic, |
package configs / Gradle |
| Mocking | — | Mock external deps |
Built-in / Gradle | |
| Integration testing | Latest | API endpoint testing with |
apps/api/ |
|
| Test database | PostgreSQL |
Real database |
Gradle |
|
| E2E testing | Playwright | Latest | Browser automation, critical user flows | apps/e2e/playwright.config.ts |
| Coverage | — | Coverage reports and ratchets | Gradle Kover / package configs |
|
| Performance | k6 | Latest | Load testing (PLANNED Phase 2) | apps/e2e/load/ |
Why Vitest (not Jest)
- ESM native, Vite-based → faster
- Compatible with Turborepo
- Watch mode with HMR
- Same API as Jest (easy migration)
Why Playwright (not Cypress)
- Multi-browser: Chromium, Firefox, WebKit (Safari)
- Auto-wait (no flaky tests from race conditions)
- Parallel execution (workers: 4)
- Video and trace on failure
4. Test Scope by Layer
4.1 Unit Tests (Vitest)
| Attribute | Value |
|---|---|
| Scope | Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting |
| External dependencies | Mocked — no real DB, network, or filesystem |
| Coverage target | > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall |
| Execution time | < 3 minutes |
| Runs on | Every commit, pre-commit hook (lint + type-check only), CI on every push |
| Written by | Developer who writes the feature |
What to unit test:
calculateVAT(amount, rate, country)— Serbia 20%, BiH 17%, Croatia 25%validateDoubleEntry(debit, credit)— must be equal, error on imbalanceconvertCurrency(amount, fromCurrency, toCurrency, exchangeRate)— NUMERIC(19,4)calculateInvoiceTotal(items)— subtotal, tax, discount, totallockExchangeRate(date, fromCurrency, toCurrency)— historical rate, not today's
What NOT to unit test:
Prisma ORMFramework internalsExpress(Ktor,frameworkNext.js,boilerplateExposed, JDBC)- Simple property getters/setters with no logic
- Full browser journeys that belong in Playwright E2E or demo smoke
4.2 Integration + Contract Tests (Supertest)
| Attribute | Value |
|---|---|
| Scope | |
| External dependencies | Real PostgreSQL local DB |
| Coverage target | All service boundaries; > 80% of integration paths |
| Execution time | < |
| Runs on | Every |
| Written by | Developer who writes the API endpoint or client contract |
What to integration test:
- Auth flow (register, login, refresh, logout)
- Invoice CRUD + status transitions (draft → sent → paid)
- Expense CRUD + approval flow
- Reports API (P&L, VAT, balance sheet)
- Organization scoping — org A cannot read org B's data (P0 security test)
- RBAC enforcement — viewer cannot create, owner can delete
4.3 E2E Tests (Playwright)
| Attribute | Value |
|---|---|
| Scope | |
| External dependencies | Real |
| Coverage target | |
| Execution time | < 8 minutes for critical gate; < 1 minute for real-demo smoke |
| Runs on | Post-staging deploy, pre-production |
| Written by | Developer + QA collaboration |
Critical journeys:
- Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior
- Invoice Flow: Create
(6-stepdraftwizard)in resettable staging/demo tenant → Preview →Send →Send/Mark Paid where safe - Expense Flow: Add → Upload Receipt →
ApproveApprove/Reject/Pay→wherePaysafe - Report Flow: Generate P&
LL/VAT report → ExportPDFPDF/XLSX where safe AuthSettings Flow:RegisterOrganization/settings/users→pageLoginloads→and2FAkey→controlsLogoutare visible
Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.
5. Test Data Management
| Approach | Used For | Tool | Cleanup |
|---|---|---|---|
| Test factories | Unit + integration | apps/api/src/test/factories/ |
Per-test (beforeEach teardown) |
| Database seeding | Flyway fixtures / seed scripts / API factories |
Per |
|
| PostgreSQL transactions | Integration tests | |
Per test |
Isolation rule: beforeEach in integration tests clearsmust allcreate tablesisolated viaorganization/user Prismafixtures and deleteMany()cascade.clean up deterministically. Cross-test dependence is forbidden.
Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.
6. Coverage Requirements
| Layer | Lines | Branches | Functions | Enforcement |
|---|---|---|---|---|
| Financial logic (VAT, double-entry, currency) | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| Authentication utils | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| API handlers | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
| Utilities | ≥ 90% | ≥ 85% | ≥ 90% | CI hard fail |
| Overall minimum | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.
7. Quality Gates
PR Merge Gate
- All unit tests pass
- All integration tests pass
- Coverage ≥ minimum thresholds
- Linting passes (ESLint + Prettier)
- Type checking passes (TypeScript strict)
- No new HIGH/CRITICAL security findings
Staging Deploy Gate
- All PR gates passed
- Build artifact created successfully
Production Deploy Gate
-
AllCritical E2Etestsgatepasspasses onstagingstaging/resettable environment - Real-demo smoke passes after deploy with screenshot/video evidence
- Performance baseline not degraded > 20% for relevant changes
- Manual approval in CI pipeline
8. Responsibility Matrix
| Test Type | Writes | Reviews | Maintains | Signs Off |
|---|---|---|---|---|
| Unit tests | Developer | PR reviewer | Developer | Tech Lead |
| Integration tests | Developer | QA / Tech Lead | Developer | Tech Lead |
| E2E tests | Developer | Tech Lead | Developer | Tech Lead |
| Performance tests | DevOps | Tech Lead | DevOps | Alem Bašić |
9. Test Reporting & Metrics
| Metric | Target |
|---|---|
| Test pass rate | ≥ 99% unit, ≥ 95% E2E |
| Flaky test rate | < 2% |
| Full suite execution time | < 10 min |
| Coverage trend | Stable or improving per sprint |
| Financial logic coverage | ≥ 95% at all times |
10. Continuous Testing in CI/CD
| Stage | Tests Run | Blocking |
|---|---|---|
| Pre-commit (local) | lint + type-check only | Recommended (Husky) |
| PR open/update | unit + integration + lint + type-check | Yes — blocks merge |
| Staging deploy | Critical E2E (Playwright, |
Yes — blocks production |
| Production deploy | npm run test:real-demo-smoke) with evidence |
Yes — |
| Nightly |
Full E2E |
No — |
Related Documents
- Test Plan
- E2E Test Plan
- Performance Test Plan
- Definition of Done
- CI/CD Pipeline
- TESTING-GUIDE.md
- TEST-INVENTORY.md
- Demo Testing Plan
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Ops Architect | 2026-02-23 | |
| Reviewer | Tech Lead | ||
| Approver | Alem Bašić |