Test Strategy
Test Strategy
Project: Bilko Version: 1.
10 Date: 2026-05-2102-25 Author: Ops Architect/ ALAI Documentation TeamStatus:ActiveFinal Reviewers: Tech Lead, Alem Bašić
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-02-23 | Ops Architect | Initial draft |
| 1.0 | 2026-02-25 | ALAI Documentation Team | Finalized — approved for production use |
1. Testing Philosophy & Principles
Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.
Core Principles:
- Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
- Tests are first-class code — reviewed, maintained, and refactored alongside production code
- Test the behavior, not the implementation — tests enable safe refactoring of internals
- Fast feedback — unit tests run in < 3 min; full suite < 10 min
- No test = no ship — financial logic without a test is a P0 blocker for merging
- Isolation — every test cleans up after itself; no test depends on another
Testing philosophy: Bilko follows anthe industry-standardtest layeredpyramid strategy:— focusedheavy unit test coverage forof financial calculations and business logic, strongtargeted integration/contractintegration coveragetests for Ktor API + PostgreSQL behavior,database, and aE2E small Playwright layertests for the 4 critical user journeys.journeys This(invoice, isexpense, compatiblereport, with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser.auth). We do not aim for 100% E2E coverage.
2. Layered Test StrategyPyramid
/\
/E2E\ ← 10% — 12 tests — Playwright
E2E/------\
/ DemoInteg Smoke\ Critical← browser30% journeys— +35 deployedtests health— evidenceSupertest
Integration/----------\
+ Contract Tests (dominant)
Ktor routes, services, PostgreSQL/Testcontainers, auth,
RBAC, multi-tenant isolation, frontend/backend API contract
Focused/ Unit Tests\ Financial← engine,60% VAT,— currency,45 validators,tests pure— logicVitest
/--------------\
TargetDistribution distribution:(92 total planned — see TEST-INVENTORY.md):
- 60% Unit
testsTests (45) —financialFinanciallogiclogic,andutilities,pure business rules, especiallypackages/coreand accounting/tax codeauth Integration/contract30%testsIntegration Tests (35) — APIroutes,endpoints,DBdatabase,behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibilityorg-scopingCritical Playwright10% E2E Tests (12) —a small, maintained set for invoice,Invoice, expense, report,auth, and settingsauth flowsReal-demo smoke — non-destructive deployed health check againsthttps://bilko-demo.alai.noFull-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate
3. Testing Tools
| Type | Tool | Version | Purpose | Config |
|---|---|---|---|---|
| Unit testing | Vitest |
Latest | Business logic, |
vitest.config.ts |
| Mocking | — | Mock external deps |
Built-in |
|
| Integration testing | Latest | API endpoint testing with |
apps/api/ |
|
| Test database | PostgreSQL 15 | Real database |
|
|
| E2E testing | Playwright | Latest | Browser automation, |
apps/e2e/playwright.config.ts |
| Coverage | — | Coverage reports |
vitest.config.ts |
|
| Performance | k6 | Latest | Load testing (PLANNED Phase 2) | apps/e2e/load/ |
Why Vitest (not Jest)
- ESM native, Vite-based → faster
- Compatible with Turborepo
- Watch mode with HMR
- Same API as Jest (easy migration)
Why Playwright (not Cypress)
- Multi-browser: Chromium, Firefox, WebKit (Safari)
- Auto-wait (no flaky tests from race conditions)
- Parallel execution (workers: 4)
- Video and trace on failure
4. Test Scope by Layer
4.1 Unit Tests (Vitest)
| Attribute | Value |
|---|---|
| Scope | Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting |
| External dependencies | Mocked — no real DB, network, or filesystem |
| Coverage target | > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall |
| Execution time | < 3 minutes |
| Runs on | Every commit, pre-commit hook (lint + type-check only), CI on every push |
| Written by | Developer who writes the feature |
What to unit test:
calculateVAT(amount, rate, country)— Serbia 20%, BiH 17%, Croatia 25%validateDoubleEntry(debit, credit)— must be equal, error on imbalanceconvertCurrency(amount, fromCurrency, toCurrency, exchangeRate)— NUMERIC(19,4)calculateInvoiceTotal(items)— subtotal, tax, discount, totallockExchangeRate(date, fromCurrency, toCurrency)— historical rate, not today's
What NOT to unit test:
FrameworkPrisma ORM internals- Express
Next.js,frameworkExposed, JDBC)boilerplate - Simple property getters/setters with no logic
Full browser journeys that belong in Playwright E2E or demo smoke
4.2 Integration +Tests Contract Tests(Supertest)
| Attribute | Value |
|---|---|
| Scope | |
| External dependencies | Real PostgreSQL bilko_test DB |
| Coverage target | All service boundaries; > 80% of integration paths |
| Execution time | < |
| Runs on | Every |
| Written by | Developer who writes the API endpoint |
What to integration test:
- Auth flow (register, login, refresh, logout)
- Invoice CRUD + status transitions (draft → sent → paid)
- Expense CRUD + approval flow
- Reports API (P&L, VAT, balance sheet)
- Organization scoping — org A cannot read org B's data (P0 security test)
- RBAC enforcement — viewer cannot create, owner can delete
4.3 E2E Tests (Playwright)
| Attribute | Value |
|---|---|
| Scope | |
| External dependencies | Real (staging |
| Coverage target | |
| Execution time | < 8 minutes |
| Runs on | Post-staging deploy, pre-production |
| Written by | Developer + QA collaboration |
Critical journeys:
Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior- Invoice Flow: Create
draft(6-stepin resettable staging/demo tenantwizard) → Preview →Send/Send → Mark Paidwhere safe - Expense Flow: Add → Upload Receipt →
Approve/Reject/Approve → Paywhere safe - Report Flow: Generate P&
L/VAT reportL → ExportPDF/XLSX where safePDF SettingsAuth Flow:Organization/settings/usersRegisterpage→loadsLoginand→key2FAcontrols→are visibleLogout
Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.
5. Test Data Management
| Approach | Used For | Tool | Cleanup |
|---|---|---|---|
| Test factories | Unit + integration | apps/api/src/test/factories/ |
Per-test (beforeEach teardown) |
| Database seeding | packages/database/prisma/seed.ts |
Per |
|
| PostgreSQL transactions | Integration tests | $transaction |
Per test |
Isolation rule: beforeEach in integration tests mustclears createall isolatedtables organization/uservia fixturesPrisma anddeleteMany() clean up deterministically. Cross-test dependence is forbidden.cascade.
Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.
6. Coverage Requirements
| Layer | Lines | Branches | Functions | Enforcement |
|---|---|---|---|---|
| Financial logic (VAT, double-entry, currency) | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| Authentication utils | ≥ 95% | ≥ 90% | ≥ 100% | CI hard fail |
| API handlers | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
| Utilities | ≥ 90% | ≥ 85% | ≥ 90% | CI hard fail |
| Overall minimum | ≥ 80% | ≥ 75% | ≥ 80% | CI hard fail |
Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.
7. Quality Gates
PR Merge Gate
- All unit tests pass
- All integration tests pass
- Coverage ≥ minimum thresholds
- Linting passes (ESLint + Prettier)
- Type checking passes (TypeScript strict)
- No new HIGH/CRITICAL security findings
Staging Deploy Gate
- All PR gates passed
- Build artifact created successfully
Production Deploy Gate
-
CriticalAll E2Egatetestspassespass onstaging/resettable environment Real-demo smoke passes after deploy with screenshot/video evidencestaging- Performance baseline not degraded > 20%
for relevant changes - Manual approval in CI pipeline
8. Responsibility Matrix
| Test Type | Writes | Reviews | Maintains | Signs Off |
|---|---|---|---|---|
| Unit tests | Developer | PR reviewer | Developer | Tech Lead |
| Integration tests | Developer | QA / Tech Lead | Developer | Tech Lead |
| E2E tests | Developer | Tech Lead | Developer | Tech Lead |
| Performance tests | DevOps | Tech Lead | DevOps | Alem Bašić |
9. Test Reporting & Metrics
| Metric | Target |
|---|---|
| Test pass rate | ≥ 99% unit, ≥ 95% E2E |
| Flaky test rate | < 2% |
| Full suite execution time | < 10 min |
| Coverage trend | Stable or improving per sprint |
| Financial logic coverage | ≥ 95% at all times |
10. Continuous Testing in CI/CD
| Stage | Tests Run | Blocking |
|---|---|---|
| Pre-commit (local) | lint + type-check only | Recommended (Husky) |
| PR open/update | unit + integration + lint + type-check | Yes — blocks merge |
| Staging deploy | Yes — blocks production | |
| Production deploy | |
Yes — |
| Nightly |
Full E2E |
No — |
Related Documents
- Test Plan
- E2E Test Plan
- Performance Test Plan
- Definition of Done
- CI/CD Pipeline
- TESTING-GUIDE.md
- TEST-INVENTORY.md
Demo Testing Plan
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | Ops Architect | 2026-02-23 | |
| Reviewer | Tech Lead | ||
| Approver | Alem Bašić |