Test Strategy Test Strategy Project: Bilko Version: 1.1 Date: 2026-05-21 Author: Ops Architect / ALAI Documentation Team Status: Active Reviewers: Tech Lead, Alem Bašić Document History Version Date Author Changes 0.1 2026-02-23 Ops Architect Initial draft 1.0 2026-02-25 ALAI Documentation Team Finalized — approved for production use 1.1 2026-05-21 ALAI Documentation Team Clarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy 1. Testing Philosophy & Principles Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings. Core Principles: Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships Tests are first-class code — reviewed, maintained, and refactored alongside production code Test the behavior, not the implementation — tests enable safe refactoring of internals Fast feedback — unit tests run in < 3 min; full suite < 10 min No test = no ship — financial logic without a test is a P0 blocker for merging Isolation — every test cleans up after itself; no test depends on another Testing philosophy: Bilko follows an industry-standard layered strategy: focused unit coverage for financial calculations and business logic, strong integration/contract coverage for Ktor API + PostgreSQL behavior, and a small Playwright layer for critical user journeys. This is compatible with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage. 2. Layered Test Strategy Playwright E2E / Demo Smoke Critical browser journeys + deployed health evidence Integration + Contract Tests (dominant) Ktor routes, services, PostgreSQL/Testcontainers, auth, RBAC, multi-tenant isolation, frontend/backend API contract Focused Unit Tests Financial engine, VAT, currency, validators, pure logic Target distribution: Unit tests — financial logic and pure business rules, especially packages/core and accounting/tax code Integration/contract tests — API routes, DB behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility Critical Playwright E2E — a small, maintained set for invoice, expense, report, auth, and settings flows Real-demo smoke — non-destructive deployed health check against https://bilko-demo.alai.no Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate 3. Testing Tools Type Tool Version Purpose Config Unit testing Vitest + Kotlin/JUnit Latest Business logic, utilities, services package configs / Gradle Mocking Vitest, MockK — Mock external deps where appropriate Built-in / Gradle Integration testing Ktor test host + JUnit Latest API endpoint testing with PostgreSQL apps/api/build.gradle.kts Test database PostgreSQL 15/16 Real database via local DB/Testcontainers Gradle integrationTest E2E testing Playwright Latest Browser automation, critical user flows apps/e2e/playwright.config.ts Coverage Kover + Vitest — Coverage reports and ratchets Gradle Kover / package configs Performance k6 Latest Load testing (PLANNED Phase 2) apps/e2e/load/ Why Vitest (not Jest) ESM native, Vite-based → faster Compatible with Turborepo Watch mode with HMR Same API as Jest (easy migration) Why Playwright (not Cypress) Multi-browser: Chromium, Firefox, WebKit (Safari) Auto-wait (no flaky tests from race conditions) Parallel execution (workers: 4) Video and trace on failure 4. Test Scope by Layer 4.1 Unit Tests (Vitest) Attribute Value Scope Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting External dependencies Mocked — no real DB, network, or filesystem Coverage target > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall Execution time < 3 minutes Runs on Every commit, pre-commit hook (lint + type-check only), CI on every push Written by Developer who writes the feature What to unit test: calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25% validateDoubleEntry(debit, credit) — must be equal, error on imbalance convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4) calculateInvoiceTotal(items) — subtotal, tax, discount, total lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's What NOT to unit test: Framework internals (Ktor, Next.js, Exposed, JDBC) Simple property getters/setters with no logic Full browser journeys that belong in Playwright E2E or demo smoke 4.2 Integration + Contract Tests Attribute Value Scope Ktor API routes, service boundaries, PostgreSQL behavior, contracts External dependencies Real PostgreSQL via local DB or Testcontainers where needed Coverage target All service boundaries; > 80% of integration paths Execution time < 10 minutes for blocking gate Runs on Every PR / deploy gate, blocking merge where configured Written by Developer who writes the API endpoint or client contract What to integration test: Auth flow (register, login, refresh, logout) Invoice CRUD + status transitions (draft → sent → paid) Expense CRUD + approval flow Reports API (P&L, VAT, balance sheet) Organization scoping — org A cannot read org B's data (P0 security test) RBAC enforcement — viewer cannot create, owner can delete 4.3 E2E Tests (Playwright) Attribute Value Scope Critical user journeys through deployed/staging application External dependencies Real staging services; production/demo only for non-destructive smoke Coverage target Critical journeys + smoke, not exhaustive UI coverage Execution time < 8 minutes for critical gate; < 1 minute for real-demo smoke Runs on Post-staging deploy, pre-production gate, post-deploy smoke Written by Developer + QA collaboration Critical journeys: Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior Invoice Flow: Create draft in resettable staging/demo tenant → Preview → Send/Mark Paid where safe Expense Flow: Add → Upload Receipt → Approve/Reject/Pay where safe Report Flow: Generate P&L/VAT report → Export PDF/XLSX where safe Settings Flow: Organization/settings/users page loads and key controls are visible Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate. 5. Test Data Management Approach Used For Tool Cleanup Test factories Unit + integration apps/api/src/test/factories/ Per-test (beforeEach teardown) Database seeding E2E/full-demo rehearsal Flyway fixtures / seed scripts / API factories Per resettable environment run PostgreSQL transactions Integration tests Test transaction or teardown helpers Per test Isolation rule: integration tests must create isolated organization/user fixtures and clean up deterministically. Cross-test dependence is forbidden. Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination. 6. Coverage Requirements Layer Lines Branches Functions Enforcement Financial logic (VAT, double-entry, currency) ≥ 95% ≥ 90% ≥ 100% CI hard fail Authentication utils ≥ 95% ≥ 90% ≥ 100% CI hard fail API handlers ≥ 80% ≥ 75% ≥ 80% CI hard fail Utilities ≥ 90% ≥ 85% ≥ 90% CI hard fail Overall minimum ≥ 80% ≥ 75% ≥ 80% CI hard fail Coverage enforcement: Vitest coverage thresholds in vitest.config.ts . CI pipeline fails if below threshold. 7. Quality Gates PR Merge Gate All unit tests pass All integration tests pass Coverage ≥ minimum thresholds Linting passes (ESLint + Prettier) Type checking passes (TypeScript strict) No new HIGH/CRITICAL security findings Staging Deploy Gate All PR gates passed Build artifact created successfully Production Deploy Gate Critical E2E gate passes on staging/resettable environment Real-demo smoke passes after deploy with screenshot/video evidence Performance baseline not degraded > 20% for relevant changes Manual approval in CI pipeline 8. Responsibility Matrix Test Type Writes Reviews Maintains Signs Off Unit tests Developer PR reviewer Developer Tech Lead Integration tests Developer QA / Tech Lead Developer Tech Lead E2E tests Developer Tech Lead Developer Tech Lead Performance tests DevOps Tech Lead DevOps Alem Bašić 9. Test Reporting & Metrics Metric Target Test pass rate ≥ 99% unit, ≥ 95% E2E Flaky test rate < 2% Full suite execution time < 10 min Coverage trend Stable or improving per sprint Financial logic coverage ≥ 95% at all times 10. Continuous Testing in CI/CD Stage Tests Run Blocking Pre-commit (local) lint + type-check only Recommended (Husky) PR open/update unit + integration + lint + type-check Yes — blocks merge Staging deploy Critical E2E (Playwright, Chromium primary; other browsers scheduled/risk-based) Yes — blocks production Production deploy Real-demo smoke ( npm run test:real-demo-smoke ) with evidence Yes — rollback/escalate on failure Nightly / scheduled Full E2E regression + destructive/resettable tests + performance No — alerts/issues, not automatic deploy blocker Related Documents Test Plan E2E Test Plan Performance Test Plan Definition of Done CI/CD Pipeline TESTING-GUIDE.md TEST-INVENTORY.md Demo Testing Plan Approval Role Name Date Signature Author Ops Architect 2026-02-23 Reviewer Tech Lead Approver Alem Bašić