Skip to main content

Test Strategy

Test Strategy

Project: Bilko Version: 1.01 Date: 2026-02-2505-21 Author: Ops Architect / ALAI Documentation Team Status: FinalActive Reviewers: Tech Lead, Alem Bašić

Document History

Version Date Author Changes
0.1 2026-02-23 Ops Architect Initial draft
1.0 2026-02-25 ALAI Documentation Team Finalized — approved for production use
1.12026-05-21ALAI Documentation TeamClarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy

1. Testing Philosophy & Principles

Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.

Core Principles:

  1. Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
  2. Tests are first-class code — reviewed, maintained, and refactored alongside production code
  3. Test the behavior, not the implementation — tests enable safe refactoring of internals
  4. Fast feedback — unit tests run in < 3 min; full suite < 10 min
  5. No test = no ship — financial logic without a test is a P0 blocker for merging
  6. Isolation — every test cleans up after itself; no test depends on another

Testing philosophy: Bilko follows thean testindustry-standard pyramidlayered strategy: heavyfocused unit test coverage offor financial calculations and business logic, targetedstrong integrationintegration/contract testscoverage for Ktor API + database,PostgreSQL behavior, and E2Ea testssmall Playwright layer for the 4 critical user journeysjourneys. (invoice,This expense,is report,compatible auth).with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser. We do not aim for 100% E2E coverage.


2. Layered Test PyramidStrategy

                 /\
        /E2E\        ← 10% — 12 tests — Playwright /------\E2E / IntegDemo \Smoke
        Critical 30%browser journeys 35+ testsdeployed health Supertestevidence

              /----------\Integration /+ Contract Tests (dominant)
       Ktor routes, services, PostgreSQL/Testcontainers, auth,
       RBAC, multi-tenant isolation, frontend/backend API contract

                    Focused Unit \Tests
       Financial 60%engine, VAT, 45currency, testsvalidators, pure Vitest
   /--------------\logic

DistributionTarget (92 total planned — see TEST-INVENTORY.md):distribution:

  • 60% Unit Tests (45)testsFinancialfinancial logic,logic utilities,and authpure business rules, especially packages/core and accounting/tax code
  • 30%Integration/contract Integration Tests (35)tests — API endpoints,routes, database,DB org-scopingbehavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibility
  • 10%Critical Playwright E2E Tests (12) Invoice,a small, maintained set for invoice, expense, report, authauth, and settings flows
  • Real-demo smoke — non-destructive deployed health check against https://bilko-demo.alai.no
  • Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate

3. Testing Tools

Type Tool Version Purpose Config
Unit testing Vitest + Kotlin/JUnit Latest Business logic, utilitiesutilities, services vitest.config.tspackage configs / Gradle
Mocking VitestVitest, built-inMockK Mock external deps (nowhere real DB)appropriate Built-in / Gradle
Integration testing SupertestKtor test host + JUnit Latest API endpoint testing with real PGPostgreSQL apps/api/src/test/setup.tsbuild.gradle.kts
Test database PostgreSQL 15 1515/16 Real database forvia integrationlocal testsDB/Testcontainers Gradle .env.testintegrationTest
E2E testing Playwright Latest Browser automation, critical user flows apps/e2e/playwright.config.ts
Coverage c8Kover (+ Vitest built-in) Coverage reports and ratchets vitest.config.tsGradle Kover / package configs
Performance k6 Latest Load testing (PLANNED Phase 2) apps/e2e/load/

Why Vitest (not Jest)

  • ESM native, Vite-based → faster
  • Compatible with Turborepo
  • Watch mode with HMR
  • Same API as Jest (easy migration)

Why Playwright (not Cypress)

  • Multi-browser: Chromium, Firefox, WebKit (Safari)
  • Auto-wait (no flaky tests from race conditions)
  • Parallel execution (workers: 4)
  • Video and trace on failure

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

Attribute Value
Scope Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting
External dependencies Mocked — no real DB, network, or filesystem
Coverage target > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall
Execution time < 3 minutes
Runs on Every commit, pre-commit hook (lint + type-check only), CI on every push
Written by Developer who writes the feature

What to unit test:

  • calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25%
  • validateDoubleEntry(debit, credit) — must be equal, error on imbalance
  • convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4)
  • calculateInvoiceTotal(items) — subtotal, tax, discount, total
  • lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's

What NOT to unit test:

  • Prisma ORMFramework internals
  • Express(Ktor, frameworkNext.js, boilerplateExposed, JDBC)
  • Simple property getters/setters with no logic
  • Full browser journeys that belong in Playwright E2E or demo smoke

4.2 Integration + Contract Tests (Supertest)

Attribute Value
Scope AllKtor API routesroutes, withservice realboundaries, PostgreSQL 15behavior, databasecontracts
External dependencies Real PostgreSQL (testvia container in CI, bilko_testlocal DB local)or Testcontainers where needed
Coverage target All service boundaries; > 80% of integration paths
Execution time < 510 minutes for blocking gate
Runs on Every PR,PR / deploy gate, blocking merge where configured
Written by Developer who writes the API endpoint or client contract

What to integration test:

  • Auth flow (register, login, refresh, logout)
  • Invoice CRUD + status transitions (draft → sent → paid)
  • Expense CRUD + approval flow
  • Reports API (P&L, VAT, balance sheet)
  • Organization scoping — org A cannot read org B's data (P0 security test)
  • RBAC enforcement — viewer cannot create, owner can delete

4.3 E2E Tests (Playwright)

Attribute Value
Scope 4 criticalCritical user journeys through deployeddeployed/staging application
External dependencies Real (staging environmentservices; orproduction/demo production)only for non-destructive smoke
Coverage target 4 criticalCritical journeys + 8smoke, sub-scenariosnot exhaustive UI coverage
Execution time < 8 minutes for critical gate; < 1 minute for real-demo smoke
Runs on Post-staging deploy, pre-production gategate, post-deploy smoke
Written by Developer + QA collaboration

Critical journeys:

  1. Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior
  2. Invoice Flow: Create (6-stepdraft wizard)in resettable staging/demo tenant → Preview → Send → Send/Mark Paid where safe
  3. Expense Flow: Add → Upload Receipt → ApproveApprove/Reject/Pay where Paysafe
  4. Report Flow: Generate P&LL/VAT report → Export PDFPDF/XLSX where safe
  5. AuthSettings Flow: RegisterOrganization/settings/users page Loginloads and 2FAkey controls Logoutare visible

Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.


5. Test Data Management

Approach Used For Tool Cleanup
Test factories Unit + integration apps/api/src/test/factories/ Per-test (beforeEach teardown)
Database seeding E2EE2E/full-demo testsrehearsal packages/database/prisma/seed.tsFlyway fixtures / seed scripts / API factories Per E2Eresettable environment run
PostgreSQL transactions Integration tests PrismaTest $transaction rollbackor teardown helpers Per test

Isolation rule: beforeEach in integration tests clearsmust allcreate tablesisolated viaorganization/user Prismafixtures deleteMany()and cascade.clean up deterministically. Cross-test dependence is forbidden.

Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.


6. Coverage Requirements

Layer Lines Branches Functions Enforcement
Financial logic (VAT, double-entry, currency) ≥ 95% ≥ 90% ≥ 100% CI hard fail
Authentication utils ≥ 95% ≥ 90% ≥ 100% CI hard fail
API handlers ≥ 80% ≥ 75% ≥ 80% CI hard fail
Utilities ≥ 90% ≥ 85% ≥ 90% CI hard fail
Overall minimum ≥ 80% ≥ 75% ≥ 80% CI hard fail

Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.


7. Quality Gates

PR Merge Gate

  • All unit tests pass
  • All integration tests pass
  • Coverage ≥ minimum thresholds
  • Linting passes (ESLint + Prettier)
  • Type checking passes (TypeScript strict)
  • No new HIGH/CRITICAL security findings

Staging Deploy Gate

  • All PR gates passed
  • Build artifact created successfully

Production Deploy Gate

  • AllCritical E2E testsgate passpasses on stagingstaging/resettable environment
  •  Real-demo smoke passes after deploy with screenshot/video evidence
  • Performance baseline not degraded > 20% for relevant changes
  • Manual approval in CI pipeline

8. Responsibility Matrix

Test Type Writes Reviews Maintains Signs Off
Unit tests Developer PR reviewer Developer Tech Lead
Integration tests Developer QA / Tech Lead Developer Tech Lead
E2E tests Developer Tech Lead Developer Tech Lead
Performance tests DevOps Tech Lead DevOps Alem Bašić

9. Test Reporting & Metrics

Metric Target
Test pass rate ≥ 99% unit, ≥ 95% E2E
Flaky test rate < 2%
Full suite execution time < 10 min
Coverage trend Stable or improving per sprint
Financial logic coverage ≥ 95% at all times

10. Continuous Testing in CI/CD

Stage Tests Run Blocking
Pre-commit (local) lint + type-check only Recommended (Husky)
PR open/update unit + integration + lint + type-check Yes — blocks merge
Staging deploy Critical E2E (Playwright, 3Chromium browsers)primary; other browsers scheduled/risk-based) Yes — blocks production
Production deploy SmokeReal-demo testssmoke (npm run test:real-demo-smoke) with evidence Yes — auto-rollbackrollback/escalate on failure
Nightly (PLANNED)/ scheduled Full E2E suiteregression + destructive/resettable tests + performance No — alertsalerts/issues, onlynot automatic deploy blocker


Approval

Role Name Date Signature
Author Ops Architect 2026-02-23
Reviewer Tech Lead
Approver Alem Bašić