Skip to main content

Test Strategy

Test Strategy

Project: Bilko Version: 1.10 Date: 2026-05-2102-25 Author: Ops Architect / ALAI Documentation Team Status: ActiveFinal Reviewers: Tech Lead, Alem Bašić

Document History

Version Date Author Changes
0.1 2026-02-23 Ops Architect Initial draft
1.0 2026-02-25 ALAI Documentation Team Finalized — approved for production use
1.12026-05-21ALAI Documentation TeamClarified industry/Spotify-style layered strategy, real-demo smoke gate, and full-demo rehearsal policy

1. Testing Philosophy & Principles

Financial software has a higher correctness bar than typical web apps. A bug in VAT calculation or double-entry bookkeeping is not a UX inconvenience — it's a compliance failure that could expose Bilko users to tax liability or audit findings.

Core Principles:

  1. Financial logic is P0 — VAT calculations, double-entry balance, NUMERIC precision are tested at >95% coverage before any feature ships
  2. Tests are first-class code — reviewed, maintained, and refactored alongside production code
  3. Test the behavior, not the implementation — tests enable safe refactoring of internals
  4. Fast feedback — unit tests run in < 3 min; full suite < 10 min
  5. No test = no ship — financial logic without a test is a P0 blocker for merging
  6. Isolation — every test cleans up after itself; no test depends on another

Testing philosophy: Bilko follows anthe industry-standardtest layeredpyramid strategy: focusedheavy unit test coverage forof financial calculations and business logic, strongtargeted integration/contractintegration coveragetests for Ktor API + PostgreSQL behavior,database, and aE2E small Playwright layertests for the 4 critical user journeys.journeys This(invoice, isexpense, compatiblereport, with Spotify's public "testing honeycomb" direction: most confidence should come from service interaction tests and contracts, not from trying to automate every behavior through a browser.auth). We do not aim for 100% E2E coverage.


2. Layered Test StrategyPyramid

         /\
        /E2E\        ← 10% — 12 tests — Playwright
       E2E/------\
      /  DemoInteg Smoke\     Critical browser30% journeys +35 deployedtests health evidenceSupertest
     Integration/----------\
    + Contract Tests (dominant)
       Ktor routes, services, PostgreSQL/Testcontainers, auth,
       RBAC, multi-tenant isolation, frontend/backend API contract

                    Focused/    Unit    Tests\   Financial engine,60% VAT, currency,45 validators,tests pure logicVitest
   /--------------\

TargetDistribution distribution:(92 total planned — see TEST-INVENTORY.md):

  • 60% Unit testsTests (45)financialFinancial logiclogic, andutilities, pure business rules, especially packages/core and accounting/tax codeauth
  • Integration/contract30% testsIntegration Tests (35) — API routes,endpoints, DBdatabase, behavior, auth/session boundaries, RBAC, org isolation, frontend/backend API compatibilityorg-scoping
  • Critical Playwright10% E2E Tests (12) a small, maintained set for invoice,Invoice, expense, report, auth, and settingsauth flows
  • Real-demo smoke — non-destructive deployed health check against https://bilko-demo.alai.no
  • Full-demo rehearsal — resettable demo tenant/environment used for stakeholder demos; not the deploy gate

3. Testing Tools

Type Tool Version Purpose Config
Unit testing Vitest + Kotlin/JUnit Latest Business logic, utilities, servicesutilities package configs / Gradlevitest.config.ts
Mocking Vitest,Vitest MockKbuilt-in Mock external deps where(no appropriatereal DB) Built-in / Gradle
Integration testing Ktor test host + JUnitSupertest Latest API endpoint testing with PostgreSQLreal PG apps/api/build.gradle.ktssrc/test/setup.ts
Test database PostgreSQL 15 15/1615 Real database viafor localintegration DB/Testcontainerstests Gradle integrationTest.env.test
E2E testing Playwright Latest Browser automation, critical user flows apps/e2e/playwright.config.ts
Coverage Koverc8 +(Vitest Vitestbuilt-in) Coverage reports and ratchets Gradle Kover / package configsvitest.config.ts
Performance k6 Latest Load testing (PLANNED Phase 2) apps/e2e/load/

Why Vitest (not Jest)

  • ESM native, Vite-based → faster
  • Compatible with Turborepo
  • Watch mode with HMR
  • Same API as Jest (easy migration)

Why Playwright (not Cypress)

  • Multi-browser: Chromium, Firefox, WebKit (Safari)
  • Auto-wait (no flaky tests from race conditions)
  • Parallel execution (workers: 4)
  • Video and trace on failure

4. Test Scope by Layer

4.1 Unit Tests (Vitest)

Attribute Value
Scope Pure functions: VAT calculation, double-entry validation, currency conversion, invoice totals, date utils, number formatting
External dependencies Mocked — no real DB, network, or filesystem
Coverage target > 95% for financial logic; > 90% utilities; > 80% services; > 80% overall
Execution time < 3 minutes
Runs on Every commit, pre-commit hook (lint + type-check only), CI on every push
Written by Developer who writes the feature

What to unit test:

  • calculateVAT(amount, rate, country) — Serbia 20%, BiH 17%, Croatia 25%
  • validateDoubleEntry(debit, credit) — must be equal, error on imbalance
  • convertCurrency(amount, fromCurrency, toCurrency, exchangeRate) — NUMERIC(19,4)
  • calculateInvoiceTotal(items) — subtotal, tax, discount, total
  • lockExchangeRate(date, fromCurrency, toCurrency) — historical rate, not today's

What NOT to unit test:

  • FrameworkPrisma ORM internals
  • (Ktor,
  • Express Next.js,framework Exposed, JDBC)boilerplate
  • Simple property getters/setters with no logic
  • Full browser journeys that belong in Playwright E2E or demo smoke

4.2 Integration +Tests Contract Tests(Supertest)

Attribute Value
Scope KtorAll API routes,routes servicewith boundaries,real PostgreSQL behavior,15 contractsdatabase
External dependencies Real PostgreSQL via(test localcontainer in CI, bilko_test DB or Testcontainers where neededlocal)
Coverage target All service boundaries; > 80% of integration paths
Execution time < 105 minutes for blocking gate
Runs on Every PR / deploy gate,PR, blocking merge where configured
Written by Developer who writes the API endpoint or client contract

What to integration test:

  • Auth flow (register, login, refresh, logout)
  • Invoice CRUD + status transitions (draft → sent → paid)
  • Expense CRUD + approval flow
  • Reports API (P&L, VAT, balance sheet)
  • Organization scoping — org A cannot read org B's data (P0 security test)
  • RBAC enforcement — viewer cannot create, owner can delete

4.3 E2E Tests (Playwright)

Attribute Value
Scope Critical4 critical user journeys through deployed/stagingdeployed application
External dependencies Real (staging services;environment production/demoor only for non-destructive smokeproduction)
Coverage target Critical4 critical journeys + smoke,8 not exhaustive UI coveragesub-scenarios
Execution time < 8 minutes for critical gate; < 1 minute for real-demo smoke
Runs on Post-staging deploy, pre-production gate, post-deploy smokegate
Written by Developer + QA collaboration

Critical journeys:

  1. Auth/session Flow: Login → refresh/session validation → logout or session expiry behavior
  2. Invoice Flow: Create draft(6-step in resettable staging/demo tenantwizard) → Preview → Send/Send → Mark Paid where safe
  3. Expense Flow: Add → Upload Receipt → Approve/Reject/Approve → Pay where safe
  4. Report Flow: Generate P&L/VAT reportL → Export PDF/XLSX where safePDF
  5. SettingsAuth Flow: Organization/settings/usersRegister page loadsLogin and key2FA controls are visibleLogout

Rule: public real-demo tests must be non-destructive. Registration, deletion, rate-limit torture, invoice/expense creation, and expected-fail regressions belong in resettable staging/nightly suites, not the live demo smoke gate.


5. Test Data Management

Approach Used For Tool Cleanup
Test factories Unit + integration apps/api/src/test/factories/ Per-test (beforeEach teardown)
Database seeding E2E/full-demoE2E rehearsaltests Flyway fixtures / seed scripts / API factoriespackages/database/prisma/seed.ts Per resettable environmentE2E run
PostgreSQL transactions Integration tests TestPrisma $transaction or teardown helpersrollback Per test

Isolation rule: beforeEach in integration tests mustclears createall isolatedtables organization/uservia fixturesPrisma anddeleteMany() clean up deterministically. Cross-test dependence is forbidden.cascade.

Test org pattern: Each integration test creates a fresh bilko_test organization and user to prevent cross-test contamination.


6. Coverage Requirements

Layer Lines Branches Functions Enforcement
Financial logic (VAT, double-entry, currency) ≥ 95% ≥ 90% ≥ 100% CI hard fail
Authentication utils ≥ 95% ≥ 90% ≥ 100% CI hard fail
API handlers ≥ 80% ≥ 75% ≥ 80% CI hard fail
Utilities ≥ 90% ≥ 85% ≥ 90% CI hard fail
Overall minimum ≥ 80% ≥ 75% ≥ 80% CI hard fail

Coverage enforcement: Vitest coverage thresholds in vitest.config.ts. CI pipeline fails if below threshold.


7. Quality Gates

PR Merge Gate

  • All unit tests pass
  • All integration tests pass
  • Coverage ≥ minimum thresholds
  • Linting passes (ESLint + Prettier)
  • Type checking passes (TypeScript strict)
  • No new HIGH/CRITICAL security findings

Staging Deploy Gate

  • All PR gates passed
  • Build artifact created successfully

Production Deploy Gate

  • CriticalAll E2E gatetests passespass on staging/resettable environment
  •  Real-demo smoke passes after deploy with screenshot/video evidencestaging
  • Performance baseline not degraded > 20% for relevant changes
  • Manual approval in CI pipeline

8. Responsibility Matrix

Test Type Writes Reviews Maintains Signs Off
Unit tests Developer PR reviewer Developer Tech Lead
Integration tests Developer QA / Tech Lead Developer Tech Lead
E2E tests Developer Tech Lead Developer Tech Lead
Performance tests DevOps Tech Lead DevOps Alem Bašić

9. Test Reporting & Metrics

Metric Target
Test pass rate ≥ 99% unit, ≥ 95% E2E
Flaky test rate < 2%
Full suite execution time < 10 min
Coverage trend Stable or improving per sprint
Financial logic coverage ≥ 95% at all times

10. Continuous Testing in CI/CD

Stage Tests Run Blocking
Pre-commit (local) lint + type-check only Recommended (Husky)
PR open/update unit + integration + lint + type-check Yes — blocks merge
Staging deploy Critical E2E (Playwright, Chromium3 primary; other browsers scheduled/risk-based)browsers) Yes — blocks production
Production deploy Real-demoSmoke smoke (npm run test:real-demo-smoke) with evidencetests Yes — rollback/escalateauto-rollback on failure
Nightly / scheduled(PLANNED) Full E2E regression + destructive/resettable testssuite + performance No — alerts/issues,alerts not automatic deploy blockeronly


Approval

Role Name Date Signature
Author Ops Architect 2026-02-23
Reviewer Tech Lead
Approver Alem Bašić