# /plan-build-test **Source:** `~/.claude/skills/plan-build-test/SKILL.md` --- --- name: plan-build-test version: "2.0" level: 3 trigger: "plan-build-test, full-cycle test, playwright, E2E testing, run tests" author: john updated: 2026-03-16 description: Orchestration skill for Plan→Build→Test development cycles. Runs Playwright CLI tests (NOT MCP) against local or remote web apps. Supports E2E testing, visual regression, and mobile viewport testing. --- # Plan-Build-Test Orchestration Skill Automates the full development cycle: implement changes → build → test → fix → re-test. **CRITICAL: Playwright CLI ONLY** — NEVER use MCP playwright tools. All testing via `npx playwright test` or `./scripts/test-runner.sh`. ## Modes ### Mode 1: Full Cycle (`\plan-build-test:full-cycle`) **Purpose:** Implement feature/fix → build → test → fix failures → visual regression **Agent workflow:** 1. **Read requirements** - Read task description and acceptance criteria - Identify files to change and expected test coverage 2. **Spawn builder subagent** - Use Task tool to spawn builder agent with clear file ownership - Wait for builder to complete implementation - Verify builder marked task as done 3. **Build verification** - Run build command: `npx next build` (or relevant for project) - Parse output for errors - If build fails → analyze errors → spawn builder to fix → re-build - Max 3 build iterations before escalating 4. **Start dev server (if testing locally)** - If TEST_BASE_URL not set, start dev server: `npx next dev &` - Wait for server ready (check http://localhost:3000) - If testing remote URL, skip this step 5. **Run E2E tests** - Execute: `./scripts/test-runner.sh [--project ] [--grep ]` - Parse JSON results from `/tmp/playwright-results.json` - Capture: - Total tests, passed, failed, skipped - Failure details (test title, error message) - Screenshot paths from `/tmp/playwright-screenshots/` 6. **Fix failures (if needed)** - If tests fail: - Analyze failure details and screenshots - Identify root cause - Spawn builder to fix issues - Re-run tests - Max 3 fix iterations before escalating 7. **Visual regression (optional)** - If changes affect UI: - Run: `./scripts/visual-regression.sh` - Compare against baseline - Report diff percentages - Show paths to diff images - If no baseline exists: - Capture baseline: `./scripts/visual-regression.sh --baseline` - Skip comparison (first run) 8. **Report summary** - Build status: pass/fail - Test results: X/Y passed - Failure details (if any) with screenshot references - Visual regression status (if run) - Next steps or completion confirmation **Variables:** - `{{TASK_DESCRIPTION}}` — What to implement - `{{PROJECT_DIR}}` — Project root path - `{{BASE_URL}}` — URL to test (default: http://localhost:3000) - `{{MAX_ITERATIONS}}` — Max fix attempts (default: 3) **Example usage:** ``` \plan-build-test:full-cycle Task: Implement login form validation Project: /Users/makinja/ALAI/products/Drop/src/drop-app Base URL: http://localhost:3000 ``` --- ### Mode 2: Test Only (`\plan-build-test:test-only`) **Purpose:** Run tests against existing deployment (local or remote) without building **Agent workflow:** 1. **Accept parameters** - URL to test (required, default: http://localhost:3000) - Project filter (optional, e.g., "mobile-iphone") - Test grep pattern (optional, e.g., "login") 2. **Run tests** - Execute: `TEST_BASE_URL= ./scripts/test-runner.sh [--project ] [--grep ]` - Parse results from `/tmp/playwright-results.json` 3. **Report results** - Summary: X/Y tests passed - If failures: - Show failure details (test title + error message) - List screenshot paths from `/tmp/playwright-screenshots/` - Exit code: 0 = all pass, 1 = failures **Variables:** - `{{BASE_URL}}` — URL to test - `{{PROJECT}}` — Project filter (optional) - `{{GREP_PATTERN}}` — Test name filter (optional) **Example usage:** ``` \plan-build-test:test-only URL: https://staging.getdrop.no Project: mobile-iphone Pattern: login ``` **Mobile testing:** - iPhone viewport: `--project mobile-iphone` - Galaxy viewport: `--project mobile-galaxy` - iPad viewport: `--project tablet-ipad` --- ### Mode 3: Visual Check (`\plan-build-test:visual-check`) **Purpose:** Capture screenshots and compare against baseline for visual regression detection **Agent workflow:** 1. **Check baseline status** - Check if baseline exists: `ls tests/visual/baseline/*.png` - If no baseline → capture baseline mode - If baseline exists → comparison mode 2. **Capture baseline (first run)** - Execute: `./scripts/visual-regression.sh --baseline` - Saves screenshots to `tests/visual/baseline/` - Report: "Baseline captured, X screenshots saved" - Skip comparison (nothing to compare against) 3. **Run comparison (subsequent runs)** - Execute: `./scripts/visual-regression.sh [--threshold ]` - Default threshold: 5% (customizable) - Compares current screenshots vs baseline - Generates diff images to `/tmp/visual-diffs/` 4. **Report results** - Per-page diff percentages - Overall status: pass (no diffs > threshold) or fail (diffs detected) - Paths to diff images for review - Recommendation: approve new baseline or fix regressions **Variables:** - `{{PROJECT_DIR}}` — Project root path - `{{THRESHOLD}}` — Max diff percentage allowed (default: 5) **Example usage:** ``` \plan-build-test:visual-check Threshold: 10 ``` **Workflow:** 1. First run: Capture baseline 2. Make UI changes 3. Run visual check → see diffs 4. Review diff images 5. If intentional → update baseline: `./scripts/visual-regression.sh --baseline` 6. If bugs → fix issues → re-run visual check --- ## Key Constraints 1. **Playwright CLI ONLY** - NEVER use MCP playwright tools - All tests via `npx playwright test` or wrapper scripts - No browser automation except through Playwright CLI 2. **URL flexibility** - Support local dev: http://localhost:3000 - Support staging: https://staging.example.com - Support production: https://example.com - Use TEST_BASE_URL env var to override default 3. **Mobile testing** - Use `--project` flag for mobile viewports - Available projects: mobile-iphone, mobile-galaxy, tablet-ipad - See playwright.config.ts for full project list 4. **JSON results parsing** - Always parse `/tmp/playwright-results.json` for structured data - Extract: total, passed, failed, skipped, failures[] - Reference screenshot paths from `/tmp/playwright-screenshots/` 5. **Screenshot evidence** - All failure screenshots saved to `/tmp/playwright-screenshots/` - Visual regression diffs saved to `/tmp/visual-diffs/` - Include paths in reports for manual review 6. **Iterative fixing** - Max 3 iterations for build fixes - Max 3 iterations for test fixes - After max iterations → escalate to human with detailed failure analysis 7. **Build before test** - Full cycle MUST run build before tests - Test-only mode assumes build already done - Visual check mode can run independently (screenshot capture doesn't require build) --- ## File Locations - **Test runner:** `./scripts/test-runner.sh` - **Visual regression:** `./scripts/visual-regression.sh` - **Playwright config:** `playwright.config.ts` - **Test results:** `/tmp/playwright-results.json` - **Screenshots:** `/tmp/playwright-screenshots/` - **Visual diffs:** `/tmp/visual-diffs/` - **Visual baseline:** `tests/visual/baseline/` --- ## Example Outputs ### Full Cycle Success ``` Task #1234 COMPLETE Build: ✓ Passed Tests: ✓ 15/15 passed Visual regression: ✓ No changes detected (all diffs < 5%) Ready for deployment. ``` ### Full Cycle with Failures ``` Task #1234 — Test failures detected Build: ✓ Passed Tests: ✗ 12/15 passed (3 failures) Failures: 1. "login with valid credentials" — Error: Element not found: button[type="submit"] Screenshot: /tmp/playwright-screenshots/login-failure-1.png 2. "dashboard loads after login" — Error: Timeout waiting for selector: h1:has-text("Dashboard") Screenshot: /tmp/playwright-screenshots/dashboard-timeout-2.png Fix iteration 1/3 in progress... ``` ### Test Only (Remote) ``` Testing: https://staging.getdrop.no Project: mobile-iphone Results: ✓ 8/8 passed All tests passed on mobile viewport. ``` ### Visual Check ``` Visual regression results: ✓ login.png — 0.2% diff (PASS) ✓ dashboard.png — 1.8% diff (PASS) ✗ profile.png — 12.5% diff (FAIL — exceeds 5% threshold) Review diff: /tmp/visual-diffs/profile-diff.png Action needed: Review profile page changes or update baseline if intentional. ``` --- ## ⏱ Operational Limits - **MAX TURNS:** 30 (build) | 20 (validate) | 10 (lookup) - Exit cleanly after completing. On 5+ failures: escalate to John with full error context.