# /plan-build-test

**Source:** `~/.claude/skills/plan-build-test/SKILL.md`
---

---
name: plan-build-test
version: "2.0"
level: 3
trigger: "plan-build-test, full-cycle test, playwright, E2E testing, run tests"
author: john
updated: 2026-03-16
description: Orchestration skill for Plan→Build→Test development cycles. Runs Playwright CLI tests (NOT MCP) against local or remote web apps. Supports E2E testing, visual regression, and mobile viewport testing.
---

# Plan-Build-Test Orchestration Skill

Automates the full development cycle: implement changes → build → test → fix → re-test.

**CRITICAL: Playwright CLI ONLY** — NEVER use MCP playwright tools. All testing via `npx playwright test` or `./scripts/test-runner.sh`.

## Modes

### Mode 1: Full Cycle (`\plan-build-test:full-cycle`)

**Purpose:** Implement feature/fix → build → test → fix failures → visual regression

**Agent workflow:**

1. **Read requirements**
   - Read task description and acceptance criteria
   - Identify files to change and expected test coverage

2. **Spawn builder subagent**
   - Use Task tool to spawn builder agent with clear file ownership
   - Wait for builder to complete implementation
   - Verify builder marked task as done

3. **Build verification**
   - Run build command: `npx next build` (or relevant for project)
   - Parse output for errors
   - If build fails → analyze errors → spawn builder to fix → re-build
   - Max 3 build iterations before escalating

4. **Start dev server (if testing locally)**
   - If TEST_BASE_URL not set, start dev server: `npx next dev &`
   - Wait for server ready (check http://localhost:3000)
   - If testing remote URL, skip this step

5. **Run E2E tests**
   - Execute: `./scripts/test-runner.sh [--project <project>] [--grep <pattern>]`
   - Parse JSON results from `/tmp/playwright-results.json`
   - Capture:
     - Total tests, passed, failed, skipped
     - Failure details (test title, error message)
     - Screenshot paths from `/tmp/playwright-screenshots/`

6. **Fix failures (if needed)**
   - If tests fail:
     - Analyze failure details and screenshots
     - Identify root cause
     - Spawn builder to fix issues
     - Re-run tests
   - Max 3 fix iterations before escalating

7. **Visual regression (optional)**
   - If changes affect UI:
     - Run: `./scripts/visual-regression.sh`
     - Compare against baseline
     - Report diff percentages
     - Show paths to diff images
   - If no baseline exists:
     - Capture baseline: `./scripts/visual-regression.sh --baseline`
     - Skip comparison (first run)

8. **Report summary**
   - Build status: pass/fail
   - Test results: X/Y passed
   - Failure details (if any) with screenshot references
   - Visual regression status (if run)
   - Next steps or completion confirmation

**Variables:**
- `{{TASK_DESCRIPTION}}` — What to implement
- `{{PROJECT_DIR}}` — Project root path
- `{{BASE_URL}}` — URL to test (default: http://localhost:3000)
- `{{MAX_ITERATIONS}}` — Max fix attempts (default: 3)

**Example usage:**
```
\plan-build-test:full-cycle
Task: Implement login form validation
Project: /Users/makinja/ALAI/products/Drop/src/drop-app
Base URL: http://localhost:3000
```

---

### Mode 2: Test Only (`\plan-build-test:test-only`)

**Purpose:** Run tests against existing deployment (local or remote) without building

**Agent workflow:**

1. **Accept parameters**
   - URL to test (required, default: http://localhost:3000)
   - Project filter (optional, e.g., "mobile-iphone")
   - Test grep pattern (optional, e.g., "login")

2. **Run tests**
   - Execute: `TEST_BASE_URL=<url> ./scripts/test-runner.sh [--project <project>] [--grep <pattern>]`
   - Parse results from `/tmp/playwright-results.json`

3. **Report results**
   - Summary: X/Y tests passed
   - If failures:
     - Show failure details (test title + error message)
     - List screenshot paths from `/tmp/playwright-screenshots/`
   - Exit code: 0 = all pass, 1 = failures

**Variables:**
- `{{BASE_URL}}` — URL to test
- `{{PROJECT}}` — Project filter (optional)
- `{{GREP_PATTERN}}` — Test name filter (optional)

**Example usage:**
```
\plan-build-test:test-only
URL: https://staging.getdrop.no
Project: mobile-iphone
Pattern: login
```

**Mobile testing:**
- iPhone viewport: `--project mobile-iphone`
- Galaxy viewport: `--project mobile-galaxy`
- iPad viewport: `--project tablet-ipad`

---

### Mode 3: Visual Check (`\plan-build-test:visual-check`)

**Purpose:** Capture screenshots and compare against baseline for visual regression detection

**Agent workflow:**

1. **Check baseline status**
   - Check if baseline exists: `ls tests/visual/baseline/*.png`
   - If no baseline → capture baseline mode
   - If baseline exists → comparison mode

2. **Capture baseline (first run)**
   - Execute: `./scripts/visual-regression.sh --baseline`
   - Saves screenshots to `tests/visual/baseline/`
   - Report: "Baseline captured, X screenshots saved"
   - Skip comparison (nothing to compare against)

3. **Run comparison (subsequent runs)**
   - Execute: `./scripts/visual-regression.sh [--threshold <percent>]`
   - Default threshold: 5% (customizable)
   - Compares current screenshots vs baseline
   - Generates diff images to `/tmp/visual-diffs/`

4. **Report results**
   - Per-page diff percentages
   - Overall status: pass (no diffs > threshold) or fail (diffs detected)
   - Paths to diff images for review
   - Recommendation: approve new baseline or fix regressions

**Variables:**
- `{{PROJECT_DIR}}` — Project root path
- `{{THRESHOLD}}` — Max diff percentage allowed (default: 5)

**Example usage:**
```
\plan-build-test:visual-check
Threshold: 10
```

**Workflow:**
1. First run: Capture baseline
2. Make UI changes
3. Run visual check → see diffs
4. Review diff images
5. If intentional → update baseline: `./scripts/visual-regression.sh --baseline`
6. If bugs → fix issues → re-run visual check

---

## Key Constraints

1. **Playwright CLI ONLY**
   - NEVER use MCP playwright tools
   - All tests via `npx playwright test` or wrapper scripts
   - No browser automation except through Playwright CLI

2. **URL flexibility**
   - Support local dev: http://localhost:3000
   - Support staging: https://staging.example.com
   - Support production: https://example.com
   - Use TEST_BASE_URL env var to override default

3. **Mobile testing**
   - Use `--project` flag for mobile viewports
   - Available projects: mobile-iphone, mobile-galaxy, tablet-ipad
   - See playwright.config.ts for full project list

4. **JSON results parsing**
   - Always parse `/tmp/playwright-results.json` for structured data
   - Extract: total, passed, failed, skipped, failures[]
   - Reference screenshot paths from `/tmp/playwright-screenshots/`

5. **Screenshot evidence**
   - All failure screenshots saved to `/tmp/playwright-screenshots/`
   - Visual regression diffs saved to `/tmp/visual-diffs/`
   - Include paths in reports for manual review

6. **Iterative fixing**
   - Max 3 iterations for build fixes
   - Max 3 iterations for test fixes
   - After max iterations → escalate to human with detailed failure analysis

7. **Build before test**
   - Full cycle MUST run build before tests
   - Test-only mode assumes build already done
   - Visual check mode can run independently (screenshot capture doesn't require build)

---

## File Locations

- **Test runner:** `./scripts/test-runner.sh`
- **Visual regression:** `./scripts/visual-regression.sh`
- **Playwright config:** `playwright.config.ts`
- **Test results:** `/tmp/playwright-results.json`
- **Screenshots:** `/tmp/playwright-screenshots/`
- **Visual diffs:** `/tmp/visual-diffs/`
- **Visual baseline:** `tests/visual/baseline/`

---

## Example Outputs

### Full Cycle Success
```
Task #1234 COMPLETE

Build: ✓ Passed
Tests: ✓ 15/15 passed
Visual regression: ✓ No changes detected (all diffs < 5%)

Ready for deployment.
```

### Full Cycle with Failures
```
Task #1234 — Test failures detected

Build: ✓ Passed
Tests: ✗ 12/15 passed (3 failures)

Failures:
1. "login with valid credentials" — Error: Element not found: button[type="submit"]
   Screenshot: /tmp/playwright-screenshots/login-failure-1.png

2. "dashboard loads after login" — Error: Timeout waiting for selector: h1:has-text("Dashboard")
   Screenshot: /tmp/playwright-screenshots/dashboard-timeout-2.png

Fix iteration 1/3 in progress...
```

### Test Only (Remote)
```
Testing: https://staging.getdrop.no
Project: mobile-iphone
Results: ✓ 8/8 passed

All tests passed on mobile viewport.
```

### Visual Check
```
Visual regression results:

✓ login.png — 0.2% diff (PASS)
✓ dashboard.png — 1.8% diff (PASS)
✗ profile.png — 12.5% diff (FAIL — exceeds 5% threshold)

Review diff: /tmp/visual-diffs/profile-diff.png

Action needed: Review profile page changes or update baseline if intentional.
```


---
## ⏱ Operational Limits
- **MAX TURNS:** 30 (build) | 20 (validate) | 10 (lookup)
- Exit cleanly after completing. On 5+ failures: escalate to John with full error context.