/plan-build-test
Source: ~/.claude/skills/plan-build-test/SKILL.md
name: plan-build-test version: "2.0" level: 3 trigger: "plan-build-test, full-cycle test, playwright, E2E testing, run tests" author: john updated: 2026-03-16 description: Orchestration skill for Plan→Build→Test development cycles. Runs Playwright CLI tests (NOT MCP) against local or remote web apps. Supports E2E testing, visual regression, and mobile viewport testing.
Plan-Build-Test Orchestration Skill
Automates the full development cycle: implement changes → build → test → fix → re-test.
CRITICAL: Playwright CLI ONLY — NEVER use MCP playwright tools. All testing via npx playwright test or ./scripts/test-runner.sh.
Modes
Mode 1: Full Cycle (\plan-build-test:full-cycle)
Purpose: Implement feature/fix → build → test → fix failures → visual regression
Agent workflow:
-
Read requirements
- Read task description and acceptance criteria
- Identify files to change and expected test coverage
-
Spawn builder subagent
- Use Task tool to spawn builder agent with clear file ownership
- Wait for builder to complete implementation
- Verify builder marked task as done
-
Build verification
- Run build command:
npx next build(or relevant for project) - Parse output for errors
- If build fails → analyze errors → spawn builder to fix → re-build
- Max 3 build iterations before escalating
- Run build command:
-
Start dev server (if testing locally)
- If TEST_BASE_URL not set, start dev server:
npx next dev & - Wait for server ready (check http://localhost:3000)
- If testing remote URL, skip this step
- If TEST_BASE_URL not set, start dev server:
-
Run E2E tests
- Execute:
./scripts/test-runner.sh [--project <project>] [--grep <pattern>] - Parse JSON results from
/tmp/playwright-results.json - Capture:
- Total tests, passed, failed, skipped
- Failure details (test title, error message)
- Screenshot paths from
/tmp/playwright-screenshots/
- Execute:
-
Fix failures (if needed)
- If tests fail:
- Analyze failure details and screenshots
- Identify root cause
- Spawn builder to fix issues
- Re-run tests
- Max 3 fix iterations before escalating
- If tests fail:
-
Visual regression (optional)
- If changes affect UI:
- Run:
./scripts/visual-regression.sh - Compare against baseline
- Report diff percentages
- Show paths to diff images
- Run:
- If no baseline exists:
- Capture baseline:
./scripts/visual-regression.sh --baseline - Skip comparison (first run)
- Capture baseline:
- If changes affect UI:
-
Report summary
- Build status: pass/fail
- Test results: X/Y passed
- Failure details (if any) with screenshot references
- Visual regression status (if run)
- Next steps or completion confirmation
Variables:
{{TASK_DESCRIPTION}}— What to implement{{PROJECT_DIR}}— Project root path{{BASE_URL}}— URL to test (default: http://localhost:3000){{MAX_ITERATIONS}}— Max fix attempts (default: 3)
Example usage:
\plan-build-test:full-cycle
Task: Implement login form validation
Project: /Users/makinja/ALAI/products/Drop/src/drop-app
Base URL: http://localhost:3000
Mode 2: Test Only (\plan-build-test:test-only)
Purpose: Run tests against existing deployment (local or remote) without building
Agent workflow:
-
Accept parameters
- URL to test (required, default: http://localhost:3000)
- Project filter (optional, e.g., "mobile-iphone")
- Test grep pattern (optional, e.g., "login")
-
Run tests
- Execute:
TEST_BASE_URL=<url> ./scripts/test-runner.sh [--project <project>] [--grep <pattern>] - Parse results from
/tmp/playwright-results.json
- Execute:
-
Report results
- Summary: X/Y tests passed
- If failures:
- Show failure details (test title + error message)
- List screenshot paths from
/tmp/playwright-screenshots/
- Exit code: 0 = all pass, 1 = failures
Variables:
{{BASE_URL}}— URL to test{{PROJECT}}— Project filter (optional){{GREP_PATTERN}}— Test name filter (optional)
Example usage:
\plan-build-test:test-only
URL: https://staging.getdrop.no
Project: mobile-iphone
Pattern: login
Mobile testing:
- iPhone viewport:
--project mobile-iphone - Galaxy viewport:
--project mobile-galaxy - iPad viewport:
--project tablet-ipad
Mode 3: Visual Check (\plan-build-test:visual-check)
Purpose: Capture screenshots and compare against baseline for visual regression detection
Agent workflow:
-
Check baseline status
- Check if baseline exists:
ls tests/visual/baseline/*.png - If no baseline → capture baseline mode
- If baseline exists → comparison mode
- Check if baseline exists:
-
Capture baseline (first run)
- Execute:
./scripts/visual-regression.sh --baseline - Saves screenshots to
tests/visual/baseline/ - Report: "Baseline captured, X screenshots saved"
- Skip comparison (nothing to compare against)
- Execute:
-
Run comparison (subsequent runs)
- Execute:
./scripts/visual-regression.sh [--threshold <percent>] - Default threshold: 5% (customizable)
- Compares current screenshots vs baseline
- Generates diff images to
/tmp/visual-diffs/
- Execute:
-
Report results
- Per-page diff percentages
- Overall status: pass (no diffs > threshold) or fail (diffs detected)
- Paths to diff images for review
- Recommendation: approve new baseline or fix regressions
Variables:
{{PROJECT_DIR}}— Project root path{{THRESHOLD}}— Max diff percentage allowed (default: 5)
Example usage:
\plan-build-test:visual-check
Threshold: 10
Workflow:
- First run: Capture baseline
- Make UI changes
- Run visual check → see diffs
- Review diff images
- If intentional → update baseline:
./scripts/visual-regression.sh --baseline - If bugs → fix issues → re-run visual check
Key Constraints
-
Playwright CLI ONLY
- NEVER use MCP playwright tools
- All tests via
npx playwright testor wrapper scripts - No browser automation except through Playwright CLI
-
URL flexibility
- Support local dev: http://localhost:3000
- Support staging: https://staging.example.com
- Support production: https://example.com
- Use TEST_BASE_URL env var to override default
-
Mobile testing
- Use
--projectflag for mobile viewports - Available projects: mobile-iphone, mobile-galaxy, tablet-ipad
- See playwright.config.ts for full project list
- Use
-
JSON results parsing
- Always parse
/tmp/playwright-results.jsonfor structured data - Extract: total, passed, failed, skipped, failures[]
- Reference screenshot paths from
/tmp/playwright-screenshots/
- Always parse
-
Screenshot evidence
- All failure screenshots saved to
/tmp/playwright-screenshots/ - Visual regression diffs saved to
/tmp/visual-diffs/ - Include paths in reports for manual review
- All failure screenshots saved to
-
Iterative fixing
- Max 3 iterations for build fixes
- Max 3 iterations for test fixes
- After max iterations → escalate to human with detailed failure analysis
-
Build before test
- Full cycle MUST run build before tests
- Test-only mode assumes build already done
- Visual check mode can run independently (screenshot capture doesn't require build)
File Locations
- Test runner:
./scripts/test-runner.sh - Visual regression:
./scripts/visual-regression.sh - Playwright config:
playwright.config.ts - Test results:
/tmp/playwright-results.json - Screenshots:
/tmp/playwright-screenshots/ - Visual diffs:
/tmp/visual-diffs/ - Visual baseline:
tests/visual/baseline/
Example Outputs
Full Cycle Success
Task #1234 COMPLETE
Build: ✓ Passed
Tests: ✓ 15/15 passed
Visual regression: ✓ No changes detected (all diffs < 5%)
Ready for deployment.
Full Cycle with Failures
Task #1234 — Test failures detected
Build: ✓ Passed
Tests: ✗ 12/15 passed (3 failures)
Failures:
1. "login with valid credentials" — Error: Element not found: button[type="submit"]
Screenshot: /tmp/playwright-screenshots/login-failure-1.png
2. "dashboard loads after login" — Error: Timeout waiting for selector: h1:has-text("Dashboard")
Screenshot: /tmp/playwright-screenshots/dashboard-timeout-2.png
Fix iteration 1/3 in progress...
Test Only (Remote)
Testing: https://staging.getdrop.no
Project: mobile-iphone
Results: ✓ 8/8 passed
All tests passed on mobile viewport.
Visual Check
Visual regression results:
✓ login.png — 0.2% diff (PASS)
✓ dashboard.png — 1.8% diff (PASS)
✗ profile.png — 12.5% diff (FAIL — exceeds 5% threshold)
Review diff: /tmp/visual-diffs/profile-diff.png
Action needed: Review profile page changes or update baseline if intentional.
⏱ Operational Limits
- MAX TURNS: 30 (build) | 20 (validate) | 10 (lookup)
- Exit cleanly after completing. On 5+ failures: escalate to John with full error context.
No comments to display
No comments to display