/plan-build-test

Source: `~/.claude/skills/plan-build-test/SKILL.md`

name: plan-build-test version: "2.0" level: 3 trigger: "plan-build-test, full-cycle test, playwright, E2E testing, run tests" author: john updated: 2026-03-16 description: Orchestration skill for Plan→Build→Test development cycles. Runs Playwright CLI tests (NOT MCP) against local or remote web apps. Supports E2E testing, visual regression, and mobile viewport testing.

Plan-Build-Test Orchestration Skill

Automates the full development cycle: implement changes → build → test → fix → re-test.

CRITICAL: Playwright CLI ONLY — NEVER use MCP playwright tools. All testing via npx playwright test or ./scripts/test-runner.sh.

Modes

Mode 1: Full Cycle (`\plan-build-test:full-cycle`)

Purpose: Implement feature/fix → build → test → fix failures → visual regression

Agent workflow:

Read requirements
- Read task description and acceptance criteria
- Identify files to change and expected test coverage
Spawn builder subagent
- Use Task tool to spawn builder agent with clear file ownership
- Wait for builder to complete implementation
- Verify builder marked task as done
Build verification
- Run build command: npx next build (or relevant for project)
- Parse output for errors
- If build fails → analyze errors → spawn builder to fix → re-build
- Max 3 build iterations before escalating
Start dev server (if testing locally)
- If TEST_BASE_URL not set, start dev server: npx next dev &
- Wait for server ready (check http://localhost:3000)
- If testing remote URL, skip this step
Run E2E tests
- Execute: ./scripts/test-runner.sh [--project <project>] [--grep <pattern>]
- Parse JSON results from /tmp/playwright-results.json
- Capture:
  - Total tests, passed, failed, skipped
  - Failure details (test title, error message)
  - Screenshot paths from /tmp/playwright-screenshots/
Fix failures (if needed)
- If tests fail:
  - Analyze failure details and screenshots
  - Identify root cause
  - Spawn builder to fix issues
  - Re-run tests
- Max 3 fix iterations before escalating
Visual regression (optional)
- If changes affect UI:
  - Run: ./scripts/visual-regression.sh
  - Compare against baseline
  - Report diff percentages
  - Show paths to diff images
- If no baseline exists:
  - Capture baseline: ./scripts/visual-regression.sh --baseline
  - Skip comparison (first run)
Report summary
- Build status: pass/fail
- Test results: X/Y passed
- Failure details (if any) with screenshot references
- Visual regression status (if run)
- Next steps or completion confirmation

Variables:

{{TASK_DESCRIPTION}} — What to implement
{{PROJECT_DIR}} — Project root path
{{BASE_URL}} — URL to test (default: http://localhost:3000)
{{MAX_ITERATIONS}} — Max fix attempts (default: 3)

Example usage:

\plan-build-test:full-cycle
Task: Implement login form validation
Project: /Users/makinja/ALAI/products/Drop/src/drop-app
Base URL: http://localhost:3000

Mode 2: Test Only (`\plan-build-test:test-only`)

Purpose: Run tests against existing deployment (local or remote) without building

Agent workflow:

Accept parameters
- URL to test (required, default: http://localhost:3000)
- Project filter (optional, e.g., "mobile-iphone")
- Test grep pattern (optional, e.g., "login")
Run tests
- Execute: TEST_BASE_URL=<url> ./scripts/test-runner.sh [--project <project>] [--grep <pattern>]
- Parse results from /tmp/playwright-results.json
Report results
- Summary: X/Y tests passed
- If failures:
  - Show failure details (test title + error message)
  - List screenshot paths from /tmp/playwright-screenshots/
- Exit code: 0 = all pass, 1 = failures

Variables:

{{BASE_URL}} — URL to test
{{PROJECT}} — Project filter (optional)
{{GREP_PATTERN}} — Test name filter (optional)

Example usage:

\plan-build-test:test-only
URL: https://staging.getdrop.no
Project: mobile-iphone
Pattern: login

Mobile testing:

iPhone viewport: --project mobile-iphone
Galaxy viewport: --project mobile-galaxy
iPad viewport: --project tablet-ipad

Mode 3: Visual Check (`\plan-build-test:visual-check`)

Purpose: Capture screenshots and compare against baseline for visual regression detection

Agent workflow:

Check baseline status
- Check if baseline exists: ls tests/visual/baseline/*.png
- If no baseline → capture baseline mode
- If baseline exists → comparison mode
Capture baseline (first run)
- Execute: ./scripts/visual-regression.sh --baseline
- Saves screenshots to tests/visual/baseline/
- Report: "Baseline captured, X screenshots saved"
- Skip comparison (nothing to compare against)
Run comparison (subsequent runs)
- Execute: ./scripts/visual-regression.sh [--threshold <percent>]
- Default threshold: 5% (customizable)
- Compares current screenshots vs baseline
- Generates diff images to /tmp/visual-diffs/
Report results
- Per-page diff percentages
- Overall status: pass (no diffs > threshold) or fail (diffs detected)
- Paths to diff images for review
- Recommendation: approve new baseline or fix regressions

Variables:

{{PROJECT_DIR}} — Project root path
{{THRESHOLD}} — Max diff percentage allowed (default: 5)

Example usage:

\plan-build-test:visual-check
Threshold: 10

Workflow:

First run: Capture baseline
Make UI changes
Run visual check → see diffs
Review diff images
If intentional → update baseline: ./scripts/visual-regression.sh --baseline
If bugs → fix issues → re-run visual check

Key Constraints

Playwright CLI ONLY
- NEVER use MCP playwright tools
- All tests via npx playwright test or wrapper scripts
- No browser automation except through Playwright CLI
URL flexibility
- Support local dev: http://localhost:3000
- Support staging: https://staging.example.com
- Support production: https://example.com
- Use TEST_BASE_URL env var to override default
Mobile testing
- Use --project flag for mobile viewports
- Available projects: mobile-iphone, mobile-galaxy, tablet-ipad
- See playwright.config.ts for full project list
JSON results parsing
- Always parse /tmp/playwright-results.json for structured data
- Extract: total, passed, failed, skipped, failures[]
- Reference screenshot paths from /tmp/playwright-screenshots/
Screenshot evidence
- All failure screenshots saved to /tmp/playwright-screenshots/
- Visual regression diffs saved to /tmp/visual-diffs/
- Include paths in reports for manual review
Iterative fixing
- Max 3 iterations for build fixes
- Max 3 iterations for test fixes
- After max iterations → escalate to human with detailed failure analysis
Build before test
- Full cycle MUST run build before tests
- Test-only mode assumes build already done
- Visual check mode can run independently (screenshot capture doesn't require build)

File Locations

Test runner: ./scripts/test-runner.sh
Visual regression: ./scripts/visual-regression.sh
Playwright config: playwright.config.ts
Test results: /tmp/playwright-results.json
Screenshots: /tmp/playwright-screenshots/
Visual diffs: /tmp/visual-diffs/
Visual baseline: tests/visual/baseline/

Example Outputs

Full Cycle Success

Task #1234 COMPLETE

Build: ✓ Passed
Tests: ✓ 15/15 passed
Visual regression: ✓ No changes detected (all diffs < 5%)

Ready for deployment.

Full Cycle with Failures

Task #1234 — Test failures detected

Build: ✓ Passed
Tests: ✗ 12/15 passed (3 failures)

Failures:
1. "login with valid credentials" — Error: Element not found: button[type="submit"]
   Screenshot: /tmp/playwright-screenshots/login-failure-1.png

2. "dashboard loads after login" — Error: Timeout waiting for selector: h1:has-text("Dashboard")
   Screenshot: /tmp/playwright-screenshots/dashboard-timeout-2.png

Fix iteration 1/3 in progress...

Test Only (Remote)

Testing: https://staging.getdrop.no
Project: mobile-iphone
Results: ✓ 8/8 passed

All tests passed on mobile viewport.

Visual Check

Visual regression results:

✓ login.png — 0.2% diff (PASS)
✓ dashboard.png — 1.8% diff (PASS)
✗ profile.png — 12.5% diff (FAIL — exceeds 5% threshold)

Review diff: /tmp/visual-diffs/profile-diff.png

Action needed: Review profile page changes or update baseline if intentional.

⏱ Operational Limits

MAX TURNS: 30 (build) | 20 (validate) | 10 (lookup)
Exit cleanly after completing. On 5+ failures: escalate to John with full error context.

/build-plan

/plan-with-team

/learning-opportunity

/code-review

/security-audit

/invoice

/pipeline-review

/financial-overview

/onboard-client

/onboard-partner

/send-for-signing

/canvas-design

/frontend-design

/figma-design

/design-system

/brand-guidelines

/pdf

/docx

/pptx

/xlsx

/doc-coauthoring

/sp-brainstorm

/sp-brainstorming

/sp-dispatching-parallel-agents

/sp-execute-plan

/sp-executing-plans

/sp-finishing-a-development-branch

/sp-receiving-code-review

/sp-requesting-code-review

/sp-subagent-driven-development

/sp-systematic-debugging

/sp-test-driven-development

/sp-using-git-worktrees

/sp-using-superpowers

/sp-verification-before-completion

/sp-write-plan

/sp-writing-plans

/sp-writing-skills

/sentry-agents-md

/sentry-brand-guidelines

/sentry-claude-settings-audit

/sentry-code-review

/sentry-code-simplifier

/sentry-commit

/sentry-create-pr

/sentry-django-access-review

/sentry-django-perf-review

/sentry-doc-coauthoring

/sentry-find-bugs

/sentry-iterate-pr

/sentry-security-review

/sentry-skill-creator

/sentry-skill-scanner

/ask-questions-if-underspecified

/audit-context-building

/algorand-vulnerability-scanner

/audit-prep-assistant

/cairo-vulnerability-scanner

/code-maturity-assessor

/cosmos-vulnerability-scanner

/guidelines-advisor

/secure-workflow-guide

/solana-vulnerability-scanner

/substrate-vulnerability-scanner

/token-integration-analyzer

/ton-vulnerability-scanner

/claude-in-chrome-troubleshooting

/constant-time-analysis

/interpreting-culture-index

/devcontainer-setup

/differential-review

/dwarf-expert

/entry-point-analyzer

/firebase-apk-scanner

/fix-review

/insecure-defaults

/modern-python

/property-based-testing

/second-opinion

/semgrep-rule-creator

Source: `~/.claude/skills/plan-build-test/SKILL.md`

Mode 1: Full Cycle (`\plan-build-test:full-cycle`)

Mode 2: Test Only (`\plan-build-test:test-only`)

Mode 3: Visual Check (`\plan-build-test:visual-check`)