Specialized Patterns (sp-*)

/sp-brainstorm

Source: ~/.claude/skills/sp-brainstorm/SKILL.md


description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores requirements and design before implementation." disable-model-invocation: true

Invoke the superpowers:brainstorming skill and follow it exactly as presented to you

/sp-brainstorming

Source: ~/.claude/skills/sp-brainstorming/SKILL.md


name: brainstorming description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."

Brainstorming Ideas Into Designs

Overview

Help turn ideas into fully formed designs and specs through natural collaborative dialogue.

Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design in small sections (200-300 words), checking after each section whether it looks right so far.

The Process

Understanding the idea:

Exploring approaches:

Presenting the design:

After the Design

Documentation:

Implementation (if continuing):

Key Principles

/sp-dispatching-parallel-agents

Source: ~/.claude/skills/sp-dispatching-parallel-agents/SKILL.md


name: dispatching-parallel-agents description: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies

Dispatching Parallel Agents

Overview

When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel.

Core principle: Dispatch one agent per independent problem domain. Let them work concurrently.

When to Use

digraph when_to_use {
    "Multiple failures?" [shape=diamond];
    "Are they independent?" [shape=diamond];
    "Single agent investigates all" [shape=box];
    "One agent per problem domain" [shape=box];
    "Can they work in parallel?" [shape=diamond];
    "Sequential agents" [shape=box];
    "Parallel dispatch" [shape=box];

    "Multiple failures?" -> "Are they independent?" [label="yes"];
    "Are they independent?" -> "Single agent investigates all" [label="no - related"];
    "Are they independent?" -> "Can they work in parallel?" [label="yes"];
    "Can they work in parallel?" -> "Parallel dispatch" [label="yes"];
    "Can they work in parallel?" -> "Sequential agents" [label="no - shared state"];
}

Use when:

Don't use when:

The Pattern

1. Identify Independent Domains

Group failures by what's broken:

Each domain is independent - fixing tool approval doesn't affect abort tests.

2. Create Focused Agent Tasks

Each agent gets:

3. Dispatch in Parallel

// In Claude Code / AI environment
Task("Fix agent-tool-abort.test.ts failures")
Task("Fix batch-completion-behavior.test.ts failures")
Task("Fix tool-approval-race-conditions.test.ts failures")
// All three run concurrently

4. Review and Integrate

When agents return:

Agent Prompt Structure

Good agent prompts are:

  1. Focused - One clear problem domain
  2. Self-contained - All context needed to understand the problem
  3. Specific about output - What should the agent return?
Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:

1. "should abort tool with partial output capture" - expects 'interrupted at' in message
2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
3. "should properly track pendingToolCount" - expects 3 results but gets 0

These are timing/race condition issues. Your task:

1. Read the test file and understand what each test verifies
2. Identify root cause - timing issues or actual bugs?
3. Fix by:
   - Replacing arbitrary timeouts with event-based waiting
   - Fixing bugs in abort implementation if found
   - Adjusting test expectations if testing changed behavior

Do NOT just increase timeouts - find the real issue.

Return: Summary of what you found and what you fixed.

Common Mistakes

❌ Too broad: "Fix all the tests" - agent gets lost ✅ Specific: "Fix agent-tool-abort.test.ts" - focused scope

❌ No context: "Fix the race condition" - agent doesn't know where ✅ Context: Paste the error messages and test names

❌ No constraints: Agent might refactor everything ✅ Constraints: "Do NOT change production code" or "Fix tests only"

❌ Vague output: "Fix it" - you don't know what changed ✅ Specific: "Return summary of root cause and changes"

When NOT to Use

Real Example from Session

Scenario: 6 test failures across 3 files after major refactoring

Failures:

Decision: Independent domains - abort logic separate from batch completion separate from race conditions

Dispatch:

Agent 1 → Fix agent-tool-abort.test.ts
Agent 2 → Fix batch-completion-behavior.test.ts
Agent 3 → Fix tool-approval-race-conditions.test.ts

Results:

Integration: All fixes independent, no conflicts, full suite green

Time saved: 3 problems solved in parallel vs sequentially

Key Benefits

  1. Parallelization - Multiple investigations happen simultaneously
  2. Focus - Each agent has narrow scope, less context to track
  3. Independence - Agents don't interfere with each other
  4. Speed - 3 problems solved in time of 1

Verification

After agents return:

  1. Review each summary - Understand what changed
  2. Check for conflicts - Did agents edit same code?
  3. Run full suite - Verify all fixes work together
  4. Spot check - Agents can make systematic errors

Real-World Impact

From debugging session (2025-10-03):

/sp-execute-plan

Source: ~/.claude/skills/sp-execute-plan/SKILL.md


description: Execute plan in batches with review checkpoints disable-model-invocation: true

Invoke the superpowers:executing-plans skill and follow it exactly as presented to you

/sp-executing-plans

Source: ~/.claude/skills/sp-executing-plans/SKILL.md


name: executing-plans description: Use when you have a written implementation plan to execute in a separate session with review checkpoints

Executing Plans

Overview

Load plan, review critically, execute tasks in batches, report for review between batches.

Core principle: Batch execution with checkpoints for architect review.

Announce at start: "I'm using the executing-plans skill to implement this plan."

The Process

Step 1: Load and Review Plan

  1. Read plan file
  2. Review critically - identify any questions or concerns about the plan
  3. If concerns: Raise them with your human partner before starting
  4. If no concerns: Create TodoWrite and proceed

Step 2: Execute Batch

Default: First 3 tasks

For each task:

  1. Mark as in_progress
  2. Follow each step exactly (plan has bite-sized steps)
  3. Run verifications as specified
  4. Mark as completed

Step 3: Report

When batch complete:

Step 4: Continue

Based on feedback:

Step 5: Complete Development

After all tasks complete and verified:

When to Stop and Ask for Help

STOP executing immediately when:

Ask for clarification rather than guessing.

When to Revisit Earlier Steps

Return to Review (Step 1) when:

Don't force through blockers - stop and ask.

Remember

Integration

Required workflow skills:

/sp-finishing-a-development-branch

Source: ~/.claude/skills/sp-finishing-a-development-branch/SKILL.md


name: finishing-a-development-branch description: Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup

Finishing a Development Branch

Overview

Guide completion of development work by presenting clear options and handling chosen workflow.

Core principle: Verify tests → Present options → Execute choice → Clean up.

Announce at start: "I'm using the finishing-a-development-branch skill to complete this work."

The Process

Step 1: Verify Tests

Before presenting options, verify tests pass:

# Run project's test suite
npm test / cargo test / pytest / go test ./...

If tests fail:

Tests failing (<N> failures). Must fix before completing:

[Show failures]

Cannot proceed with merge/PR until tests pass.

Stop. Don't proceed to Step 2.

If tests pass: Continue to Step 2.

Step 2: Determine Base Branch

# Try common base branches
git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null

Or ask: "This branch split from main - is that correct?"

Step 3: Present Options

Present exactly these 4 options:

Implementation complete. What would you like to do?

1. Merge back to <base-branch> locally
2. Push and create a Pull Request
3. Keep the branch as-is (I'll handle it later)
4. Discard this work

Which option?

Don't add explanation - keep options concise.

Step 4: Execute Choice

Option 1: Merge Locally

# Switch to base branch
git checkout <base-branch>

# Pull latest
git pull

# Merge feature branch
git merge <feature-branch>

# Verify tests on merged result
<test command>

# If tests pass
git branch -d <feature-branch>

Then: Cleanup worktree (Step 5)

Option 2: Push and Create PR

# Push branch
git push -u origin <feature-branch>

# Create PR
gh pr create --title "<title>" --body "$(cat <<'EOF'
## Summary
<2-3 bullets of what changed>

## Test Plan
- [ ] <verification steps>
EOF
)"

Then: Cleanup worktree (Step 5)

Option 3: Keep As-Is

Report: "Keeping branch . Worktree preserved at ."

Don't cleanup worktree.

Option 4: Discard

Confirm first:

This will permanently delete:
- Branch <name>
- All commits: <commit-list>
- Worktree at <path>

Type 'discard' to confirm.

Wait for exact confirmation.

If confirmed:

git checkout <base-branch>
git branch -D <feature-branch>

Then: Cleanup worktree (Step 5)

Step 5: Cleanup Worktree

For Options 1, 2, 4:

Check if in worktree:

git worktree list | grep $(git branch --show-current)

If yes:

git worktree remove <worktree-path>

For Option 3: Keep worktree.

Quick Reference

Option Merge Push Keep Worktree Cleanup Branch
1. Merge locally - -
2. Create PR - -
3. Keep as-is - - -
4. Discard - - - ✓ (force)

Common Mistakes

Skipping test verification

Open-ended questions

Automatic worktree cleanup

No confirmation for discard

Red Flags

Never:

Always:

Integration

Called by:

Pairs with:

/sp-receiving-code-review

Source: ~/.claude/skills/sp-receiving-code-review/SKILL.md


name: receiving-code-review description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation

Code Review Reception

Overview

Code review requires technical evaluation, not emotional performance.

Core principle: Verify before implementing. Ask before assuming. Technical correctness over social comfort.

The Response Pattern

WHEN receiving code review feedback:

1. READ: Complete feedback without reacting
2. UNDERSTAND: Restate requirement in own words (or ask)
3. VERIFY: Check against codebase reality
4. EVALUATE: Technically sound for THIS codebase?
5. RESPOND: Technical acknowledgment or reasoned pushback
6. IMPLEMENT: One item at a time, test each

Forbidden Responses

NEVER:

INSTEAD:

Handling Unclear Feedback

IF any item is unclear:
  STOP - do not implement anything yet
  ASK for clarification on unclear items

WHY: Items may be related. Partial understanding = wrong implementation.

Example:

your human partner: "Fix 1-6"
You understand 1,2,3,6. Unclear on 4,5.

❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later
✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding."

Source-Specific Handling

From your human partner

From External Reviewers

BEFORE implementing:
  1. Check: Technically correct for THIS codebase?
  2. Check: Breaks existing functionality?
  3. Check: Reason for current implementation?
  4. Check: Works on all platforms/versions?
  5. Check: Does reviewer understand full context?

IF suggestion seems wrong:
  Push back with technical reasoning

IF can't easily verify:
  Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?"

IF conflicts with your human partner's prior decisions:
  Stop and discuss with your human partner first

your human partner's rule: "External feedback - be skeptical, but check carefully"

YAGNI Check for "Professional" Features

IF reviewer suggests "implementing properly":
  grep codebase for actual usage

  IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
  IF used: Then implement properly

your human partner's rule: "You and reviewer both report to me. If we don't need this feature, don't add it."

Implementation Order

FOR multi-item feedback:
  1. Clarify anything unclear FIRST
  2. Then implement in this order:
     - Blocking issues (breaks, security)
     - Simple fixes (typos, imports)
     - Complex fixes (refactoring, logic)
  3. Test each fix individually
  4. Verify no regressions

When To Push Back

Push back when:

How to push back:

Signal if uncomfortable pushing back out loud: "Strange things are afoot at the Circle K"

Acknowledging Correct Feedback

When feedback IS correct:

✅ "Fixed. [Brief description of what changed]"
✅ "Good catch - [specific issue]. Fixed in [location]."
✅ [Just fix it and show in the code]

❌ "You're absolutely right!"
❌ "Great point!"
❌ "Thanks for catching that!"
❌ "Thanks for [anything]"
❌ ANY gratitude expression

Why no thanks: Actions speak. Just fix it. The code itself shows you heard the feedback.

If you catch yourself about to write "Thanks": DELETE IT. State the fix instead.

Gracefully Correcting Your Pushback

If you pushed back and were wrong:

✅ "You were right - I checked [X] and it does [Y]. Implementing now."
✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing."

❌ Long apology
❌ Defending why you pushed back
❌ Over-explaining

State the correction factually and move on.

Common Mistakes

Mistake Fix
Performative agreement State requirement or just act
Blind implementation Verify against codebase first
Batch without testing One at a time, test each
Assuming reviewer is right Check if breaks things
Avoiding pushback Technical correctness > comfort
Partial implementation Clarify all items first
Can't verify, proceed anyway State limitation, ask for direction

Real Examples

Performative Agreement (Bad):

Reviewer: "Remove legacy code"
❌ "You're absolutely right! Let me remove that..."

Technical Verification (Good):

Reviewer: "Remove legacy code"
✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?"

YAGNI (Good):

Reviewer: "Implement proper metrics tracking with database, date filters, CSV export"
✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?"

Unclear Item (Good):

your human partner: "Fix items 1-6"
You understand 1,2,3,6. Unclear on 4,5.
✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."

GitHub Thread Replies

When replying to inline review comments on GitHub, reply in the comment thread (gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies), not as a top-level PR comment.

The Bottom Line

External feedback = suggestions to evaluate, not orders to follow.

Verify. Question. Then implement.

No performative agreement. Technical rigor always.

/sp-requesting-code-review

Source: ~/.claude/skills/sp-requesting-code-review/SKILL.md


name: requesting-code-review description: Use when completing tasks, implementing major features, or before merging to verify work meets requirements

Requesting Code Review

Dispatch superpowers:code-reviewer subagent to catch issues before they cascade.

Core principle: Review early, review often.

When to Request Review

Mandatory:

Optional but valuable:

How to Request

1. Get git SHAs:

BASE_SHA=$(git rev-parse HEAD~1)  # or origin/main
HEAD_SHA=$(git rev-parse HEAD)

2. Dispatch code-reviewer subagent:

Use Task tool with superpowers:code-reviewer type, fill template at code-reviewer.md

Placeholders:

3. Act on feedback:

Example

[Just completed Task 2: Add verification function]

You: Let me request code review before proceeding.

BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}')
HEAD_SHA=$(git rev-parse HEAD)

[Dispatch superpowers:code-reviewer subagent]
  WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index
  PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md
  BASE_SHA: a7981ec
  HEAD_SHA: 3df7661
  DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types

[Subagent returns]:
  Strengths: Clean architecture, real tests
  Issues:
    Important: Missing progress indicators
    Minor: Magic number (100) for reporting interval
  Assessment: Ready to proceed

You: [Fix progress indicators]
[Continue to Task 3]

Integration with Workflows

Subagent-Driven Development:

Executing Plans:

Ad-Hoc Development:

Red Flags

Never:

If reviewer wrong:

See template at: requesting-code-review/code-reviewer.md

/sp-subagent-driven-development

Source: ~/.claude/skills/sp-subagent-driven-development/SKILL.md


name: subagent-driven-development description: Use when executing implementation plans with independent tasks in the current session

Subagent-Driven Development

Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.

Core principle: Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration

When to Use

digraph when_to_use {
    "Have implementation plan?" [shape=diamond];
    "Tasks mostly independent?" [shape=diamond];
    "Stay in this session?" [shape=diamond];
    "subagent-driven-development" [shape=box];
    "executing-plans" [shape=box];
    "Manual execution or brainstorm first" [shape=box];

    "Have implementation plan?" -> "Tasks mostly independent?" [label="yes"];
    "Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"];
    "Tasks mostly independent?" -> "Stay in this session?" [label="yes"];
    "Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"];
    "Stay in this session?" -> "subagent-driven-development" [label="yes"];
    "Stay in this session?" -> "executing-plans" [label="no - parallel session"];
}

vs. Executing Plans (parallel session):

The Process

digraph process {
    rankdir=TB;

    subgraph cluster_per_task {
        label="Per Task";
        "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
        "Implementer subagent asks questions?" [shape=diamond];
        "Answer questions, provide context" [shape=box];
        "Implementer subagent implements, tests, commits, self-reviews" [shape=box];
        "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [shape=box];
        "Spec reviewer subagent confirms code matches spec?" [shape=diamond];
        "Implementer subagent fixes spec gaps" [shape=box];
        "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [shape=box];
        "Code quality reviewer subagent approves?" [shape=diamond];
        "Implementer subagent fixes quality issues" [shape=box];
        "Mark task complete in TodoWrite" [shape=box];
    }

    "Read plan, extract all tasks with full text, note context, create TodoWrite" [shape=box];
    "More tasks remain?" [shape=diamond];
    "Dispatch final code reviewer subagent for entire implementation" [shape=box];
    "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];

    "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)";
    "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
    "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
    "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
    "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"];
    "Implementer subagent implements, tests, commits, self-reviews" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)";
    "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" -> "Spec reviewer subagent confirms code matches spec?";
    "Spec reviewer subagent confirms code matches spec?" -> "Implementer subagent fixes spec gaps" [label="no"];
    "Implementer subagent fixes spec gaps" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [label="re-review"];
    "Spec reviewer subagent confirms code matches spec?" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="yes"];
    "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" -> "Code quality reviewer subagent approves?";
    "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
    "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="re-review"];
    "Code quality reviewer subagent approves?" -> "Mark task complete in TodoWrite" [label="yes"];
    "Mark task complete in TodoWrite" -> "More tasks remain?";
    "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
    "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"];
    "Dispatch final code reviewer subagent for entire implementation" -> "Use superpowers:finishing-a-development-branch";
}

Prompt Templates

Example Workflow

You: I'm using Subagent-Driven Development to execute this plan.

[Read plan file once: docs/plans/feature-plan.md]
[Extract all 5 tasks with full text and context]
[Create TodoWrite with all tasks]

Task 1: Hook installation script

[Get Task 1 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]

Implementer: "Before I begin - should the hook be installed at user or system level?"

You: "User level (~/.config/superpowers/hooks/)"

Implementer: "Got it. Implementing now..."
[Later] Implementer:
  - Implemented install-hook command
  - Added tests, 5/5 passing
  - Self-review: Found I missed --force flag, added it
  - Committed

[Dispatch spec compliance reviewer]
Spec reviewer: ✅ Spec compliant - all requirements met, nothing extra

[Get git SHAs, dispatch code quality reviewer]
Code reviewer: Strengths: Good test coverage, clean. Issues: None. Approved.

[Mark Task 1 complete]

Task 2: Recovery modes

[Get Task 2 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]

Implementer: [No questions, proceeds]
Implementer:
  - Added verify/repair modes
  - 8/8 tests passing
  - Self-review: All good
  - Committed

[Dispatch spec compliance reviewer]
Spec reviewer: ❌ Issues:
  - Missing: Progress reporting (spec says "report every 100 items")
  - Extra: Added --json flag (not requested)

[Implementer fixes issues]
Implementer: Removed --json flag, added progress reporting

[Spec reviewer reviews again]
Spec reviewer: ✅ Spec compliant now

[Dispatch code quality reviewer]
Code reviewer: Strengths: Solid. Issues (Important): Magic number (100)

[Implementer fixes]
Implementer: Extracted PROGRESS_INTERVAL constant

[Code reviewer reviews again]
Code reviewer: ✅ Approved

[Mark Task 2 complete]

...

[After all tasks]
[Dispatch final code-reviewer]
Final reviewer: All requirements met, ready to merge

Done!

Advantages

vs. Manual execution:

vs. Executing Plans:

Efficiency gains:

Quality gates:

Cost:

Red Flags

Never:

If subagent asks questions:

If reviewer finds issues:

If subagent fails task:

Integration

Required workflow skills:

Subagents should use:

Alternative workflow:

/sp-systematic-debugging

Source: ~/.claude/skills/sp-systematic-debugging/SKILL.md


name: systematic-debugging description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes

Systematic Debugging

Overview

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

Violating the letter of this process is violating the spirit of debugging.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

When to Use

Use for ANY technical issue:

Use this ESPECIALLY when:

Don't skip when:

The Four Phases

You MUST complete each phase before proceeding to the next.

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

  1. Read Error Messages Carefully

    • Don't skip past errors or warnings
    • They often contain the exact solution
    • Read stack traces completely
    • Note line numbers, file paths, error codes
  2. Reproduce Consistently

    • Can you trigger it reliably?
    • What are the exact steps?
    • Does it happen every time?
    • If not reproducible → gather more data, don't guess
  3. Check Recent Changes

    • What changed that could cause this?
    • Git diff, recent commits
    • New dependencies, config changes
    • Environmental differences
  4. Gather Evidence in Multi-Component Systems

    WHEN system has multiple components (CI → build → signing, API → service → database):

    BEFORE proposing fixes, add diagnostic instrumentation:

    For EACH component boundary:
      - Log what data enters component
      - Log what data exits component
      - Verify environment/config propagation
      - Check state at each layer
    
    Run once to gather evidence showing WHERE it breaks
    THEN analyze evidence to identify failing component
    THEN investigate that specific component
    

    Example (multi-layer system):

    # Layer 1: Workflow
    echo "=== Secrets available in workflow: ==="
    echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"
    
    # Layer 2: Build script
    echo "=== Env vars in build script: ==="
    env | grep IDENTITY || echo "IDENTITY not in environment"
    
    # Layer 3: Signing script
    echo "=== Keychain state: ==="
    security list-keychains
    security find-identity -v
    
    # Layer 4: Actual signing
    codesign --sign "$IDENTITY" --verbose=4 "$APP"
    

    This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)

  5. Trace Data Flow

    WHEN error is deep in call stack:

    See root-cause-tracing.md in this directory for the complete backward tracing technique.

    Quick version:

    • Where does bad value originate?
    • What called this with bad value?
    • Keep tracing up until you find the source
    • Fix at source, not at symptom

Phase 2: Pattern Analysis

Find the pattern before fixing:

  1. Find Working Examples

    • Locate similar working code in same codebase
    • What works that's similar to what's broken?
  2. Compare Against References

    • If implementing pattern, read reference implementation COMPLETELY
    • Don't skim - read every line
    • Understand the pattern fully before applying
  3. Identify Differences

    • What's different between working and broken?
    • List every difference, however small
    • Don't assume "that can't matter"
  4. Understand Dependencies

    • What other components does this need?
    • What settings, config, environment?
    • What assumptions does it make?

Phase 3: Hypothesis and Testing

Scientific method:

  1. Form Single Hypothesis

    • State clearly: "I think X is the root cause because Y"
    • Write it down
    • Be specific, not vague
  2. Test Minimally

    • Make the SMALLEST possible change to test hypothesis
    • One variable at a time
    • Don't fix multiple things at once
  3. Verify Before Continuing

    • Did it work? Yes → Phase 4
    • Didn't work? Form NEW hypothesis
    • DON'T add more fixes on top
  4. When You Don't Know

    • Say "I don't understand X"
    • Don't pretend to know
    • Ask for help
    • Research more

Phase 4: Implementation

Fix the root cause, not the symptom:

  1. Create Failing Test Case

    • Simplest possible reproduction
    • Automated test if possible
    • One-off test script if no framework
    • MUST have before fixing
    • Use the superpowers:test-driven-development skill for writing proper failing tests
  2. Implement Single Fix

    • Address the root cause identified
    • ONE change at a time
    • No "while I'm here" improvements
    • No bundled refactoring
  3. Verify Fix

    • Test passes now?
    • No other tests broken?
    • Issue actually resolved?
  4. If Fix Doesn't Work

    • STOP
    • Count: How many fixes have you tried?
    • If < 3: Return to Phase 1, re-analyze with new information
    • If ≥ 3: STOP and question the architecture (step 5 below)
    • DON'T attempt Fix #4 without architectural discussion
  5. If 3+ Fixes Failed: Question Architecture

    Pattern indicating architectural problem:

    • Each fix reveals new shared state/coupling/problem in different place
    • Fixes require "massive refactoring" to implement
    • Each fix creates new symptoms elsewhere

    STOP and question fundamentals:

    • Is this pattern fundamentally sound?
    • Are we "sticking with it through sheer inertia"?
    • Should we refactor architecture vs. continue fixing symptoms?

    Discuss with your human partner before attempting more fixes

    This is NOT a failed hypothesis - this is a wrong architecture.

Red Flags - STOP and Follow Process

If you catch yourself thinking:

ALL of these mean: STOP. Return to Phase 1.

If 3+ fixes failed: Question the architecture (see Phase 4.5)

your human partner's Signals You're Doing It Wrong

Watch for these redirections:

When you see these: STOP. Return to Phase 1.

Common Rationalizations

Excuse Reality
"Issue is simple, don't need process" Simple issues have root causes too. Process is fast for simple bugs.
"Emergency, no time for process" Systematic debugging is FASTER than guess-and-check thrashing.
"Just try this first, then investigate" First fix sets the pattern. Do it right from the start.
"I'll write test after confirming fix works" Untested fixes don't stick. Test first proves it.
"Multiple fixes at once saves time" Can't isolate what worked. Causes new bugs.
"Reference too long, I'll adapt the pattern" Partial understanding guarantees bugs. Read it completely.
"I see the problem, let me fix it" Seeing symptoms ≠ understanding root cause.
"One more fix attempt" (after 2+ failures) 3+ failures = architectural problem. Question pattern, don't fix again.

Quick Reference

Phase Key Activities Success Criteria
1. Root Cause Read errors, reproduce, check changes, gather evidence Understand WHAT and WHY
2. Pattern Find working examples, compare Identify differences
3. Hypothesis Form theory, test minimally Confirmed or new hypothesis
4. Implementation Create test, fix, verify Bug resolved, tests pass

When Process Reveals "No Root Cause"

If systematic investigation reveals issue is truly environmental, timing-dependent, or external:

  1. You've completed the process
  2. Document what you investigated
  3. Implement appropriate handling (retry, timeout, error message)
  4. Add monitoring/logging for future investigation

But: 95% of "no root cause" cases are incomplete investigation.

Supporting Techniques

These techniques are part of systematic debugging and available in this directory:

Real-World Impact

From debugging sessions:

/sp-test-driven-development

Source: ~/.claude/skills/sp-test-driven-development/SKILL.md


name: test-driven-development description: Use when implementing any feature or bugfix, before writing implementation code

Test-Driven Development (TDD)

Overview

Write the test first. Watch it fail. Write minimal code to pass.

Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.

Violating the letter of the rules is violating the spirit of the rules.

When to Use

Always:

Exceptions (ask your human partner):

Thinking "skip TDD just this once"? Stop. That's rationalization.

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

No exceptions:

Implement fresh from tests. Period.

Red-Green-Refactor

digraph tdd_cycle {
    rankdir=LR;
    red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
    verify_red [label="Verify fails\ncorrectly", shape=diamond];
    green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
    verify_green [label="Verify passes\nAll green", shape=diamond];
    refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
    next [label="Next", shape=ellipse];

    red -> verify_red;
    verify_red -> green [label="yes"];
    verify_red -> red [label="wrong\nfailure"];
    green -> verify_green;
    verify_green -> refactor [label="yes"];
    verify_green -> green [label="no"];
    refactor -> verify_green [label="stay\ngreen"];
    verify_green -> next;
    next -> red;
}

RED - Write Failing Test

Write one minimal test showing what should happen.

```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; };

const result = await retryOperation(operation);

expect(result).toBe('success'); expect(attempts).toBe(3); });

Clear name, tests real behavior, one thing
</Good>

<Bad>
```typescript
test('retry works', async () => {
  const mock = jest.fn()
    .mockRejectedValueOnce(new Error())
    .mockRejectedValueOnce(new Error())
    .mockResolvedValueOnce('success');
  await retryOperation(mock);
  expect(mock).toHaveBeenCalledTimes(3);
});

Vague name, tests mock not code </Bad>

Requirements:

Verify RED - Watch It Fail

MANDATORY. Never skip.

npm test path/to/test.test.ts

Confirm:

Test passes? You're testing existing behavior. Fix test.

Test errors? Fix error, re-run until it fails correctly.

GREEN - Minimal Code

Write simplest code to pass the test.

```typescript async function retryOperation(fn: () => Promise): Promise { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass ```typescript async function retryOperation( fn: () => Promise, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise { // YAGNI } ``` Over-engineered

Don't add features, refactor other code, or "improve" beyond the test.

Verify GREEN - Watch It Pass

MANDATORY.

npm test path/to/test.test.ts

Confirm:

Test fails? Fix code, not test.

Other tests fail? Fix now.

REFACTOR - Clean Up

After green only:

Keep tests green. Don't add behavior.

Repeat

Next failing test for next feature.

Good Tests

Quality Good Bad
Minimal One thing. "and" in name? Split it. test('validates email and domain and whitespace')
Clear Name describes behavior test('test1')
Shows intent Demonstrates desired API Obscures what code should do

Why Order Matters

"I'll write tests after to verify it works"

Tests written after code pass immediately. Passing immediately proves nothing:

Test-first forces you to see the test fail, proving it actually tests something.

"I already manually tested all the edge cases"

Manual testing is ad-hoc. You think you tested everything but:

Automated tests are systematic. They run the same way every time.

"Deleting X hours of work is wasteful"

Sunk cost fallacy. The time is already gone. Your choice now:

The "waste" is keeping code you can't trust. Working code without real tests is technical debt.

"TDD is dogmatic, being pragmatic means adapting"

TDD IS pragmatic:

"Pragmatic" shortcuts = debugging in production = slower.

"Tests after achieve the same goals - it's spirit not ritual"

No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"

Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.

Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).

30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.

Common Rationalizations

Excuse Reality
"Too simple to test" Simple code breaks. Test takes 30 seconds.
"I'll test after" Tests passing immediately prove nothing.
"Tests after achieve same goals" Tests-after = "what does this do?" Tests-first = "what should this do?"
"Already manually tested" Ad-hoc ≠ systematic. No record, can't re-run.
"Deleting X hours is wasteful" Sunk cost fallacy. Keeping unverified code is technical debt.
"Keep as reference, write tests first" You'll adapt it. That's testing after. Delete means delete.
"Need to explore first" Fine. Throw away exploration, start with TDD.
"Test hard = design unclear" Listen to test. Hard to test = hard to use.
"TDD will slow me down" TDD faster than debugging. Pragmatic = test-first.
"Manual test faster" Manual doesn't prove edge cases. You'll re-test every change.
"Existing code has no tests" You're improving it. Add tests for existing code.

Red Flags - STOP and Start Over

All of these mean: Delete code. Start over with TDD.

Example: Bug Fix

Bug: Empty email accepted

RED

test('rejects empty email', async () => {
  const result = await submitForm({ email: '' });
  expect(result.error).toBe('Email required');
});

Verify RED

$ npm test
FAIL: expected 'Email required', got undefined

GREEN

function submitForm(data: FormData) {
  if (!data.email?.trim()) {
    return { error: 'Email required' };
  }
  // ...
}

Verify GREEN

$ npm test
PASS

REFACTOR Extract validation for multiple fields if needed.

Verification Checklist

Before marking work complete:

Can't check all boxes? You skipped TDD. Start over.

When Stuck

Problem Solution
Don't know how to test Write wished-for API. Write assertion first. Ask your human partner.
Test too complicated Design too complicated. Simplify interface.
Must mock everything Code too coupled. Use dependency injection.
Test setup huge Extract helpers. Still complex? Simplify design.

Debugging Integration

Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.

Never fix bugs without a test.

Testing Anti-Patterns

When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:

Final Rule

Production code → test exists and failed first
Otherwise → not TDD

No exceptions without your human partner's permission.

/sp-using-git-worktrees

Source: ~/.claude/skills/sp-using-git-worktrees/SKILL.md


name: using-git-worktrees description: Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification

Using Git Worktrees

Overview

Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching.

Core principle: Systematic directory selection + safety verification = reliable isolation.

Announce at start: "I'm using the using-git-worktrees skill to set up an isolated workspace."

Directory Selection Process

Follow this priority order:

1. Check Existing Directories

# Check in priority order
ls -d .worktrees 2>/dev/null     # Preferred (hidden)
ls -d worktrees 2>/dev/null      # Alternative

If found: Use that directory. If both exist, .worktrees wins.

2. Check CLAUDE.md

grep -i "worktree.*director" CLAUDE.md 2>/dev/null

If preference specified: Use it without asking.

3. Ask User

If no directory exists and no CLAUDE.md preference:

No worktree directory found. Where should I create worktrees?

1. .worktrees/ (project-local, hidden)
2. ~/.config/superpowers/worktrees/<project-name>/ (global location)

Which would you prefer?

Safety Verification

For Project-Local Directories (.worktrees or worktrees)

MUST verify directory is ignored before creating worktree:

# Check if directory is ignored (respects local, global, and system gitignore)
git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/dev/null

If NOT ignored:

Per Jesse's rule "Fix broken things immediately":

  1. Add appropriate line to .gitignore
  2. Commit the change
  3. Proceed with worktree creation

Why critical: Prevents accidentally committing worktree contents to repository.

For Global Directory (~/.config/superpowers/worktrees)

No .gitignore verification needed - outside project entirely.

Creation Steps

1. Detect Project Name

project=$(basename "$(git rev-parse --show-toplevel)")

2. Create Worktree

# Determine full path
case $LOCATION in
  .worktrees|worktrees)
    path="$LOCATION/$BRANCH_NAME"
    ;;
  ~/.config/superpowers/worktrees/*)
    path="~/.config/superpowers/worktrees/$project/$BRANCH_NAME"
    ;;
esac

# Create worktree with new branch
git worktree add "$path" -b "$BRANCH_NAME"
cd "$path"

3. Run Project Setup

Auto-detect and run appropriate setup:

# Node.js
if [ -f package.json ]; then npm install; fi

# Rust
if [ -f Cargo.toml ]; then cargo build; fi

# Python
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f pyproject.toml ]; then poetry install; fi

# Go
if [ -f go.mod ]; then go mod download; fi

4. Verify Clean Baseline

Run tests to ensure worktree starts clean:

# Examples - use project-appropriate command
npm test
cargo test
pytest
go test ./...

If tests fail: Report failures, ask whether to proceed or investigate.

If tests pass: Report ready.

5. Report Location

Worktree ready at <full-path>
Tests passing (<N> tests, 0 failures)
Ready to implement <feature-name>

Quick Reference

Situation Action
.worktrees/ exists Use it (verify ignored)
worktrees/ exists Use it (verify ignored)
Both exist Use .worktrees/
Neither exists Check CLAUDE.md → Ask user
Directory not ignored Add to .gitignore + commit
Tests fail during baseline Report failures + ask
No package.json/Cargo.toml Skip dependency install

Common Mistakes

Skipping ignore verification

Assuming directory location

Proceeding with failing tests

Hardcoding setup commands

Example Workflow

You: I'm using the using-git-worktrees skill to set up an isolated workspace.

[Check .worktrees/ - exists]
[Verify ignored - git check-ignore confirms .worktrees/ is ignored]
[Create worktree: git worktree add .worktrees/auth -b feature/auth]
[Run npm install]
[Run npm test - 47 passing]

Worktree ready at /Users/jesse/myproject/.worktrees/auth
Tests passing (47 tests, 0 failures)
Ready to implement auth feature

Red Flags

Never:

Always:

Integration

Called by:

Pairs with:

/sp-using-superpowers

Source: ~/.claude/skills/sp-using-superpowers/SKILL.md


name: using-superpowers description: Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions

If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill.

IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.

This is not negotiable. This is not optional. You cannot rationalize your way out of this. </EXTREMELY-IMPORTANT>

How to Access Skills

In Claude Code: Use the Skill tool. When you invoke a skill, its content is loaded and presented to you—follow it directly. Never use the Read tool on skill files.

In other environments: Check your platform's documentation for how skills are loaded.

Using Skills

The Rule

Invoke relevant or requested skills BEFORE any response or action. Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it.

digraph skill_flow {
    "User message received" [shape=doublecircle];
    "Might any skill apply?" [shape=diamond];
    "Invoke Skill tool" [shape=box];
    "Announce: 'Using [skill] to [purpose]'" [shape=box];
    "Has checklist?" [shape=diamond];
    "Create TodoWrite todo per item" [shape=box];
    "Follow skill exactly" [shape=box];
    "Respond (including clarifications)" [shape=doublecircle];

    "User message received" -> "Might any skill apply?";
    "Might any skill apply?" -> "Invoke Skill tool" [label="yes, even 1%"];
    "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"];
    "Invoke Skill tool" -> "Announce: 'Using [skill] to [purpose]'";
    "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?";
    "Has checklist?" -> "Create TodoWrite todo per item" [label="yes"];
    "Has checklist?" -> "Follow skill exactly" [label="no"];
    "Create TodoWrite todo per item" -> "Follow skill exactly";
}

Red Flags

These thoughts mean STOP—you're rationalizing:

Thought Reality
"This is just a simple question" Questions are tasks. Check for skills.
"I need more context first" Skill check comes BEFORE clarifying questions.
"Let me explore the codebase first" Skills tell you HOW to explore. Check first.
"I can check git/files quickly" Files lack conversation context. Check for skills.
"Let me gather information first" Skills tell you HOW to gather information.
"This doesn't need a formal skill" If a skill exists, use it.
"I remember this skill" Skills evolve. Read current version.
"This doesn't count as a task" Action = task. Check for skills.
"The skill is overkill" Simple things become complex. Use it.
"I'll just do this one thing first" Check BEFORE doing anything.
"This feels productive" Undisciplined action wastes time. Skills prevent this.
"I know what that means" Knowing the concept ≠ using the skill. Invoke it.

Skill Priority

When multiple skills could apply, use this order:

  1. Process skills first (brainstorming, debugging) - these determine HOW to approach the task
  2. Implementation skills second (frontend-design, mcp-builder) - these guide execution

"Let's build X" → brainstorming first, then implementation skills. "Fix this bug" → debugging first, then domain-specific skills.

Skill Types

Rigid (TDD, debugging): Follow exactly. Don't adapt away discipline.

Flexible (patterns): Adapt principles to context.

The skill itself tells you which.

User Instructions

Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows.

/sp-verification-before-completion

Source: ~/.claude/skills/sp-verification-before-completion/SKILL.md


name: verification-before-completion description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

Verification Before Completion

Overview

Claiming work is complete without verification is dishonesty, not efficiency.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this message, you cannot claim it passes.

The Gate Function

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying

Common Failures

Claim Requires Not Sufficient
Tests pass Test command output: 0 failures Previous run, "should pass"
Linter clean Linter output: 0 errors Partial check, extrapolation
Build succeeds Build command: exit 0 Linter passing, logs look good
Bug fixed Test original symptom: passes Code changed, assumed fixed
Regression test works Red-green cycle verified Test passes once
Agent completed VCS diff shows changes Agent reports "success"
Requirements met Line-by-line checklist Tests passing

Red Flags - STOP

Rationalization Prevention

Excuse Reality
"Should work now" RUN the verification
"I'm confident" Confidence ≠ evidence
"Just this once" No exceptions
"Linter passed" Linter ≠ compiler
"Agent said success" Verify independently
"I'm tired" Exhaustion ≠ excuse
"Partial check is enough" Partial proves nothing
"Different words so rule doesn't apply" Spirit over letter

Key Patterns

Tests:

✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"

Regression tests (TDD Red-Green):

✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)

Build:

✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)

Requirements:

✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"

Agent delegation:

✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report

Why This Matters

From 24 failure memories:

When To Apply

ALWAYS before:

Rule applies to:

The Bottom Line

No shortcuts for verification.

Run the command. Read the output. THEN claim the result.

This is non-negotiable.

/sp-write-plan

Source: ~/.claude/skills/sp-write-plan/SKILL.md


description: Create detailed implementation plan with bite-sized tasks disable-model-invocation: true

Invoke the superpowers:writing-plans skill and follow it exactly as presented to you

/sp-writing-plans

Source: ~/.claude/skills/sp-writing-plans/SKILL.md


name: writing-plans description: Use when you have a spec or requirements for a multi-step task, before touching code

Writing Plans

Overview

Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits.

Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well.

Announce at start: "I'm using the writing-plans skill to create the implementation plan."

Context: This should be run in a dedicated worktree (created by brainstorming skill).

Save plans to: docs/plans/YYYY-MM-DD-<feature-name>.md

Bite-Sized Task Granularity

Each step is one action (2-5 minutes):

Plan Document Header

Every plan MUST start with this header:

# [Feature Name] Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** [One sentence describing what this builds]

**Architecture:** [2-3 sentences about approach]

**Tech Stack:** [Key technologies/libraries]

---

Task Structure

### Task N: [Component Name]

**Files:**
- Create: `exact/path/to/file.py`
- Modify: `exact/path/to/existing.py:123-145`
- Test: `tests/exact/path/to/test.py`

**Step 1: Write the failing test**

```python
def test_specific_behavior():
    result = function(input)
    assert result == expected

Step 2: Run test to verify it fails

Run: pytest tests/path/test.py::test_name -v Expected: FAIL with "function not defined"

Step 3: Write minimal implementation

def function(input):
    return expected

Step 4: Run test to verify it passes

Run: pytest tests/path/test.py::test_name -v Expected: PASS

Step 5: Commit

git add tests/path/test.py src/path/file.py
git commit -m "feat: add specific feature"

## Remember
- Exact file paths always
- Complete code in plan (not "add validation")
- Exact commands with expected output
- Reference relevant skills with @ syntax
- DRY, YAGNI, TDD, frequent commits

## Execution Handoff

After saving the plan, offer execution choice:

**"Plan complete and saved to `docs/plans/<filename>.md`. Two execution options:**

**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration

**2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints

**Which approach?"**

**If Subagent-Driven chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
- Stay in this session
- Fresh subagent per task + code review

**If Parallel Session chosen:**
- Guide them to open new session in worktree
- **REQUIRED SUB-SKILL:** New session uses superpowers:executing-plans

/sp-writing-skills

Source: ~/.claude/skills/sp-writing-skills/SKILL.md


name: writing-skills description: Use when creating new skills, editing existing skills, or verifying skills work before deployment

Writing Skills

Overview

Writing skills IS Test-Driven Development applied to process documentation.

Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex)

You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.

Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.

What is a Skill?

A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are NOT: Narratives about how you solved a problem once

TDD Mapping for Skills

TDD Concept Skill Creation
Test case Pressure scenario with subagent
Production code Skill document (SKILL.md)
Test fails (RED) Agent violates rule without skill (baseline)
Test passes (GREEN) Agent complies with skill present
Refactor Close loopholes while maintaining compliance
Write test first Run baseline scenario BEFORE writing skill
Watch it fail Document exact rationalizations agent uses
Minimal code Write skill addressing those specific violations
Watch it pass Verify agent now complies
Refactor cycle Find new rationalizations → plug → re-verify

The entire skill creation process follows RED-GREEN-REFACTOR.

When to Create a Skill

Create when:

Don't create for:

Skill Types

Technique

Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)

Pattern

Way of thinking about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Flat namespace - all skills in one searchable namespace

Separate files for:

  1. Heavy reference (100+ lines) - API docs, comprehensive syntax
  2. Reusable tools - Scripts, utilities, templates

Keep inline:

SKILL.md Structure

Frontmatter (YAML):

---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---

# Skill Name

## Overview
What is this? Core principle in 1-2 sentences.

## When to Use
[Small inline flowchart IF decision non-obvious]

Bullet list with SYMPTOMS and use cases
When NOT to use

## Core Pattern (for techniques/patterns)
Before/after code comparison

## Quick Reference
Table or bullets for scanning common operations

## Implementation
Inline code for simple patterns
Link to file for heavy reference or reusable tools

## Common Mistakes
What goes wrong + fixes

## Real-World Impact (optional)
Concrete results

Claude Search Optimization (CSO)

Critical for discovery: Future Claude needs to FIND your skill

1. Rich Description Field

Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"

Format: Start with "Use when..." to focus on triggering conditions

CRITICAL: Description = When to Use, NOT What the Skill Does

The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.

Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).

When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.

The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.

# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code

Content:

# ❌ BAD: Too abstract, vague, doesn't include when to use
description: For async testing

# ❌ BAD: First person
description: I can help you with async tests when they're flaky

# ❌ BAD: Mentions technology but skill isn't specific to it
description: Use when tests use setTimeout/sleep and are flaky

# ✅ GOOD: Starts with "Use when", describes problem, no workflow
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

# ✅ GOOD: Technology-specific skill with explicit trigger
description: Use when using React Router and handling authentication redirects

2. Keyword Coverage

Use words Claude would search for:

3. Descriptive Naming

Use active voice, verb-first:

4. Token Efficiency (Critical)

Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.

Target word counts:

Techniques:

Move details to tool help:

# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.

Use cross-references:

# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.

Compress examples:

# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]

Eliminate redundancy:

Verification:

wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total

Name by what you DO or core insight:

Gerunds (-ing) work well for processes:

4. Cross-Referencing Other Skills

When writing documentation that references other skills:

Use skill name only, with explicit requirement markers:

Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.

Flowchart Usage

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Decision where I might go wrong?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
}

Use flowcharts ONLY for:

Never use flowcharts for:

See @graphviz-conventions.dot for graphviz style rules.

Visualizing for your human partner: Use render-graphs.js in this directory to render a skill's flowcharts to SVG:

./render-graphs.js ../some-skill           # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVG

Code Examples

One excellent example beats many mediocre ones

Choose most relevant language:

Good example:

Don't:

You're good at porting - one great example is enough.

File Organization

Self-Contained Skill

defense-in-depth/
  SKILL.md    # Everything inline

When: All content fits, no heavy reference needed

Skill with Reusable Tool

condition-based-waiting/
  SKILL.md    # Overview + patterns
  example.ts  # Working helpers to adapt

When: Tool is reusable code, not just narrative

Skill with Heavy Reference

pptx/
  SKILL.md       # Overview + workflows
  pptxgenjs.md   # 600 lines API reference
  ooxml.md       # 500 lines XML structure
  scripts/       # Executable tools

When: Reference material too large for inline

The Iron Law (Same as TDD)

NO SKILL WITHOUT A FAILING TEST FIRST

This applies to NEW skills AND EDITS to existing skills.

Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.

No exceptions:

REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.

Testing All Skill Types

Different skill types need different test approaches:

Discipline-Enforcing Skills (rules/requirements)

Examples: TDD, verification-before-completion, designing-before-coding

Test with:

Success criteria: Agent follows rule under maximum pressure

Technique Skills (how-to guides)

Examples: condition-based-waiting, root-cause-tracing, defensive-programming

Test with:

Success criteria: Agent successfully applies technique to new scenario

Pattern Skills (mental models)

Examples: reducing-complexity, information-hiding concepts

Test with:

Success criteria: Agent correctly identifies when/how to apply pattern

Reference Skills (documentation/APIs)

Examples: API documentation, command references, library guides

Test with:

Success criteria: Agent finds and correctly applies reference information

Common Rationalizations for Skipping Testing

Excuse Reality
"Skill is obviously clear" Clear to you ≠ clear to other agents. Test it.
"It's just a reference" References can have gaps, unclear sections. Test retrieval.
"Testing is overkill" Untested skills have issues. Always. 15 min testing saves hours.
"I'll test if problems emerge" Problems = agents can't use skill. Test BEFORE deploying.
"Too tedious to test" Testing is less tedious than debugging bad skill in production.
"I'm confident it's good" Overconfidence guarantees issues. Test anyway.
"Academic review is enough" Reading ≠ using. Test application scenarios.
"No time to test" Deploying untested skill wastes more time fixing it later.

All of these mean: Test before deploying. No exceptions.

Bulletproofing Skills Against Rationalization

Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.

Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.

Close Every Loophole Explicitly

Don't just state the rule - forbid specific workarounds:

```markdown Write code before test? Delete it. ``` ```markdown Write code before test? Delete it. Start over.

No exceptions:

</Good>

### Address "Spirit vs Letter" Arguments

Add foundational principle early:

```markdown
**Violating the letter of the rules is violating the spirit of the rules.**

This cuts off entire class of "I'm following the spirit" rationalizations.

Build Rationalization Table

Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |

Create Red Flags List

Make it easy for agents to self-check when rationalizing:

## Red Flags - STOP and Start Over

- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."

**All of these mean: Delete code. Start over with TDD.**

Update CSO for Violation Symptoms

Add to description: symptoms of when you're ABOUT to violate the rule:

description: use when implementing any feature or bugfix, before writing implementation code

RED-GREEN-REFACTOR for Skills

Follow the TDD cycle:

RED: Write Failing Test (Baseline)

Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:

This is "watch the test fail" - you must see what agents naturally do before writing the skill.

GREEN: Write Minimal Skill

Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.

Run same scenarios WITH skill. Agent should now comply.

REFACTOR: Close Loopholes

Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:

Anti-Patterns

❌ Narrative Example

"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable

❌ Multi-Language Dilution

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

❌ Code in Flowcharts

step1 [label="import fs"];
step2 [label="read file"];

Why bad: Can't copy-paste, hard to read

❌ Generic Labels

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

STOP: Before Moving to Next Skill

After writing ANY skill, you MUST STOP and complete the deployment process.

Do NOT:

The deployment checklist below is MANDATORY for EACH skill.

Deploying untested skills = deploying untested code. It's a violation of quality standards.

Skill Creation Checklist (TDD Adapted)

IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.

RED Phase - Write Failing Test:

GREEN Phase - Write Minimal Skill:

REFACTOR Phase - Close Loopholes:

Quality Checks:

Deployment:

Discovery Workflow

How future Claude finds your skill:

  1. Encounters problem ("tests are flaky")
  2. Finds SKILL (description matches)
  3. Scans overview (is this relevant?)
  4. Reads patterns (quick reference table)
  5. Loads example (only when implementing)

Optimize for this flow - put searchable terms early and often.

The Bottom Line

Creating skills IS TDD for process documentation.

Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.

If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.