" --body "$(cat <<'EOF' ## Summary <2-3 bullets of what changed> ## Test Plan - [ ] <verification steps> EOF )" ``` Then: Cleanup worktree (Step 5) #### Option 3: Keep As-Is Report: "Keeping branch <name>. Worktree preserved at <path>." **Don't cleanup worktree.** #### Option 4: Discard **Confirm first:** ``` This will permanently delete: - Branch <name> - All commits: <commit-list> - Worktree at <path> Type 'discard' to confirm. ``` Wait for exact confirmation. If confirmed: ```bash git checkout <base-branch> git branch -D <feature-branch> ``` Then: Cleanup worktree (Step 5) ### Step 5: Cleanup Worktree **For Options 1, 2, 4:** Check if in worktree: ```bash git worktree list | grep $(git branch --show-current) ``` If yes: ```bash git worktree remove <worktree-path> ``` **For Option 3:** Keep worktree. ## Quick Reference | Option | Merge | Push | Keep Worktree | Cleanup Branch | |--------|-------|------|---------------|----------------| | 1. Merge locally | ✓ | - | - | ✓ | | 2. Create PR | - | ✓ | ✓ | - | | 3. Keep as-is | - | - | ✓ | - | | 4. Discard | - | - | - | ✓ (force) | ## Common Mistakes **Skipping test verification** - **Problem:** Merge broken code, create failing PR - **Fix:** Always verify tests before offering options **Open-ended questions** - **Problem:** "What should I do next?" → ambiguous - **Fix:** Present exactly 4 structured options **Automatic worktree cleanup** - **Problem:** Remove worktree when might need it (Option 2, 3) - **Fix:** Only cleanup for Options 1 and 4 **No confirmation for discard** - **Problem:** Accidentally delete work - **Fix:** Require typed "discard" confirmation ## Red Flags **Never:** - Proceed with failing tests - Merge without verifying tests on result - Delete work without confirmation - Force-push without explicit request **Always:** - Verify tests before offering options - Present exactly 4 options - Get typed confirmation for Option 4 - Clean up worktree for Options 1 & 4 only ## Integration **Called by:** - **subagent-driven-development** (Step 7) - After all tasks complete - **executing-plans** (Step 5) - After all batches complete **Pairs with:** - **using-git-worktrees** - Cleans up worktree created by that skill # /sp-receiving-code-review **Source:** `~/.claude/skills/sp-receiving-code-review/SKILL.md` --- --- name: receiving-code-review description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation --- # Code Review Reception ## Overview Code review requires technical evaluation, not emotional performance. **Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort. ## The Response Pattern ``` WHEN receiving code review feedback: 1. READ: Complete feedback without reacting 2. UNDERSTAND: Restate requirement in own words (or ask) 3. VERIFY: Check against codebase reality 4. EVALUATE: Technically sound for THIS codebase? 5. RESPOND: Technical acknowledgment or reasoned pushback 6. IMPLEMENT: One item at a time, test each ``` ## Forbidden Responses **NEVER:** - "You're absolutely right!" (explicit CLAUDE.md violation) - "Great point!" / "Excellent feedback!" (performative) - "Let me implement that now" (before verification) **INSTEAD:** - Restate the technical requirement - Ask clarifying questions - Push back with technical reasoning if wrong - Just start working (actions > words) ## Handling Unclear Feedback ``` IF any item is unclear: STOP - do not implement anything yet ASK for clarification on unclear items WHY: Items may be related. Partial understanding = wrong implementation. ``` **Example:** ``` your human partner: "Fix 1-6" You understand 1,2,3,6. Unclear on 4,5. ❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later ✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding." ``` ## Source-Specific Handling ### From your human partner - **Trusted** - implement after understanding - **Still ask** if scope unclear - **No performative agreement** - **Skip to action** or technical acknowledgment ### From External Reviewers ``` BEFORE implementing: 1. Check: Technically correct for THIS codebase? 2. Check: Breaks existing functionality? 3. Check: Reason for current implementation? 4. Check: Works on all platforms/versions? 5. Check: Does reviewer understand full context? IF suggestion seems wrong: Push back with technical reasoning IF can't easily verify: Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?" IF conflicts with your human partner's prior decisions: Stop and discuss with your human partner first ``` **your human partner's rule:** "External feedback - be skeptical, but check carefully" ## YAGNI Check for "Professional" Features ``` IF reviewer suggests "implementing properly": grep codebase for actual usage IF unused: "This endpoint isn't called. Remove it (YAGNI)?" IF used: Then implement properly ``` **your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it." ## Implementation Order ``` FOR multi-item feedback: 1. Clarify anything unclear FIRST 2. Then implement in this order: - Blocking issues (breaks, security) - Simple fixes (typos, imports) - Complex fixes (refactoring, logic) 3. Test each fix individually 4. Verify no regressions ``` ## When To Push Back Push back when: - Suggestion breaks existing functionality - Reviewer lacks full context - Violates YAGNI (unused feature) - Technically incorrect for this stack - Legacy/compatibility reasons exist - Conflicts with your human partner's architectural decisions **How to push back:** - Use technical reasoning, not defensiveness - Ask specific questions - Reference working tests/code - Involve your human partner if architectural **Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K" ## Acknowledging Correct Feedback When feedback IS correct: ``` ✅ "Fixed. [Brief description of what changed]" ✅ "Good catch - [specific issue]. Fixed in [location]." ✅ [Just fix it and show in the code] ❌ "You're absolutely right!" ❌ "Great point!" ❌ "Thanks for catching that!" ❌ "Thanks for [anything]" ❌ ANY gratitude expression ``` **Why no thanks:** Actions speak. Just fix it. The code itself shows you heard the feedback. **If you catch yourself about to write "Thanks":** DELETE IT. State the fix instead. ## Gracefully Correcting Your Pushback If you pushed back and were wrong: ``` ✅ "You were right - I checked [X] and it does [Y]. Implementing now." ✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing." ❌ Long apology ❌ Defending why you pushed back ❌ Over-explaining ``` State the correction factually and move on. ## Common Mistakes | Mistake | Fix | |---------|-----| | Performative agreement | State requirement or just act | | Blind implementation | Verify against codebase first | | Batch without testing | One at a time, test each | | Assuming reviewer is right | Check if breaks things | | Avoiding pushback | Technical correctness > comfort | | Partial implementation | Clarify all items first | | Can't verify, proceed anyway | State limitation, ask for direction | ## Real Examples **Performative Agreement (Bad):** ``` Reviewer: "Remove legacy code" ❌ "You're absolutely right! Let me remove that..." ``` **Technical Verification (Good):** ``` Reviewer: "Remove legacy code" ✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?" ``` **YAGNI (Good):** ``` Reviewer: "Implement proper metrics tracking with database, date filters, CSV export" ✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?" ``` **Unclear Item (Good):** ``` your human partner: "Fix items 1-6" You understand 1,2,3,6. Unclear on 4,5. ✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing." ``` ## GitHub Thread Replies When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment. ## The Bottom Line **External feedback = suggestions to evaluate, not orders to follow.** Verify. Question. Then implement. No performative agreement. Technical rigor always. # /sp-requesting-code-review **Source:** `~/.claude/skills/sp-requesting-code-review/SKILL.md` --- --- name: requesting-code-review description: Use when completing tasks, implementing major features, or before merging to verify work meets requirements --- # Requesting Code Review Dispatch superpowers:code-reviewer subagent to catch issues before they cascade. **Core principle:** Review early, review often. ## When to Request Review **Mandatory:** - After each task in subagent-driven development - After completing major feature - Before merge to main **Optional but valuable:** - When stuck (fresh perspective) - Before refactoring (baseline check) - After fixing complex bug ## How to Request **1. Get git SHAs:** ```bash BASE_SHA=$(git rev-parse HEAD~1) # or origin/main HEAD_SHA=$(git rev-parse HEAD) ``` **2. Dispatch code-reviewer subagent:** Use Task tool with superpowers:code-reviewer type, fill template at `code-reviewer.md` **Placeholders:** - `{WHAT_WAS_IMPLEMENTED}` - What you just built - `{PLAN_OR_REQUIREMENTS}` - What it should do - `{BASE_SHA}` - Starting commit - `{HEAD_SHA}` - Ending commit - `{DESCRIPTION}` - Brief summary **3. Act on feedback:** - Fix Critical issues immediately - Fix Important issues before proceeding - Note Minor issues for later - Push back if reviewer is wrong (with reasoning) ## Example ``` [Just completed Task 2: Add verification function] You: Let me request code review before proceeding. BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}') HEAD_SHA=$(git rev-parse HEAD) [Dispatch superpowers:code-reviewer subagent] WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md BASE_SHA: a7981ec HEAD_SHA: 3df7661 DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types [Subagent returns]: Strengths: Clean architecture, real tests Issues: Important: Missing progress indicators Minor: Magic number (100) for reporting interval Assessment: Ready to proceed You: [Fix progress indicators] [Continue to Task 3] ``` ## Integration with Workflows **Subagent-Driven Development:** - Review after EACH task - Catch issues before they compound - Fix before moving to next task **Executing Plans:** - Review after each batch (3 tasks) - Get feedback, apply, continue **Ad-Hoc Development:** - Review before merge - Review when stuck ## Red Flags **Never:** - Skip review because "it's simple" - Ignore Critical issues - Proceed with unfixed Important issues - Argue with valid technical feedback **If reviewer wrong:** - Push back with technical reasoning - Show code/tests that prove it works - Request clarification See template at: requesting-code-review/code-reviewer.md # /sp-subagent-driven-development **Source:** `~/.claude/skills/sp-subagent-driven-development/SKILL.md` --- --- name: subagent-driven-development description: Use when executing implementation plans with independent tasks in the current session --- # Subagent-Driven Development Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review. **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration ## When to Use ```dot digraph when_to_use { "Have implementation plan?" [shape=diamond]; "Tasks mostly independent?" [shape=diamond]; "Stay in this session?" [shape=diamond]; "subagent-driven-development" [shape=box]; "executing-plans" [shape=box]; "Manual execution or brainstorm first" [shape=box]; "Have implementation plan?" -> "Tasks mostly independent?" [label="yes"]; "Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"]; "Tasks mostly independent?" -> "Stay in this session?" [label="yes"]; "Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"]; "Stay in this session?" -> "subagent-driven-development" [label="yes"]; "Stay in this session?" -> "executing-plans" [label="no - parallel session"]; } ``` **vs. Executing Plans (parallel session):** - Same session (no context switch) - Fresh subagent per task (no context pollution) - Two-stage review after each task: spec compliance first, then code quality - Faster iteration (no human-in-loop between tasks) ## The Process ```dot digraph process { rankdir=TB; subgraph cluster_per_task { label="Per Task"; "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box]; "Implementer subagent asks questions?" [shape=diamond]; "Answer questions, provide context" [shape=box]; "Implementer subagent implements, tests, commits, self-reviews" [shape=box]; "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [shape=box]; "Spec reviewer subagent confirms code matches spec?" [shape=diamond]; "Implementer subagent fixes spec gaps" [shape=box]; "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [shape=box]; "Code quality reviewer subagent approves?" [shape=diamond]; "Implementer subagent fixes quality issues" [shape=box]; "Mark task complete in TodoWrite" [shape=box]; } "Read plan, extract all tasks with full text, note context, create TodoWrite" [shape=box]; "More tasks remain?" [shape=diamond]; "Dispatch final code reviewer subagent for entire implementation" [shape=box]; "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen]; "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)"; "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?"; "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"]; "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)"; "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"]; "Implementer subagent implements, tests, commits, self-reviews" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)"; "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" -> "Spec reviewer subagent confirms code matches spec?"; "Spec reviewer subagent confirms code matches spec?" -> "Implementer subagent fixes spec gaps" [label="no"]; "Implementer subagent fixes spec gaps" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [label="re-review"]; "Spec reviewer subagent confirms code matches spec?" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="yes"]; "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" -> "Code quality reviewer subagent approves?"; "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"]; "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="re-review"]; "Code quality reviewer subagent approves?" -> "Mark task complete in TodoWrite" [label="yes"]; "Mark task complete in TodoWrite" -> "More tasks remain?"; "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"]; "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"]; "Dispatch final code reviewer subagent for entire implementation" -> "Use superpowers:finishing-a-development-branch"; } ``` ## Prompt Templates - `./implementer-prompt.md` - Dispatch implementer subagent - `./spec-reviewer-prompt.md` - Dispatch spec compliance reviewer subagent - `./code-quality-reviewer-prompt.md` - Dispatch code quality reviewer subagent ## Example Workflow ``` You: I'm using Subagent-Driven Development to execute this plan. [Read plan file once: docs/plans/feature-plan.md] [Extract all 5 tasks with full text and context] [Create TodoWrite with all tasks] Task 1: Hook installation script [Get Task 1 text and context (already extracted)] [Dispatch implementation subagent with full task text + context] Implementer: "Before I begin - should the hook be installed at user or system level?" You: "User level (~/.config/superpowers/hooks/)" Implementer: "Got it. Implementing now..." [Later] Implementer: - Implemented install-hook command - Added tests, 5/5 passing - Self-review: Found I missed --force flag, added it - Committed [Dispatch spec compliance reviewer] Spec reviewer: ✅ Spec compliant - all requirements met, nothing extra [Get git SHAs, dispatch code quality reviewer] Code reviewer: Strengths: Good test coverage, clean. Issues: None. Approved. [Mark Task 1 complete] Task 2: Recovery modes [Get Task 2 text and context (already extracted)] [Dispatch implementation subagent with full task text + context] Implementer: [No questions, proceeds] Implementer: - Added verify/repair modes - 8/8 tests passing - Self-review: All good - Committed [Dispatch spec compliance reviewer] Spec reviewer: ❌ Issues: - Missing: Progress reporting (spec says "report every 100 items") - Extra: Added --json flag (not requested) [Implementer fixes issues] Implementer: Removed --json flag, added progress reporting [Spec reviewer reviews again] Spec reviewer: ✅ Spec compliant now [Dispatch code quality reviewer] Code reviewer: Strengths: Solid. Issues (Important): Magic number (100) [Implementer fixes] Implementer: Extracted PROGRESS_INTERVAL constant [Code reviewer reviews again] Code reviewer: ✅ Approved [Mark Task 2 complete] ... [After all tasks] [Dispatch final code-reviewer] Final reviewer: All requirements met, ready to merge Done! ``` ## Advantages **vs. Manual execution:** - Subagents follow TDD naturally - Fresh context per task (no confusion) - Parallel-safe (subagents don't interfere) - Subagent can ask questions (before AND during work) **vs. Executing Plans:** - Same session (no handoff) - Continuous progress (no waiting) - Review checkpoints automatic **Efficiency gains:** - No file reading overhead (controller provides full text) - Controller curates exactly what context is needed - Subagent gets complete information upfront - Questions surfaced before work begins (not after) **Quality gates:** - Self-review catches issues before handoff - Two-stage review: spec compliance, then code quality - Review loops ensure fixes actually work - Spec compliance prevents over/under-building - Code quality ensures implementation is well-built **Cost:** - More subagent invocations (implementer + 2 reviewers per task) - Controller does more prep work (extracting all tasks upfront) - Review loops add iterations - But catches issues early (cheaper than debugging later) ## Red Flags **Never:** - Start implementation on main/master branch without explicit user consent - Skip reviews (spec compliance OR code quality) - Proceed with unfixed issues - Dispatch multiple implementation subagents in parallel (conflicts) - Make subagent read plan file (provide full text instead) - Skip scene-setting context (subagent needs to understand where task fits) - Ignore subagent questions (answer before letting them proceed) - Accept "close enough" on spec compliance (spec reviewer found issues = not done) - Skip review loops (reviewer found issues = implementer fixes = review again) - Let implementer self-review replace actual review (both are needed) - **Start code quality review before spec compliance is ✅** (wrong order) - Move to next task while either review has open issues **If subagent asks questions:** - Answer clearly and completely - Provide additional context if needed - Don't rush them into implementation **If reviewer finds issues:** - Implementer (same subagent) fixes them - Reviewer reviews again - Repeat until approved - Don't skip the re-review **If subagent fails task:** - Dispatch fix subagent with specific instructions - Don't try to fix manually (context pollution) ## Integration **Required workflow skills:** - **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting - **superpowers:writing-plans** - Creates the plan this skill executes - **superpowers:requesting-code-review** - Code review template for reviewer subagents - **superpowers:finishing-a-development-branch** - Complete development after all tasks **Subagents should use:** - **superpowers:test-driven-development** - Subagents follow TDD for each task **Alternative workflow:** - **superpowers:executing-plans** - Use for parallel session instead of same-session execution # /sp-systematic-debugging **Source:** `~/.claude/skills/sp-systematic-debugging/SKILL.md` --- --- name: systematic-debugging description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes --- # Systematic Debugging ## Overview Random fixes waste time and create new bugs. Quick patches mask underlying issues. **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure. **Violating the letter of this process is violating the spirit of debugging.** ## The Iron Law ``` NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST ``` If you haven't completed Phase 1, you cannot propose fixes. ## When to Use Use for ANY technical issue: - Test failures - Bugs in production - Unexpected behavior - Performance problems - Build failures - Integration issues **Use this ESPECIALLY when:** - Under time pressure (emergencies make guessing tempting) - "Just one quick fix" seems obvious - You've already tried multiple fixes - Previous fix didn't work - You don't fully understand the issue **Don't skip when:** - Issue seems simple (simple bugs have root causes too) - You're in a hurry (rushing guarantees rework) - Manager wants it fixed NOW (systematic is faster than thrashing) ## The Four Phases You MUST complete each phase before proceeding to the next. ### Phase 1: Root Cause Investigation **BEFORE attempting ANY fix:** 1. **Read Error Messages Carefully** - Don't skip past errors or warnings - They often contain the exact solution - Read stack traces completely - Note line numbers, file paths, error codes 2. **Reproduce Consistently** - Can you trigger it reliably? - What are the exact steps? - Does it happen every time? - If not reproducible → gather more data, don't guess 3. **Check Recent Changes** - What changed that could cause this? - Git diff, recent commits - New dependencies, config changes - Environmental differences 4. **Gather Evidence in Multi-Component Systems** **WHEN system has multiple components (CI → build → signing, API → service → database):** **BEFORE proposing fixes, add diagnostic instrumentation:** ``` For EACH component boundary: - Log what data enters component - Log what data exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence showing WHERE it breaks THEN analyze evidence to identify failing component THEN investigate that specific component ``` **Example (multi-layer system):** ```bash # Layer 1: Workflow echo "=== Secrets available in workflow: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script echo "=== Env vars in build script: ===" env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing script echo "=== Keychain state: ===" security list-keychains security find-identity -v # Layer 4: Actual signing codesign --sign "$IDENTITY" --verbose=4 "$APP" ``` **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗) 5. **Trace Data Flow** **WHEN error is deep in call stack:** See `root-cause-tracing.md` in this directory for the complete backward tracing technique. **Quick version:** - Where does bad value originate? - What called this with bad value? - Keep tracing up until you find the source - Fix at source, not at symptom ### Phase 2: Pattern Analysis **Find the pattern before fixing:** 1. **Find Working Examples** - Locate similar working code in same codebase - What works that's similar to what's broken? 2. **Compare Against References** - If implementing pattern, read reference implementation COMPLETELY - Don't skim - read every line - Understand the pattern fully before applying 3. **Identify Differences** - What's different between working and broken? - List every difference, however small - Don't assume "that can't matter" 4. **Understand Dependencies** - What other components does this need? - What settings, config, environment? - What assumptions does it make? ### Phase 3: Hypothesis and Testing **Scientific method:** 1. **Form Single Hypothesis** - State clearly: "I think X is the root cause because Y" - Write it down - Be specific, not vague 2. **Test Minimally** - Make the SMALLEST possible change to test hypothesis - One variable at a time - Don't fix multiple things at once 3. **Verify Before Continuing** - Did it work? Yes → Phase 4 - Didn't work? Form NEW hypothesis - DON'T add more fixes on top 4. **When You Don't Know** - Say "I don't understand X" - Don't pretend to know - Ask for help - Research more ### Phase 4: Implementation **Fix the root cause, not the symptom:** 1. **Create Failing Test Case** - Simplest possible reproduction - Automated test if possible - One-off test script if no framework - MUST have before fixing - Use the `superpowers:test-driven-development` skill for writing proper failing tests 2. **Implement Single Fix** - Address the root cause identified - ONE change at a time - No "while I'm here" improvements - No bundled refactoring 3. **Verify Fix** - Test passes now? - No other tests broken? - Issue actually resolved? 4. **If Fix Doesn't Work** - STOP - Count: How many fixes have you tried? - If < 3: Return to Phase 1, re-analyze with new information - **If ≥ 3: STOP and question the architecture (step 5 below)** - DON'T attempt Fix #4 without architectural discussion 5. **If 3+ Fixes Failed: Question Architecture** **Pattern indicating architectural problem:** - Each fix reveals new shared state/coupling/problem in different place - Fixes require "massive refactoring" to implement - Each fix creates new symptoms elsewhere **STOP and question fundamentals:** - Is this pattern fundamentally sound? - Are we "sticking with it through sheer inertia"? - Should we refactor architecture vs. continue fixing symptoms? **Discuss with your human partner before attempting more fixes** This is NOT a failed hypothesis - this is a wrong architecture. ## Red Flags - STOP and Follow Process If you catch yourself thinking: - "Quick fix for now, investigate later" - "Just try changing X and see if it works" - "Add multiple changes, run tests" - "Skip the test, I'll manually verify" - "It's probably X, let me fix that" - "I don't fully understand but this might work" - "Pattern says X but I'll adapt it differently" - "Here are the main problems: [lists fixes without investigation]" - Proposing solutions before tracing data flow - **"One more fix attempt" (when already tried 2+)** - **Each fix reveals new problem in different place** **ALL of these mean: STOP. Return to Phase 1.** **If 3+ fixes failed:** Question the architecture (see Phase 4.5) ## your human partner's Signals You're Doing It Wrong **Watch for these redirections:** - "Is that not happening?" - You assumed without verifying - "Will it show us...?" - You should have added evidence gathering - "Stop guessing" - You're proposing fixes without understanding - "Ultrathink this" - Question fundamentals, not just symptoms - "We're stuck?" (frustrated) - Your approach isn't working **When you see these:** STOP. Return to Phase 1. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. | | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. | | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. | | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. | | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. | | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. | | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. | | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. | ## Quick Reference | Phase | Key Activities | Success Criteria | |-------|---------------|------------------| | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY | | **2. Pattern** | Find working examples, compare | Identify differences | | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis | | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass | ## When Process Reveals "No Root Cause" If systematic investigation reveals issue is truly environmental, timing-dependent, or external: 1. You've completed the process 2. Document what you investigated 3. Implement appropriate handling (retry, timeout, error message) 4. Add monitoring/logging for future investigation **But:** 95% of "no root cause" cases are incomplete investigation. ## Supporting Techniques These techniques are part of systematic debugging and available in this directory: - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling **Related skills:** - **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1) - **superpowers:verification-before-completion** - Verify fix worked before claiming success ## Real-World Impact From debugging sessions: - Systematic approach: 15-30 minutes to fix - Random fixes approach: 2-3 hours of thrashing - First-time fix rate: 95% vs 40% - New bugs introduced: Near zero vs common # /sp-test-driven-development **Source:** `~/.claude/skills/sp-test-driven-development/SKILL.md` --- --- name: test-driven-development description: Use when implementing any feature or bugfix, before writing implementation code --- # Test-Driven Development (TDD) ## Overview Write the test first. Watch it fail. Write minimal code to pass. **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing. **Violating the letter of the rules is violating the spirit of the rules.** ## When to Use **Always:** - New features - Bug fixes - Refactoring - Behavior changes **Exceptions (ask your human partner):** - Throwaway prototypes - Generated code - Configuration files Thinking "skip TDD just this once"? Stop. That's rationalization. ## The Iron Law ``` NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST ``` Write code before the test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete Implement fresh from tests. Period. ## Red-Green-Refactor ```dot digraph tdd_cycle { rankdir=LR; red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"]; verify_red [label="Verify fails\ncorrectly", shape=diamond]; green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"]; verify_green [label="Verify passes\nAll green", shape=diamond]; refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"]; next [label="Next", shape=ellipse]; red -> verify_red; verify_red -> green [label="yes"]; verify_red -> red [label="wrong\nfailure"]; green -> verify_green; verify_green -> refactor [label="yes"]; verify_green -> green [label="no"]; refactor -> verify_green [label="stay\ngreen"]; verify_green -> next; next -> red; } ``` ### RED - Write Failing Test Write one minimal test showing what should happen. <Good> ```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; }; const result = await retryOperation(operation); expect(result).toBe('success'); expect(attempts).toBe(3); }); ``` Clear name, tests real behavior, one thing </Good> <Bad> ```typescript test('retry works', async () => { const mock = jest.fn() .mockRejectedValueOnce(new Error()) .mockRejectedValueOnce(new Error()) .mockResolvedValueOnce('success'); await retryOperation(mock); expect(mock).toHaveBeenCalledTimes(3); }); ``` Vague name, tests mock not code </Bad> **Requirements:** - One behavior - Clear name - Real code (no mocks unless unavoidable) ### Verify RED - Watch It Fail **MANDATORY. Never skip.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test fails (not errors) - Failure message is expected - Fails because feature missing (not typos) **Test passes?** You're testing existing behavior. Fix test. **Test errors?** Fix error, re-run until it fails correctly. ### GREEN - Minimal Code Write simplest code to pass the test. <Good> ```typescript async function retryOperation<T>(fn: () => Promise<T>): Promise<T> { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass </Good> <Bad> ```typescript async function retryOperation<T>( fn: () => Promise<T>, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise<T> { // YAGNI } ``` Over-engineered </Bad> Don't add features, refactor other code, or "improve" beyond the test. ### Verify GREEN - Watch It Pass **MANDATORY.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test passes - Other tests still pass - Output pristine (no errors, warnings) **Test fails?** Fix code, not test. **Other tests fail?** Fix now. ### REFACTOR - Clean Up After green only: - Remove duplication - Improve names - Extract helpers Keep tests green. Don't add behavior. ### Repeat Next failing test for next feature. ## Good Tests | Quality | Good | Bad | |---------|------|-----| | **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` | | **Clear** | Name describes behavior | `test('test1')` | | **Shows intent** | Demonstrates desired API | Obscures what code should do | ## Why Order Matters **"I'll write tests after to verify it works"** Tests written after code pass immediately. Passing immediately proves nothing: - Might test wrong thing - Might test implementation, not behavior - Might miss edge cases you forgot - You never saw it catch the bug Test-first forces you to see the test fail, proving it actually tests something. **"I already manually tested all the edge cases"** Manual testing is ad-hoc. You think you tested everything but: - No record of what you tested - Can't re-run when code changes - Easy to forget cases under pressure - "It worked when I tried it" ≠ comprehensive Automated tests are systematic. They run the same way every time. **"Deleting X hours of work is wasteful"** Sunk cost fallacy. The time is already gone. Your choice now: - Delete and rewrite with TDD (X more hours, high confidence) - Keep it and add tests after (30 min, low confidence, likely bugs) The "waste" is keeping code you can't trust. Working code without real tests is technical debt. **"TDD is dogmatic, being pragmatic means adapting"** TDD IS pragmatic: - Finds bugs before commit (faster than debugging after) - Prevents regressions (tests catch breaks immediately) - Documents behavior (tests show how to use code) - Enables refactoring (change freely, tests catch breaks) "Pragmatic" shortcuts = debugging in production = slower. **"Tests after achieve the same goals - it's spirit not ritual"** No. Tests-after answer "What does this do?" Tests-first answer "What should this do?" Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones. Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't). 30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. | | "Existing code has no tests" | You're improving it. Add tests for existing code. | ## Red Flags - STOP and Start Over - Code before test - Test after implementation - Test passes immediately - Can't explain why test failed - Tests added "later" - Rationalizing "just this once" - "I already manually tested it" - "Tests after achieve the same purpose" - "It's about spirit not ritual" - "Keep as reference" or "adapt existing code" - "Already spent X hours, deleting is wasteful" - "TDD is dogmatic, I'm being pragmatic" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ## Example: Bug Fix **Bug:** Empty email accepted **RED** ```typescript test('rejects empty email', async () => { const result = await submitForm({ email: '' }); expect(result.error).toBe('Email required'); }); ``` **Verify RED** ```bash $ npm test FAIL: expected 'Email required', got undefined ``` **GREEN** ```typescript function submitForm(data: FormData) { if (!data.email?.trim()) { return { error: 'Email required' }; } // ... } ``` **Verify GREEN** ```bash $ npm test PASS ``` **REFACTOR** Extract validation for multiple fields if needed. ## Verification Checklist Before marking work complete: - [ ] Every new function/method has a test - [ ] Watched each test fail before implementing - [ ] Each test failed for expected reason (feature missing, not typo) - [ ] Wrote minimal code to pass each test - [ ] All tests pass - [ ] Output pristine (no errors, warnings) - [ ] Tests use real code (mocks only if unavoidable) - [ ] Edge cases and errors covered Can't check all boxes? You skipped TDD. Start over. ## When Stuck | Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex? Simplify design. | ## Debugging Integration Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression. Never fix bugs without a test. ## Testing Anti-Patterns When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls: - Testing mock behavior instead of real behavior - Adding test-only methods to production classes - Mocking without understanding dependencies ## Final Rule ``` Production code → test exists and failed first Otherwise → not TDD ``` No exceptions without your human partner's permission. # /sp-using-git-worktrees **Source:** `~/.claude/skills/sp-using-git-worktrees/SKILL.md` --- --- name: using-git-worktrees description: Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification --- # Using Git Worktrees ## Overview Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching. **Core principle:** Systematic directory selection + safety verification = reliable isolation. **Announce at start:** "I'm using the using-git-worktrees skill to set up an isolated workspace." ## Directory Selection Process Follow this priority order: ### 1. Check Existing Directories ```bash # Check in priority order ls -d .worktrees 2>/dev/null # Preferred (hidden) ls -d worktrees 2>/dev/null # Alternative ``` **If found:** Use that directory. If both exist, `.worktrees` wins. ### 2. Check CLAUDE.md ```bash grep -i "worktree.*director" CLAUDE.md 2>/dev/null ``` **If preference specified:** Use it without asking. ### 3. Ask User If no directory exists and no CLAUDE.md preference: ``` No worktree directory found. Where should I create worktrees? 1. .worktrees/ (project-local, hidden) 2. ~/.config/superpowers/worktrees/<project-name>/ (global location) Which would you prefer? ``` ## Safety Verification ### For Project-Local Directories (.worktrees or worktrees) **MUST verify directory is ignored before creating worktree:** ```bash # Check if directory is ignored (respects local, global, and system gitignore) git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/dev/null ``` **If NOT ignored:** Per Jesse's rule "Fix broken things immediately": 1. Add appropriate line to .gitignore 2. Commit the change 3. Proceed with worktree creation **Why critical:** Prevents accidentally committing worktree contents to repository. ### For Global Directory (~/.config/superpowers/worktrees) No .gitignore verification needed - outside project entirely. ## Creation Steps ### 1. Detect Project Name ```bash project=$(basename "$(git rev-parse --show-toplevel)") ``` ### 2. Create Worktree ```bash # Determine full path case $LOCATION in .worktrees|worktrees) path="$LOCATION/$BRANCH_NAME" ;; ~/.config/superpowers/worktrees/*) path="~/.config/superpowers/worktrees/$project/$BRANCH_NAME" ;; esac # Create worktree with new branch git worktree add "$path" -b "$BRANCH_NAME" cd "$path" ``` ### 3. Run Project Setup Auto-detect and run appropriate setup: ```bash # Node.js if [ -f package.json ]; then npm install; fi # Rust if [ -f Cargo.toml ]; then cargo build; fi # Python if [ -f requirements.txt ]; then pip install -r requirements.txt; fi if [ -f pyproject.toml ]; then poetry install; fi # Go if [ -f go.mod ]; then go mod download; fi ``` ### 4. Verify Clean Baseline Run tests to ensure worktree starts clean: ```bash # Examples - use project-appropriate command npm test cargo test pytest go test ./... ``` **If tests fail:** Report failures, ask whether to proceed or investigate. **If tests pass:** Report ready. ### 5. Report Location ``` Worktree ready at <full-path> Tests passing (<N> tests, 0 failures) Ready to implement <feature-name> ``` ## Quick Reference | Situation | Action | |-----------|--------| | `.worktrees/` exists | Use it (verify ignored) | | `worktrees/` exists | Use it (verify ignored) | | Both exist | Use `.worktrees/` | | Neither exists | Check CLAUDE.md → Ask user | | Directory not ignored | Add to .gitignore + commit | | Tests fail during baseline | Report failures + ask | | No package.json/Cargo.toml | Skip dependency install | ## Common Mistakes ### Skipping ignore verification - **Problem:** Worktree contents get tracked, pollute git status - **Fix:** Always use `git check-ignore` before creating project-local worktree ### Assuming directory location - **Problem:** Creates inconsistency, violates project conventions - **Fix:** Follow priority: existing > CLAUDE.md > ask ### Proceeding with failing tests - **Problem:** Can't distinguish new bugs from pre-existing issues - **Fix:** Report failures, get explicit permission to proceed ### Hardcoding setup commands - **Problem:** Breaks on projects using different tools - **Fix:** Auto-detect from project files (package.json, etc.) ## Example Workflow ``` You: I'm using the using-git-worktrees skill to set up an isolated workspace. [Check .worktrees/ - exists] [Verify ignored - git check-ignore confirms .worktrees/ is ignored] [Create worktree: git worktree add .worktrees/auth -b feature/auth] [Run npm install] [Run npm test - 47 passing] Worktree ready at /Users/jesse/myproject/.worktrees/auth Tests passing (47 tests, 0 failures) Ready to implement auth feature ``` ## Red Flags **Never:** - Create worktree without verifying it's ignored (project-local) - Skip baseline test verification - Proceed with failing tests without asking - Assume directory location when ambiguous - Skip CLAUDE.md check **Always:** - Follow directory priority: existing > CLAUDE.md > ask - Verify directory is ignored for project-local - Auto-detect and run project setup - Verify clean test baseline ## Integration **Called by:** - **brainstorming** (Phase 4) - REQUIRED when design is approved and implementation follows - **subagent-driven-development** - REQUIRED before executing any tasks - **executing-plans** - REQUIRED before executing any tasks - Any skill needing isolated workspace **Pairs with:** - **finishing-a-development-branch** - REQUIRED for cleanup after work complete # /sp-using-superpowers **Source:** `~/.claude/skills/sp-using-superpowers/SKILL.md` --- --- name: using-superpowers description: Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions --- <EXTREMELY-IMPORTANT> If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill. IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT. This is not negotiable. This is not optional. You cannot rationalize your way out of this. </EXTREMELY-IMPORTANT> ## How to Access Skills **In Claude Code:** Use the `Skill` tool. When you invoke a skill, its content is loaded and presented to you—follow it directly. Never use the Read tool on skill files. **In other environments:** Check your platform's documentation for how skills are loaded. # Using Skills ## The Rule **Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it. ```dot digraph skill_flow { "User message received" [shape=doublecircle]; "Might any skill apply?" [shape=diamond]; "Invoke Skill tool" [shape=box]; "Announce: 'Using [skill] to [purpose]'" [shape=box]; "Has checklist?" [shape=diamond]; "Create TodoWrite todo per item" [shape=box]; "Follow skill exactly" [shape=box]; "Respond (including clarifications)" [shape=doublecircle]; "User message received" -> "Might any skill apply?"; "Might any skill apply?" -> "Invoke Skill tool" [label="yes, even 1%"]; "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"]; "Invoke Skill tool" -> "Announce: 'Using [skill] to [purpose]'"; "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?"; "Has checklist?" -> "Create TodoWrite todo per item" [label="yes"]; "Has checklist?" -> "Follow skill exactly" [label="no"]; "Create TodoWrite todo per item" -> "Follow skill exactly"; } ``` ## Red Flags These thoughts mean STOP—you're rationalizing: | Thought | Reality | |---------|---------| | "This is just a simple question" | Questions are tasks. Check for skills. | | "I need more context first" | Skill check comes BEFORE clarifying questions. | | "Let me explore the codebase first" | Skills tell you HOW to explore. Check first. | | "I can check git/files quickly" | Files lack conversation context. Check for skills. | | "Let me gather information first" | Skills tell you HOW to gather information. | | "This doesn't need a formal skill" | If a skill exists, use it. | | "I remember this skill" | Skills evolve. Read current version. | | "This doesn't count as a task" | Action = task. Check for skills. | | "The skill is overkill" | Simple things become complex. Use it. | | "I'll just do this one thing first" | Check BEFORE doing anything. | | "This feels productive" | Undisciplined action wastes time. Skills prevent this. | | "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. | ## Skill Priority When multiple skills could apply, use this order: 1. **Process skills first** (brainstorming, debugging) - these determine HOW to approach the task 2. **Implementation skills second** (frontend-design, mcp-builder) - these guide execution "Let's build X" → brainstorming first, then implementation skills. "Fix this bug" → debugging first, then domain-specific skills. ## Skill Types **Rigid** (TDD, debugging): Follow exactly. Don't adapt away discipline. **Flexible** (patterns): Adapt principles to context. The skill itself tells you which. ## User Instructions Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows. # /sp-verification-before-completion **Source:** `~/.claude/skills/sp-verification-before-completion/SKILL.md` --- --- name: verification-before-completion description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always --- # Verification Before Completion ## Overview Claiming work is complete without verification is dishonesty, not efficiency. **Core principle:** Evidence before claims, always. **Violating the letter of this rule is violating the spirit of this rule.** ## The Iron Law ``` NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE ``` If you haven't run the verification command in this message, you cannot claim it passes. ## The Gate Function ``` BEFORE claiming any status or expressing satisfaction: 1. IDENTIFY: What command proves this claim? 2. RUN: Execute the FULL command (fresh, complete) 3. READ: Full output, check exit code, count failures 4. VERIFY: Does output confirm the claim? - If NO: State actual status with evidence - If YES: State claim WITH evidence 5. ONLY THEN: Make the claim Skip any step = lying, not verifying ``` ## Common Failures | Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | Regression test works | Red-green cycle verified | Test passes once | | Agent completed | VCS diff shows changes | Agent reports "success" | | Requirements met | Line-by-line checklist | Tests passing | ## Red Flags - STOP - Using "should", "probably", "seems to" - Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.) - About to commit/push/PR without verification - Trusting agent success reports - Relying on partial verification - Thinking "just this once" - Tired and wanting work over - **ANY wording implying success without having run verification** ## Rationalization Prevention | Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter | ## Key Patterns **Tests:** ``` ✅ [Run test command] [See: 34/34 pass] "All tests pass" ❌ "Should pass now" / "Looks correct" ``` **Regression tests (TDD Red-Green):** ``` ✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass) ❌ "I've written a regression test" (without red-green verification) ``` **Build:** ``` ✅ [Run build] [See: exit 0] "Build passes" ❌ "Linter passed" (linter doesn't check compilation) ``` **Requirements:** ``` ✅ Re-read plan → Create checklist → Verify each → Report gaps or completion ❌ "Tests pass, phase complete" ``` **Agent delegation:** ``` ✅ Agent reports success → Check VCS diff → Verify changes → Report actual state ❌ Trust agent report ``` ## Why This Matters From 24 failure memories: - your human partner said "I don't believe you" - trust broken - Undefined functions shipped - would crash - Missing requirements shipped - incomplete features - Time wasted on false completion → redirect → rework - Violates: "Honesty is a core value. If you lie, you'll be replaced." ## When To Apply **ALWAYS before:** - ANY variation of success/completion claims - ANY expression of satisfaction - ANY positive statement about work state - Committing, PR creation, task completion - Moving to next task - Delegating to agents **Rule applies to:** - Exact phrases - Paraphrases and synonyms - Implications of success - ANY communication suggesting completion/correctness ## The Bottom Line **No shortcuts for verification.** Run the command. Read the output. THEN claim the result. This is non-negotiable. # /sp-write-plan **Source:** `~/.claude/skills/sp-write-plan/SKILL.md` --- --- description: Create detailed implementation plan with bite-sized tasks disable-model-invocation: true --- Invoke the superpowers:writing-plans skill and follow it exactly as presented to you # /sp-writing-plans **Source:** `~/.claude/skills/sp-writing-plans/SKILL.md` --- --- name: writing-plans description: Use when you have a spec or requirements for a multi-step task, before touching code --- # Writing Plans ## Overview Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits. Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well. **Announce at start:** "I'm using the writing-plans skill to create the implementation plan." **Context:** This should be run in a dedicated worktree (created by brainstorming skill). **Save plans to:** `docs/plans/YYYY-MM-DD-<feature-name>.md` ## Bite-Sized Task Granularity **Each step is one action (2-5 minutes):** - "Write the failing test" - step - "Run it to make sure it fails" - step - "Implement the minimal code to make the test pass" - step - "Run the tests and make sure they pass" - step - "Commit" - step ## Plan Document Header **Every plan MUST start with this header:** ```markdown # [Feature Name] Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** [One sentence describing what this builds] **Architecture:** [2-3 sentences about approach] **Tech Stack:** [Key technologies/libraries] --- ``` ## Task Structure ```markdown ### Task N: [Component Name] **Files:** - Create: `exact/path/to/file.py` - Modify: `exact/path/to/existing.py:123-145` - Test: `tests/exact/path/to/test.py` **Step 1: Write the failing test** ```python def test_specific_behavior(): result = function(input) assert result == expected ``` **Step 2: Run test to verify it fails** Run: `pytest tests/path/test.py::test_name -v` Expected: FAIL with "function not defined" **Step 3: Write minimal implementation** ```python def function(input): return expected ``` **Step 4: Run test to verify it passes** Run: `pytest tests/path/test.py::test_name -v` Expected: PASS **Step 5: Commit** ```bash git add tests/path/test.py src/path/file.py git commit -m "feat: add specific feature" ``` ``` ## Remember - Exact file paths always - Complete code in plan (not "add validation") - Exact commands with expected output - Reference relevant skills with @ syntax - DRY, YAGNI, TDD, frequent commits ## Execution Handoff After saving the plan, offer execution choice: **"Plan complete and saved to `docs/plans/<filename>.md`. Two execution options:** **1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration **2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints **Which approach?"** **If Subagent-Driven chosen:** - **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development - Stay in this session - Fresh subagent per task + code review **If Parallel Session chosen:** - Guide them to open new session in worktree - **REQUIRED SUB-SKILL:** New session uses superpowers:executing-plans # /sp-writing-skills **Source:** `~/.claude/skills/sp-writing-skills/SKILL.md` --- --- name: writing-skills description: Use when creating new skills, editing existing skills, or verifying skills work before deployment --- # Writing Skills ## Overview **Writing skills IS Test-Driven Development applied to process documentation.** **Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)** You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes). **Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing. **REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation. **Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill. ## What is a Skill? A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches. **Skills are:** Reusable techniques, patterns, tools, reference guides **Skills are NOT:** Narratives about how you solved a problem once ## TDD Mapping for Skills | TDD Concept | Skill Creation | |-------------|----------------| | **Test case** | Pressure scenario with subagent | | **Production code** | Skill document (SKILL.md) | | **Test fails (RED)** | Agent violates rule without skill (baseline) | | **Test passes (GREEN)** | Agent complies with skill present | | **Refactor** | Close loopholes while maintaining compliance | | **Write test first** | Run baseline scenario BEFORE writing skill | | **Watch it fail** | Document exact rationalizations agent uses | | **Minimal code** | Write skill addressing those specific violations | | **Watch it pass** | Verify agent now complies | | **Refactor cycle** | Find new rationalizations → plug → re-verify | The entire skill creation process follows RED-GREEN-REFACTOR. ## When to Create a Skill **Create when:** - Technique wasn't intuitively obvious to you - You'd reference this again across projects - Pattern applies broadly (not project-specific) - Others would benefit **Don't create for:** - One-off solutions - Standard practices well-documented elsewhere - Project-specific conventions (put in CLAUDE.md) - Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls) ## Skill Types ### Technique Concrete method with steps to follow (condition-based-waiting, root-cause-tracing) ### Pattern Way of thinking about problems (flatten-with-flags, test-invariants) ### Reference API docs, syntax guides, tool documentation (office docs) ## Directory Structure ``` skills/ skill-name/ SKILL.md # Main reference (required) supporting-file.* # Only if needed ``` **Flat namespace** - all skills in one searchable namespace **Separate files for:** 1. **Heavy reference** (100+ lines) - API docs, comprehensive syntax 2. **Reusable tools** - Scripts, utilities, templates **Keep inline:** - Principles and concepts - Code patterns (< 50 lines) - Everything else ## SKILL.md Structure **Frontmatter (YAML):** - Only two fields supported: `name` and `description` - Max 1024 characters total - `name`: Use letters, numbers, and hyphens only (no parentheses, special chars) - `description`: Third-person, describes ONLY when to use (NOT what it does) - Start with "Use when..." to focus on triggering conditions - Include specific symptoms, situations, and contexts - **NEVER summarize the skill's process or workflow** (see CSO section for why) - Keep under 500 characters if possible ```markdown --- name: Skill-Name-With-Hyphens description: Use when [specific triggering conditions and symptoms] --- # Skill Name ## Overview What is this? Core principle in 1-2 sentences. ## When to Use [Small inline flowchart IF decision non-obvious] Bullet list with SYMPTOMS and use cases When NOT to use ## Core Pattern (for techniques/patterns) Before/after code comparison ## Quick Reference Table or bullets for scanning common operations ## Implementation Inline code for simple patterns Link to file for heavy reference or reusable tools ## Common Mistakes What goes wrong + fixes ## Real-World Impact (optional) Concrete results ``` ## Claude Search Optimization (CSO) **Critical for discovery:** Future Claude needs to FIND your skill ### 1. Rich Description Field **Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?" **Format:** Start with "Use when..." to focus on triggering conditions **CRITICAL: Description = When to Use, NOT What the Skill Does** The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description. **Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process. **The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips. ```yaml # ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill description: Use when executing plans - dispatches subagent per task with code review between tasks # ❌ BAD: Too much process detail description: Use for TDD - write test first, watch it fail, write minimal code, refactor # ✅ GOOD: Just triggering conditions, no workflow summary description: Use when executing implementation plans with independent tasks in the current session # ✅ GOOD: Triggering conditions only description: Use when implementing any feature or bugfix, before writing implementation code ``` **Content:** - Use concrete triggers, symptoms, and situations that signal this skill applies - Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep) - Keep triggers technology-agnostic unless the skill itself is technology-specific - If skill is technology-specific, make that explicit in the trigger - Write in third person (injected into system prompt) - **NEVER summarize the skill's process or workflow** ```yaml # ❌ BAD: Too abstract, vague, doesn't include when to use description: For async testing # ❌ BAD: First person description: I can help you with async tests when they're flaky # ❌ BAD: Mentions technology but skill isn't specific to it description: Use when tests use setTimeout/sleep and are flaky # ✅ GOOD: Starts with "Use when", describes problem, no workflow description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently # ✅ GOOD: Technology-specific skill with explicit trigger description: Use when using React Router and handling authentication redirects ``` ### 2. Keyword Coverage Use words Claude would search for: - Error messages: "Hook timed out", "ENOTEMPTY", "race condition" - Symptoms: "flaky", "hanging", "zombie", "pollution" - Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach" - Tools: Actual commands, library names, file types ### 3. Descriptive Naming **Use active voice, verb-first:** - ✅ `creating-skills` not `skill-creation` - ✅ `condition-based-waiting` not `async-test-helpers` ### 4. Token Efficiency (Critical) **Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts. **Target word counts:** - getting-started workflows: <150 words each - Frequently-loaded skills: <200 words total - Other skills: <500 words (still be concise) **Techniques:** **Move details to tool help:** ```bash # ❌ BAD: Document all flags in SKILL.md search-conversations supports --text, --both, --after DATE, --before DATE, --limit N # ✅ GOOD: Reference --help search-conversations supports multiple modes and filters. Run --help for details. ``` **Use cross-references:** ```markdown # ❌ BAD: Repeat workflow details When searching, dispatch subagent with template... [20 lines of repeated instructions] # ✅ GOOD: Reference other skill Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow. ``` **Compress examples:** ```markdown # ❌ BAD: Verbose example (42 words) your human partner: "How did we handle authentication errors in React Router before?" You: I'll search past conversations for React Router authentication patterns. [Dispatch subagent with search query: "React Router authentication error handling 401"] # ✅ GOOD: Minimal example (20 words) Partner: "How did we handle auth errors in React Router?" You: Searching... [Dispatch subagent → synthesis] ``` **Eliminate redundancy:** - Don't repeat what's in cross-referenced skills - Don't explain what's obvious from command - Don't include multiple examples of same pattern **Verification:** ```bash wc -w skills/path/SKILL.md # getting-started workflows: aim for <150 each # Other frequently-loaded: aim for <200 total ``` **Name by what you DO or core insight:** - ✅ `condition-based-waiting` > `async-test-helpers` - ✅ `using-skills` not `skill-usage` - ✅ `flatten-with-flags` > `data-structure-refactoring` - ✅ `root-cause-tracing` > `debugging-techniques` **Gerunds (-ing) work well for processes:** - `creating-skills`, `testing-skills`, `debugging-with-logs` - Active, describes the action you're taking ### 4. Cross-Referencing Other Skills **When writing documentation that references other skills:** Use skill name only, with explicit requirement markers: - ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development` - ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging` - ❌ Bad: `See skills/testing/test-driven-development` (unclear if required) - ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context) **Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them. ## Flowchart Usage ```dot digraph when_flowchart { "Need to show information?" [shape=diamond]; "Decision where I might go wrong?" [shape=diamond]; "Use markdown" [shape=box]; "Small inline flowchart" [shape=box]; "Need to show information?" -> "Decision where I might go wrong?" [label="yes"]; "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"]; "Decision where I might go wrong?" -> "Use markdown" [label="no"]; } ``` **Use flowcharts ONLY for:** - Non-obvious decision points - Process loops where you might stop too early - "When to use A vs B" decisions **Never use flowcharts for:** - Reference material → Tables, lists - Code examples → Markdown blocks - Linear instructions → Numbered lists - Labels without semantic meaning (step1, helper2) See @graphviz-conventions.dot for graphviz style rules. **Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG: ```bash ./render-graphs.js ../some-skill # Each diagram separately ./render-graphs.js ../some-skill --combine # All diagrams in one SVG ``` ## Code Examples **One excellent example beats many mediocre ones** Choose most relevant language: - Testing techniques → TypeScript/JavaScript - System debugging → Shell/Python - Data processing → Python **Good example:** - Complete and runnable - Well-commented explaining WHY - From real scenario - Shows pattern clearly - Ready to adapt (not generic template) **Don't:** - Implement in 5+ languages - Create fill-in-the-blank templates - Write contrived examples You're good at porting - one great example is enough. ## File Organization ### Self-Contained Skill ``` defense-in-depth/ SKILL.md # Everything inline ``` When: All content fits, no heavy reference needed ### Skill with Reusable Tool ``` condition-based-waiting/ SKILL.md # Overview + patterns example.ts # Working helpers to adapt ``` When: Tool is reusable code, not just narrative ### Skill with Heavy Reference ``` pptx/ SKILL.md # Overview + workflows pptxgenjs.md # 600 lines API reference ooxml.md # 500 lines XML structure scripts/ # Executable tools ``` When: Reference material too large for inline ## The Iron Law (Same as TDD) ``` NO SKILL WITHOUT A FAILING TEST FIRST ``` This applies to NEW skills AND EDITS to existing skills. Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation. **No exceptions:** - Not for "simple additions" - Not for "just adding a section" - Not for "documentation updates" - Don't keep untested changes as "reference" - Don't "adapt" while running tests - Delete means delete **REQUIRED BACKGROUND:** The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation. ## Testing All Skill Types Different skill types need different test approaches: ### Discipline-Enforcing Skills (rules/requirements) **Examples:** TDD, verification-before-completion, designing-before-coding **Test with:** - Academic questions: Do they understand the rules? - Pressure scenarios: Do they comply under stress? - Multiple pressures combined: time + sunk cost + exhaustion - Identify rationalizations and add explicit counters **Success criteria:** Agent follows rule under maximum pressure ### Technique Skills (how-to guides) **Examples:** condition-based-waiting, root-cause-tracing, defensive-programming **Test with:** - Application scenarios: Can they apply the technique correctly? - Variation scenarios: Do they handle edge cases? - Missing information tests: Do instructions have gaps? **Success criteria:** Agent successfully applies technique to new scenario ### Pattern Skills (mental models) **Examples:** reducing-complexity, information-hiding concepts **Test with:** - Recognition scenarios: Do they recognize when pattern applies? - Application scenarios: Can they use the mental model? - Counter-examples: Do they know when NOT to apply? **Success criteria:** Agent correctly identifies when/how to apply pattern ### Reference Skills (documentation/APIs) **Examples:** API documentation, command references, library guides **Test with:** - Retrieval scenarios: Can they find the right information? - Application scenarios: Can they use what they found correctly? - Gap testing: Are common use cases covered? **Success criteria:** Agent finds and correctly applies reference information ## Common Rationalizations for Skipping Testing | Excuse | Reality | |--------|---------| | "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. | | "It's just a reference" | References can have gaps, unclear sections. Test retrieval. | | "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. | | "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. | | "Too tedious to test" | Testing is less tedious than debugging bad skill in production. | | "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. | | "Academic review is enough" | Reading ≠ using. Test application scenarios. | | "No time to test" | Deploying untested skill wastes more time fixing it later. | **All of these mean: Test before deploying. No exceptions.** ## Bulletproofing Skills Against Rationalization Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure. **Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles. ### Close Every Loophole Explicitly Don't just state the rule - forbid specific workarounds: <Bad> ```markdown Write code before test? Delete it. ``` </Bad> <Good> ```markdown Write code before test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete ``` </Good> ### Address "Spirit vs Letter" Arguments Add foundational principle early: ```markdown **Violating the letter of the rules is violating the spirit of the rules.** ``` This cuts off entire class of "I'm following the spirit" rationalizations. ### Build Rationalization Table Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table: ```markdown | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | ``` ### Create Red Flags List Make it easy for agents to self-check when rationalizing: ```markdown ## Red Flags - STOP and Start Over - Code before test - "I already manually tested it" - "Tests after achieve the same purpose" - "It's about spirit not ritual" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ``` ### Update CSO for Violation Symptoms Add to description: symptoms of when you're ABOUT to violate the rule: ```yaml description: use when implementing any feature or bugfix, before writing implementation code ``` ## RED-GREEN-REFACTOR for Skills Follow the TDD cycle: ### RED: Write Failing Test (Baseline) Run pressure scenario with subagent WITHOUT the skill. Document exact behavior: - What choices did they make? - What rationalizations did they use (verbatim)? - Which pressures triggered violations? This is "watch the test fail" - you must see what agents naturally do before writing the skill. ### GREEN: Write Minimal Skill Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases. Run same scenarios WITH skill. Agent should now comply. ### REFACTOR: Close Loopholes Agent found new rationalization? Add explicit counter. Re-test until bulletproof. **Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology: - How to write pressure scenarios - Pressure types (time, sunk cost, authority, exhaustion) - Plugging holes systematically - Meta-testing techniques ## Anti-Patterns ### ❌ Narrative Example "In session 2025-10-03, we found empty projectDir caused..." **Why bad:** Too specific, not reusable ### ❌ Multi-Language Dilution example-js.js, example-py.py, example-go.go **Why bad:** Mediocre quality, maintenance burden ### ❌ Code in Flowcharts ```dot step1 [label="import fs"]; step2 [label="read file"]; ``` **Why bad:** Can't copy-paste, hard to read ### ❌ Generic Labels helper1, helper2, step3, pattern4 **Why bad:** Labels should have semantic meaning ## STOP: Before Moving to Next Skill **After writing ANY skill, you MUST STOP and complete the deployment process.** **Do NOT:** - Create multiple skills in batch without testing each - Move to next skill before current one is verified - Skip testing because "batching is more efficient" **The deployment checklist below is MANDATORY for EACH skill.** Deploying untested skills = deploying untested code. It's a violation of quality standards. ## Skill Creation Checklist (TDD Adapted) **IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.** **RED Phase - Write Failing Test:** - [ ] Create pressure scenarios (3+ combined pressures for discipline skills) - [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim - [ ] Identify patterns in rationalizations/failures **GREEN Phase - Write Minimal Skill:** - [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars) - [ ] YAML frontmatter with only name and description (max 1024 chars) - [ ] Description starts with "Use when..." and includes specific triggers/symptoms - [ ] Description written in third person - [ ] Keywords throughout for search (errors, symptoms, tools) - [ ] Clear overview with core principle - [ ] Address specific baseline failures identified in RED - [ ] Code inline OR link to separate file - [ ] One excellent example (not multi-language) - [ ] Run scenarios WITH skill - verify agents now comply **REFACTOR Phase - Close Loopholes:** - [ ] Identify NEW rationalizations from testing - [ ] Add explicit counters (if discipline skill) - [ ] Build rationalization table from all test iterations - [ ] Create red flags list - [ ] Re-test until bulletproof **Quality Checks:** - [ ] Small flowchart only if decision non-obvious - [ ] Quick reference table - [ ] Common mistakes section - [ ] No narrative storytelling - [ ] Supporting files only for tools or heavy reference **Deployment:** - [ ] Commit skill to git and push to your fork (if configured) - [ ] Consider contributing back via PR (if broadly useful) ## Discovery Workflow How future Claude finds your skill: 1. **Encounters problem** ("tests are flaky") 3. **Finds SKILL** (description matches) 4. **Scans overview** (is this relevant?) 5. **Reads patterns** (quick reference table) 6. **Loads example** (only when implementing) **Optimize for this flow** - put searchable terms early and often. ## The Bottom Line **Creating skills IS TDD for process documentation.** Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results. If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.

# Specialized Patterns (sp-*) # /sp-brainstorm **Source:** `~/.claude/skills/sp-brainstorm/SKILL.md` --- --- description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores requirements and design before implementation." disable-model-invocation: true --- Invoke the superpowers:brainstorming skill and follow it exactly as presented to you # /sp-brainstorming **Source:** `~/.claude/skills/sp-brainstorming/SKILL.md` --- --- name: brainstorming description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation." --- # Brainstorming Ideas Into Designs ## Overview Help turn ideas into fully formed designs and specs through natural collaborative dialogue. Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design in small sections (200-300 words), checking after each section whether it looks right so far. ## The Process **Understanding the idea:** - Check out the current project state first (files, docs, recent commits) - Ask questions one at a time to refine the idea - Prefer multiple choice questions when possible, but open-ended is fine too - Only one question per message - if a topic needs more exploration, break it into multiple questions - Focus on understanding: purpose, constraints, success criteria **Exploring approaches:** - Propose 2-3 different approaches with trade-offs - Present options conversationally with your recommendation and reasoning - Lead with your recommended option and explain why **Presenting the design:** - Once you believe you understand what you're building, present the design - Break it into sections of 200-300 words - Ask after each section whether it looks right so far - Cover: architecture, components, data flow, error handling, testing - Be ready to go back and clarify if something doesn't make sense ## After the Design **Documentation:** - Write the validated design to `docs/plans/YYYY-MM-DD--design.md` - Use elements-of-style:writing-clearly-and-concisely skill if available - Commit the design document to git **Implementation (if continuing):** - Ask: "Ready to set up for implementation?" - Use superpowers:using-git-worktrees to create isolated workspace - Use superpowers:writing-plans to create detailed implementation plan ## Key Principles - **One question at a time** - Don't overwhelm with multiple questions - **Multiple choice preferred** - Easier to answer than open-ended when possible - **YAGNI ruthlessly** - Remove unnecessary features from all designs - **Explore alternatives** - Always propose 2-3 approaches before settling - **Incremental validation** - Present design in sections, validate each - **Be flexible** - Go back and clarify when something doesn't make sense # /sp-dispatching-parallel-agents **Source:** `~/.claude/skills/sp-dispatching-parallel-agents/SKILL.md` --- --- name: dispatching-parallel-agents description: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies --- # Dispatching Parallel Agents ## Overview When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel. **Core principle:** Dispatch one agent per independent problem domain. Let them work concurrently. ## When to Use ```dot digraph when_to_use { "Multiple failures?" [shape=diamond]; "Are they independent?" [shape=diamond]; "Single agent investigates all" [shape=box]; "One agent per problem domain" [shape=box]; "Can they work in parallel?" [shape=diamond]; "Sequential agents" [shape=box]; "Parallel dispatch" [shape=box]; "Multiple failures?" -> "Are they independent?" [label="yes"]; "Are they independent?" -> "Single agent investigates all" [label="no - related"]; "Are they independent?" -> "Can they work in parallel?" [label="yes"]; "Can they work in parallel?" -> "Parallel dispatch" [label="yes"]; "Can they work in parallel?" -> "Sequential agents" [label="no - shared state"]; } ``` **Use when:** - 3+ test files failing with different root causes - Multiple subsystems broken independently - Each problem can be understood without context from others - No shared state between investigations **Don't use when:** - Failures are related (fix one might fix others) - Need to understand full system state - Agents would interfere with each other ## The Pattern ### 1. Identify Independent Domains Group failures by what's broken: - File A tests: Tool approval flow - File B tests: Batch completion behavior - File C tests: Abort functionality Each domain is independent - fixing tool approval doesn't affect abort tests. ### 2. Create Focused Agent Tasks Each agent gets: - **Specific scope:** One test file or subsystem - **Clear goal:** Make these tests pass - **Constraints:** Don't change other code - **Expected output:** Summary of what you found and fixed ### 3. Dispatch in Parallel ```typescript // In Claude Code / AI environment Task("Fix agent-tool-abort.test.ts failures") Task("Fix batch-completion-behavior.test.ts failures") Task("Fix tool-approval-race-conditions.test.ts failures") // All three run concurrently ``` ### 4. Review and Integrate When agents return: - Read each summary - Verify fixes don't conflict - Run full test suite - Integrate all changes ## Agent Prompt Structure Good agent prompts are: 1. **Focused** - One clear problem domain 2. **Self-contained** - All context needed to understand the problem 3. **Specific about output** - What should the agent return? ```markdown Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts: 1. "should abort tool with partial output capture" - expects 'interrupted at' in message 2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed 3. "should properly track pendingToolCount" - expects 3 results but gets 0 These are timing/race condition issues. Your task: 1. Read the test file and understand what each test verifies 2. Identify root cause - timing issues or actual bugs? 3. Fix by: - Replacing arbitrary timeouts with event-based waiting - Fixing bugs in abort implementation if found - Adjusting test expectations if testing changed behavior Do NOT just increase timeouts - find the real issue. Return: Summary of what you found and what you fixed. ``` ## Common Mistakes **❌ Too broad:** "Fix all the tests" - agent gets lost **✅ Specific:** "Fix agent-tool-abort.test.ts" - focused scope **❌ No context:** "Fix the race condition" - agent doesn't know where **✅ Context:** Paste the error messages and test names **❌ No constraints:** Agent might refactor everything **✅ Constraints:** "Do NOT change production code" or "Fix tests only" **❌ Vague output:** "Fix it" - you don't know what changed **✅ Specific:** "Return summary of root cause and changes" ## When NOT to Use **Related failures:** Fixing one might fix others - investigate together first **Need full context:** Understanding requires seeing entire system **Exploratory debugging:** You don't know what's broken yet **Shared state:** Agents would interfere (editing same files, using same resources) ## Real Example from Session **Scenario:** 6 test failures across 3 files after major refactoring **Failures:** - agent-tool-abort.test.ts: 3 failures (timing issues) - batch-completion-behavior.test.ts: 2 failures (tools not executing) - tool-approval-race-conditions.test.ts: 1 failure (execution count = 0) **Decision:** Independent domains - abort logic separate from batch completion separate from race conditions **Dispatch:** ``` Agent 1 → Fix agent-tool-abort.test.ts Agent 2 → Fix batch-completion-behavior.test.ts Agent 3 → Fix tool-approval-race-conditions.test.ts ``` **Results:** - Agent 1: Replaced timeouts with event-based waiting - Agent 2: Fixed event structure bug (threadId in wrong place) - Agent 3: Added wait for async tool execution to complete **Integration:** All fixes independent, no conflicts, full suite green **Time saved:** 3 problems solved in parallel vs sequentially ## Key Benefits 1. **Parallelization** - Multiple investigations happen simultaneously 2. **Focus** - Each agent has narrow scope, less context to track 3. **Independence** - Agents don't interfere with each other 4. **Speed** - 3 problems solved in time of 1 ## Verification After agents return: 1. **Review each summary** - Understand what changed 2. **Check for conflicts** - Did agents edit same code? 3. **Run full suite** - Verify all fixes work together 4. **Spot check** - Agents can make systematic errors ## Real-World Impact From debugging session (2025-10-03): - 6 failures across 3 files - 3 agents dispatched in parallel - All investigations completed concurrently - All fixes integrated successfully - Zero conflicts between agent changes # /sp-execute-plan **Source:** `~/.claude/skills/sp-execute-plan/SKILL.md` --- --- description: Execute plan in batches with review checkpoints disable-model-invocation: true --- Invoke the superpowers:executing-plans skill and follow it exactly as presented to you # /sp-executing-plans **Source:** `~/.claude/skills/sp-executing-plans/SKILL.md` --- --- name: executing-plans description: Use when you have a written implementation plan to execute in a separate session with review checkpoints --- # Executing Plans ## Overview Load plan, review critically, execute tasks in batches, report for review between batches. **Core principle:** Batch execution with checkpoints for architect review. **Announce at start:** "I'm using the executing-plans skill to implement this plan." ## The Process ### Step 1: Load and Review Plan 1. Read plan file 2. Review critically - identify any questions or concerns about the plan 3. If concerns: Raise them with your human partner before starting 4. If no concerns: Create TodoWrite and proceed ### Step 2: Execute Batch **Default: First 3 tasks** For each task: 1. Mark as in_progress 2. Follow each step exactly (plan has bite-sized steps) 3. Run verifications as specified 4. Mark as completed ### Step 3: Report When batch complete: - Show what was implemented - Show verification output - Say: "Ready for feedback." ### Step 4: Continue Based on feedback: - Apply changes if needed - Execute next batch - Repeat until complete ### Step 5: Complete Development After all tasks complete and verified: - Announce: "I'm using the finishing-a-development-branch skill to complete this work." - **REQUIRED SUB-SKILL:** Use superpowers:finishing-a-development-branch - Follow that skill to verify tests, present options, execute choice ## When to Stop and Ask for Help **STOP executing immediately when:** - Hit a blocker mid-batch (missing dependency, test fails, instruction unclear) - Plan has critical gaps preventing starting - You don't understand an instruction - Verification fails repeatedly **Ask for clarification rather than guessing.** ## When to Revisit Earlier Steps **Return to Review (Step 1) when:** - Partner updates the plan based on your feedback - Fundamental approach needs rethinking **Don't force through blockers** - stop and ask. ## Remember - Review plan critically first - Follow plan steps exactly - Don't skip verifications - Reference skills when plan says to - Between batches: just report and wait - Stop when blocked, don't guess - Never start implementation on main/master branch without explicit user consent ## Integration **Required workflow skills:** - **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting - **superpowers:writing-plans** - Creates the plan this skill executes - **superpowers:finishing-a-development-branch** - Complete development after all tasks # /sp-finishing-a-development-branch **Source:** `~/.claude/skills/sp-finishing-a-development-branch/SKILL.md` --- --- name: finishing-a-development-branch description: Use when implementation is complete, all tests pass, and you need to decide how to integrate the work - guides completion of development work by presenting structured options for merge, PR, or cleanup --- # Finishing a Development Branch ## Overview Guide completion of development work by presenting clear options and handling chosen workflow. **Core principle:** Verify tests → Present options → Execute choice → Clean up. **Announce at start:** "I'm using the finishing-a-development-branch skill to complete this work." ## The Process ### Step 1: Verify Tests **Before presenting options, verify tests pass:** ```bash # Run project's test suite npm test / cargo test / pytest / go test ./... ``` **If tests fail:** ``` Tests failing ( failures). Must fix before completing: [Show failures] Cannot proceed with merge/PR until tests pass. ``` Stop. Don't proceed to Step 2. **If tests pass:** Continue to Step 2. ### Step 2: Determine Base Branch ```bash # Try common base branches git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null ``` Or ask: "This branch split from main - is that correct?" ### Step 3: Present Options Present exactly these 4 options: ``` Implementation complete. What would you like to do? 1. Merge back to locally 2. Push and create a Pull Request 3. Keep the branch as-is (I'll handle it later) 4. Discard this work Which option? ``` **Don't add explanation** - keep options concise. ### Step 4: Execute Choice #### Option 1: Merge Locally ```bash # Switch to base branch git checkout # Pull latest git pull # Merge feature branch git merge # Verify tests on merged result # If tests pass git branch -d ``` Then: Cleanup worktree (Step 5) #### Option 2: Push and Create PR ```bash # Push branch git push -u origin # Create PR gh pr create --title "" --body "$(cat <<'EOF' ## Summary <2-3 bullets of what changed> ## Test Plan - [ ] <verification steps> EOF )" ``` Then: Cleanup worktree (Step 5) #### Option 3: Keep As-Is Report: "Keeping branch <name>. Worktree preserved at <path>." **Don't cleanup worktree.** #### Option 4: Discard **Confirm first:** ``` This will permanently delete: - Branch <name> - All commits: <commit-list> - Worktree at <path> Type 'discard' to confirm. ``` Wait for exact confirmation. If confirmed: ```bash git checkout <base-branch> git branch -D <feature-branch> ``` Then: Cleanup worktree (Step 5) ### Step 5: Cleanup Worktree **For Options 1, 2, 4:** Check if in worktree: ```bash git worktree list | grep $(git branch --show-current) ``` If yes: ```bash git worktree remove <worktree-path> ``` **For Option 3:** Keep worktree. ## Quick Reference | Option | Merge | Push | Keep Worktree | Cleanup Branch | |--------|-------|------|---------------|----------------| | 1. Merge locally | ✓ | - | - | ✓ | | 2. Create PR | - | ✓ | ✓ | - | | 3. Keep as-is | - | - | ✓ | - | | 4. Discard | - | - | - | ✓ (force) | ## Common Mistakes **Skipping test verification** - **Problem:** Merge broken code, create failing PR - **Fix:** Always verify tests before offering options **Open-ended questions** - **Problem:** "What should I do next?" → ambiguous - **Fix:** Present exactly 4 structured options **Automatic worktree cleanup** - **Problem:** Remove worktree when might need it (Option 2, 3) - **Fix:** Only cleanup for Options 1 and 4 **No confirmation for discard** - **Problem:** Accidentally delete work - **Fix:** Require typed "discard" confirmation ## Red Flags **Never:** - Proceed with failing tests - Merge without verifying tests on result - Delete work without confirmation - Force-push without explicit request **Always:** - Verify tests before offering options - Present exactly 4 options - Get typed confirmation for Option 4 - Clean up worktree for Options 1 & 4 only ## Integration **Called by:** - **subagent-driven-development** (Step 7) - After all tasks complete - **executing-plans** (Step 5) - After all batches complete **Pairs with:** - **using-git-worktrees** - Cleans up worktree created by that skill # /sp-receiving-code-review **Source:** `~/.claude/skills/sp-receiving-code-review/SKILL.md` --- --- name: receiving-code-review description: Use when receiving code review feedback, before implementing suggestions, especially if feedback seems unclear or technically questionable - requires technical rigor and verification, not performative agreement or blind implementation --- # Code Review Reception ## Overview Code review requires technical evaluation, not emotional performance. **Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort. ## The Response Pattern ``` WHEN receiving code review feedback: 1. READ: Complete feedback without reacting 2. UNDERSTAND: Restate requirement in own words (or ask) 3. VERIFY: Check against codebase reality 4. EVALUATE: Technically sound for THIS codebase? 5. RESPOND: Technical acknowledgment or reasoned pushback 6. IMPLEMENT: One item at a time, test each ``` ## Forbidden Responses **NEVER:** - "You're absolutely right!" (explicit CLAUDE.md violation) - "Great point!" / "Excellent feedback!" (performative) - "Let me implement that now" (before verification) **INSTEAD:** - Restate the technical requirement - Ask clarifying questions - Push back with technical reasoning if wrong - Just start working (actions > words) ## Handling Unclear Feedback ``` IF any item is unclear: STOP - do not implement anything yet ASK for clarification on unclear items WHY: Items may be related. Partial understanding = wrong implementation. ``` **Example:** ``` your human partner: "Fix 1-6" You understand 1,2,3,6. Unclear on 4,5. ❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later ✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding." ``` ## Source-Specific Handling ### From your human partner - **Trusted** - implement after understanding - **Still ask** if scope unclear - **No performative agreement** - **Skip to action** or technical acknowledgment ### From External Reviewers ``` BEFORE implementing: 1. Check: Technically correct for THIS codebase? 2. Check: Breaks existing functionality? 3. Check: Reason for current implementation? 4. Check: Works on all platforms/versions? 5. Check: Does reviewer understand full context? IF suggestion seems wrong: Push back with technical reasoning IF can't easily verify: Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?" IF conflicts with your human partner's prior decisions: Stop and discuss with your human partner first ``` **your human partner's rule:** "External feedback - be skeptical, but check carefully" ## YAGNI Check for "Professional" Features ``` IF reviewer suggests "implementing properly": grep codebase for actual usage IF unused: "This endpoint isn't called. Remove it (YAGNI)?" IF used: Then implement properly ``` **your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it." ## Implementation Order ``` FOR multi-item feedback: 1. Clarify anything unclear FIRST 2. Then implement in this order: - Blocking issues (breaks, security) - Simple fixes (typos, imports) - Complex fixes (refactoring, logic) 3. Test each fix individually 4. Verify no regressions ``` ## When To Push Back Push back when: - Suggestion breaks existing functionality - Reviewer lacks full context - Violates YAGNI (unused feature) - Technically incorrect for this stack - Legacy/compatibility reasons exist - Conflicts with your human partner's architectural decisions **How to push back:** - Use technical reasoning, not defensiveness - Ask specific questions - Reference working tests/code - Involve your human partner if architectural **Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K" ## Acknowledging Correct Feedback When feedback IS correct: ``` ✅ "Fixed. [Brief description of what changed]" ✅ "Good catch - [specific issue]. Fixed in [location]." ✅ [Just fix it and show in the code] ❌ "You're absolutely right!" ❌ "Great point!" ❌ "Thanks for catching that!" ❌ "Thanks for [anything]" ❌ ANY gratitude expression ``` **Why no thanks:** Actions speak. Just fix it. The code itself shows you heard the feedback. **If you catch yourself about to write "Thanks":** DELETE IT. State the fix instead. ## Gracefully Correcting Your Pushback If you pushed back and were wrong: ``` ✅ "You were right - I checked [X] and it does [Y]. Implementing now." ✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing." ❌ Long apology ❌ Defending why you pushed back ❌ Over-explaining ``` State the correction factually and move on. ## Common Mistakes | Mistake | Fix | |---------|-----| | Performative agreement | State requirement or just act | | Blind implementation | Verify against codebase first | | Batch without testing | One at a time, test each | | Assuming reviewer is right | Check if breaks things | | Avoiding pushback | Technical correctness > comfort | | Partial implementation | Clarify all items first | | Can't verify, proceed anyway | State limitation, ask for direction | ## Real Examples **Performative Agreement (Bad):** ``` Reviewer: "Remove legacy code" ❌ "You're absolutely right! Let me remove that..." ``` **Technical Verification (Good):** ``` Reviewer: "Remove legacy code" ✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?" ``` **YAGNI (Good):** ``` Reviewer: "Implement proper metrics tracking with database, date filters, CSV export" ✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?" ``` **Unclear Item (Good):** ``` your human partner: "Fix items 1-6" You understand 1,2,3,6. Unclear on 4,5. ✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing." ``` ## GitHub Thread Replies When replying to inline review comments on GitHub, reply in the comment thread (`gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies`), not as a top-level PR comment. ## The Bottom Line **External feedback = suggestions to evaluate, not orders to follow.** Verify. Question. Then implement. No performative agreement. Technical rigor always. # /sp-requesting-code-review **Source:** `~/.claude/skills/sp-requesting-code-review/SKILL.md` --- --- name: requesting-code-review description: Use when completing tasks, implementing major features, or before merging to verify work meets requirements --- # Requesting Code Review Dispatch superpowers:code-reviewer subagent to catch issues before they cascade. **Core principle:** Review early, review often. ## When to Request Review **Mandatory:** - After each task in subagent-driven development - After completing major feature - Before merge to main **Optional but valuable:** - When stuck (fresh perspective) - Before refactoring (baseline check) - After fixing complex bug ## How to Request **1. Get git SHAs:** ```bash BASE_SHA=$(git rev-parse HEAD~1) # or origin/main HEAD_SHA=$(git rev-parse HEAD) ``` **2. Dispatch code-reviewer subagent:** Use Task tool with superpowers:code-reviewer type, fill template at `code-reviewer.md` **Placeholders:** - `{WHAT_WAS_IMPLEMENTED}` - What you just built - `{PLAN_OR_REQUIREMENTS}` - What it should do - `{BASE_SHA}` - Starting commit - `{HEAD_SHA}` - Ending commit - `{DESCRIPTION}` - Brief summary **3. Act on feedback:** - Fix Critical issues immediately - Fix Important issues before proceeding - Note Minor issues for later - Push back if reviewer is wrong (with reasoning) ## Example ``` [Just completed Task 2: Add verification function] You: Let me request code review before proceeding. BASE_SHA=$(git log --oneline | grep "Task 1" | head -1 | awk '{print $1}') HEAD_SHA=$(git rev-parse HEAD) [Dispatch superpowers:code-reviewer subagent] WHAT_WAS_IMPLEMENTED: Verification and repair functions for conversation index PLAN_OR_REQUIREMENTS: Task 2 from docs/plans/deployment-plan.md BASE_SHA: a7981ec HEAD_SHA: 3df7661 DESCRIPTION: Added verifyIndex() and repairIndex() with 4 issue types [Subagent returns]: Strengths: Clean architecture, real tests Issues: Important: Missing progress indicators Minor: Magic number (100) for reporting interval Assessment: Ready to proceed You: [Fix progress indicators] [Continue to Task 3] ``` ## Integration with Workflows **Subagent-Driven Development:** - Review after EACH task - Catch issues before they compound - Fix before moving to next task **Executing Plans:** - Review after each batch (3 tasks) - Get feedback, apply, continue **Ad-Hoc Development:** - Review before merge - Review when stuck ## Red Flags **Never:** - Skip review because "it's simple" - Ignore Critical issues - Proceed with unfixed Important issues - Argue with valid technical feedback **If reviewer wrong:** - Push back with technical reasoning - Show code/tests that prove it works - Request clarification See template at: requesting-code-review/code-reviewer.md # /sp-subagent-driven-development **Source:** `~/.claude/skills/sp-subagent-driven-development/SKILL.md` --- --- name: subagent-driven-development description: Use when executing implementation plans with independent tasks in the current session --- # Subagent-Driven Development Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review. **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration ## When to Use ```dot digraph when_to_use { "Have implementation plan?" [shape=diamond]; "Tasks mostly independent?" [shape=diamond]; "Stay in this session?" [shape=diamond]; "subagent-driven-development" [shape=box]; "executing-plans" [shape=box]; "Manual execution or brainstorm first" [shape=box]; "Have implementation plan?" -> "Tasks mostly independent?" [label="yes"]; "Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"]; "Tasks mostly independent?" -> "Stay in this session?" [label="yes"]; "Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"]; "Stay in this session?" -> "subagent-driven-development" [label="yes"]; "Stay in this session?" -> "executing-plans" [label="no - parallel session"]; } ``` **vs. Executing Plans (parallel session):** - Same session (no context switch) - Fresh subagent per task (no context pollution) - Two-stage review after each task: spec compliance first, then code quality - Faster iteration (no human-in-loop between tasks) ## The Process ```dot digraph process { rankdir=TB; subgraph cluster_per_task { label="Per Task"; "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box]; "Implementer subagent asks questions?" [shape=diamond]; "Answer questions, provide context" [shape=box]; "Implementer subagent implements, tests, commits, self-reviews" [shape=box]; "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [shape=box]; "Spec reviewer subagent confirms code matches spec?" [shape=diamond]; "Implementer subagent fixes spec gaps" [shape=box]; "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [shape=box]; "Code quality reviewer subagent approves?" [shape=diamond]; "Implementer subagent fixes quality issues" [shape=box]; "Mark task complete in TodoWrite" [shape=box]; } "Read plan, extract all tasks with full text, note context, create TodoWrite" [shape=box]; "More tasks remain?" [shape=diamond]; "Dispatch final code reviewer subagent for entire implementation" [shape=box]; "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen]; "Read plan, extract all tasks with full text, note context, create TodoWrite" -> "Dispatch implementer subagent (./implementer-prompt.md)"; "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?"; "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"]; "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)"; "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"]; "Implementer subagent implements, tests, commits, self-reviews" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)"; "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" -> "Spec reviewer subagent confirms code matches spec?"; "Spec reviewer subagent confirms code matches spec?" -> "Implementer subagent fixes spec gaps" [label="no"]; "Implementer subagent fixes spec gaps" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [label="re-review"]; "Spec reviewer subagent confirms code matches spec?" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="yes"]; "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" -> "Code quality reviewer subagent approves?"; "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"]; "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="re-review"]; "Code quality reviewer subagent approves?" -> "Mark task complete in TodoWrite" [label="yes"]; "Mark task complete in TodoWrite" -> "More tasks remain?"; "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"]; "More tasks remain?" -> "Dispatch final code reviewer subagent for entire implementation" [label="no"]; "Dispatch final code reviewer subagent for entire implementation" -> "Use superpowers:finishing-a-development-branch"; } ``` ## Prompt Templates - `./implementer-prompt.md` - Dispatch implementer subagent - `./spec-reviewer-prompt.md` - Dispatch spec compliance reviewer subagent - `./code-quality-reviewer-prompt.md` - Dispatch code quality reviewer subagent ## Example Workflow ``` You: I'm using Subagent-Driven Development to execute this plan. [Read plan file once: docs/plans/feature-plan.md] [Extract all 5 tasks with full text and context] [Create TodoWrite with all tasks] Task 1: Hook installation script [Get Task 1 text and context (already extracted)] [Dispatch implementation subagent with full task text + context] Implementer: "Before I begin - should the hook be installed at user or system level?" You: "User level (~/.config/superpowers/hooks/)" Implementer: "Got it. Implementing now..." [Later] Implementer: - Implemented install-hook command - Added tests, 5/5 passing - Self-review: Found I missed --force flag, added it - Committed [Dispatch spec compliance reviewer] Spec reviewer: ✅ Spec compliant - all requirements met, nothing extra [Get git SHAs, dispatch code quality reviewer] Code reviewer: Strengths: Good test coverage, clean. Issues: None. Approved. [Mark Task 1 complete] Task 2: Recovery modes [Get Task 2 text and context (already extracted)] [Dispatch implementation subagent with full task text + context] Implementer: [No questions, proceeds] Implementer: - Added verify/repair modes - 8/8 tests passing - Self-review: All good - Committed [Dispatch spec compliance reviewer] Spec reviewer: ❌ Issues: - Missing: Progress reporting (spec says "report every 100 items") - Extra: Added --json flag (not requested) [Implementer fixes issues] Implementer: Removed --json flag, added progress reporting [Spec reviewer reviews again] Spec reviewer: ✅ Spec compliant now [Dispatch code quality reviewer] Code reviewer: Strengths: Solid. Issues (Important): Magic number (100) [Implementer fixes] Implementer: Extracted PROGRESS_INTERVAL constant [Code reviewer reviews again] Code reviewer: ✅ Approved [Mark Task 2 complete] ... [After all tasks] [Dispatch final code-reviewer] Final reviewer: All requirements met, ready to merge Done! ``` ## Advantages **vs. Manual execution:** - Subagents follow TDD naturally - Fresh context per task (no confusion) - Parallel-safe (subagents don't interfere) - Subagent can ask questions (before AND during work) **vs. Executing Plans:** - Same session (no handoff) - Continuous progress (no waiting) - Review checkpoints automatic **Efficiency gains:** - No file reading overhead (controller provides full text) - Controller curates exactly what context is needed - Subagent gets complete information upfront - Questions surfaced before work begins (not after) **Quality gates:** - Self-review catches issues before handoff - Two-stage review: spec compliance, then code quality - Review loops ensure fixes actually work - Spec compliance prevents over/under-building - Code quality ensures implementation is well-built **Cost:** - More subagent invocations (implementer + 2 reviewers per task) - Controller does more prep work (extracting all tasks upfront) - Review loops add iterations - But catches issues early (cheaper than debugging later) ## Red Flags **Never:** - Start implementation on main/master branch without explicit user consent - Skip reviews (spec compliance OR code quality) - Proceed with unfixed issues - Dispatch multiple implementation subagents in parallel (conflicts) - Make subagent read plan file (provide full text instead) - Skip scene-setting context (subagent needs to understand where task fits) - Ignore subagent questions (answer before letting them proceed) - Accept "close enough" on spec compliance (spec reviewer found issues = not done) - Skip review loops (reviewer found issues = implementer fixes = review again) - Let implementer self-review replace actual review (both are needed) - **Start code quality review before spec compliance is ✅** (wrong order) - Move to next task while either review has open issues **If subagent asks questions:** - Answer clearly and completely - Provide additional context if needed - Don't rush them into implementation **If reviewer finds issues:** - Implementer (same subagent) fixes them - Reviewer reviews again - Repeat until approved - Don't skip the re-review **If subagent fails task:** - Dispatch fix subagent with specific instructions - Don't try to fix manually (context pollution) ## Integration **Required workflow skills:** - **superpowers:using-git-worktrees** - REQUIRED: Set up isolated workspace before starting - **superpowers:writing-plans** - Creates the plan this skill executes - **superpowers:requesting-code-review** - Code review template for reviewer subagents - **superpowers:finishing-a-development-branch** - Complete development after all tasks **Subagents should use:** - **superpowers:test-driven-development** - Subagents follow TDD for each task **Alternative workflow:** - **superpowers:executing-plans** - Use for parallel session instead of same-session execution # /sp-systematic-debugging **Source:** `~/.claude/skills/sp-systematic-debugging/SKILL.md` --- --- name: systematic-debugging description: Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes --- # Systematic Debugging ## Overview Random fixes waste time and create new bugs. Quick patches mask underlying issues. **Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure. **Violating the letter of this process is violating the spirit of debugging.** ## The Iron Law ``` NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST ``` If you haven't completed Phase 1, you cannot propose fixes. ## When to Use Use for ANY technical issue: - Test failures - Bugs in production - Unexpected behavior - Performance problems - Build failures - Integration issues **Use this ESPECIALLY when:** - Under time pressure (emergencies make guessing tempting) - "Just one quick fix" seems obvious - You've already tried multiple fixes - Previous fix didn't work - You don't fully understand the issue **Don't skip when:** - Issue seems simple (simple bugs have root causes too) - You're in a hurry (rushing guarantees rework) - Manager wants it fixed NOW (systematic is faster than thrashing) ## The Four Phases You MUST complete each phase before proceeding to the next. ### Phase 1: Root Cause Investigation **BEFORE attempting ANY fix:** 1. **Read Error Messages Carefully** - Don't skip past errors or warnings - They often contain the exact solution - Read stack traces completely - Note line numbers, file paths, error codes 2. **Reproduce Consistently** - Can you trigger it reliably? - What are the exact steps? - Does it happen every time? - If not reproducible → gather more data, don't guess 3. **Check Recent Changes** - What changed that could cause this? - Git diff, recent commits - New dependencies, config changes - Environmental differences 4. **Gather Evidence in Multi-Component Systems** **WHEN system has multiple components (CI → build → signing, API → service → database):** **BEFORE proposing fixes, add diagnostic instrumentation:** ``` For EACH component boundary: - Log what data enters component - Log what data exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence showing WHERE it breaks THEN analyze evidence to identify failing component THEN investigate that specific component ``` **Example (multi-layer system):** ```bash # Layer 1: Workflow echo "=== Secrets available in workflow: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script echo "=== Env vars in build script: ===" env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing script echo "=== Keychain state: ===" security list-keychains security find-identity -v # Layer 4: Actual signing codesign --sign "$IDENTITY" --verbose=4 "$APP" ``` **This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗) 5. **Trace Data Flow** **WHEN error is deep in call stack:** See `root-cause-tracing.md` in this directory for the complete backward tracing technique. **Quick version:** - Where does bad value originate? - What called this with bad value? - Keep tracing up until you find the source - Fix at source, not at symptom ### Phase 2: Pattern Analysis **Find the pattern before fixing:** 1. **Find Working Examples** - Locate similar working code in same codebase - What works that's similar to what's broken? 2. **Compare Against References** - If implementing pattern, read reference implementation COMPLETELY - Don't skim - read every line - Understand the pattern fully before applying 3. **Identify Differences** - What's different between working and broken? - List every difference, however small - Don't assume "that can't matter" 4. **Understand Dependencies** - What other components does this need? - What settings, config, environment? - What assumptions does it make? ### Phase 3: Hypothesis and Testing **Scientific method:** 1. **Form Single Hypothesis** - State clearly: "I think X is the root cause because Y" - Write it down - Be specific, not vague 2. **Test Minimally** - Make the SMALLEST possible change to test hypothesis - One variable at a time - Don't fix multiple things at once 3. **Verify Before Continuing** - Did it work? Yes → Phase 4 - Didn't work? Form NEW hypothesis - DON'T add more fixes on top 4. **When You Don't Know** - Say "I don't understand X" - Don't pretend to know - Ask for help - Research more ### Phase 4: Implementation **Fix the root cause, not the symptom:** 1. **Create Failing Test Case** - Simplest possible reproduction - Automated test if possible - One-off test script if no framework - MUST have before fixing - Use the `superpowers:test-driven-development` skill for writing proper failing tests 2. **Implement Single Fix** - Address the root cause identified - ONE change at a time - No "while I'm here" improvements - No bundled refactoring 3. **Verify Fix** - Test passes now? - No other tests broken? - Issue actually resolved? 4. **If Fix Doesn't Work** - STOP - Count: How many fixes have you tried? - If < 3: Return to Phase 1, re-analyze with new information - **If ≥ 3: STOP and question the architecture (step 5 below)** - DON'T attempt Fix #4 without architectural discussion 5. **If 3+ Fixes Failed: Question Architecture** **Pattern indicating architectural problem:** - Each fix reveals new shared state/coupling/problem in different place - Fixes require "massive refactoring" to implement - Each fix creates new symptoms elsewhere **STOP and question fundamentals:** - Is this pattern fundamentally sound? - Are we "sticking with it through sheer inertia"? - Should we refactor architecture vs. continue fixing symptoms? **Discuss with your human partner before attempting more fixes** This is NOT a failed hypothesis - this is a wrong architecture. ## Red Flags - STOP and Follow Process If you catch yourself thinking: - "Quick fix for now, investigate later" - "Just try changing X and see if it works" - "Add multiple changes, run tests" - "Skip the test, I'll manually verify" - "It's probably X, let me fix that" - "I don't fully understand but this might work" - "Pattern says X but I'll adapt it differently" - "Here are the main problems: [lists fixes without investigation]" - Proposing solutions before tracing data flow - **"One more fix attempt" (when already tried 2+)** - **Each fix reveals new problem in different place** **ALL of these mean: STOP. Return to Phase 1.** **If 3+ fixes failed:** Question the architecture (see Phase 4.5) ## your human partner's Signals You're Doing It Wrong **Watch for these redirections:** - "Is that not happening?" - You assumed without verifying - "Will it show us...?" - You should have added evidence gathering - "Stop guessing" - You're proposing fixes without understanding - "Ultrathink this" - Question fundamentals, not just symptoms - "We're stuck?" (frustrated) - Your approach isn't working **When you see these:** STOP. Return to Phase 1. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. | | "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. | | "Just try this first, then investigate" | First fix sets the pattern. Do it right from the start. | | "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves it. | | "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. | | "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read it completely. | | "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. | | "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Question pattern, don't fix again. | ## Quick Reference | Phase | Key Activities | Success Criteria | |-------|---------------|------------------| | **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY | | **2. Pattern** | Find working examples, compare | Identify differences | | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis | | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass | ## When Process Reveals "No Root Cause" If systematic investigation reveals issue is truly environmental, timing-dependent, or external: 1. You've completed the process 2. Document what you investigated 3. Implement appropriate handling (retry, timeout, error message) 4. Add monitoring/logging for future investigation **But:** 95% of "no root cause" cases are incomplete investigation. ## Supporting Techniques These techniques are part of systematic debugging and available in this directory: - **`root-cause-tracing.md`** - Trace bugs backward through call stack to find original trigger - **`defense-in-depth.md`** - Add validation at multiple layers after finding root cause - **`condition-based-waiting.md`** - Replace arbitrary timeouts with condition polling **Related skills:** - **superpowers:test-driven-development** - For creating failing test case (Phase 4, Step 1) - **superpowers:verification-before-completion** - Verify fix worked before claiming success ## Real-World Impact From debugging sessions: - Systematic approach: 15-30 minutes to fix - Random fixes approach: 2-3 hours of thrashing - First-time fix rate: 95% vs 40% - New bugs introduced: Near zero vs common # /sp-test-driven-development **Source:** `~/.claude/skills/sp-test-driven-development/SKILL.md` --- --- name: test-driven-development description: Use when implementing any feature or bugfix, before writing implementation code --- # Test-Driven Development (TDD) ## Overview Write the test first. Watch it fail. Write minimal code to pass. **Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing. **Violating the letter of the rules is violating the spirit of the rules.** ## When to Use **Always:** - New features - Bug fixes - Refactoring - Behavior changes **Exceptions (ask your human partner):** - Throwaway prototypes - Generated code - Configuration files Thinking "skip TDD just this once"? Stop. That's rationalization. ## The Iron Law ``` NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST ``` Write code before the test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete Implement fresh from tests. Period. ## Red-Green-Refactor ```dot digraph tdd_cycle { rankdir=LR; red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"]; verify_red [label="Verify fails\ncorrectly", shape=diamond]; green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"]; verify_green [label="Verify passes\nAll green", shape=diamond]; refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"]; next [label="Next", shape=ellipse]; red -> verify_red; verify_red -> green [label="yes"]; verify_red -> red [label="wrong\nfailure"]; green -> verify_green; verify_green -> refactor [label="yes"]; verify_green -> green [label="no"]; refactor -> verify_green [label="stay\ngreen"]; verify_green -> next; next -> red; } ``` ### RED - Write Failing Test Write one minimal test showing what should happen. <Good> ```typescript test('retries failed operations 3 times', async () => { let attempts = 0; const operation = () => { attempts++; if (attempts < 3) throw new Error('fail'); return 'success'; }; const result = await retryOperation(operation); expect(result).toBe('success'); expect(attempts).toBe(3); }); ``` Clear name, tests real behavior, one thing </Good> <Bad> ```typescript test('retry works', async () => { const mock = jest.fn() .mockRejectedValueOnce(new Error()) .mockRejectedValueOnce(new Error()) .mockResolvedValueOnce('success'); await retryOperation(mock); expect(mock).toHaveBeenCalledTimes(3); }); ``` Vague name, tests mock not code </Bad> **Requirements:** - One behavior - Clear name - Real code (no mocks unless unavoidable) ### Verify RED - Watch It Fail **MANDATORY. Never skip.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test fails (not errors) - Failure message is expected - Fails because feature missing (not typos) **Test passes?** You're testing existing behavior. Fix test. **Test errors?** Fix error, re-run until it fails correctly. ### GREEN - Minimal Code Write simplest code to pass the test. <Good> ```typescript async function retryOperation<T>(fn: () => Promise<T>): Promise<T> { for (let i = 0; i < 3; i++) { try { return await fn(); } catch (e) { if (i === 2) throw e; } } throw new Error('unreachable'); } ``` Just enough to pass </Good> <Bad> ```typescript async function retryOperation<T>( fn: () => Promise<T>, options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; onRetry?: (attempt: number) => void; } ): Promise<T> { // YAGNI } ``` Over-engineered </Bad> Don't add features, refactor other code, or "improve" beyond the test. ### Verify GREEN - Watch It Pass **MANDATORY.** ```bash npm test path/to/test.test.ts ``` Confirm: - Test passes - Other tests still pass - Output pristine (no errors, warnings) **Test fails?** Fix code, not test. **Other tests fail?** Fix now. ### REFACTOR - Clean Up After green only: - Remove duplication - Improve names - Extract helpers Keep tests green. Don't add behavior. ### Repeat Next failing test for next feature. ## Good Tests | Quality | Good | Bad | |---------|------|-----| | **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` | | **Clear** | Name describes behavior | `test('test1')` | | **Shows intent** | Demonstrates desired API | Obscures what code should do | ## Why Order Matters **"I'll write tests after to verify it works"** Tests written after code pass immediately. Passing immediately proves nothing: - Might test wrong thing - Might test implementation, not behavior - Might miss edge cases you forgot - You never saw it catch the bug Test-first forces you to see the test fail, proving it actually tests something. **"I already manually tested all the edge cases"** Manual testing is ad-hoc. You think you tested everything but: - No record of what you tested - Can't re-run when code changes - Easy to forget cases under pressure - "It worked when I tried it" ≠ comprehensive Automated tests are systematic. They run the same way every time. **"Deleting X hours of work is wasteful"** Sunk cost fallacy. The time is already gone. Your choice now: - Delete and rewrite with TDD (X more hours, high confidence) - Keep it and add tests after (30 min, low confidence, likely bugs) The "waste" is keeping code you can't trust. Working code without real tests is technical debt. **"TDD is dogmatic, being pragmatic means adapting"** TDD IS pragmatic: - Finds bugs before commit (faster than debugging after) - Prevents regressions (tests catch breaks immediately) - Documents behavior (tests show how to use code) - Enables refactoring (change freely, tests catch breaks) "Pragmatic" shortcuts = debugging in production = slower. **"Tests after achieve the same goals - it's spirit not ritual"** No. Tests-after answer "What does this do?" Tests-first answer "What should this do?" Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones. Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't). 30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work. ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. | | "Existing code has no tests" | You're improving it. Add tests for existing code. | ## Red Flags - STOP and Start Over - Code before test - Test after implementation - Test passes immediately - Can't explain why test failed - Tests added "later" - Rationalizing "just this once" - "I already manually tested it" - "Tests after achieve the same purpose" - "It's about spirit not ritual" - "Keep as reference" or "adapt existing code" - "Already spent X hours, deleting is wasteful" - "TDD is dogmatic, I'm being pragmatic" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ## Example: Bug Fix **Bug:** Empty email accepted **RED** ```typescript test('rejects empty email', async () => { const result = await submitForm({ email: '' }); expect(result.error).toBe('Email required'); }); ``` **Verify RED** ```bash $ npm test FAIL: expected 'Email required', got undefined ``` **GREEN** ```typescript function submitForm(data: FormData) { if (!data.email?.trim()) { return { error: 'Email required' }; } // ... } ``` **Verify GREEN** ```bash $ npm test PASS ``` **REFACTOR** Extract validation for multiple fields if needed. ## Verification Checklist Before marking work complete: - [ ] Every new function/method has a test - [ ] Watched each test fail before implementing - [ ] Each test failed for expected reason (feature missing, not typo) - [ ] Wrote minimal code to pass each test - [ ] All tests pass - [ ] Output pristine (no errors, warnings) - [ ] Tests use real code (mocks only if unavoidable) - [ ] Edge cases and errors covered Can't check all boxes? You skipped TDD. Start over. ## When Stuck | Problem | Solution | |---------|----------| | Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. | | Test too complicated | Design too complicated. Simplify interface. | | Must mock everything | Code too coupled. Use dependency injection. | | Test setup huge | Extract helpers. Still complex? Simplify design. | ## Debugging Integration Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression. Never fix bugs without a test. ## Testing Anti-Patterns When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls: - Testing mock behavior instead of real behavior - Adding test-only methods to production classes - Mocking without understanding dependencies ## Final Rule ``` Production code → test exists and failed first Otherwise → not TDD ``` No exceptions without your human partner's permission. # /sp-using-git-worktrees **Source:** `~/.claude/skills/sp-using-git-worktrees/SKILL.md` --- --- name: using-git-worktrees description: Use when starting feature work that needs isolation from current workspace or before executing implementation plans - creates isolated git worktrees with smart directory selection and safety verification --- # Using Git Worktrees ## Overview Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching. **Core principle:** Systematic directory selection + safety verification = reliable isolation. **Announce at start:** "I'm using the using-git-worktrees skill to set up an isolated workspace." ## Directory Selection Process Follow this priority order: ### 1. Check Existing Directories ```bash # Check in priority order ls -d .worktrees 2>/dev/null # Preferred (hidden) ls -d worktrees 2>/dev/null # Alternative ``` **If found:** Use that directory. If both exist, `.worktrees` wins. ### 2. Check CLAUDE.md ```bash grep -i "worktree.*director" CLAUDE.md 2>/dev/null ``` **If preference specified:** Use it without asking. ### 3. Ask User If no directory exists and no CLAUDE.md preference: ``` No worktree directory found. Where should I create worktrees? 1. .worktrees/ (project-local, hidden) 2. ~/.config/superpowers/worktrees/<project-name>/ (global location) Which would you prefer? ``` ## Safety Verification ### For Project-Local Directories (.worktrees or worktrees) **MUST verify directory is ignored before creating worktree:** ```bash # Check if directory is ignored (respects local, global, and system gitignore) git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/dev/null ``` **If NOT ignored:** Per Jesse's rule "Fix broken things immediately": 1. Add appropriate line to .gitignore 2. Commit the change 3. Proceed with worktree creation **Why critical:** Prevents accidentally committing worktree contents to repository. ### For Global Directory (~/.config/superpowers/worktrees) No .gitignore verification needed - outside project entirely. ## Creation Steps ### 1. Detect Project Name ```bash project=$(basename "$(git rev-parse --show-toplevel)") ``` ### 2. Create Worktree ```bash # Determine full path case $LOCATION in .worktrees|worktrees) path="$LOCATION/$BRANCH_NAME" ;; ~/.config/superpowers/worktrees/*) path="~/.config/superpowers/worktrees/$project/$BRANCH_NAME" ;; esac # Create worktree with new branch git worktree add "$path" -b "$BRANCH_NAME" cd "$path" ``` ### 3. Run Project Setup Auto-detect and run appropriate setup: ```bash # Node.js if [ -f package.json ]; then npm install; fi # Rust if [ -f Cargo.toml ]; then cargo build; fi # Python if [ -f requirements.txt ]; then pip install -r requirements.txt; fi if [ -f pyproject.toml ]; then poetry install; fi # Go if [ -f go.mod ]; then go mod download; fi ``` ### 4. Verify Clean Baseline Run tests to ensure worktree starts clean: ```bash # Examples - use project-appropriate command npm test cargo test pytest go test ./... ``` **If tests fail:** Report failures, ask whether to proceed or investigate. **If tests pass:** Report ready. ### 5. Report Location ``` Worktree ready at <full-path> Tests passing (<N> tests, 0 failures) Ready to implement <feature-name> ``` ## Quick Reference | Situation | Action | |-----------|--------| | `.worktrees/` exists | Use it (verify ignored) | | `worktrees/` exists | Use it (verify ignored) | | Both exist | Use `.worktrees/` | | Neither exists | Check CLAUDE.md → Ask user | | Directory not ignored | Add to .gitignore + commit | | Tests fail during baseline | Report failures + ask | | No package.json/Cargo.toml | Skip dependency install | ## Common Mistakes ### Skipping ignore verification - **Problem:** Worktree contents get tracked, pollute git status - **Fix:** Always use `git check-ignore` before creating project-local worktree ### Assuming directory location - **Problem:** Creates inconsistency, violates project conventions - **Fix:** Follow priority: existing > CLAUDE.md > ask ### Proceeding with failing tests - **Problem:** Can't distinguish new bugs from pre-existing issues - **Fix:** Report failures, get explicit permission to proceed ### Hardcoding setup commands - **Problem:** Breaks on projects using different tools - **Fix:** Auto-detect from project files (package.json, etc.) ## Example Workflow ``` You: I'm using the using-git-worktrees skill to set up an isolated workspace. [Check .worktrees/ - exists] [Verify ignored - git check-ignore confirms .worktrees/ is ignored] [Create worktree: git worktree add .worktrees/auth -b feature/auth] [Run npm install] [Run npm test - 47 passing] Worktree ready at /Users/jesse/myproject/.worktrees/auth Tests passing (47 tests, 0 failures) Ready to implement auth feature ``` ## Red Flags **Never:** - Create worktree without verifying it's ignored (project-local) - Skip baseline test verification - Proceed with failing tests without asking - Assume directory location when ambiguous - Skip CLAUDE.md check **Always:** - Follow directory priority: existing > CLAUDE.md > ask - Verify directory is ignored for project-local - Auto-detect and run project setup - Verify clean test baseline ## Integration **Called by:** - **brainstorming** (Phase 4) - REQUIRED when design is approved and implementation follows - **subagent-driven-development** - REQUIRED before executing any tasks - **executing-plans** - REQUIRED before executing any tasks - Any skill needing isolated workspace **Pairs with:** - **finishing-a-development-branch** - REQUIRED for cleanup after work complete # /sp-using-superpowers **Source:** `~/.claude/skills/sp-using-superpowers/SKILL.md` --- --- name: using-superpowers description: Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions --- <EXTREMELY-IMPORTANT> If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill. IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT. This is not negotiable. This is not optional. You cannot rationalize your way out of this. </EXTREMELY-IMPORTANT> ## How to Access Skills **In Claude Code:** Use the `Skill` tool. When you invoke a skill, its content is loaded and presented to you—follow it directly. Never use the Read tool on skill files. **In other environments:** Check your platform's documentation for how skills are loaded. # Using Skills ## The Rule **Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it. ```dot digraph skill_flow { "User message received" [shape=doublecircle]; "Might any skill apply?" [shape=diamond]; "Invoke Skill tool" [shape=box]; "Announce: 'Using [skill] to [purpose]'" [shape=box]; "Has checklist?" [shape=diamond]; "Create TodoWrite todo per item" [shape=box]; "Follow skill exactly" [shape=box]; "Respond (including clarifications)" [shape=doublecircle]; "User message received" -> "Might any skill apply?"; "Might any skill apply?" -> "Invoke Skill tool" [label="yes, even 1%"]; "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"]; "Invoke Skill tool" -> "Announce: 'Using [skill] to [purpose]'"; "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?"; "Has checklist?" -> "Create TodoWrite todo per item" [label="yes"]; "Has checklist?" -> "Follow skill exactly" [label="no"]; "Create TodoWrite todo per item" -> "Follow skill exactly"; } ``` ## Red Flags These thoughts mean STOP—you're rationalizing: | Thought | Reality | |---------|---------| | "This is just a simple question" | Questions are tasks. Check for skills. | | "I need more context first" | Skill check comes BEFORE clarifying questions. | | "Let me explore the codebase first" | Skills tell you HOW to explore. Check first. | | "I can check git/files quickly" | Files lack conversation context. Check for skills. | | "Let me gather information first" | Skills tell you HOW to gather information. | | "This doesn't need a formal skill" | If a skill exists, use it. | | "I remember this skill" | Skills evolve. Read current version. | | "This doesn't count as a task" | Action = task. Check for skills. | | "The skill is overkill" | Simple things become complex. Use it. | | "I'll just do this one thing first" | Check BEFORE doing anything. | | "This feels productive" | Undisciplined action wastes time. Skills prevent this. | | "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. | ## Skill Priority When multiple skills could apply, use this order: 1. **Process skills first** (brainstorming, debugging) - these determine HOW to approach the task 2. **Implementation skills second** (frontend-design, mcp-builder) - these guide execution "Let's build X" → brainstorming first, then implementation skills. "Fix this bug" → debugging first, then domain-specific skills. ## Skill Types **Rigid** (TDD, debugging): Follow exactly. Don't adapt away discipline. **Flexible** (patterns): Adapt principles to context. The skill itself tells you which. ## User Instructions Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows. # /sp-verification-before-completion **Source:** `~/.claude/skills/sp-verification-before-completion/SKILL.md` --- --- name: verification-before-completion description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always --- # Verification Before Completion ## Overview Claiming work is complete without verification is dishonesty, not efficiency. **Core principle:** Evidence before claims, always. **Violating the letter of this rule is violating the spirit of this rule.** ## The Iron Law ``` NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE ``` If you haven't run the verification command in this message, you cannot claim it passes. ## The Gate Function ``` BEFORE claiming any status or expressing satisfaction: 1. IDENTIFY: What command proves this claim? 2. RUN: Execute the FULL command (fresh, complete) 3. READ: Full output, check exit code, count failures 4. VERIFY: Does output confirm the claim? - If NO: State actual status with evidence - If YES: State claim WITH evidence 5. ONLY THEN: Make the claim Skip any step = lying, not verifying ``` ## Common Failures | Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | Regression test works | Red-green cycle verified | Test passes once | | Agent completed | VCS diff shows changes | Agent reports "success" | | Requirements met | Line-by-line checklist | Tests passing | ## Red Flags - STOP - Using "should", "probably", "seems to" - Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.) - About to commit/push/PR without verification - Trusting agent success reports - Relying on partial verification - Thinking "just this once" - Tired and wanting work over - **ANY wording implying success without having run verification** ## Rationalization Prevention | Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter | ## Key Patterns **Tests:** ``` ✅ [Run test command] [See: 34/34 pass] "All tests pass" ❌ "Should pass now" / "Looks correct" ``` **Regression tests (TDD Red-Green):** ``` ✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass) ❌ "I've written a regression test" (without red-green verification) ``` **Build:** ``` ✅ [Run build] [See: exit 0] "Build passes" ❌ "Linter passed" (linter doesn't check compilation) ``` **Requirements:** ``` ✅ Re-read plan → Create checklist → Verify each → Report gaps or completion ❌ "Tests pass, phase complete" ``` **Agent delegation:** ``` ✅ Agent reports success → Check VCS diff → Verify changes → Report actual state ❌ Trust agent report ``` ## Why This Matters From 24 failure memories: - your human partner said "I don't believe you" - trust broken - Undefined functions shipped - would crash - Missing requirements shipped - incomplete features - Time wasted on false completion → redirect → rework - Violates: "Honesty is a core value. If you lie, you'll be replaced." ## When To Apply **ALWAYS before:** - ANY variation of success/completion claims - ANY expression of satisfaction - ANY positive statement about work state - Committing, PR creation, task completion - Moving to next task - Delegating to agents **Rule applies to:** - Exact phrases - Paraphrases and synonyms - Implications of success - ANY communication suggesting completion/correctness ## The Bottom Line **No shortcuts for verification.** Run the command. Read the output. THEN claim the result. This is non-negotiable. # /sp-write-plan **Source:** `~/.claude/skills/sp-write-plan/SKILL.md` --- --- description: Create detailed implementation plan with bite-sized tasks disable-model-invocation: true --- Invoke the superpowers:writing-plans skill and follow it exactly as presented to you # /sp-writing-plans **Source:** `~/.claude/skills/sp-writing-plans/SKILL.md` --- --- name: writing-plans description: Use when you have a spec or requirements for a multi-step task, before touching code --- # Writing Plans ## Overview Write comprehensive implementation plans assuming the engineer has zero context for our codebase and questionable taste. Document everything they need to know: which files to touch for each task, code, testing, docs they might need to check, how to test it. Give them the whole plan as bite-sized tasks. DRY. YAGNI. TDD. Frequent commits. Assume they are a skilled developer, but know almost nothing about our toolset or problem domain. Assume they don't know good test design very well. **Announce at start:** "I'm using the writing-plans skill to create the implementation plan." **Context:** This should be run in a dedicated worktree (created by brainstorming skill). **Save plans to:** `docs/plans/YYYY-MM-DD-<feature-name>.md` ## Bite-Sized Task Granularity **Each step is one action (2-5 minutes):** - "Write the failing test" - step - "Run it to make sure it fails" - step - "Implement the minimal code to make the test pass" - step - "Run the tests and make sure they pass" - step - "Commit" - step ## Plan Document Header **Every plan MUST start with this header:** ```markdown # [Feature Name] Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** [One sentence describing what this builds] **Architecture:** [2-3 sentences about approach] **Tech Stack:** [Key technologies/libraries] --- ``` ## Task Structure ```markdown ### Task N: [Component Name] **Files:** - Create: `exact/path/to/file.py` - Modify: `exact/path/to/existing.py:123-145` - Test: `tests/exact/path/to/test.py` **Step 1: Write the failing test** ```python def test_specific_behavior(): result = function(input) assert result == expected ``` **Step 2: Run test to verify it fails** Run: `pytest tests/path/test.py::test_name -v` Expected: FAIL with "function not defined" **Step 3: Write minimal implementation** ```python def function(input): return expected ``` **Step 4: Run test to verify it passes** Run: `pytest tests/path/test.py::test_name -v` Expected: PASS **Step 5: Commit** ```bash git add tests/path/test.py src/path/file.py git commit -m "feat: add specific feature" ``` ``` ## Remember - Exact file paths always - Complete code in plan (not "add validation") - Exact commands with expected output - Reference relevant skills with @ syntax - DRY, YAGNI, TDD, frequent commits ## Execution Handoff After saving the plan, offer execution choice: **"Plan complete and saved to `docs/plans/<filename>.md`. Two execution options:** **1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration **2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints **Which approach?"** **If Subagent-Driven chosen:** - **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development - Stay in this session - Fresh subagent per task + code review **If Parallel Session chosen:** - Guide them to open new session in worktree - **REQUIRED SUB-SKILL:** New session uses superpowers:executing-plans # /sp-writing-skills **Source:** `~/.claude/skills/sp-writing-skills/SKILL.md` --- --- name: writing-skills description: Use when creating new skills, editing existing skills, or verifying skills work before deployment --- # Writing Skills ## Overview **Writing skills IS Test-Driven Development applied to process documentation.** **Personal skills live in agent-specific directories (`~/.claude/skills` for Claude Code, `~/.agents/skills/` for Codex)** You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes). **Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing. **REQUIRED BACKGROUND:** You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation. **Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill. ## What is a Skill? A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches. **Skills are:** Reusable techniques, patterns, tools, reference guides **Skills are NOT:** Narratives about how you solved a problem once ## TDD Mapping for Skills | TDD Concept | Skill Creation | |-------------|----------------| | **Test case** | Pressure scenario with subagent | | **Production code** | Skill document (SKILL.md) | | **Test fails (RED)** | Agent violates rule without skill (baseline) | | **Test passes (GREEN)** | Agent complies with skill present | | **Refactor** | Close loopholes while maintaining compliance | | **Write test first** | Run baseline scenario BEFORE writing skill | | **Watch it fail** | Document exact rationalizations agent uses | | **Minimal code** | Write skill addressing those specific violations | | **Watch it pass** | Verify agent now complies | | **Refactor cycle** | Find new rationalizations → plug → re-verify | The entire skill creation process follows RED-GREEN-REFACTOR. ## When to Create a Skill **Create when:** - Technique wasn't intuitively obvious to you - You'd reference this again across projects - Pattern applies broadly (not project-specific) - Others would benefit **Don't create for:** - One-off solutions - Standard practices well-documented elsewhere - Project-specific conventions (put in CLAUDE.md) - Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls) ## Skill Types ### Technique Concrete method with steps to follow (condition-based-waiting, root-cause-tracing) ### Pattern Way of thinking about problems (flatten-with-flags, test-invariants) ### Reference API docs, syntax guides, tool documentation (office docs) ## Directory Structure ``` skills/ skill-name/ SKILL.md # Main reference (required) supporting-file.* # Only if needed ``` **Flat namespace** - all skills in one searchable namespace **Separate files for:** 1. **Heavy reference** (100+ lines) - API docs, comprehensive syntax 2. **Reusable tools** - Scripts, utilities, templates **Keep inline:** - Principles and concepts - Code patterns (< 50 lines) - Everything else ## SKILL.md Structure **Frontmatter (YAML):** - Only two fields supported: `name` and `description` - Max 1024 characters total - `name`: Use letters, numbers, and hyphens only (no parentheses, special chars) - `description`: Third-person, describes ONLY when to use (NOT what it does) - Start with "Use when..." to focus on triggering conditions - Include specific symptoms, situations, and contexts - **NEVER summarize the skill's process or workflow** (see CSO section for why) - Keep under 500 characters if possible ```markdown --- name: Skill-Name-With-Hyphens description: Use when [specific triggering conditions and symptoms] --- # Skill Name ## Overview What is this? Core principle in 1-2 sentences. ## When to Use [Small inline flowchart IF decision non-obvious] Bullet list with SYMPTOMS and use cases When NOT to use ## Core Pattern (for techniques/patterns) Before/after code comparison ## Quick Reference Table or bullets for scanning common operations ## Implementation Inline code for simple patterns Link to file for heavy reference or reusable tools ## Common Mistakes What goes wrong + fixes ## Real-World Impact (optional) Concrete results ``` ## Claude Search Optimization (CSO) **Critical for discovery:** Future Claude needs to FIND your skill ### 1. Rich Description Field **Purpose:** Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?" **Format:** Start with "Use when..." to focus on triggering conditions **CRITICAL: Description = When to Use, NOT What the Skill Does** The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description. **Why this matters:** Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process. **The trap:** Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips. ```yaml # ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill description: Use when executing plans - dispatches subagent per task with code review between tasks # ❌ BAD: Too much process detail description: Use for TDD - write test first, watch it fail, write minimal code, refactor # ✅ GOOD: Just triggering conditions, no workflow summary description: Use when executing implementation plans with independent tasks in the current session # ✅ GOOD: Triggering conditions only description: Use when implementing any feature or bugfix, before writing implementation code ``` **Content:** - Use concrete triggers, symptoms, and situations that signal this skill applies - Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep) - Keep triggers technology-agnostic unless the skill itself is technology-specific - If skill is technology-specific, make that explicit in the trigger - Write in third person (injected into system prompt) - **NEVER summarize the skill's process or workflow** ```yaml # ❌ BAD: Too abstract, vague, doesn't include when to use description: For async testing # ❌ BAD: First person description: I can help you with async tests when they're flaky # ❌ BAD: Mentions technology but skill isn't specific to it description: Use when tests use setTimeout/sleep and are flaky # ✅ GOOD: Starts with "Use when", describes problem, no workflow description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently # ✅ GOOD: Technology-specific skill with explicit trigger description: Use when using React Router and handling authentication redirects ``` ### 2. Keyword Coverage Use words Claude would search for: - Error messages: "Hook timed out", "ENOTEMPTY", "race condition" - Symptoms: "flaky", "hanging", "zombie", "pollution" - Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach" - Tools: Actual commands, library names, file types ### 3. Descriptive Naming **Use active voice, verb-first:** - ✅ `creating-skills` not `skill-creation` - ✅ `condition-based-waiting` not `async-test-helpers` ### 4. Token Efficiency (Critical) **Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts. **Target word counts:** - getting-started workflows: <150 words each - Frequently-loaded skills: <200 words total - Other skills: <500 words (still be concise) **Techniques:** **Move details to tool help:** ```bash # ❌ BAD: Document all flags in SKILL.md search-conversations supports --text, --both, --after DATE, --before DATE, --limit N # ✅ GOOD: Reference --help search-conversations supports multiple modes and filters. Run --help for details. ``` **Use cross-references:** ```markdown # ❌ BAD: Repeat workflow details When searching, dispatch subagent with template... [20 lines of repeated instructions] # ✅ GOOD: Reference other skill Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow. ``` **Compress examples:** ```markdown # ❌ BAD: Verbose example (42 words) your human partner: "How did we handle authentication errors in React Router before?" You: I'll search past conversations for React Router authentication patterns. [Dispatch subagent with search query: "React Router authentication error handling 401"] # ✅ GOOD: Minimal example (20 words) Partner: "How did we handle auth errors in React Router?" You: Searching... [Dispatch subagent → synthesis] ``` **Eliminate redundancy:** - Don't repeat what's in cross-referenced skills - Don't explain what's obvious from command - Don't include multiple examples of same pattern **Verification:** ```bash wc -w skills/path/SKILL.md # getting-started workflows: aim for <150 each # Other frequently-loaded: aim for <200 total ``` **Name by what you DO or core insight:** - ✅ `condition-based-waiting` > `async-test-helpers` - ✅ `using-skills` not `skill-usage` - ✅ `flatten-with-flags` > `data-structure-refactoring` - ✅ `root-cause-tracing` > `debugging-techniques` **Gerunds (-ing) work well for processes:** - `creating-skills`, `testing-skills`, `debugging-with-logs` - Active, describes the action you're taking ### 4. Cross-Referencing Other Skills **When writing documentation that references other skills:** Use skill name only, with explicit requirement markers: - ✅ Good: `**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development` - ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging` - ❌ Bad: `See skills/testing/test-driven-development` (unclear if required) - ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context) **Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them. ## Flowchart Usage ```dot digraph when_flowchart { "Need to show information?" [shape=diamond]; "Decision where I might go wrong?" [shape=diamond]; "Use markdown" [shape=box]; "Small inline flowchart" [shape=box]; "Need to show information?" -> "Decision where I might go wrong?" [label="yes"]; "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"]; "Decision where I might go wrong?" -> "Use markdown" [label="no"]; } ``` **Use flowcharts ONLY for:** - Non-obvious decision points - Process loops where you might stop too early - "When to use A vs B" decisions **Never use flowcharts for:** - Reference material → Tables, lists - Code examples → Markdown blocks - Linear instructions → Numbered lists - Labels without semantic meaning (step1, helper2) See @graphviz-conventions.dot for graphviz style rules. **Visualizing for your human partner:** Use `render-graphs.js` in this directory to render a skill's flowcharts to SVG: ```bash ./render-graphs.js ../some-skill # Each diagram separately ./render-graphs.js ../some-skill --combine # All diagrams in one SVG ``` ## Code Examples **One excellent example beats many mediocre ones** Choose most relevant language: - Testing techniques → TypeScript/JavaScript - System debugging → Shell/Python - Data processing → Python **Good example:** - Complete and runnable - Well-commented explaining WHY - From real scenario - Shows pattern clearly - Ready to adapt (not generic template) **Don't:** - Implement in 5+ languages - Create fill-in-the-blank templates - Write contrived examples You're good at porting - one great example is enough. ## File Organization ### Self-Contained Skill ``` defense-in-depth/ SKILL.md # Everything inline ``` When: All content fits, no heavy reference needed ### Skill with Reusable Tool ``` condition-based-waiting/ SKILL.md # Overview + patterns example.ts # Working helpers to adapt ``` When: Tool is reusable code, not just narrative ### Skill with Heavy Reference ``` pptx/ SKILL.md # Overview + workflows pptxgenjs.md # 600 lines API reference ooxml.md # 500 lines XML structure scripts/ # Executable tools ``` When: Reference material too large for inline ## The Iron Law (Same as TDD) ``` NO SKILL WITHOUT A FAILING TEST FIRST ``` This applies to NEW skills AND EDITS to existing skills. Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation. **No exceptions:** - Not for "simple additions" - Not for "just adding a section" - Not for "documentation updates" - Don't keep untested changes as "reference" - Don't "adapt" while running tests - Delete means delete **REQUIRED BACKGROUND:** The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation. ## Testing All Skill Types Different skill types need different test approaches: ### Discipline-Enforcing Skills (rules/requirements) **Examples:** TDD, verification-before-completion, designing-before-coding **Test with:** - Academic questions: Do they understand the rules? - Pressure scenarios: Do they comply under stress? - Multiple pressures combined: time + sunk cost + exhaustion - Identify rationalizations and add explicit counters **Success criteria:** Agent follows rule under maximum pressure ### Technique Skills (how-to guides) **Examples:** condition-based-waiting, root-cause-tracing, defensive-programming **Test with:** - Application scenarios: Can they apply the technique correctly? - Variation scenarios: Do they handle edge cases? - Missing information tests: Do instructions have gaps? **Success criteria:** Agent successfully applies technique to new scenario ### Pattern Skills (mental models) **Examples:** reducing-complexity, information-hiding concepts **Test with:** - Recognition scenarios: Do they recognize when pattern applies? - Application scenarios: Can they use the mental model? - Counter-examples: Do they know when NOT to apply? **Success criteria:** Agent correctly identifies when/how to apply pattern ### Reference Skills (documentation/APIs) **Examples:** API documentation, command references, library guides **Test with:** - Retrieval scenarios: Can they find the right information? - Application scenarios: Can they use what they found correctly? - Gap testing: Are common use cases covered? **Success criteria:** Agent finds and correctly applies reference information ## Common Rationalizations for Skipping Testing | Excuse | Reality | |--------|---------| | "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. | | "It's just a reference" | References can have gaps, unclear sections. Test retrieval. | | "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. | | "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. | | "Too tedious to test" | Testing is less tedious than debugging bad skill in production. | | "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. | | "Academic review is enough" | Reading ≠ using. Test application scenarios. | | "No time to test" | Deploying untested skill wastes more time fixing it later. | **All of these mean: Test before deploying. No exceptions.** ## Bulletproofing Skills Against Rationalization Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure. **Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles. ### Close Every Loophole Explicitly Don't just state the rule - forbid specific workarounds: <Bad> ```markdown Write code before test? Delete it. ``` </Bad> <Good> ```markdown Write code before test? Delete it. Start over. **No exceptions:** - Don't keep it as "reference" - Don't "adapt" it while writing tests - Don't look at it - Delete means delete ``` </Good> ### Address "Spirit vs Letter" Arguments Add foundational principle early: ```markdown **Violating the letter of the rules is violating the spirit of the rules.** ``` This cuts off entire class of "I'm following the spirit" rationalizations. ### Build Rationalization Table Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table: ```markdown | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | ``` ### Create Red Flags List Make it easy for agents to self-check when rationalizing: ```markdown ## Red Flags - STOP and Start Over - Code before test - "I already manually tested it" - "Tests after achieve the same purpose" - "It's about spirit not ritual" - "This is different because..." **All of these mean: Delete code. Start over with TDD.** ``` ### Update CSO for Violation Symptoms Add to description: symptoms of when you're ABOUT to violate the rule: ```yaml description: use when implementing any feature or bugfix, before writing implementation code ``` ## RED-GREEN-REFACTOR for Skills Follow the TDD cycle: ### RED: Write Failing Test (Baseline) Run pressure scenario with subagent WITHOUT the skill. Document exact behavior: - What choices did they make? - What rationalizations did they use (verbatim)? - Which pressures triggered violations? This is "watch the test fail" - you must see what agents naturally do before writing the skill. ### GREEN: Write Minimal Skill Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases. Run same scenarios WITH skill. Agent should now comply. ### REFACTOR: Close Loopholes Agent found new rationalization? Add explicit counter. Re-test until bulletproof. **Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology: - How to write pressure scenarios - Pressure types (time, sunk cost, authority, exhaustion) - Plugging holes systematically - Meta-testing techniques ## Anti-Patterns ### ❌ Narrative Example "In session 2025-10-03, we found empty projectDir caused..." **Why bad:** Too specific, not reusable ### ❌ Multi-Language Dilution example-js.js, example-py.py, example-go.go **Why bad:** Mediocre quality, maintenance burden ### ❌ Code in Flowcharts ```dot step1 [label="import fs"]; step2 [label="read file"]; ``` **Why bad:** Can't copy-paste, hard to read ### ❌ Generic Labels helper1, helper2, step3, pattern4 **Why bad:** Labels should have semantic meaning ## STOP: Before Moving to Next Skill **After writing ANY skill, you MUST STOP and complete the deployment process.** **Do NOT:** - Create multiple skills in batch without testing each - Move to next skill before current one is verified - Skip testing because "batching is more efficient" **The deployment checklist below is MANDATORY for EACH skill.** Deploying untested skills = deploying untested code. It's a violation of quality standards. ## Skill Creation Checklist (TDD Adapted) **IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.** **RED Phase - Write Failing Test:** - [ ] Create pressure scenarios (3+ combined pressures for discipline skills) - [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim - [ ] Identify patterns in rationalizations/failures **GREEN Phase - Write Minimal Skill:** - [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars) - [ ] YAML frontmatter with only name and description (max 1024 chars) - [ ] Description starts with "Use when..." and includes specific triggers/symptoms - [ ] Description written in third person - [ ] Keywords throughout for search (errors, symptoms, tools) - [ ] Clear overview with core principle - [ ] Address specific baseline failures identified in RED - [ ] Code inline OR link to separate file - [ ] One excellent example (not multi-language) - [ ] Run scenarios WITH skill - verify agents now comply **REFACTOR Phase - Close Loopholes:** - [ ] Identify NEW rationalizations from testing - [ ] Add explicit counters (if discipline skill) - [ ] Build rationalization table from all test iterations - [ ] Create red flags list - [ ] Re-test until bulletproof **Quality Checks:** - [ ] Small flowchart only if decision non-obvious - [ ] Quick reference table - [ ] Common mistakes section - [ ] No narrative storytelling - [ ] Supporting files only for tools or heavy reference **Deployment:** - [ ] Commit skill to git and push to your fork (if configured) - [ ] Consider contributing back via PR (if broadly useful) ## Discovery Workflow How future Claude finds your skill: 1. **Encounters problem** ("tests are flaky") 3. **Finds SKILL** (description matches) 4. **Scans overview** (is this relevant?) 5. **Reads patterns** (quick reference table) 6. **Loads example** (only when implementing) **Optimize for this flow** - put searchable terms early and often. ## The Bottom Line **Creating skills IS TDD for process documentation.** Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results. If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.