Trail of Bits Skills

/ask-questions-if-underspecified
/audit-context-building
/algorand-vulnerability-scanner
/audit-prep-assistant
/cairo-vulnerability-scanner
/code-maturity-assessor
/cosmos-vulnerability-scanner
/guidelines-advisor
/secure-workflow-guide
/solana-vulnerability-scanner
/substrate-vulnerability-scanner
/token-integration-analyzer
/ton-vulnerability-scanner
/claude-in-chrome-troubleshooting
/constant-time-analysis
/interpreting-culture-index
/devcontainer-setup
/differential-review
/dwarf-expert
/entry-point-analyzer
/firebase-apk-scanner
/fix-review
/insecure-defaults
/modern-python
/property-based-testing
/second-opinion
/semgrep-rule-creator
/semgrep-rule-variant-creator
/sharp-edges
/spec-to-code-compliance
/codeql
/sarif-parsing
/semgrep
/address-sanitizer
/aflpp
/atheris
/cargo-fuzz
/constant-time-testing
/coverage-analysis
/fuzzing-dictionary
/fuzzing-obstacles
/harness-writing
/libafl
/libfuzzer
/ossfuzz
/ruzzy
/testing-handbook-generator
/wycheproof
/variant-analysis
/yara-rule-authoring

/ask-questions-if-underspecified

Source: `~/.claude/skills/tob-ask-questions-if-underspecified/skills/ask-questions-if-underspecified/SKILL.md`

name: ask-questions-if-underspecified description: Clarify requirements before implementing. Use when serious doubts arise.

Ask Questions If Underspecified

When to Use

Use this skill when a request has multiple plausible interpretations or key details (objective, scope, constraints, environment, or safety) are unclear.

When NOT to Use

Do not use this skill when the request is already clear, or when a quick, low-risk discovery read can answer the missing details.

Goal

Ask the minimum set of clarifying questions needed to avoid wrong work; do not start implementing until the must-have questions are answered (or the user explicitly approves proceeding with stated assumptions).

Workflow

1) Decide whether the request is underspecified

Treat a request as underspecified if after exploring how to perform the work, some or all of the following are not clear:

Define the objective (what should change vs stay the same)
Define "done" (acceptance criteria, examples, edge cases)
Define scope (which files/components/users are in/out)
Define constraints (compatibility, performance, style, deps, time)
Identify environment (language/runtime versions, OS, build/test runner)
Clarify safety/reversibility (data migration, rollout/rollback, risk)

If multiple plausible interpretations exist, assume it is underspecified.

2) Ask must-have questions first (keep it small)

Ask 1-5 questions in the first pass. Prefer questions that eliminate whole branches of work.

Make questions easy to answer:

Optimize for scannability (short, numbered questions; avoid paragraphs)
Offer multiple-choice options when possible
Suggest reasonable defaults when appropriate (mark them clearly as the default/recommended choice; bold the recommended choice in the list, or if you present options in a code block, put a bold "Recommended" line immediately above the block and also tag defaults inside the block)
Include a fast-path response (e.g., reply defaults to accept all recommended/default choices)
Include a low-friction "not sure" option when helpful (e.g., "Not sure - use default")
Separate "Need to know" from "Nice to know" if that reduces friction
Structure options so the user can respond with compact decisions (e.g., 1b 2a 3c); restate the chosen options in plain language to confirm

3) Pause before acting

Until must-have answers arrive:

Do not run commands, edit files, or produce a detailed plan that depends on unknowns
Do perform a clearly labeled, low-risk discovery step only if it does not commit you to a direction (e.g., inspect repo structure, read relevant config files)

If the user explicitly asks you to proceed without answers:

State your assumptions as a short numbered list
Ask for confirmation; proceed only after they confirm or correct them

4) Confirm interpretation, then proceed

Once you have answers, restate the requirements in 1-3 sentences (including key constraints and what success looks like), then start work.

Question templates

"Before I start, I need: (1) ..., (2) ..., (3) .... If you don't care about (2), I will assume ...."
"Which of these should it be? A) ... B) ... C) ... (pick one)"
"What would you consider 'done'? For example: ..."
"Any constraints I must follow (versions, performance, style, deps)? If none, I will target the existing project defaults."
Use numbered questions with lettered options and a clear reply format

1) Scope?
a) Minimal change (default)
b) Refactor while touching the area
c) Not sure - use default
2) Compatibility target?
a) Current project defaults (default)
b) Also support older versions: <specify>
c) Not sure - use default

Reply with: defaults (or 1a 2a)

Anti-patterns

Don't ask questions you can answer with a quick, low-risk discovery read (e.g., configs, existing patterns, docs).
Don't ask open-ended questions if a tight multiple-choice or yes/no would eliminate ambiguity faster.

/audit-context-building

Source: `~/.claude/skills/tob-audit-context-building/skills/audit-context-building/SKILL.md`

name: audit-context-building description: Enables ultra-granular, line-by-line code analysis to build deep architectural context before vulnerability or bug finding.

Deep Context Builder Skill (Ultra-Granular Pure Context Mode)

1. Purpose

This skill governs how Claude thinks during the context-building phase of an audit.

Perform line-by-line / block-by-block code analysis by default.
Apply First Principles, 5 Whys, and 5 Hows at micro scale.
Continuously link insights → functions → modules → entire system.
Maintain a stable, explicit mental model that evolves with new evidence.
Identify invariants, assumptions, flows, and reasoning hazards.

This skill defines a structured analysis format (see Example: Function Micro-Analysis below) and runs before the vulnerability-hunting phase.

2. When to Use This Skill

Use when:

Deep comprehension is needed before bug or vulnerability discovery.
You want bottom-up understanding instead of high-level guessing.
Reducing hallucinations, contradictions, and context loss is critical.
Preparing for security auditing, architecture review, or threat modeling.

Do not use for:

Vulnerability findings
Fix recommendations
Exploit reasoning
Severity/impact rating

3. How This Skill Behaves

Default to ultra-granular analysis of each block and line.
Apply micro-level First Principles, 5 Whys, and 5 Hows.
Build and refine a persistent global mental model.
Update earlier assumptions when contradicted ("Earlier I thought X; now Y.").
Periodically anchor summaries to maintain stable context.
Avoid speculation; express uncertainty explicitly when needed.

Goal: deep, accurate understanding, not conclusions.

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"I get the gist"	Gist-level understanding misses edge cases	Line-by-line analysis required
"This function is simple"	Simple functions compose into complex bugs	Apply 5 Whys anyway
"I'll remember this invariant"	You won't. Context degrades.	Write it down explicitly
"External call is probably fine"	External = adversarial until proven otherwise	Jump into code or model as hostile
"I can skip this helper"	Helpers contain assumptions that propagate	Trace the full call chain
"This is taking too long"	Rushed context = hallucinated vulnerabilities later	Slow is fast

4. Phase 1 — Initial Orientation (Bottom-Up Scan)

Before deep analysis, Claude performs a minimal mapping:

Identify major modules/files/contracts.
Note obvious public/external entrypoints.
Identify likely actors (users, owners, relayers, oracles, other contracts).
Identify important storage variables, dicts, state structs, or cells.
Build a preliminary structure without assuming behavior.

This establishes anchors for detailed analysis.

5. Phase 2 — Ultra-Granular Function Analysis (Default Mode)

Every non-trivial function receives full micro analysis.

5.1 Per-Function Microstructure Checklist

For each function:

Purpose
- Why the function exists and its role in the system.
Inputs & Assumptions
- Parameters and implicit inputs (state, sender, env).
- Preconditions and constraints.
Outputs & Effects
- Return values.
- State/storage writes.
- Events/messages.
- External interactions.
Block-by-Block / Line-by-Line Analysis For each logical block:
- What it does.
- Why it appears here (ordering logic).
- What assumptions it relies on.
- What invariants it establishes or maintains.
- What later logic depends on it.
Apply per-block:
- First Principles
- 5 Whys
- 5 Hows

5.2 Cross-Function & External Flow Analysis

(Full Integration of Jump-Into-External-Code Rule)

When encountering calls, continue the same micro-first analysis across boundaries.

Internal Calls

Jump into the callee immediately.
Perform block-by-block analysis of relevant code.
Track flow of data, assumptions, and invariants: caller → callee → return → caller.
Note if callee logic behaves differently in this specific call context.

External Calls — Two Cases

Case A — External Call to a Contract Whose Code Exists in the Codebase Treat as an internal call:

Jump into the target contract/function.
Continue block-by-block micro-analysis.
Propagate invariants and assumptions seamlessly.
Consider edge cases based on the actual code, not a black-box guess.

Case B — External Call Without Available Code (True External / Black Box) Analyze as adversarial:

Describe payload/value/gas or parameters sent.
Identify assumptions about the target.
Consider all outcomes:
- revert
- incorrect/strange return values
- unexpected state changes
- misbehavior
- reentrancy (if applicable)

Continuity Rule

Treat the entire call chain as one continuous execution flow. Never reset context. All invariants, assumptions, and data dependencies must propagate across calls.

5.3 Complete Analysis Example

See FUNCTION_MICRO_ANALYSIS_EXAMPLE.md for a complete walkthrough demonstrating:

Full micro-analysis of a DEX swap function
Application of First Principles, 5 Whys, and 5 Hows
Block-by-block analysis with invariants and assumptions
Cross-function dependency mapping
Risk analysis for external interactions

This example demonstrates the level of depth and structure required for all analyzed functions.

5.4 Output Requirements

When performing ultra-granular analysis, Claude MUST structure output following the format defined in OUTPUT_REQUIREMENTS.md.

Key requirements:

Purpose (2-3 sentences minimum)
Inputs & Assumptions (all parameters, preconditions, trust assumptions)
Outputs & Effects (returns, state writes, external calls, events, postconditions)
Block-by-Block Analysis (What, Why here, Assumptions, First Principles/5 Whys/5 Hows)
Cross-Function Dependencies (internal calls, external calls with risk analysis, shared state)

Quality thresholds:

Minimum 3 invariants per function
Minimum 5 assumptions documented
Minimum 3 risk considerations for external interactions
At least 1 First Principles application
At least 3 combined 5 Whys/5 Hows applications

5.5 Completeness Checklist

Before concluding micro-analysis of a function, verify against the COMPLETENESS_CHECKLIST.md:

Structural Completeness: All required sections present (Purpose, Inputs, Outputs, Block-by-Block, Dependencies)
Content Depth: Minimum thresholds met (invariants, assumptions, risk analysis, First Principles)
Continuity & Integration: Cross-references, propagated assumptions, invariant couplings
Anti-Hallucination: Line number citations, no vague statements, evidence-based claims

Analysis is complete when all checklist items are satisfied and no unresolved "unclear" items remain.

6. Phase 3 — Global System Understanding

After sufficient micro-analysis:

State & Invariant Reconstruction
- Map reads/writes of each state variable.
- Derive multi-function and multi-module invariants.
Workflow Reconstruction
- Identify end-to-end flows (deposit, withdraw, lifecycle, upgrades).
- Track how state transforms across these flows.
- Record assumptions that persist across steps.
Trust Boundary Mapping
- Actor → entrypoint → behavior.
- Identify untrusted input paths.
- Privilege changes and implicit role expectations.
Complexity & Fragility Clustering
- Functions with many assumptions.
- High branching logic.
- Multi-step dependencies.
- Coupled state changes across modules.

These clusters help guide the vulnerability-hunting phase.

7. Stability & Consistency Rules

(Anti-Hallucination, Anti-Contradiction)

Claude must:

Never reshape evidence to fit earlier assumptions. When contradicted:
- Update the model.
- State the correction explicitly.
Periodically anchor key facts Summarize core:
- invariants
- state relationships
- actor roles
- workflows
Avoid vague guesses Use:
- "Unclear; need to inspect X." instead of:
- "It probably…"
Cross-reference constantly Connect new insights to previous state, flows, and invariants to maintain global coherence.

8. Subagent Usage

Claude may spawn subagents for:

Dense or complex functions.
Long data-flow or control-flow chains.
Cryptographic / mathematical logic.
Complex state machines.
Multi-module workflow reconstruction.

Subagents must:

Follow the same micro-first rules.
Return summaries that Claude integrates into its global model.

9. Relationship to Other Phases

This skill runs before:

Vulnerability discovery
Classification / triage
Report writing
Impact modeling
Exploit reasoning

It exists solely to build:

Deep understanding
Stable context
System-level clarity

10. Non-Goals

While active, Claude should NOT:

Identify vulnerabilities
Propose fixes
Generate proofs-of-concept
Model exploits
Assign severity or impact

This is pure context building only.

/algorand-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/algorand-vulnerability-scanner/SKILL.md`

name: algorand-vulnerability-scanner description: Scans Algorand smart contracts for 11 common vulnerabilities including rekeying attacks, unchecked transaction fees, missing field validations, and access control issues. Use when auditing Algorand projects (TEAL/PyTeal).

Algorand Vulnerability Scanner

1. Purpose

Systematically scan Algorand smart contracts (TEAL and PyTeal) for platform-specific security vulnerabilities documented in Trail of Bits' "Not So Smart Contracts" database. This skill encodes 11 critical vulnerability patterns unique to Algorand's transaction model.

2. When to Use This Skill

Auditing Algorand smart contracts (stateful applications or smart signatures)
Reviewing TEAL assembly or PyTeal code
Pre-audit security assessment of Algorand projects
Validating fixes for reported Algorand vulnerabilities
Training team on Algorand-specific security patterns

3. Platform Detection

File Extensions & Indicators

TEAL files: .teal
PyTeal files: .py with PyTeal imports

Language/Framework Markers

# PyTeal indicators
from pyteal import *
from algosdk import *

# Common patterns
Txn, Gtxn, Global, InnerTxnBuilder
OnComplete, ApplicationCall, TxnType
@router.method, @Subroutine

Project Structure

approval_program.py / clear_program.py
contract.teal / signature.teal
References to Algorand SDK or Beaker framework

Tool Support

Tealer: Trail of Bits static analyzer for Algorand
Installation: pip3 install tealer
Usage: tealer contract.teal --detect all

4. How This Skill Works

When invoked, I will:

Search your codebase for TEAL/PyTeal files
Analyze each file for the 11 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Run Tealer (if installed) for automated detection

5. Example Output

When vulnerabilities are found, you'll get a report like this:

=== ALGORAND VULNERABILITY SCAN RESULTS ===

Project: my-algorand-dapp
Files Scanned: 3 (.teal, .py)
Vulnerabilities Found: 2

---

[CRITICAL] Rekeying Attack
File: contracts/approval.py:45
Pattern: Missing RekeyTo validation

Code:
    If(Txn.type_enum() == TxnType.Payment,
        Seq([
            # Missing: Assert(Txn.rekey_to() == Global.zero_address())
            App.globalPut(Bytes("balance"), balance + Txn.amount()),
            Approve()
        ])
    )

Issue: The contract doesn't validate the RekeyTo field, allowing attackers
to change account authorization and bypass restrictions.


---

## 5. Vulnerability Patterns (11 Patterns)

I check for 11 critical vulnerability patterns unique to Algorand. For detailed detection patterns, code examples, mitigations, and testing strategies, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).

### Pattern Summary:

1. **Rekeying Vulnerability** ⚠️ CRITICAL - Unchecked RekeyTo field
2. **Missing Transaction Verification** ⚠️ CRITICAL - No GroupSize/GroupIndex checks
3. **Group Transaction Manipulation** ⚠️ HIGH - Unsafe group transaction handling
4. **Asset Clawback Risk** ⚠️ HIGH - Missing clawback address checks
5. **Application State Manipulation** ⚠️ MEDIUM - Unsafe global/local state updates
6. **Asset Opt-In Missing** ⚠️ HIGH - No asset opt-in validation
7. **Minimum Balance Violation** ⚠️ MEDIUM - Account below minimum balance
8. **Close Remainder To Check** ⚠️ HIGH - Unchecked CloseRemainderTo field
9. **Application Clear State** ⚠️ MEDIUM - Unsafe clear state program
10. **Atomic Transaction Ordering** ⚠️ HIGH - Assuming transaction order
11. **Logic Signature Reuse** ⚠️ HIGH - Logic sigs without uniqueness constraints

For complete vulnerability patterns with code examples, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).
## 5. Scanning Workflow

### Step 1: Platform Identification
1. Confirm file extensions (`.teal`, `.py`)
2. Identify framework (PyTeal, Beaker, pure TEAL)
3. Determine contract type (stateful application vs smart signature)
4. Locate approval and clear state programs

### Step 2: Static Analysis with Tealer
```bash
# Run Tealer on contract
tealer contract.teal --detect all

# Or specific detectors
tealer contract.teal --detect unprotected-rekey,group-size-check,update-application-check

Step 3: Manual Vulnerability Sweep

For each of the 11 vulnerabilities above:

Search for relevant transaction field usage
Verify validation logic exists
Check for bypass conditions
Validate inner transaction handling

Step 4: Transaction Field Validation Matrix

Create checklist for all transaction types used:

Payment Transactions:

RekeyTo validated
CloseRemainderTo validated
Fee validated (if smart signature)

Asset Transfers:

Asset ID validated
AssetCloseTo validated
RekeyTo validated

Application Calls:

OnComplete validated
Access controls enforced
Group size validated

Inner Transactions:

Fee explicitly set to 0
RekeyTo not user-controlled (Teal v6+)
All fields validated

Step 5: Group Transaction Analysis

For atomic transaction groups:

Validate Global.group_size() checks
Review absolute vs relative indexing
Check for replay protection (Lease field)
Verify OnComplete fields for ApplicationCalls in group

Step 6: Access Control Review

Creator/admin privileges properly enforced
Update/delete operations protected
Sensitive functions have authorization checks

6. Reporting Format

Finding Template

## [SEVERITY] Vulnerability Name (e.g., Missing RekeyTo Validation)

**Location**: `contract.teal:45-50` or `approval_program.py:withdraw()`

**Description**:
The contract approves payment transactions without validating the RekeyTo field, allowing an attacker to rekey the account and bypass future authorization checks.

**Vulnerable Code**:
```python
# approval_program.py, line 45
If(Txn.type_enum() == TxnType.Payment,
    Approve()  # Missing RekeyTo check
)

Attack Scenario:

Attacker submits payment transaction with RekeyTo set to attacker's address
Contract approves transaction without checking RekeyTo
Account authorization is rekeyed to attacker
Attacker gains full control of account

Recommendation: Add explicit validation of the RekeyTo field:

If(And(
    Txn.type_enum() == TxnType.Payment,
    Txn.rekey_to() == Global.zero_address()
), Approve(), Reject())

References:

building-secure-contracts/not-so-smart-contracts/algorand/rekeying
Tealer detector: unprotected-rekey


---

## 7. Priority Guidelines

### Critical (Immediate Fix Required)
- Rekeying attacks
- CloseRemainderTo / AssetCloseTo issues
- Access control bypasses

### High (Fix Before Deployment)
- Unchecked transaction fees
- Asset ID validation issues
- Group size validation
- Clear state transaction checks

### Medium (Address in Audit)
- Inner transaction fee issues
- Time-based replay attacks
- DoS via asset opt-in

---

## 8. Testing Recommendations

### Unit Tests Required
- Test each vulnerability scenario with PoC exploit
- Verify fixes prevent exploitation
- Test edge cases (group size = 0, empty addresses, etc.)

### Tealer Integration
```bash
# Add to CI/CD pipeline
tealer approval.teal --detect all --json > tealer-report.json

# Fail build on critical findings
tealer approval.teal --detect all --fail-on critical,high

Scenario Testing

Submit transactions with all critical fields manipulated
Test atomic groups with unexpected sizes
Attempt access control bypasses
Verify inner transaction fee handling

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/algorand/
Tealer Documentation: https://github.com/crytic/tealer
Algorand Developer Docs: https://developer.algorand.org/docs/
PyTeal Documentation: https://pyteal.readthedocs.io/

10. Quick Reference Checklist

Before completing Algorand audit, verify ALL items checked:

/audit-prep-assistant

Source: `~/.claude/skills/tob-building-secure-contracts/skills/audit-prep-assistant/SKILL.md`

name: audit-prep-assistant description: Prepares codebases for security review using Trail of Bits' checklist. Helps set review goals, runs static analysis tools, increases test coverage, removes dead code, ensures accessibility, and generates documentation (flowcharts, user stories, inline comments).

Audit Prep Assistant

Purpose

Helps prepare for a security review using Trail of Bits' checklist. A well-prepared codebase makes the review process smoother and more effective.

Use this: 1-2 weeks before your security audit

The Preparation Process

Step 1: Set Review Goals

Helps define what you want from the review:

Key Questions:

What's the overall security level you're aiming for?
What areas concern you most?
- Previous audit issues?
- Complex components?
- Fragile parts?
What's the worst-case scenario for your project?

Documents goals to share with the assessment team.

Step 2: Resolve Easy Issues

Runs static analysis and helps fix low-hanging fruit:

Run Static Analysis:

For Solidity:

slither . --exclude-dependencies

For Rust:

dylint --all

For Go:

golangci-lint run

For Go/Rust/C++:

# CodeQL and Semgrep checks

Then I'll:

Triage all findings
Help fix easy issues
Document accepted risks

Increase Test Coverage:

Analyze current coverage
Identify untested code
Suggest new tests
Run full test suite

Remove Dead Code:

Find unused functions/variables
Identify unused libraries
Locate stale features
Suggest cleanup

Goal: Clean static analysis report, high test coverage, minimal dead code

Step 3: Ensure Code Accessibility

Helps make code clear and accessible:

Provide Detailed File List:

List all files in scope
Mark out-of-scope files
Explain folder structure
Document dependencies

Create Build Instructions:

Write step-by-step setup guide
Test on fresh environment
Document dependencies and versions
Verify build succeeds

Freeze Stable Version:

Identify commit hash for review
Create dedicated branch
Tag release version
Lock dependencies

Identify Boilerplate:

Mark copied/forked code
Highlight your modifications
Document third-party code
Focus review on your code

Step 4: Generate Documentation

Helps create documentation:

Flowcharts and Sequence Diagrams:

Map primary workflows
Show component relationships
Visualize data flow
Identify critical paths

User Stories:

Define user roles
Document use cases
Explain interactions
Clarify expectations

On-chain/Off-chain Assumptions:

Data validation procedures
Oracle information
Bridge assumptions
Trust boundaries

Actors and Privileges:

List all actors
Document roles
Define privileges
Map access controls

External Developer Docs:

Link docs to code
Keep synchronized
Explain architecture
Document APIs

Function Documentation:

System and function invariants
Parameter ranges (min/max values)
Arithmetic formulas and precision loss
Complex logic explanations
NatSpec for Solidity

Glossary:

Define domain terms
Explain acronyms
Consistent terminology
Business logic concepts

Video Walkthroughs (optional):

Complex workflows
Areas of concern
Architecture overview

How I Work

When invoked, I will:

Help set review goals - Ask about concerns and document them
Run static analysis - Execute appropriate tools for your platform
Analyze test coverage - Identify gaps and suggest improvements
Find dead code - Search for unused code and libraries
Review accessibility - Check build instructions and scope clarity
Generate documentation - Create flowcharts, user stories, glossaries
Create prep checklist - Track what's done and what's remaining

Adapts based on:

Your platform (Solidity, Rust, Go, etc.)
Available tools
Existing documentation
Review timeline

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"README covers setup, no need for detailed build instructions"	READMEs assume context auditors don't have	Test build on fresh environment, document every dependency version
"Static analysis already ran, no need to run again"	Codebase changed since last run	Execute static analysis tools, generate fresh report
"Test coverage looks decent"	"Looks decent" isn't measured coverage	Run coverage tools, identify specific untested code paths
"Not much dead code to worry about"	Dead code hides during manual review	Use automated detection tools to find unused functions/variables
"Architecture is straightforward, no diagrams needed"	Text descriptions miss visual patterns	Generate actual flowcharts and sequence diagrams
"Can freeze version right before audit"	Last-minute freezing creates rushed handoff	Identify and document commit hash now, create dedicated branch
"Terms are self-explanatory"	Domain knowledge isn't universal	Create comprehensive glossary with all domain-specific terms
"I'll do this step later"	Steps build on each other - skipping creates gaps	Complete all 4 steps sequentially, track progress with checklist

Example Output

When I finish helping you prepare, you'll have concrete deliverables like:

=== AUDIT PREP PACKAGE ===

Project: DeFi DEX Protocol
Audit Date: March 15, 2024
Preparation Status: Complete

---

## REVIEW GOALS DOCUMENT

Security Objectives:
- Verify economic security of liquidity pool swaps
- Validate oracle manipulation resistance
- Assess flash loan attack vectors

Areas of Concern:
1. Complex AMM pricing calculation (src/SwapRouter.sol:89-156)
2. Multi-hop swap routing logic (src/Router.sol)
3. Oracle price aggregation (src/PriceOracle.sol:45-78)

Worst-Case Scenario:
- Flash loan attack drains liquidity pools via oracle manipulation

Questions for Auditors:
- Can the AMM pricing model produce negative slippage under edge cases?
- Is the slippage protection sufficient to prevent sandwich attacks?
- How resilient is the system to temporary oracle failures?

---

## STATIC ANALYSIS REPORT

Slither Scan Results:
✓ High: 0 issues
✓ Medium: 0 issues
⚠ Low: 2 issues (triaged - documented in TRIAGE.md)
ℹ Info: 5 issues (code style, acceptable)

Tool: slither . --exclude-dependencies
Date: March 1, 2024
Status: CLEAN (all critical issues resolved)

---

## TEST COVERAGE REPORT

Overall Coverage: 94%
- Statements: 1,245 / 1,321 (94%)
- Branches: 456 / 498 (92%)
- Functions: 89 / 92 (97%)

Uncovered Areas:
- Emergency pause admin functions (tested manually)
- Governance migration path (one-time use)

Command: forge coverage
Status: EXCELLENT

---

## CODE SCOPE

In-Scope Files (8):
✓ src/SwapRouter.sol (456 lines)
✓ src/LiquidityPool.sol (234 lines)
✓ src/PairFactory.sol (389 lines)
✓ src/PriceOracle.sol (167 lines)
✓ src/LiquidityManager.sol (298 lines)
✓ src/Governance.sol (201 lines)
✓ src/FlashLoan.sol (145 lines)
✓ src/RewardsDistributor.sol (178 lines)

Out-of-Scope:
- lib/ (OpenZeppelin, external dependencies)
- test/ (test contracts)
- scripts/ (deployment scripts)

Total In-Scope: 2,068 lines of Solidity

---

## BUILD INSTRUCTIONS

Prerequisites:
- Foundry 0.2.0+
- Node.js 18+
- Git

Setup:
```bash
git clone https://github.com/project/repo.git
cd repo
git checkout audit-march-2024  # Frozen branch
forge install
forge build
forge test

Verification: ✓ Build succeeds without errors ✓ All 127 tests pass ✓ No warnings from compiler

DOCUMENTATION

Generated Artifacts: ✓ ARCHITECTURE.md - System overview with diagrams ✓ USER_STORIES.md - 12 user interaction flows ✓ GLOSSARY.md - 34 domain terms defined ✓ docs/diagrams/contract-interactions.png ✓ docs/diagrams/swap-flow.png ✓ docs/diagrams/state-machine.png

NatSpec Coverage: 100% of public functions

DEPLOYMENT INFO

Network: Ethereum Mainnet Commit: abc123def456 (audit-march-2024 branch) Deployed Contracts:

SwapRouter: 0x1234...
PriceOracle: 0x5678... [... etc]

PACKAGE READY FOR AUDIT ✓ Next Step: Share with Trail of Bits assessment team


---

## What You'll Get

**Review Goals Document**:
- Security objectives
- Areas of concern
- Worst-case scenarios
- Questions for auditors

**Clean Codebase**:
- Triaged static analysis (or clean report)
- High test coverage
- No dead code
- Clear scope

**Accessibility Package**:
- File list with scope
- Build instructions
- Frozen commit/branch
- Boilerplate identified

**Documentation Suite**:
- Flowcharts and diagrams
- User stories
- Architecture docs
- Actor/privilege map
- Inline code comments
- Glossary
- Video walkthroughs (if created)

**Audit Prep Checklist**:
- [ ] Review goals documented
- [ ] Static analysis clean/triaged
- [ ] Test coverage >80%
- [ ] Dead code removed
- [ ] Build instructions verified
- [ ] Stable version frozen
- [ ] Flowcharts created
- [ ] User stories documented
- [ ] Assumptions documented
- [ ] Actors/privileges listed
- [ ] Function docs complete
- [ ] Glossary created

---

## Timeline

**2 weeks before audit**:
- Set review goals
- Run static analysis
- Start fixing issues

**1 week before audit**:
- Increase test coverage
- Remove dead code
- Freeze stable version
- Start documentation

**Few days before audit**:
- Complete documentation
- Verify build instructions
- Create final checklist
- Send package to auditors

---

## Ready to Prep

Let me know when you're ready and I'll help you prepare for your security review!

/cairo-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/cairo-vulnerability-scanner/SKILL.md`

name: cairo-vulnerability-scanner description: Scans Cairo/StarkNet smart contracts for 6 critical vulnerabilities including felt252 arithmetic overflow, L1-L2 messaging issues, address conversion problems, and signature replay. Use when auditing StarkNet projects.

Cairo/StarkNet Vulnerability Scanner

1. Purpose

Systematically scan Cairo smart contracts on StarkNet for platform-specific security vulnerabilities related to arithmetic, cross-layer messaging, and cryptographic operations. This skill encodes 6 critical vulnerability patterns unique to Cairo/StarkNet ecosystem.

2. When to Use This Skill

Auditing StarkNet smart contracts (Cairo)
Reviewing L1-L2 bridge implementations
Pre-launch security assessment of StarkNet applications
Validating cross-layer message handling
Reviewing signature verification logic
Assessing L1 handler functions

3. Platform Detection

File Extensions & Indicators

Cairo files: .cairo

Language/Framework Markers

// Cairo contract indicators
#[contract]
mod MyContract {
    use starknet::ContractAddress;

    #[storage]
    struct Storage {
        balance: LegacyMap<ContractAddress, felt252>,
    }

    #[external(v0)]
    fn transfer(ref self: ContractState, to: ContractAddress, amount: felt252) {
        // Contract logic
    }

    #[l1_handler]
    fn handle_deposit(ref self: ContractState, from_address: felt252, amount: u256) {
        // L1 message handler
    }
}

// Common patterns
felt252, u128, u256
ContractAddress, EthAddress
#[external(v0)], #[l1_handler], #[constructor]
get_caller_address(), get_contract_address()
send_message_to_l1_syscall

Project Structure

src/contract.cairo - Main contract implementation
src/lib.cairo - Library modules
tests/ - Contract tests
Scarb.toml - Cairo project configuration

Tool Support

Caracal: Trail of Bits static analyzer for Cairo
Installation: pip install caracal
Usage: caracal detect src/
cairo-test: Built-in testing framework
Starknet Foundry: Testing and development toolkit

4. How This Skill Works

When invoked, I will:

Search your codebase for Cairo files
Analyze each contract for the 6 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Check L1-L2 interactions for messaging vulnerabilities

5. Example Output

When vulnerabilities are found, you'll get a report like this:

=== CAIRO/STARKNET VULNERABILITY SCAN RESULTS ===


---

## 5. Vulnerability Patterns (6 Patterns)

I check for 6 critical vulnerability patterns unique to Cairo/Starknet. For detailed detection patterns, code examples, mitigations, and testing strategies, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).

### Pattern Summary:

1. **Unchecked Arithmetic** ⚠️ CRITICAL - Integer overflow/underflow in felt252
2. **Storage Collision** ⚠️ CRITICAL - Conflicting storage variable hashes
3. **Missing Access Control** ⚠️ CRITICAL - No caller validation on sensitive functions
4. **Improper Felt252 Boundaries** ⚠️ HIGH - Not validating felt252 range
5. **Unvalidated Contract Address** ⚠️ HIGH - Using untrusted contract addresses
6. **Missing Caller Validation** ⚠️ CRITICAL - No get_caller_address() checks

For complete vulnerability patterns with code examples, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).
## 5. Scanning Workflow

### Step 1: Platform Identification
1. Verify Cairo language and StarkNet framework
2. Check Cairo version (Cairo 1.0+ vs legacy Cairo 0)
3. Locate contract files (`src/*.cairo`)
4. Identify L1-L2 bridge contracts (if applicable)

### Step 2: Arithmetic Safety Sweep
```bash
# Find felt252 usage in arithmetic
rg "felt252" src/ | rg "[-+*/]"

# Find balance/amount storage using felt252
rg "felt252" src/ | rg "balance|amount|total|supply"

# Should prefer u128, u256 instead

Step 3: L1 Handler Analysis

For each #[l1_handler] function:

Validates from_address parameter
Checks address != zero
Has proper access control
Emits events for monitoring

Step 4: Signature Verification Review

For signature-based functions:

Includes nonce tracking
Nonce incremented after use
Domain separator includes chain ID and contract address
Cannot replay signatures

Step 5: L1-L2 Bridge Audit

If contract includes bridge functionality:

L1 validates address < STARKNET_FIELD_PRIME
L1 implements message cancellation
L2 validates from_address in handlers
Symmetric access controls L1 ↔ L2
Test full roundtrip flows

Step 6: Static Analysis with Caracal

# Run Caracal detectors
caracal detect src/

# Specific detectors
caracal detect src/ --detectors unchecked-felt252-arithmetic
caracal detect src/ --detectors unchecked-l1-handler-from
caracal detect src/ --detectors missing-nonce-validation

6. Reporting Format

Finding Template

## [CRITICAL] Unchecked from_address in L1 Handler

**Location**: `src/bridge.cairo:145-155` (handle_deposit function)

**Description**:
The `handle_deposit` L1 handler function does not validate the `from_address` parameter. Any L1 contract can send messages to this function and mint tokens for arbitrary users, bypassing the intended L1 bridge access controls.

**Vulnerable Code**:
```rust
// bridge.cairo, line 145
#[l1_handler]
fn handle_deposit(
    ref self: ContractState,
    from_address: felt252,  // Not validated!
    user: ContractAddress,
    amount: u256
) {
    let current_balance = self.balances.read(user);
    self.balances.write(user, current_balance + amount);
}

Attack Scenario:

Attacker deploys malicious L1 contract
Malicious contract calls starknetCore.sendMessageToL2(l2Contract, selector, [attacker_address, 1000000])
L2 handler processes message without checking sender
Attacker receives 1,000,000 tokens without depositing any funds
Protocol suffers infinite mint vulnerability

Recommendation: Validate from_address against authorized L1 bridge:

#[l1_handler]
fn handle_deposit(
    ref self: ContractState,
    from_address: felt252,
    user: ContractAddress,
    amount: u256
) {
    // Validate L1 sender
    let authorized_l1_bridge = self.l1_bridge_address.read();
    assert(from_address == authorized_l1_bridge, 'Unauthorized L1 sender');

    let current_balance = self.balances.read(user);
    self.balances.write(user, current_balance + amount);
}

References:

building-secure-contracts/not-so-smart-contracts/cairo/unchecked_l1_handler_from
Caracal detector: unchecked-l1-handler-from


---

## 7. Priority Guidelines

### Critical (Immediate Fix Required)
- Unchecked from_address in L1 handlers (infinite mint)
- L1-L2 address conversion issues (funds to zero address)

### High (Fix Before Deployment)
- Felt252 arithmetic overflow/underflow (balance manipulation)
- Missing signature replay protection (replay attacks)
- L1-L2 message failure without cancellation (locked funds)

### Medium (Address in Audit)
- Overconstrained L1-L2 interactions (trapped funds)

---

## 8. Testing Recommendations

### Unit Tests
```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_felt252_overflow() {
        // Test arithmetic edge cases
    }

    #[test]
    #[should_panic]
    fn test_unauthorized_l1_handler() {
        // Wrong from_address should fail
    }

    #[test]
    fn test_signature_replay_protection() {
        // Same signature twice should fail
    }
}

Integration Tests (with L1)

// Test full L1-L2 flow
#[test]
fn test_deposit_withdraw_roundtrip() {
    // 1. Deposit on L1
    // 2. Wait for L2 processing
    // 3. Verify L2 balance
    // 4. Withdraw to L1
    // 5. Verify L1 balance restored
}

Caracal CI Integration

# .github/workflows/security.yml
- name: Run Caracal
  run: |
    pip install caracal
    caracal detect src/ --fail-on high,critical

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/cairo/
Caracal: https://github.com/crytic/caracal
Cairo Documentation: https://book.cairo-lang.org/
StarkNet Documentation: https://docs.starknet.io/
OpenZeppelin Cairo Contracts: https://github.com/OpenZeppelin/cairo-contracts

10. Quick Reference Checklist

Before completing Cairo/StarkNet audit:

Arithmetic Safety (HIGH):

No felt252 used for balances/amounts (use u128/u256)
OR felt252 arithmetic has explicit bounds checking
Overflow/underflow scenarios tested

L1 Handler Security (CRITICAL):

ALL #[l1_handler] functions validate from_address
from_address compared against stored L1 contract address
Cannot bypass by deploying alternate L1 contract

L1-L2 Messaging (HIGH):

L1 bridge validates addresses < STARKNET_FIELD_PRIME
L1 bridge implements message cancellation
L2 handlers check from_address
Symmetric validation rules L1 ↔ L2
Full roundtrip flows tested

Signature Security (HIGH):

Signatures include nonce tracking
Nonce incremented after each use
Domain separator includes chain ID and contract address
Signature replay tested and prevented
Cross-chain replay prevented

Tool Usage:

Caracal scan completed with no critical findings
Unit tests cover all vulnerability scenarios
Integration tests verify L1-L2 flows
Testnet deployment tested before mainnet

/code-maturity-assessor

Source: `~/.claude/skills/tob-building-secure-contracts/skills/code-maturity-assessor/SKILL.md`

name: code-maturity-assessor description: Systematic code maturity assessment using Trail of Bits' 9-category framework. Analyzes codebase for arithmetic safety, auditing practices, access controls, complexity, decentralization, documentation, MEV risks, low-level code, and testing. Produces professional scorecard with evidence-based ratings and actionable recommendations.

Code Maturity Assessor

Purpose

Systematically assesses codebase maturity using Trail of Bits' 9-category framework. Provides evidence-based ratings and actionable recommendations.

Framework: Building Secure Contracts - Code Maturity Evaluation v0.1.0

How This Works

Phase 1: Discovery

Explores the codebase to understand:

Project structure and platform
Contract/module files
Test coverage
Documentation availability

Phase 2: Analysis

For each of 9 categories, I'll:

Search the code for relevant patterns
Read key files to assess implementation
Present findings with file references
Ask clarifying questions about processes I can't see in code
Determine rating based on criteria

Phase 3: Report

Generates:

Executive summary
Maturity scorecard (ratings for all 9 categories)
Detailed analysis with evidence
Priority-ordered improvement roadmap

Rating System

Missing (0): Not present/not implemented
Weak (1): Several significant improvements needed
Moderate (2): Adequate, can be improved
Satisfactory (3): Above average, minor improvements
Strong (4): Exceptional, only small improvements possible

Rating Logic:

ANY "Weak" criteria → Weak
NO "Weak" + SOME "Moderate" unmet → Moderate
ALL "Moderate" + SOME "Satisfactory" met → Satisfactory
ALL "Satisfactory" + exceptional practices → Strong

The 9 Categories

I assess 9 comprehensive categories covering all aspects of code maturity. For detailed criteria, analysis approaches, and rating thresholds, see ASSESSMENT_CRITERIA.md.

Quick Reference:

1. ARITHMETIC

Overflow protection mechanisms
Precision handling and rounding
Formula specifications
Edge case testing

2. AUDITING

Event definitions and coverage
Monitoring infrastructure
Incident response planning

3. AUTHENTICATION / ACCESS CONTROLS

Privilege management
Role separation
Access control testing
Key compromise scenarios

4. COMPLEXITY MANAGEMENT

Function scope and clarity
Cyclomatic complexity
Inheritance hierarchies
Code duplication

5. DECENTRALIZATION

Centralization risks
Upgrade control mechanisms
User opt-out paths
Timelock/multisig patterns

6. DOCUMENTATION

Specifications and architecture
Inline code documentation
User stories
Domain glossaries

7. TRANSACTION ORDERING RISKS

MEV vulnerabilities
Front-running protections
Slippage controls
Oracle security

8. LOW-LEVEL MANIPULATION

Assembly usage
Unsafe code sections
Low-level calls
Justification and testing

9. TESTING & VERIFICATION

Test coverage
Fuzzing and formal verification
CI/CD integration
Test quality

For complete assessment criteria including what I'll analyze, what I'll ask you, and detailed rating thresholds (WEAK/MODERATE/SATISFACTORY/STRONG), see ASSESSMENT_CRITERIA.md.

Example Output

When the assessment is complete, you'll receive a comprehensive maturity report including:

Executive Summary: Overall score, top 3 strengths, top 3 gaps, priority recommendations
Maturity Scorecard: Table with all 9 categories rated with scores and notes
Detailed Analysis: Category-by-category breakdown with evidence (file:line references)
Improvement Roadmap: Priority-ordered recommendations (CRITICAL/HIGH/MEDIUM) with effort estimates

For a complete example assessment report, see EXAMPLE_REPORT.md.

Assessment Process

When invoked, I will:

Explore codebase
- Find contract/module files
- Identify test files
- Locate documentation
Analyze each category
- Search for relevant code patterns
- Read key implementations
- Assess against criteria
- Collect evidence
Interactive assessment
- Present my findings with file references
- Ask about processes I can't see in code
- Discuss borderline cases
- Determine ratings together
Generate report
- Executive summary
- Maturity scorecard table
- Detailed category analysis with evidence
- Priority-ordered improvement roadmap

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"Found some findings, assessment complete"	Assessment requires evaluating ALL 9 categories	Complete assessment of all 9 categories with evidence for each
"I see events, auditing category looks good"	Events alone don't equal auditing maturity	Check logging comprehensiveness, testing, incident response processes
"Code looks simple, complexity is low"	Visual simplicity masks composition complexity	Analyze cyclomatic complexity, dependency depth, state machine transitions
"Not a DeFi protocol, MEV category doesn't apply"	MEV extends beyond DeFi (governance, NFTs, games)	Verify with transaction ordering analysis before declaring N/A
"No assembly found, low-level category is N/A"	Low-level risks include external calls, delegatecall, inline assembly	Search for all low-level patterns before skipping category
"This is taking too long"	Thorough assessment requires time per category	Complete all 9 categories, ask clarifying questions about off-chain processes
"I can rate this without evidence"	Ratings without file:line references = unsubstantiated claims	Collect concrete code evidence for every category assessment
"User will know what to improve"	Vague guidance = no action	Provide priority-ordered roadmap with specific improvements and effort estimates

Report Format

For detailed report structure and templates, see REPORT_FORMAT.md.

Structure:

Executive Summary
- Project name and platform
- Overall maturity (average rating)
- Top 3 strengths
- Top 3 critical gaps
- Priority recommendations
Maturity Scorecard
- Table with all 9 categories
- Ratings and scores
- Key findings notes
Detailed Analysis
- Per-category breakdown
- Evidence with file:line references
- Gaps and improvement actions
Improvement Roadmap
- CRITICAL (immediate)
- HIGH (1-2 months)
- MEDIUM (2-4 months)
- Effort estimates and impact

Ready to Begin

Estimated Time: 30-40 minutes

I'll need:

Access to full codebase
Your knowledge of processes (monitoring, incident response, team practices)
Context about the project (DeFi, NFT, infrastructure, etc.)

Let's assess this codebase!

/cosmos-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/cosmos-vulnerability-scanner/SKILL.md`

name: cosmos-vulnerability-scanner description: Scans Cosmos SDK blockchains for 9 consensus-critical vulnerabilities including non-determinism, incorrect signers, ABCI panics, and rounding errors. Use when auditing Cosmos chains or CosmWasm contracts.

Cosmos Vulnerability Scanner

1. Purpose

Systematically scan Cosmos SDK blockchain modules and CosmWasm smart contracts for platform-specific security vulnerabilities that can cause chain halts, consensus failures, or fund loss. This skill encodes 9 critical vulnerability patterns unique to Cosmos-based chains.

2. When to Use This Skill

Auditing Cosmos SDK modules (custom x/ modules)
Reviewing CosmWasm smart contracts (Rust)
Pre-launch security assessment of Cosmos chains
Investigating chain halt incidents
Validating consensus-critical code changes
Reviewing ABCI method implementations

3. Platform Detection

File Extensions & Indicators

Go files: .go, .proto
CosmWasm: .rs (Rust with cosmwasm imports)

Language/Framework Markers

// Cosmos SDK indicators
import (
    "github.com/cosmos/cosmos-sdk/types"
    sdk "github.com/cosmos/cosmos-sdk/types"
    "github.com/cosmos/cosmos-sdk/x/..."
)

// Common patterns
keeper.Keeper
sdk.Msg, GetSigners()
BeginBlocker, EndBlocker
CheckTx, DeliverTx
protobuf service definitions

// CosmWasm indicators
use cosmwasm_std::*;
#[entry_point]
pub fn execute(deps: DepsMut, env: Env, info: MessageInfo, msg: ExecuteMsg)

Project Structure

x/modulename/ - Custom modules
keeper/keeper.go - State management
types/msgs.go - Message definitions
abci.go - BeginBlocker/EndBlocker
handler.go - Message handlers (legacy)

Tool Support

CodeQL: Custom rules for non-determinism and panics
go vet, golangci-lint: Basic Go static analysis
Manual review: Critical for consensus issues

4. How This Skill Works

When invoked, I will:

Search your codebase for Cosmos SDK modules
Analyze each module for the 9 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Check message handlers for validation issues

5. Example Output

When vulnerabilities are found, you'll get a report like this:

=== COSMOS SDK VULNERABILITY SCAN RESULTS ===

Project: my-cosmos-chain
Files Scanned: 6 (.go)
Vulnerabilities Found: 2

---

[CRITICAL] Incorrect GetSigners()

---

## 5. Vulnerability Patterns (9 Patterns)

I check for 9 critical vulnerability patterns unique to CosmWasm. For detailed detection patterns, code examples, mitigations, and testing strategies, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).

### Pattern Summary:

1. **Missing Denom Validation** ⚠️ CRITICAL - Accepting arbitrary token denoms
2. **Insufficient Authorization** ⚠️ CRITICAL - Missing sender/admin validation
3. **Missing Balance Check** ⚠️ HIGH - Not verifying sufficient balances
4. **Improper Reply Handling** ⚠️ HIGH - Unsafe submessage reply processing
5. **Missing Reply ID Check** ⚠️ MEDIUM - Not validating reply IDs
6. **Improper IBC Packet Validation** ⚠️ CRITICAL - Unvalidated IBC packets
7. **Unvalidated Execute Message** ⚠️ HIGH - Missing message validation
8. **Integer Overflow** ⚠️ HIGH - Unchecked arithmetic operations
9. **Reentrancy via Submessages** ⚠️ MEDIUM - State changes before submessages

For complete vulnerability patterns with code examples, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).
## 5. Scanning Workflow

### Step 1: Platform Identification
1. Identify Cosmos SDK version (`go.mod`)
2. Locate custom modules (`x/*/`)
3. Find ABCI methods (`abci.go`, BeginBlocker, EndBlocker)
4. Identify message types (`types/msgs.go`, `.proto`)

### Step 2: Critical Path Analysis
Focus on consensus-critical code:
- BeginBlocker / EndBlocker implementations
- Message handlers (execute, DeliverTx)
- Keeper methods that modify state
- CheckTx priority logic

### Step 3: Non-Determinism Sweep
**This is the highest priority check for Cosmos chains.**

```bash
# Search for non-deterministic patterns
grep -r "range.*map\[" x/
grep -r "\bint\b\|\buint\b" x/ | grep -v "int32\|int64\|uint32\|uint64"
grep -r "float32\|float64" x/
grep -r "go func\|go routine" x/
grep -r "select {" x/
grep -r "time.Now()" x/
grep -r "rand\." x/

For each finding:

Verify it's in consensus-critical path
Confirm it causes non-determinism
Assess severity (chain halt vs data inconsistency)

Step 4: ABCI Method Analysis

Review BeginBlocker and EndBlocker:

Computational complexity bounded?
No unbounded iterations?
No nested loops over large collections?
Panic-prone operations validated?
Benchmarked with maximum state?

Step 5: Message Validation

For each message type:

GetSigners() address matches handler usage?
All error returns checked?
Priority set in CheckTx if critical?
Handler registered (or using v0.47+ auto-registration)?

Step 6: Arithmetic & Bookkeeping

sdk.Dec operations use multiply-before-divide?
Rounding favors protocol over users?
Custom bookkeeping synchronized with x/bank?
Invariant checks in place?

6. Reporting Format

Finding Template

## [CRITICAL] Non-Deterministic Map Iteration in EndBlocker

**Location**: `x/dex/abci.go:45-52`

**Description**:
The EndBlocker iterates over an unordered map to distribute rewards, causing different validators to process users in different orders and produce different state roots. This will halt the chain when validators fail to reach consensus.

**Vulnerable Code**:
```go
// abci.go, line 45
func EndBlocker(ctx sdk.Context, k keeper.Keeper) {
    rewards := k.GetPendingRewards(ctx)  // Returns map[string]sdk.Coins
    for user, amount := range rewards {  // NON-DETERMINISTIC ORDER
        k.bankKeeper.SendCoins(ctx, moduleAcc, user, amount)
    }
}

Attack Scenario:

Multiple users have pending rewards
Different validators iterate in different orders due to map randomization
If any reward distribution fails mid-iteration, state diverges
Validators produce different app hashes
Chain halts - cannot reach consensus

Recommendation: Sort map keys before iteration:

func EndBlocker(ctx sdk.Context, k keeper.Keeper) {
    rewards := k.GetPendingRewards(ctx)

    // Collect and sort keys for deterministic iteration
    users := make([]string, 0, len(rewards))
    for user := range rewards {
        users = append(users, user)
    }
    sort.Strings(users)  // Deterministic order

    // Process in sorted order
    for _, user := range users {
        k.bankKeeper.SendCoins(ctx, moduleAcc, user, rewards[user])
    }
}

References:

building-secure-contracts/not-so-smart-contracts/cosmos/non_determinism
Cosmos SDK docs: Determinism


---

## 7. Priority Guidelines

### Critical - CHAIN HALT Risk
- Non-determinism (any form)
- ABCI method panics
- Slow ABCI methods
- Incorrect GetSigners (allows unauthorized actions)

### High - Fund Loss Risk
- Missing error handling (bankKeeper.SendCoins)
- Broken bookkeeping (accounting mismatch)
- Missing message priority (oracle/emergency messages)

### Medium - Logic/DoS Risk
- Rounding errors (protocol value leakage)
- Unregistered message handlers (functionality broken)

---

## 8. Testing Recommendations

### Non-Determinism Testing
```bash
# Build for different architectures
GOARCH=amd64 go build
GOARCH=arm64 go build

# Run same operations, compare state roots
# Must be identical across architectures

# Fuzz test with concurrent operations
go test -fuzz=FuzzEndBlocker -parallel=10

ABCI Benchmarking

func BenchmarkBeginBlocker(b *testing.B) {
    ctx := setupMaximalState()  // Worst-case state
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        BeginBlocker(ctx, keeper)
    }

    // Must complete in < 1 second
    require.Less(b, b.Elapsed()/time.Duration(b.N), time.Second)
}

Invariant Testing

// Run invariants in integration tests
func TestInvariants(t *testing.T) {
    app := setupApp()

    // Execute operations
    app.DeliverTx(...)

    // Check invariants
    _, broken := keeper.AllInvariants()(app.Ctx)
    require.False(t, broken, "invariant violation detected")
}

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/cosmos/
Cosmos SDK Docs: https://docs.cosmos.network/
CodeQL for Go: https://codeql.github.com/docs/codeql-language-guides/codeql-for-go/
Cosmos Security Best Practices: https://github.com/cosmos/cosmos-sdk/blob/main/docs/docs/learn/advanced/17-determinism.md

10. Quick Reference Checklist

Before completing Cosmos chain audit:

Non-Determinism (CRITICAL):

No map iteration in consensus code
No platform-dependent types (int, uint, float)
No goroutines in message handlers/ABCI
No select statements with multiple channels
No rand, time.Now(), memory addresses
All serialization is deterministic

ABCI Methods (CRITICAL):

BeginBlocker/EndBlocker computationally bounded
No unbounded iterations
No nested loops over large collections
All panic-prone operations validated
Benchmarked with maximum state

Message Handling (HIGH):

GetSigners() matches handler address usage
All error returns checked
Critical messages prioritized in CheckTx
All message types registered

Arithmetic & Accounting (MEDIUM):

Multiply before divide pattern used
Rounding favors protocol
Custom bookkeeping synced with x/bank
Invariant checks implemented

Testing:

Cross-architecture builds tested
ABCI methods benchmarked
Invariants checked in CI
Integration tests cover all messages

/guidelines-advisor

Source: `~/.claude/skills/tob-building-secure-contracts/skills/guidelines-advisor/SKILL.md`

name: guidelines-advisor description: Smart contract development advisor based on Trail of Bits' best practices. Analyzes codebase to generate documentation/specifications, review architecture, check upgradeability patterns, assess implementation quality, identify pitfalls, review dependencies, and evaluate testing. Provides actionable recommendations.

Guidelines Advisor

Purpose

Systematically analyzes the codebase and provides guidance based on Trail of Bits' development guidelines:

Generate documentation and specifications (plain English descriptions, architectural diagrams, code documentation)
Optimize on-chain/off-chain architecture (only if applicable)
Review upgradeability patterns (if your project has upgrades)
Check delegatecall/proxy implementations (if present)
Assess implementation quality (functions, inheritance, events)
Identify common pitfalls
Review dependencies
Evaluate test suite and suggest improvements

Framework: Building Secure Contracts - Development Guidelines

How This Works

Phase 1: Discovery & Context

Explores the codebase to understand:

Project structure and platform
Contract/module files and their purposes
Existing documentation
Architecture patterns (proxies, upgrades, etc.)
Testing setup
Dependencies

Phase 2: Documentation Generation

Helps create:

Plain English system description
Architectural diagrams (using Slither printers for Solidity)
Code documentation recommendations (NatSpec for Solidity)

Phase 3: Architecture Analysis

Analyzes:

On-chain vs off-chain component distribution (if applicable)
Upgradeability approach (if applicable)
Delegatecall proxy patterns (if present)

Phase 4: Implementation Review

Assesses:

Function composition and clarity
Inheritance structure
Event logging practices
Common pitfalls presence
Dependencies quality
Testing coverage and techniques

Phase 5: Recommendations

Provides:

Prioritized improvement suggestions
Best practice guidance
Actionable next steps

Assessment Areas

I analyze 11 comprehensive areas covering all aspects of smart contract development. For detailed criteria, best practices, and specific checks, see ASSESSMENT_AREAS.md.

Quick Reference:

Documentation & Specifications
- Plain English system descriptions
- Architectural diagrams
- NatSpec completeness (Solidity)
- Documentation gaps identification
On-Chain vs Off-Chain Computation
- Complexity analysis
- Gas optimization opportunities
- Verification vs computation patterns
Upgradeability
- Migration vs upgradeability trade-offs
- Data separation patterns
- Upgrade procedure documentation
Delegatecall Proxy Pattern
- Storage layout consistency
- Initialization patterns
- Function shadowing risks
- Slither upgradeability checks
Function Composition
- Function size and clarity
- Logical grouping
- Modularity assessment
Inheritance
- Hierarchy depth/width
- Diamond problem risks
- Inheritance visualization
Events
- Critical operation coverage
- Event naming consistency
- Indexed parameters
Common Pitfalls
- Reentrancy patterns
- Integer overflow/underflow
- Access control issues
- Platform-specific vulnerabilities
Dependencies
- Library quality assessment
- Version management
- Dependency manager usage
- Copied code detection
Testing & Verification
- Coverage analysis
- Fuzzing techniques
- Formal verification
- CI/CD integration
Platform-Specific Guidance
- Solidity version recommendations
- Compiler warning checks
- Inline assembly warnings
- Platform-specific tools

For complete details on each area including what I'll check, analyze, and recommend, see ASSESSMENT_AREAS.md.

Example Output

When the analysis is complete, you'll receive comprehensive guidance covering:

System documentation with plain English descriptions
Architectural diagrams and documentation gaps
Architecture analysis (on-chain/off-chain, upgradeability, proxies)
Implementation review (functions, inheritance, events, pitfalls)
Dependencies and testing evaluation
Prioritized recommendations (CRITICAL, HIGH, MEDIUM, LOW)
Overall assessment and path to production

For a complete example analysis report, see EXAMPLE_REPORT.md.

Deliverables

I provide four comprehensive deliverable categories:

1. System Documentation

Plain English descriptions
Architectural diagrams
Documentation gaps analysis

2. Architecture Analysis

On-chain/off-chain assessment
Upgradeability review
Proxy pattern security review

3. Implementation Review

Function composition analysis
Inheritance assessment
Events coverage
Pitfall identification
Dependencies evaluation
Testing analysis

4. Prioritized Recommendations

CRITICAL (address immediately)
HIGH (address before deployment)
MEDIUM (address for production quality)
LOW (nice to have)

For detailed templates and examples of each deliverable, see DELIVERABLES.md.

Assessment Process

When invoked, I will:

Explore the codebase
- Identify all contract/module files
- Find existing documentation
- Locate test files
- Check for proxies/upgrades
- Identify dependencies
Generate documentation
- Create plain English system description
- Generate architectural diagrams (if tools available)
- Identify documentation gaps
Analyze architecture
- Assess on-chain/off-chain distribution (if applicable)
- Review upgradeability approach (if applicable)
- Audit proxy patterns (if present)
Review implementation
- Analyze functions, inheritance, events
- Check for common pitfalls
- Assess dependencies
- Evaluate testing
Provide recommendations
- Present findings with file references
- Ask clarifying questions about design decisions
- Suggest prioritized improvements
- Offer actionable next steps

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"System is simple, description covers everything"	Plain English descriptions miss security-critical details	Complete all 5 phases: documentation, architecture, implementation, dependencies, recommendations
"No upgrades detected, skip upgradeability section"	Upgradeability can be implicit (ownable patterns, delegatecall)	Search for proxy patterns, delegatecall, storage collisions before declaring N/A
"Not applicable" without verification	Premature scope reduction misses vulnerabilities	Verify with explicit codebase search before skipping any guideline section
"Architecture is straightforward, no analysis needed"	Obvious architectures have subtle trust boundaries	Analyze on-chain/off-chain distribution, access control flow, external dependencies
"Common pitfalls don't apply to this codebase"	Every codebase has common pitfalls	Systematically check all guideline pitfalls with grep/code search
"Tests exist, testing guideline is satisfied"	Test existence ≠ test quality	Check coverage, property-based tests, integration tests, failure cases
"I can provide generic best practices"	Generic advice isn't actionable	Provide project-specific findings with file:line references
"User knows what to improve from findings"	Findings without prioritization = no action plan	Generate prioritized improvement roadmap with specific next steps

Notes

I'll only analyze relevant sections (won't hallucinate about upgrades if not present)
I'll adapt to your platform (Solidity, Rust, Cairo, etc.)
I'll use available tools (Slither, etc.) but work without them if unavailable
I'll provide file references and line numbers for all findings
I'll ask questions about design decisions I can't infer from code

Ready to Begin

What I'll need:

Access to your codebase
Context about your project goals
Any existing documentation or specifications
Information about deployment plans

Let's analyze your codebase and improve it using Trail of Bits' best practices!

/secure-workflow-guide

Source: `~/.claude/skills/tob-building-secure-contracts/skills/secure-workflow-guide/SKILL.md`

name: secure-workflow-guide description: Guides through Trail of Bits' 5-step secure development workflow. Runs Slither scans, checks special features (upgradeability/ERC conformance/token integration), generates visual security diagrams, helps document security properties for fuzzing/verification, and reviews manual security areas.

Secure Workflow Guide

Purpose

Guides through Trail of Bits' secure development workflow - a 5-step process to enhance smart contract security throughout development.

Use this: On every check-in, before deployment, or when you want a security review

The 5-Step Workflow

Covers a security workflow including:

Step 1: Check for Known Security Issues

Run Slither with 70+ built-in detectors to find common vulnerabilities:

Parse findings by severity
Explain each issue with file references
Recommend fixes
Help triage false positives

Goal: Clean Slither report or documented triages

Step 2: Check Special Features

Detect and validate applicable features:

Upgradeability: slither-check-upgradeability (17 upgrade risks)
ERC conformance: slither-check-erc (6 common specs)
Token integration: Recommend token-integration-analyzer skill
Security properties: slither-prop for ERC20

Note: Only runs checks that apply to your codebase

Step 3: Visual Security Inspection

Generate 3 security diagrams:

Inheritance graph: Identify shadowing and C3 linearization issues
Function summary: Show visibility and access controls
Variables and authorization: Map who can write to state variables

Review each diagram for security concerns

Step 4: Document Security Properties

Help document critical security properties:

State machine transitions and invariants
Access control requirements
Arithmetic constraints and precision
External interaction safety
Standards conformance

Then set up testing:

Echidna: Property-based fuzzing with invariants
Manticore: Formal verification with symbolic execution
Custom Slither checks: Project-specific business logic

Note: Most important activity for security

Step 5: Manual Review Areas

Analyze areas automated tools miss:

Privacy: On-chain secrets, commit-reveal needs
Front-running: Slippage protection, ordering risks, MEV
Cryptography: Weak randomness, signature issues, hash collisions
DeFi interactions: Oracle manipulation, flash loans, protocol assumptions

Search codebase for these patterns and flag risks

For detailed instructions, commands, and explanations for each step, see WORKFLOW_STEPS.md.

How I Work

When invoked, I will:

Explore your codebase to understand structure
Run Step 1: Slither security scan
Detect and run Step 2: Special feature checks (only what applies)
Generate Step 3: Visual security diagrams
Guide Step 4: Security property documentation
Analyze Step 5: Manual review areas
Provide action plan: Prioritized fixes and next steps

Adapts based on:

What tools you have installed
What's applicable to your project
Where you are in development

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"Slither not available, I'll check manually"	Manual checking misses 70+ detector patterns	Install and run Slither, or document why it's blocked
"Can't generate diagrams, I'll describe the architecture"	Descriptions aren't visual - diagrams reveal patterns text misses	Execute slither --print commands, generate actual visual outputs
"No upgrades detected, skip upgradeability checks"	Proxies and upgrades are often implicit or planned	Verify with codebase search before skipping Step 2 checks
"Not a token, skip ERC checks"	Tokens can be integrated without obvious ERC inheritance	Check for token interactions, transfers, balances before skipping
"Can't set up Echidna now, suggesting it for later"	Property-based testing is Step 4, not optional	Document properties now, set up fuzzing infrastructure
"No DeFi interactions, skip oracle/flash loan checks"	DeFi patterns appear in unexpected places (price feeds, external calls)	Complete Step 5 manual review, search codebase for patterns
"This step doesn't apply to my project"	"Not applicable" without verification = missed vulnerabilities	Verify with explicit codebase search before declaring N/A
"I'll provide generic security advice instead of running workflow"	Generic advice isn't actionable, workflow finds specific issues	Execute all 5 steps, generate project-specific findings with file:line references

Example Output

When I complete the workflow, you'll get a comprehensive security report covering:

Step 1: Slither findings with severity, file references, and fix recommendations
Step 2: Special feature validation results (upgradeability, ERC conformance, etc.)
Step 3: Visual diagrams analyzing inheritance, functions, and state variable authorization
Step 4: Documented security properties and testing setup (Echidna/Manticore)
Step 5: Manual review findings (privacy, front-running, cryptography, DeFi risks)
Action plan: Critical/high/medium priority tasks with effort estimates
Workflow checklist: Progress on all 5 steps

For a complete example workflow report, see EXAMPLE_REPORT.md.

What You'll Get

Security Report:

Slither findings with severity and fixes
Special feature validation results
Visual diagrams (PNG/PDF)
Manual review findings

Action Plan:

Critical issues to fix immediately
Security properties to document
Testing to set up (Echidna/Manticore)
Manual areas to review

Workflow Checklist:

Getting Help

Trail of Bits Resources:

Office Hours: Every Tuesday (schedule)
Empire Hacking Slack: #crytic and #ethereum channels

Other Security:

Remember: Security is about more than smart contracts
Off-chain security (owner keys, infrastructure) equally critical

Ready to Start

Let me know when you're ready and I'll run through the workflow with your codebase!

/solana-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/solana-vulnerability-scanner/SKILL.md`

name: solana-vulnerability-scanner description: Scans Solana programs for 6 critical vulnerabilities including arbitrary CPI, improper PDA validation, missing signer/ownership checks, and sysvar spoofing. Use when auditing Solana/Anchor programs.

Solana Vulnerability Scanner

1. Purpose

Systematically scan Solana programs (native and Anchor framework) for platform-specific security vulnerabilities related to cross-program invocations, account validation, and program-derived addresses. This skill encodes 6 critical vulnerability patterns unique to Solana's account model.

2. When to Use This Skill

Auditing Solana programs (native Rust or Anchor)
Reviewing cross-program invocation (CPI) logic
Validating program-derived address (PDA) implementations
Pre-launch security assessment of Solana protocols
Reviewing account validation patterns
Assessing instruction introspection logic

3. Platform Detection

File Extensions & Indicators

Rust files: .rs

Language/Framework Markers

// Native Solana program indicators
use solana_program::{
    account_info::AccountInfo,
    entrypoint,
    entrypoint::ProgramResult,
    pubkey::Pubkey,
    program::invoke,
    program::invoke_signed,
};

entrypoint!(process_instruction);

// Anchor framework indicators
use anchor_lang::prelude::*;

#[program]
pub mod my_program {
    pub fn initialize(ctx: Context<Initialize>) -> Result<()> {
        // Program logic
    }
}

#[derive(Accounts)]
pub struct Initialize<'info> {
    #[account(mut)]
    pub authority: Signer<'info>,
}

// Common patterns
AccountInfo, Pubkey
invoke(), invoke_signed()
Signer<'info>, Account<'info>
#[account(...)] with constraints
seeds, bump

Project Structure

programs/*/src/lib.rs - Program implementation
Anchor.toml - Anchor configuration
Cargo.toml with solana-program or anchor-lang
tests/ - Program tests

Tool Support

Trail of Bits Solana Lints: Rust linters for Solana
Installation: Add to Cargo.toml
anchor test: Built-in testing framework
Solana Test Validator: Local testing environment

4. How This Skill Works

When invoked, I will:

Search your codebase for Solana/Anchor programs
Analyze each program for the 6 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Check account validation and CPI security

5. Example Output

5. Vulnerability Patterns (6 Patterns)

I check for 6 critical vulnerability patterns unique to Solana. For detailed detection patterns, code examples, mitigations, and testing strategies, see VULNERABILITY_PATTERNS.md.

Pattern Summary:

Arbitrary CPI ⚠️ CRITICAL - User-controlled program IDs in CPI calls
Improper PDA Validation ⚠️ CRITICAL - Using create_program_address without canonical bump
Missing Ownership Check ⚠️ HIGH - Deserializing accounts without owner validation
Missing Signer Check ⚠️ CRITICAL - Authority operations without is_signer check
Sysvar Account Check ⚠️ HIGH - Spoofed sysvar accounts (pre-Solana 1.8.1)
Improper Instruction Introspection ⚠️ MEDIUM - Absolute indexes allowing reuse

For complete vulnerability patterns with code examples, see VULNERABILITY_PATTERNS.md.

5. Scanning Workflow

Step 1: Platform Identification

Verify Solana program (native or Anchor)
Check Solana version (1.8.1+ for sysvar security)
Locate program source (programs/*/src/lib.rs)
Identify framework (native vs Anchor)

Step 2: CPI Security Review

# Find all CPI calls
rg "invoke\(|invoke_signed\(" programs/

# Check for program ID validation before each
# Should see program ID checks immediately before invoke

For each CPI:

Program ID validated before invocation
Cannot pass user-controlled program accounts
Anchor: Uses Program<'info, T> type

Step 3: PDA Validation Check

# Find PDA usage
rg "find_program_address|create_program_address" programs/
rg "seeds.*bump" programs/

# Anchor: Check for seeds constraints
rg "#\[account.*seeds" programs/

For each PDA:

Uses find_program_address() or Anchor seeds constraint
Bump seed stored and reused
Not using user-provided bump

Step 4: Account Validation Sweep

# Find account deserialization
rg "try_from_slice|try_deserialize" programs/

# Should see owner checks before deserialization
rg "\.owner\s*==|\.owner\s*!=" programs/

For each account used:

Owner validated before deserialization
Signer check for authority accounts
Anchor: Uses Account<'info, T> and Signer<'info>

Step 5: Instruction Introspection Review

# Find instruction introspection usage
rg "load_instruction_at|load_current_index|get_instruction_relative" programs/

# Check for checked versions
rg "load_instruction_at_checked|load_current_index_checked" programs/

Using checked functions (Solana 1.8.1+)
Using relative indexing
Proper correlation validation

Step 6: Trail of Bits Solana Lints

# Add to Cargo.toml
[dependencies]
solana-program = "1.17"  # Use latest version

[lints.clippy]
# Enable Solana-specific lints
# (Trail of Bits solana-lints if available)

6. Reporting Format

Finding Template

## [CRITICAL] Arbitrary CPI - Unchecked Program ID

**Location**: `programs/vault/src/lib.rs:145-160` (withdraw function)

**Description**:
The `withdraw` function performs a CPI to transfer SPL tokens without validating that the provided `token_program` account is actually the SPL Token program. An attacker can provide a malicious program that appears to perform a transfer but actually steals tokens or performs unauthorized actions.

**Vulnerable Code**:
```rust
// lib.rs, line 145
pub fn withdraw(ctx: Context<Withdraw>, amount: u64) -> Result<()> {
    let token_program = &ctx.accounts.token_program;

    // WRONG: No validation of token_program.key()!
    invoke(
        &spl_token::instruction::transfer(...),
        &[
            ctx.accounts.vault.to_account_info(),
            ctx.accounts.destination.to_account_info(),
            ctx.accounts.authority.to_account_info(),
            token_program.to_account_info(),  // UNVALIDATED
        ],
    )?;
    Ok(())
}

Attack Scenario:

Attacker deploys malicious "token program" that logs transfer instruction but doesn't execute it
Attacker calls withdraw() providing malicious program as token_program
Vault's authority signs the transaction
Malicious program receives CPI with vault's signature
Malicious program can now impersonate vault and drain real tokens

Recommendation: Use Anchor's Program<'info, Token> type:

use anchor_spl::token::{Token, Transfer};

#[derive(Accounts)]
pub struct Withdraw<'info> {
    #[account(mut)]
    pub vault: Account<'info, TokenAccount>,
    #[account(mut)]
    pub destination: Account<'info, TokenAccount>,
    pub authority: Signer<'info>,
    pub token_program: Program<'info, Token>,  // Validates program ID automatically
}

pub fn withdraw(ctx: Context<Withdraw>, amount: u64) -> Result<()> {
    let cpi_accounts = Transfer {
        from: ctx.accounts.vault.to_account_info(),
        to: ctx.accounts.destination.to_account_info(),
        authority: ctx.accounts.authority.to_account_info(),
    };

    let cpi_ctx = CpiContext::new(
        ctx.accounts.token_program.to_account_info(),
        cpi_accounts,
    );

    anchor_spl::token::transfer(cpi_ctx, amount)?;
    Ok(())
}

References:

building-secure-contracts/not-so-smart-contracts/solana/arbitrary_cpi
Trail of Bits lint: unchecked-cpi-program-id


---

## 7. Priority Guidelines

### Critical (Immediate Fix Required)
- Arbitrary CPI (attacker-controlled program execution)
- Improper PDA validation (account spoofing)
- Missing signer check (unauthorized access)

### High (Fix Before Launch)
- Missing ownership check (fake account data)
- Sysvar account check (authentication bypass, pre-1.8.1)

### Medium (Address in Audit)
- Improper instruction introspection (logic bypass)

---

## 8. Testing Recommendations

### Unit Tests
```rust
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    #[should_panic]
    fn test_rejects_wrong_program_id() {
        // Provide wrong program ID, should fail
    }

    #[test]
    #[should_panic]
    fn test_rejects_non_canonical_pda() {
        // Provide non-canonical bump, should fail
    }

    #[test]
    #[should_panic]
    fn test_requires_signer() {
        // Call without signature, should fail
    }
}

Integration Tests (Anchor)

import * as anchor from "@coral-xyz/anchor";

describe("security tests", () => {
  it("rejects arbitrary CPI", async () => {
    const fakeTokenProgram = anchor.web3.Keypair.generate();

    try {
      await program.methods
        .withdraw(amount)
        .accounts({
          tokenProgram: fakeTokenProgram.publicKey, // Wrong program
        })
        .rpc();

      assert.fail("Should have rejected fake program");
    } catch (err) {
      // Expected to fail
    }
  });
});

Solana Test Validator

# Run local validator for testing
solana-test-validator

# Deploy and test program
anchor test

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/solana/
Trail of Bits Solana Lints: https://github.com/trailofbits/solana-lints
Anchor Documentation: https://www.anchor-lang.com/
Solana Program Library: https://github.com/solana-labs/solana-program-library
Solana Cookbook: https://solanacookbook.com/

10. Quick Reference Checklist

Before completing Solana program audit:

CPI Security (CRITICAL):

ALL CPI calls validate program ID before invoke()
Cannot use user-provided program accounts
Anchor: Uses Program<'info, T> type

PDA Security (CRITICAL):

PDAs use find_program_address() or Anchor seeds constraint
Bump seed stored and reused (not user-provided)
PDA accounts validated against canonical address

Account Validation (HIGH):

ALL accounts check owner before deserialization
Native: Validates account.owner == expected_program_id
Anchor: Uses Account<'info, T> type

Signer Validation (CRITICAL):

ALL authority accounts check is_signer
Native: Validates account.is_signer == true
Anchor: Uses Signer<'info> type

Sysvar Security (HIGH):

Using Solana 1.8.1+
Using checked functions: load_instruction_at_checked()
Sysvar addresses validated

Instruction Introspection (MEDIUM):

Using relative indexes for correlation
Proper validation between related instructions
Cannot reuse same instruction across multiple calls

Testing:

Unit tests cover all account validation
Integration tests with malicious inputs
Local validator testing completed
Trail of Bits lints enabled and passing

/substrate-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/substrate-vulnerability-scanner/SKILL.md`

name: substrate-vulnerability-scanner description: Scans Substrate/Polkadot pallets for 7 critical vulnerabilities including arithmetic overflow, panic DoS, incorrect weights, and bad origin checks. Use when auditing Substrate runtimes or FRAME pallets.

Substrate Vulnerability Scanner

1. Purpose

Systematically scan Substrate runtime modules (pallets) for platform-specific security vulnerabilities that can cause node crashes, DoS attacks, or unauthorized access. This skill encodes 7 critical vulnerability patterns unique to Substrate/FRAME-based chains.

2. When to Use This Skill

Auditing custom Substrate pallets
Reviewing FRAME runtime code
Pre-launch security assessment of Substrate chains (Polkadot parachains, standalone chains)
Validating dispatchable extrinsic functions
Reviewing weight calculation functions
Assessing unsigned transaction validation logic

3. Platform Detection

File Extensions & Indicators

Rust files: .rs

Language/Framework Markers

// Substrate/FRAME indicators
#[pallet]
pub mod pallet {
    use frame_support::pallet_prelude::*;
    use frame_system::pallet_prelude::*;

    #[pallet::config]
    pub trait Config: frame_system::Config { }

    #[pallet::call]
    impl<T: Config> Pallet<T> {
        #[pallet::weight(10_000)]
        pub fn example_function(origin: OriginFor<T>) -> DispatchResult { }
    }
}

// Common patterns
DispatchResult, DispatchError
ensure!, ensure_signed, ensure_root
StorageValue, StorageMap, StorageDoubleMap
#[pallet::storage]
#[pallet::call]
#[pallet::weight]
#[pallet::validate_unsigned]

Project Structure

pallets/*/lib.rs - Pallet implementations
runtime/lib.rs - Runtime configuration
benchmarking.rs - Weight benchmarks
Cargo.toml with frame-* dependencies

Tool Support

cargo-fuzz: Fuzz testing for Rust
test-fuzz: Property-based testing framework
benchmarking framework: Built-in weight calculation
try-runtime: Runtime migration testing

4. How This Skill Works

When invoked, I will:

Search your codebase for Substrate pallets
Analyze each pallet for the 7 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Check weight calculations and origin validation

5. Vulnerability Patterns (7 Critical Patterns)

I check for 7 critical vulnerability patterns unique to Substrate/FRAME. For detailed detection patterns, code examples, mitigations, and testing strategies, see VULNERABILITY_PATTERNS.md.

Pattern Summary:

Arithmetic Overflow ⚠️ CRITICAL
- Direct +, -, *, / operators wrap in release mode
- Must use checked_* or saturating_* methods
- Affects balance/token calculations, reward/fee math
Don't Panic ⚠️ CRITICAL - DoS
- Panics cause node to stop processing blocks
- No unwrap(), expect(), array indexing without bounds check
- All user input must be validated with ensure!
Weights and Fees ⚠️ CRITICAL - DoS
- Incorrect weights allow spam attacks
- Fixed weights for variable-cost operations enable DoS
- Must use benchmarking framework, bound all input parameters
Verify First, Write Last ⚠️ HIGH (Pre-v0.9.25)
- Storage writes before validation persist on error (pre-v0.9.25)
- Pattern: validate → write → emit event
- Upgrade to v0.9.25+ or use manual #[transactional]
Unsigned Transaction Validation ⚠️ HIGH
- Insufficient validation allows spam/replay attacks
- Prefer signed transactions
- If unsigned: validate parameters, replay protection, authenticate source
Bad Randomness ⚠️ MEDIUM
- pallet_randomness_collective_flip vulnerable to collusion
- Must use BABE randomness (pallet_babe::RandomnessFromOneEpochAgo)
- Use random(subject) not random_seed()
Bad Origin ⚠️ CRITICAL
- ensure_signed allows any user for privileged operations
- Must use ensure_root or custom origins (ForceOrigin, AdminOrigin)
- Origin types must be properly configured in runtime

For complete vulnerability patterns with code examples, see VULNERABILITY_PATTERNS.md.

6. Scanning Workflow

Step 1: Platform Identification

Verify Substrate/FRAME framework usage
Check Substrate version (v0.9.25+ has transactional storage)
Locate pallet implementations (pallets/*/lib.rs)
Identify runtime configuration (runtime/lib.rs)

Step 2: Dispatchable Analysis

For each #[pallet::call] function:

Arithmetic: Uses checked/saturating operations?
Panics: No unwrap/expect/indexing?
Weights: Proportional to cost, bounded inputs?
Origin: Appropriate validation level?
Validation: All checks before storage writes?

Step 3: Panic Sweep

# Search for panic-prone patterns
rg "unwrap\(\)" pallets/
rg "expect\(" pallets/
rg "\[.*\]" pallets/  # Array indexing
rg " as u\d+" pallets/  # Type casts
rg "\.unwrap_or" pallets/

Step 4: Arithmetic Safety Check

# Find direct arithmetic
rg " \+ |\+=| - |-=| \* |\*=| / |/=" pallets/

# Should find checked/saturating alternatives instead
rg "checked_add|checked_sub|checked_mul|checked_div" pallets/
rg "saturating_add|saturating_sub|saturating_mul" pallets/

Step 5: Weight Analysis

Run benchmarking: cargo test --features runtime-benchmarks
Verify weights match computational cost
Check for bounded input parameters
Review weight calculation functions

Step 6: Origin & Privilege Review

# Find privileged operations
rg "ensure_signed" pallets/ | grep -E "pause|emergency|admin|force|sudo"

# Should use ensure_root or custom origins
rg "ensure_root|ForceOrigin|AdminOrigin" pallets/

Step 7: Testing Review

Unit tests cover all dispatchables
Fuzz tests for panic conditions
Benchmarks for weight calculation
try-runtime tests for migrations

7. Priority Guidelines

Critical (Immediate Fix Required)

Arithmetic overflow (token creation, balance manipulation)
Panic DoS (node crash risk)
Bad origin (unauthorized privileged operations)

High (Fix Before Launch)

Incorrect weights (DoS via spam)
Verify-first violations (state corruption, pre-v0.9.25)
Unsigned validation issues (spam, replay attacks)

Medium (Address in Audit)

Bad randomness (manipulation possible but limited impact)

Fuzz Testing

// Use test-fuzz for property-based testing
#[cfg(test)]
mod tests {
    use test_fuzz::test_fuzz;

    #[test_fuzz]
    fn fuzz_transfer(from: AccountId, to: AccountId, amount: u128) {
        // Should never panic
        let _ = Pallet::transfer(from, to, amount);
    }

    #[test_fuzz]
    fn fuzz_no_panics(call: Call) {
        // No dispatchable should panic
        let _ = call.dispatch(origin);
    }
}

Benchmarking

# Run benchmarks to generate weights
cargo build --release --features runtime-benchmarks
./target/release/node benchmark pallet \
    --chain dev \
    --pallet pallet_example \
    --extrinsic "*" \
    --steps 50 \
    --repeat 20

try-runtime

# Test runtime upgrades
cargo build --release --features try-runtime
try-runtime --runtime ./target/release/wbuild/runtime.wasm \
    on-runtime-upgrade live --uri wss://rpc.polkadot.io

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/substrate/
Substrate Documentation: https://docs.substrate.io/
FRAME Documentation: https://paritytech.github.io/substrate/master/frame_support/
test-fuzz: https://github.com/trailofbits/test-fuzz
Substrate StackExchange: https://substrate.stackexchange.com/

10. Quick Reference Checklist

Before completing Substrate pallet audit:

Arithmetic Safety (CRITICAL):

No direct +, -, *, / operators in dispatchables
All arithmetic uses checked_* or saturating_*
Type conversions use try_into() with error handling

Panic Prevention (CRITICAL):

No unwrap() or expect() in dispatchables
No direct array/slice indexing without bounds check
All user inputs validated with ensure!
Division operations check for zero divisor

Weights & DoS (CRITICAL):

Weights proportional to computational cost
Input parameters have maximum bounds
Benchmarking used to determine weights
No free (zero-weight) expensive operations

Access Control (CRITICAL):

Privileged operations use ensure_root or custom origins
ensure_signed only for user-level operations
Origin types properly configured in runtime
Sudo pallet removed before production

Storage Safety (HIGH):

Using Substrate v0.9.25+ OR manual #[transactional]
Validation before storage writes
Events emitted after successful operations

Other (MEDIUM):

Unsigned transactions use signed alternative if possible
If unsigned: proper validation, replay protection, authentication
BABE randomness used (not RandomnessCollectiveFlip)
Randomness uses random(subject) not random_seed()

Testing:

Unit tests for all dispatchables
Fuzz tests to find panics
Benchmarks generated and verified
try-runtime tests for migrations

/token-integration-analyzer

Source: `~/.claude/skills/tob-building-secure-contracts/skills/token-integration-analyzer/SKILL.md`

name: token-integration-analyzer description: Token integration and implementation analyzer based on Trail of Bits' token integration checklist. Analyzes token implementations for ERC20/ERC721 conformity, checks for 20+ weird token patterns, assesses contract composition and owner privileges, performs on-chain scarcity analysis, and evaluates how protocols handle non-standard tokens. Context-aware for both token implementations and token integrations.

Token Integration Analyzer

Purpose

Systematically analyzes the codebase for token-related security concerns using Trail of Bits' token integration checklist:

Token Implementations: Analyze if your token follows ERC20/ERC721 standards or has non-standard behavior
Token Integrations: Analyze how your protocol handles arbitrary tokens, including weird/non-standard tokens
On-chain Analysis: Query deployed contracts for scarcity, distribution, and configuration
Security Assessment: Identify risks from 20+ known weird token patterns

Framework: Building Secure Contracts - Token Integration Checklist + Weird ERC20 Database

How This Works

Phase 1: Context Discovery

Determines analysis context:

Token implementation: Are you building a token contract?
Token integration: Does your protocol interact with external tokens?
Platform: Ethereum, other EVM chains, or different platform?
Token types: ERC20, ERC721, or both?

Phase 2: Slither Analysis (if Solidity)

For Solidity projects, I'll help run:

slither-check-erc - ERC conformity checks
slither --print human-summary - Complexity and upgrade analysis
slither --print contract-summary - Function analysis
slither-prop - Property generation for testing

Phase 3: Code Analysis

Analyzes:

Contract composition and complexity
Owner privileges and centralization risks
ERC20/ERC721 conformity
Known weird token patterns
Integration safety patterns

Phase 4: On-chain Analysis (if deployed)

If you provide a contract address, I'll query:

Token scarcity and distribution
Total supply and holder concentration
Exchange listings
On-chain configuration

Phase 5: Risk Assessment

Provides:

Identified vulnerabilities
Non-standard behaviors
Integration risks
Prioritized recommendations

Assessment Categories

I check 10 comprehensive categories covering all aspects of token security. For detailed criteria, patterns, and checklists, see ASSESSMENT_CATEGORIES.md.

Quick Reference:

General Considerations - Security reviews, team transparency, security contacts
Contract Composition - Complexity analysis, SafeMath usage, function count, entry points
Owner Privileges - Upgradeability, minting, pausability, blacklisting, team accountability
ERC20 Conformity - Return values, metadata, decimals, race conditions, Slither checks
ERC20 Extension Risks - External calls/hooks, transfer fees, rebasing/yield-bearing tokens
Token Scarcity Analysis - Supply distribution, holder concentration, exchange distribution, flash loan/mint risks
Weird ERC20 Patterns (24 patterns including):
- Reentrant calls (ERC777 hooks)
- Missing return values (USDT, BNB, OMG)
- Fee on transfer (STA, PAXG)
- Balance modifications outside transfers (Ampleforth, Compound)
- Upgradable tokens (USDC, USDT)
- Flash mintable (DAI)
- Blocklists (USDC, USDT)
- Pausable tokens (BNB, ZIL)
- Approval race protections (USDT, KNC)
- Revert on approval/transfer to zero address
- Revert on zero value approvals/transfers
- Multiple token addresses
- Low decimals (USDC: 6, Gemini: 2)
- High decimals (YAM-V2: 24)
- transferFrom with src == msg.sender
- Non-string metadata (MKR)
- No revert on failure (ZRX, EURS)
- Revert on large approvals (UNI, COMP)
- Code injection via token name
- Unusual permit function (DAI, RAI, GLM)
- Transfer less than amount (cUSDCv3)
- ERC-20 native currency representation (Celo, Polygon, zkSync)
- And more...
Token Integration Safety - Safe transfer patterns, balance verification, allowlists, wrappers, defensive patterns
ERC721 Conformity - Transfer to 0x0, safeTransferFrom, metadata, ownerOf, approval clearing, token ID immutability
ERC721 Common Risks - onERC721Received reentrancy, safe minting, burning approval clearing

Example Output

When analysis is complete, you'll receive a comprehensive report structured as follows:

=== TOKEN INTEGRATION ANALYSIS REPORT ===

Project: MultiToken DEX
Token Analyzed: Custom Reward Token + Integration Safety
Platform: Solidity 0.8.20
Analysis Date: March 15, 2024

---

## EXECUTIVE SUMMARY

Token Type: ERC20 Implementation + Protocol Integrating External Tokens
Overall Risk Level: MEDIUM
Critical Issues: 2
High Issues: 3
Medium Issues: 4

**Top Concerns:**
⚠ Fee-on-transfer tokens not handled correctly
⚠ No validation for missing return values (USDT compatibility)
⚠ Owner can mint unlimited tokens without cap

**Recommendation:** Address critical/high issues before mainnet launch.

---

## 1. GENERAL CONSIDERATIONS

✓ Contract audited by CertiK (June 2023)
✓ Team contactable via security@project.com
✗ No security mailing list for critical announcements

**Risk:** Users won't be notified of critical issues
**Action:** Set up security@project.com mailing list

---

## 2. CONTRACT COMPOSITION

### Complexity Analysis

**Slither human-summary Results:**
- 456 lines of code
- Cyclomatic complexity: Average 6, Max 14 (transferWithFee())
- 12 functions, 8 state variables
- Inheritance depth: 3 (moderate)

✓ Contract complexity is reasonable
⚠ transferWithFee() complexity high (14) - consider splitting

### SafeMath Usage

✓ Using Solidity 0.8.20 (built-in overflow protection)
✓ No unchecked blocks found
✓ All arithmetic operations protected

### Non-Token Functions

**Functions Beyond ERC20:**
- setFeeCollector() - Admin function ✓
- setTransferFee() - Admin function ✓
- withdrawFees() - Admin function ✓
- pause()/unpause() - Emergency functions ✓

⚠ 4 non-token functions (acceptable but adds complexity)

### Address Entry Points

✓ Single contract address
✓ No proxy with multiple entry points
✓ No token migration creating address confusion

**Status:** PASS

---

## 3. OWNER PRIVILEGES

### Upgradeability

⚠ Contract uses TransparentUpgradeableProxy
**Risk:** Owner can change contract logic at any time

**Current Implementation:**
- ProxyAdmin: 0x1234... (2/3 multisig) ✓
- Timelock: None ✗

**Recommendation:** Add 48-hour timelock to all upgrades

### Minting Capabilities

❌ CRITICAL: Unlimited minting
File: contracts/RewardToken.sol:89
```solidity
function mint(address to, uint256 amount) external onlyOwner {
    _mint(to, amount);  // No cap!
}

Risk: Owner can inflate supply arbitrarily Fix: Add maximum supply cap or rate-limited minting

Pausability

✓ Pausable pattern implemented (OpenZeppelin) ✓ Only owner can pause ⚠ Paused state affects all transfers (including existing holders)

Risk: Owner can trap all user funds Mitigation: Use multi-sig for pause function (already implemented ✓)

Blacklisting

✗ No blacklist functionality Assessment: Good - no centralized censorship risk

Team Transparency

✓ Team members public (team.md) ✓ Company registered in Switzerland ✓ Accountable and contactable

Status: ACCEPTABLE

4. ERC20 CONFORMITY

Slither-check-erc Results

Command: slither-check-erc . RewardToken --erc erc20

✓ transfer returns bool ✓ transferFrom returns bool ✓ name, decimals, symbol present ✓ decimals returns uint8 (value: 18) ✓ Race condition mitigated (increaseAllowance/decreaseAllowance)

Status: FULLY COMPLIANT

slither-prop Test Results

Command: slither-prop . --contract RewardToken

Generated 12 properties, all passed: ✓ Transfer doesn't change total supply ✓ Allowance correctly updates ✓ Balance updates match transfer amounts ✓ No balance manipulation possible [... 8 more properties ...]

Echidna fuzzing: 50,000 runs, no violations ✓

Status: EXCELLENT

5. WEIRD TOKEN PATTERN ANALYSIS

Integration Safety Check

Your Protocol Integrates 5 External Tokens:

USDT (0xdac17f9...)
USDC (0xa0b86991...)
DAI (0x6b175474...)
WETH (0xc02aaa39...)
UNI (0x1f9840a8...)

Critical Issues Found

❌ Pattern 7.2: Missing Return Values Found in: USDT integration File: contracts/Vault.sol:156

IERC20(usdt).transferFrom(msg.sender, address(this), amount);
// No return value check! USDT doesn't return bool

Risk: Silent failures on USDT transfers Exploit: User appears to deposit, but no tokens moved Fix: Use OpenZeppelin SafeERC20 wrapper

❌ Pattern 7.3: Fee on Transfer Risk for: Any token with transfer fees File: contracts/Vault.sol:170

uint256 balanceBefore = IERC20(token).balanceOf(address(this));
token.transferFrom(msg.sender, address(this), amount);
shares = amount * exchangeRate;  // WRONG! Should use actual received amount

Risk: Accounting mismatch if token takes fees Exploit: User credited more shares than tokens deposited Fix: Calculate shares from balanceAfter - balanceBefore

Known Non-Standard Token Handling

✓ USDC: Properly handled (SafeERC20, 6 decimals accounted for) ⚠ DAI: permit() function not used (opportunity for gas savings) ✗ USDT: Missing return value not handled (CRITICAL) ✓ WETH: Standard wrapper, properly handled ⚠ UNI: Large approval handling not checked (reverts >= 2^96)

[... Additional sections for remaining analysis categories ...]


For complete report template and deliverables format, see [REPORT_TEMPLATES.md](resources/REPORT_TEMPLATES.md).

---

## Rationalizations (Do Not Skip)

| Rationalization | Why It's Wrong | Required Action |
|-----------------|----------------|-----------------|
| "Token looks standard, ERC20 checks pass" | 20+ weird token patterns exist beyond ERC20 compliance | Check ALL weird token patterns from database (missing return, revert on zero, hooks, etc.) |
| "Slither shows no issues, integration is safe" | Slither detects some patterns, misses integration logic | Complete manual analysis of all 5 token integration criteria |
| "No fee-on-transfer detected, skip that check" | Fee-on-transfer can be owner-controlled or conditional | Test all transfer scenarios, check for conditional fee logic |
| "Balance checks exist, handling is safe" | Balance checks alone don't protect against all weird tokens | Verify safe transfer wrappers, revert handling, approval patterns |
| "Token is deployed by reputable team, assume standard" | Reputation doesn't guarantee standard behavior | Analyze actual code and on-chain behavior, don't trust assumptions |
| "Integration uses OpenZeppelin, must be safe" | OpenZeppelin libraries don't protect against weird external tokens | Verify defensive patterns around all external token calls |
| "Can't run Slither, skipping automated analysis" | Slither provides critical ERC conformance checks | Manually verify all slither-check-erc criteria or document why blocked |
| "This pattern seems fine" | Intuition misses subtle token integration bugs | Systematically check all 20+ weird token patterns with code evidence |

---

## Deliverables

When analysis is complete, I'll provide:

1. **Compliance Checklist** - Checkboxes for all assessment categories
2. **Weird Token Pattern Analysis** - Presence/absence of all 24 patterns with risk levels and evidence
3. **On-chain Analysis Report** (if applicable) - Holder distribution, exchange listings, configuration
4. **Integration Safety Assessment** (if applicable) - Safe transfer usage, defensive patterns, weird token handling
5. **Prioritized Recommendations** - CRITICAL/HIGH/MEDIUM/LOW issues with specific fixes

Complete deliverable templates available in [REPORT_TEMPLATES.md](resources/REPORT_TEMPLATES.md).

---

## Ready to Begin

**What I'll need**:
- Your codebase
- Context: Token implementation or integration?
- Token type: ERC20, ERC721, or both?
- Contract address (if deployed and want on-chain analysis)
- RPC endpoint (if querying on-chain)

Let's analyze your token implementation or integration for security risks!

/ton-vulnerability-scanner

Source: `~/.claude/skills/tob-building-secure-contracts/skills/ton-vulnerability-scanner/SKILL.md`

name: ton-vulnerability-scanner description: Scans TON (The Open Network) smart contracts for 3 critical vulnerabilities including integer-as-boolean misuse, fake Jetton contracts, and forward TON without gas checks. Use when auditing FunC contracts.

TON Vulnerability Scanner

1. Purpose

Systematically scan TON blockchain smart contracts written in FunC for platform-specific security vulnerabilities related to boolean logic, Jetton token handling, and gas management. This skill encodes 3 critical vulnerability patterns unique to TON's architecture.

2. When to Use This Skill

Auditing TON smart contracts (FunC language)
Reviewing Jetton token implementations
Validating token transfer notification handlers
Pre-launch security assessment of TON dApps
Reviewing gas forwarding logic
Assessing boolean condition handling

3. Platform Detection

File Extensions & Indicators

FunC files: .fc, .func

Language/Framework Markers

;; FunC contract indicators
#include "imports/stdlib.fc";

() recv_internal(int my_balance, int msg_value, cell in_msg_full, slice in_msg_body) impure {
    ;; Contract logic
}

() recv_external(slice in_msg) impure {
    ;; External message handler
}

;; Common patterns
send_raw_message()
load_uint(), load_msg_addr(), load_coins()
begin_cell(), end_cell(), store_*()
transfer_notification operation
op::transfer, op::transfer_notification
.store_uint().store_slice().store_coins()

Project Structure

contracts/*.fc - FunC contract source
wrappers/*.ts - TypeScript wrappers
tests/*.spec.ts - Contract tests
ton.config.ts or wasm.config.ts - TON project config

Tool Support

TON Blueprint: Development framework for TON
toncli: CLI tool for TON contracts
ton-compiler: FunC compiler
Manual review primarily (limited automated tools)

4. How This Skill Works

When invoked, I will:

Search your codebase for FunC/Tact contracts
Analyze each contract for the 3 vulnerability patterns
Report findings with file references and severity
Provide fixes for each identified issue
Check replay protection and sender validation

5. Example Output

When vulnerabilities are found, you'll get a report like this:

=== TON VULNERABILITY SCAN RESULTS ===

Project: my-ton-contract
Files Scanned: 3 (.fc, .tact)
Vulnerabilities Found: 2

---

[CRITICAL] Missing Replay Protection
File: contracts/wallet.fc:45
Pattern: No sequence number or nonce validation


---

## 5. Vulnerability Patterns (3 Patterns)

I check for 3 critical vulnerability patterns unique to TON. For detailed detection patterns, code examples, mitigations, and testing strategies, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).

### Pattern Summary:

1. **Missing Sender Check** ⚠️ CRITICAL - No sender validation on privileged operations
2. **Integer Overflow** ⚠️ CRITICAL - Unchecked arithmetic in FunC
3. **Improper Gas Handling** ⚠️ HIGH - Insufficient gas reservations

For complete vulnerability patterns with code examples, see [VULNERABILITY_PATTERNS.md](resources/VULNERABILITY_PATTERNS.md).
## 5. Scanning Workflow

### Step 1: Platform Identification
1. Verify FunC language (`.fc` or `.func` files)
2. Check for TON Blueprint or toncli project structure
3. Locate contract source files
4. Identify Jetton-related contracts

### Step 2: Boolean Logic Review
```bash
# Find boolean-like variables
rg "int.*is_|int.*has_|int.*flag|int.*enabled" contracts/

# Check for positive integers used as booleans
rg "= 1;|return 1;" contracts/ | grep -E "is_|has_|flag|enabled|valid"

# Look for NOT operations on boolean-like values
rg "~.*\(|~ " contracts/

For each boolean:

Uses -1 for true, 0 for false
NOT using 1 or other positive integers
Logic operations work correctly

Step 3: Jetton Handler Analysis

# Find transfer_notification handlers
rg "transfer_notification|op::transfer_notification" contracts/

For each Jetton handler:

Validates sender address
Sender checked against stored Jetton wallet address
Cannot trust forward_payload without sender validation
Has admin function to set Jetton wallet address

Step 4: Gas/Forward Amount Review

# Find forward amount usage
rg "forward_ton_amount|forward_amount" contracts/
rg "load_coins\(\)" contracts/

# Find send_raw_message calls
rg "send_raw_message" contracts/

For each outgoing message:

Forward amounts are fixed/bounded
OR user-provided amounts validated against msg_value
Cannot drain contract balance
Appropriate send_raw_message flags used

Step 5: Manual Review

TON contracts require thorough manual review:

Boolean logic with ~, &, | operators
Message parsing and validation
Gas economics and fee calculations
Storage operations and data serialization

6. Reporting Format

Finding Template

## [CRITICAL] Fake Jetton Contract - Missing Sender Validation

**Location**: `contracts/staking.fc:85-95` (recv_internal, transfer_notification handler)

**Description**:
The `transfer_notification` operation handler does not validate that the sender is the expected Jetton wallet contract. Any attacker can send a fake `transfer_notification` message claiming to have transferred tokens, crediting themselves without actually depositing any Jettons.

**Vulnerable Code**:
```func
// staking.fc, line 85
if (op == op::transfer_notification) {
    int jetton_amount = in_msg_body~load_coins();
    slice from_user = in_msg_body~load_msg_addr();

    ;; WRONG: No validation of sender_address!
    ;; Attacker can claim any jetton_amount

    credit_user(from_user, jetton_amount);
}

Attack Scenario:

Attacker deploys malicious contract
Malicious contract sends transfer_notification message to staking contract
Message claims attacker transferred 1,000,000 Jettons
Staking contract credits attacker without checking sender
Attacker can now withdraw from contract or gain benefits without depositing

Proof of Concept:

// Attacker sends fake transfer_notification
const attackerContract = await blockchain.treasury("attacker");

await stakingContract.sendInternalMessage(attackerContract.getSender(), {
  op: OP_CODES.TRANSFER_NOTIFICATION,
  jettonAmount: toNano("1000000"), // Fake amount
  fromUser: attackerContract.address,
});

// Attacker successfully credited without sending real Jettons
const balance = await stakingContract.getUserBalance(attackerContract.address);
expect(balance).toEqual(toNano("1000000")); // Attack succeeded

Recommendation: Store expected Jetton wallet address and validate sender:

global slice jetton_wallet_address;

() recv_internal(...) impure {
    load_data();  ;; Load jetton_wallet_address from storage

    slice cs = in_msg_full.begin_parse();
    int flags = cs~load_uint(4);
    slice sender_address = cs~load_msg_addr();

    int op = in_msg_body~load_uint(32);

    if (op == op::transfer_notification) {
        ;; CRITICAL: Validate sender
        throw_unless(error::wrong_jetton_wallet,
            equal_slices(sender_address, jetton_wallet_address));

        int jetton_amount = in_msg_body~load_coins();
        slice from_user = in_msg_body~load_msg_addr();

        ;; Safe to credit user
        credit_user(from_user, jetton_amount);
    }
}

References:

building-secure-contracts/not-so-smart-contracts/ton/fake_jetton_contract


---

## 7. Priority Guidelines

### Critical (Immediate Fix Required)
- Fake Jetton contract (unauthorized minting/crediting)

### High (Fix Before Launch)
- Integer as boolean (logic errors, broken conditions)
- Forward TON without gas check (balance drainage)

---

## 8. Testing Recommendations

### Unit Tests
```typescript
import { Blockchain } from "@ton/sandbox";
import { toNano } from "ton-core";

describe("Security tests", () => {
  let blockchain: Blockchain;
  let contract: Contract;

  beforeEach(async () => {
    blockchain = await Blockchain.create();
    contract = blockchain.openContract(await Contract.fromInit());
  });

  it("should use correct boolean values", async () => {
    // Test that TRUE = -1, FALSE = 0
    const result = await contract.getFlag();
    expect(result).toEqual(-1n); // True
    expect(result).not.toEqual(1n); // Not 1!
  });

  it("should reject fake jetton transfer", async () => {
    const attacker = await blockchain.treasury("attacker");

    const result = await contract.send(
      attacker.getSender(),
      { value: toNano("0.05") },
      {
        $$type: "TransferNotification",
        query_id: 0n,
        amount: toNano("1000"),
        from: attacker.address,
      }
    );

    expect(result.transactions).toHaveTransaction({
      success: false, // Should reject
    });
  });

  it("should validate gas for forward amount", async () => {
    const result = await contract.send(
      user.getSender(),
      { value: toNano("0.01") }, // Insufficient gas
      {
        $$type: "Transfer",
        to: recipient.address,
        forward_ton_amount: toNano("1"), // Trying to forward 1 TON
      }
    );

    expect(result.transactions).toHaveTransaction({
      success: false,
    });
  });
});

Integration Tests

// Test with real Jetton wallet
it("should accept transfer from real jetton wallet", async () => {
  // Deploy actual Jetton minter and wallet
  const jettonMinter = await blockchain.openContract(JettonMinter.create());
  const userJettonWallet = await jettonMinter.getWalletAddress(user.address);

  // Set jetton wallet in contract
  await contract.setJettonWallet(userJettonWallet);

  // Real transfer from Jetton wallet
  const result = await userJettonWallet.sendTransfer(
    user.getSender(),
    contract.address,
    toNano("100"),
    {}
  );

  expect(result.transactions).toHaveTransaction({
    to: contract.address,
    success: true,
  });
});

9. Additional Resources

Building Secure Contracts: building-secure-contracts/not-so-smart-contracts/ton/
TON Documentation: https://docs.ton.org/
FunC Documentation: https://docs.ton.org/develop/func/overview
TON Blueprint: https://github.com/ton-org/blueprint
Jetton Standard: https://github.com/ton-blockchain/TEPs/blob/master/text/0074-jettons-standard.md

10. Quick Reference Checklist

Before completing TON contract audit:

Boolean Logic (HIGH):

All boolean values use -1 (true) and 0 (false)
NO positive integers (1, 2, etc.) used as booleans
Functions returning booleans return -1 for true
Boolean logic with ~, &, | uses correct values
Tests verify boolean operations work correctly

Jetton Security (CRITICAL):

transfer_notification handler validates sender address
Sender checked against stored Jetton wallet address
Jetton wallet address stored during initialization
Admin function to set/update Jetton wallet
Cannot trust forward_payload without sender validation
Tests with fake Jetton contracts verify rejection

Gas & Forward Amounts (HIGH):

Forward TON amounts are fixed/bounded
OR user-provided amounts validated: msg_value >= tx_fee + forward_amount
Contract balance protected from drainage
Appropriate send_raw_message flags used
Tests verify cannot drain contract with excessive forward amounts

Testing:

Unit tests for all three vulnerability types
Integration tests with real Jetton contracts
Gas cost analysis for all operations
Testnet deployment before mainnet

/claude-in-chrome-troubleshooting

Source: `~/.claude/skills/tob-claude-in-chrome-troubleshooting/skills/claude-in-chrome-troubleshooting/SKILL.md`

name: claude-in-chrome-troubleshooting description: Diagnose and fix Claude in Chrome MCP extension connectivity issues. Use when mcpclaude-in-chrome* tools fail, return "Browser extension is not connected", or behave erratically.

Claude in Chrome MCP Troubleshooting

Use this skill when Claude in Chrome MCP tools fail to connect or work unreliably.

When to Use

mcp__claude-in-chrome__* tools fail with "Browser extension is not connected"
Browser automation works erratically or times out
After updating Claude Code or Claude.app
When switching between Claude Code CLI and Claude.app (Cowork)
Native host process is running but MCP tools still fail

When NOT to Use

Linux or Windows users - This skill covers macOS-specific paths and tools (~/Library/Application Support/, osascript)
General Chrome automation issues unrelated to the Claude extension
Claude.app desktop issues (not browser-related)
Network connectivity problems
Chrome extension installation issues (use Chrome Web Store support)

The Claude.app vs Claude Code Conflict (Primary Issue)

Background: When Claude.app added Cowork support (browser automation from the desktop app), it introduced a competing native messaging host that conflicts with Claude Code CLI.

Two Native Hosts, Two Socket Formats

Component	Native Host Binary	Socket Location
Claude.app (Cowork)	`/Applications/Claude.app/Contents/Helpers/chrome-native-host`	`/tmp/claude-mcp-browser-bridge-$USER/<PID>.sock`
Claude Code CLI	`~/.local/share/claude/versions/<version> --chrome-native-host`	`$TMPDIR/claude-mcp-browser-bridge-$USER` (single file)

Why They Conflict

Both register native messaging configs in Chrome:
- com.anthropic.claude_browser_extension.json → Claude.app helper
- com.anthropic.claude_code_browser_extension.json → Claude Code wrapper
Chrome extension requests a native host by name
If the wrong config is active, the wrong binary runs
The wrong binary creates sockets in a format/location the MCP client doesn't expect
Result: "Browser extension is not connected" even though everything appears to be running

The Fix: Disable Claude.app's Native Host

If you use Claude Code CLI for browser automation (not Cowork):

# Disable the Claude.app native messaging config
mv ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_browser_extension.json \
   ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_browser_extension.json.disabled

# Ensure the Claude Code config exists and points to the wrapper
cat ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_code_browser_extension.json

If you use Cowork (Claude.app) for browser automation:

# Disable the Claude Code native messaging config
mv ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_code_browser_extension.json \
   ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_code_browser_extension.json.disabled

You cannot use both simultaneously. Pick one and disable the other.

Toggle Script

Add this to ~/.zshrc or run directly:

chrome-mcp-toggle() {
    local CONFIG_DIR=~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts
    local CLAUDE_APP="$CONFIG_DIR/com.anthropic.claude_browser_extension.json"
    local CLAUDE_CODE="$CONFIG_DIR/com.anthropic.claude_code_browser_extension.json"

    if [[ -f "$CLAUDE_APP" && ! -f "$CLAUDE_APP.disabled" ]]; then
        # Currently using Claude.app, switch to Claude Code
        mv "$CLAUDE_APP" "$CLAUDE_APP.disabled"
        [[ -f "$CLAUDE_CODE.disabled" ]] && mv "$CLAUDE_CODE.disabled" "$CLAUDE_CODE"
        echo "Switched to Claude Code CLI"
        echo "Restart Chrome and Claude Code to apply"
    elif [[ -f "$CLAUDE_CODE" && ! -f "$CLAUDE_CODE.disabled" ]]; then
        # Currently using Claude Code, switch to Claude.app
        mv "$CLAUDE_CODE" "$CLAUDE_CODE.disabled"
        [[ -f "$CLAUDE_APP.disabled" ]] && mv "$CLAUDE_APP.disabled" "$CLAUDE_APP"
        echo "Switched to Claude.app (Cowork)"
        echo "Restart Chrome to apply"
    else
        echo "Current state unclear. Check configs:"
        ls -la "$CONFIG_DIR"/com.anthropic*.json* 2>/dev/null
    fi
}

Usage: chrome-mcp-toggle then restart Chrome (and Claude Code if switching to CLI).

Quick Diagnosis

# 1. Which native host binary is running?
ps aux | grep chrome-native-host | grep -v grep
# Claude.app: /Applications/Claude.app/Contents/Helpers/chrome-native-host
# Claude Code: ~/.local/share/claude/versions/X.X.X --chrome-native-host

# 2. Where is the socket?
# For Claude Code (single file in TMPDIR):
ls -la "$(getconf DARWIN_USER_TEMP_DIR)/claude-mcp-browser-bridge-$USER" 2>&1

# For Claude.app (directory with PID files):
ls -la /tmp/claude-mcp-browser-bridge-$USER/ 2>&1

# 3. What's the native host connected to?
lsof -U 2>&1 | grep claude-mcp-browser-bridge

# 4. Which configs are active?
ls ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic*.json

Critical Insight

MCP connects at startup. If the browser bridge wasn't ready when Claude Code started, the connection will fail for the entire session. The fix is usually: ensure Chrome + extension are running with correct config, THEN restart Claude Code.

Full Reset Procedure (Claude Code CLI)

# 1. Ensure correct config is active
mv ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_browser_extension.json \
   ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_browser_extension.json.disabled 2>/dev/null

# 2. Update the wrapper to use latest Claude Code version
cat > ~/.claude/chrome/chrome-native-host << 'EOF'
#!/bin/bash
LATEST=$(ls -t ~/.local/share/claude/versions/ 2>/dev/null | head -1)
exec "$HOME/.local/share/claude/versions/$LATEST" --chrome-native-host
EOF
chmod +x ~/.claude/chrome/chrome-native-host

# 3. Kill existing native host and clean sockets
pkill -f chrome-native-host
rm -rf /tmp/claude-mcp-browser-bridge-$USER/
rm -f "$(getconf DARWIN_USER_TEMP_DIR)/claude-mcp-browser-bridge-$USER"

# 4. Restart Chrome
osascript -e 'quit app "Google Chrome"' && sleep 2 && open -a "Google Chrome"

# 5. Wait for Chrome, click Claude extension icon

# 6. Verify correct native host is running
ps aux | grep chrome-native-host | grep -v grep
# Should show: ~/.local/share/claude/versions/X.X.X --chrome-native-host

# 7. Verify socket exists
ls -la "$(getconf DARWIN_USER_TEMP_DIR)/claude-mcp-browser-bridge-$USER"

# 8. Restart Claude Code

Other Common Causes

Multiple Chrome Profiles

If you have the Claude extension installed in multiple Chrome profiles, each spawns its own native host and socket. This can cause confusion.

Fix: Only enable the Claude extension in ONE Chrome profile.

Multiple Claude Code Sessions

Running multiple Claude Code instances can cause socket conflicts.

Fix: Only run one Claude Code session at a time, or use /mcp to reconnect after closing other sessions.

Hardcoded Version in Wrapper

The wrapper at ~/.claude/chrome/chrome-native-host may have a hardcoded version that becomes stale after updates.

Diagnosis:

cat ~/.claude/chrome/chrome-native-host
# Bad: exec "/Users/.../.local/share/claude/versions/2.0.76" --chrome-native-host
# Good: Uses $(ls -t ...) to find latest

Fix: Use the dynamic version wrapper shown in the Full Reset Procedure above.

TMPDIR Not Set

Claude Code expects TMPDIR to be set to find the socket.

# Check
echo $TMPDIR
# Should show: /var/folders/XX/.../T/

# Fix: Add to ~/.zshrc
export TMPDIR="${TMPDIR:-$(getconf DARWIN_USER_TEMP_DIR)}"

Diagnostic Deep Dive

echo "=== Native Host Binary ==="
ps aux | grep chrome-native-host | grep -v grep

echo -e "\n=== Socket (Claude Code location) ==="
ls -la "$(getconf DARWIN_USER_TEMP_DIR)/claude-mcp-browser-bridge-$USER" 2>&1

echo -e "\n=== Socket (Claude.app location) ==="
ls -la /tmp/claude-mcp-browser-bridge-$USER/ 2>&1

echo -e "\n=== Native Host Open Files ==="
pgrep -f chrome-native-host | xargs -I {} lsof -p {} 2>/dev/null | grep -E "(sock|claude-mcp)"

echo -e "\n=== Active Native Messaging Configs ==="
ls ~/Library/Application\ Support/Google/Chrome/NativeMessagingHosts/com.anthropic*.json 2>/dev/null

echo -e "\n=== Custom Wrapper Contents ==="
cat ~/.claude/chrome/chrome-native-host 2>/dev/null || echo "No custom wrapper"

echo -e "\n=== TMPDIR ==="
echo "TMPDIR=$TMPDIR"
echo "Expected: $(getconf DARWIN_USER_TEMP_DIR)"

File Reference

File	Purpose
`~/.claude/chrome/chrome-native-host`	Custom wrapper script for Claude Code
`/Applications/Claude.app/Contents/Helpers/chrome-native-host`	Claude.app (Cowork) native host
`~/.local/share/claude/versions/<version>`	Claude Code binary (run with `--chrome-native-host`)
`~/Library/Application Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_browser_extension.json`	Config for Claude.app native host
`~/Library/Application Support/Google/Chrome/NativeMessagingHosts/com.anthropic.claude_code_browser_extension.json`	Config for Claude Code native host
`$TMPDIR/claude-mcp-browser-bridge-$USER`	Socket file (Claude Code)
`/tmp/claude-mcp-browser-bridge-$USER/<PID>.sock`	Socket files (Claude.app)

Summary

Primary issue: Claude.app (Cowork) and Claude Code use different native hosts with incompatible socket formats
Fix: Disable the native messaging config for whichever one you're NOT using
After any fix: Must restart Chrome AND Claude Code (MCP connects at startup)
One profile: Only have Claude extension in one Chrome profile
One session: Only run one Claude Code instance

Original skill by @jeffzwang from @ExaAILabs. Enhanced and updated for current versions of Claude Desktop and Claude Code.

/constant-time-analysis

Source: `~/.claude/skills/tob-constant-time-analysis/skills/constant-time-analysis/SKILL.md`

name: constant-time-analysis description: Detects timing side-channel vulnerabilities in cryptographic code. Use when implementing or reviewing crypto code, encountering division on secrets, secret-dependent branches, or constant-time programming questions in C, C++, Go, Rust, Swift, Java, Kotlin, C#, PHP, JavaScript, TypeScript, Python, or Ruby.

Constant-Time Analysis

Analyze cryptographic code to detect operations that leak secret data through execution timing variations.

When to Use

User writing crypto code? ──yes──> Use this skill
         │
         no
         │
         v
User asking about timing attacks? ──yes──> Use this skill
         │
         no
         │
         v
Code handles secret keys/tokens? ──yes──> Use this skill
         │
         no
         │
         v
Skip this skill

Concrete triggers:

User implements signature, encryption, or key derivation
Code contains / or % operators on secret-derived values
User mentions "constant-time", "timing attack", "side-channel", "KyberSlash"
Reviewing functions named sign, verify, encrypt, decrypt, derive_key

When NOT to Use

Non-cryptographic code (business logic, UI, etc.)
Public data processing where timing leaks don't matter
Code that doesn't handle secrets, keys, or authentication tokens
High-level API usage where timing is handled by the library

Language Selection

Based on the file extension or language context, refer to the appropriate guide:

Language	File Extensions	Guide
C, C++	`.c`, `.h`, `.cpp`, `.cc`, `.hpp`	references/compiled.md
Go	`.go`	references/compiled.md
Rust	`.rs`	references/compiled.md
Swift	`.swift`	references/swift.md
Java	`.java`	references/vm-compiled.md
Kotlin	`.kt`, `.kts`	references/kotlin.md
C#	`.cs`	references/vm-compiled.md
PHP	`.php`	references/php.md
JavaScript	`.js`, `.mjs`, `.cjs`	references/javascript.md
TypeScript	`.ts`, `.tsx`	references/javascript.md
Python	`.py`	references/python.md
Ruby	`.rb`	references/ruby.md

Quick Start

# Analyze any supported file type
uv run {baseDir}/ct_analyzer/analyzer.py <source_file>

# Include conditional branch warnings
uv run {baseDir}/ct_analyzer/analyzer.py --warnings <source_file>

# Filter to specific functions
uv run {baseDir}/ct_analyzer/analyzer.py --func 'sign|verify' <source_file>

# JSON output for CI
uv run {baseDir}/ct_analyzer/analyzer.py --json <source_file>

Native Compiled Languages Only (C, C++, Go, Rust)

# Cross-architecture testing (RECOMMENDED)
uv run {baseDir}/ct_analyzer/analyzer.py --arch x86_64 crypto.c
uv run {baseDir}/ct_analyzer/analyzer.py --arch arm64 crypto.c

# Multiple optimization levels
uv run {baseDir}/ct_analyzer/analyzer.py --opt-level O0 crypto.c
uv run {baseDir}/ct_analyzer/analyzer.py --opt-level O3 crypto.c

VM-Compiled Languages (Java, Kotlin, C#)

# Analyze Java bytecode
uv run {baseDir}/ct_analyzer/analyzer.py CryptoUtils.java

# Analyze Kotlin bytecode (Android/JVM)
uv run {baseDir}/ct_analyzer/analyzer.py CryptoUtils.kt

# Analyze C# IL
uv run {baseDir}/ct_analyzer/analyzer.py CryptoUtils.cs

Note: Java, Kotlin, and C# compile to bytecode (JVM/CIL) that runs on a virtual machine with JIT compilation. The analyzer examines the bytecode directly, not the JIT-compiled native code. The --arch and --opt-level flags do not apply to these languages.

Swift (iOS/macOS)

# Analyze Swift for native architecture
uv run {baseDir}/ct_analyzer/analyzer.py crypto.swift

# Analyze for specific architecture (iOS devices)
uv run {baseDir}/ct_analyzer/analyzer.py --arch arm64 crypto.swift

# Analyze with different optimization levels
uv run {baseDir}/ct_analyzer/analyzer.py --opt-level O0 crypto.swift

Note: Swift compiles to native code like C/C++/Go/Rust, so it uses assembly-level analysis and supports --arch and --opt-level flags.

Prerequisites

Language	Requirements
C, C++, Go, Rust	Compiler in PATH (`gcc`/`clang`, `go`, `rustc`)
Swift	Xcode or Swift toolchain (`swiftc` in PATH)
Java	JDK with `javac` and `javap` in PATH
Kotlin	Kotlin compiler (`kotlinc`) + JDK (`javap`) in PATH
C#	.NET SDK + `ilspycmd` (`dotnet tool install -g ilspycmd`)
PHP	PHP with VLD extension or OPcache
JavaScript/TypeScript	Node.js in PATH
Python	Python 3.x in PATH
Ruby	Ruby with `--dump=insns` support

macOS users: Homebrew installs Java and .NET as "keg-only". You must add them to your PATH:

# For Java (add to ~/.zshrc)
export PATH="/opt/homebrew/opt/openjdk@21/bin:$PATH"

# For .NET tools (add to ~/.zshrc)
export PATH="$HOME/.dotnet/tools:$PATH"

See references/vm-compiled.md for detailed setup instructions and troubleshooting.

Quick Reference

Problem	Detection	Fix
Division on secrets	DIV, IDIV, SDIV, UDIV	Barrett reduction or multiply-by-inverse
Branch on secrets	JE, JNE, BEQ, BNE	Constant-time selection (cmov, bit masking)
Secret comparison	Early-exit memcmp	Use `crypto/subtle` or constant-time compare
Weak RNG	rand(), mt_rand, Math.random	Use crypto-secure RNG
Table lookup by secret	Array subscript on secret index	Bit-sliced lookups

Interpreting Results

PASSED - No variable-time operations detected.

FAILED - Dangerous instructions found. Example:

[ERROR] SDIV
  Function: decompose_vulnerable
  Reason: SDIV has early termination optimization; execution time depends on operand values

Verifying Results (Avoiding False Positives)

CRITICAL: Not every flagged operation is a vulnerability. The tool has no data flow analysis - it flags ALL potentially dangerous operations regardless of whether they involve secrets.

For each flagged violation, ask: Does this operation's input depend on secret data?

Identify the secret inputs to the function (private keys, plaintext, signatures, tokens)
Trace data flow from the flagged instruction back to inputs

Common false positive patterns:

// FALSE POSITIVE: Division uses public constant, not secret
int num_blocks = data_len / 16;  // data_len is length, not content

// TRUE POSITIVE: Division involves secret-derived value
int32_t q = secret_coef / GAMMA2;  // secret_coef from private key

Document your analysis for each flagged item

Quick Triage Questions

Question	If Yes	If No
Is the operand a compile-time constant?	Likely false positive	Continue
Is the operand a public parameter (length, count)?	Likely false positive	Continue
Is the operand derived from key/plaintext/secret?	TRUE POSITIVE	Likely false positive
Can an attacker influence the operand value?	TRUE POSITIVE	Likely false positive

Limitations

Static Analysis Only: Analyzes assembly/bytecode, not runtime behavior. Cannot detect cache timing or microarchitectural side-channels.
No Data Flow Analysis: Flags all dangerous operations regardless of whether they process secrets. Manual review required.
Compiler/Runtime Variations: Different compilers, optimization levels, and runtime versions may produce different output.

Real-World Impact

KyberSlash (2023): Division instructions in post-quantum ML-KEM implementations allowed key recovery
Lucky Thirteen (2013): Timing differences in CBC padding validation enabled plaintext recovery
RSA Timing Attacks: Early implementations leaked private key bits through division timing

References

Cryptocoding Guidelines - Defensive coding for crypto
KyberSlash - Division timing in post-quantum crypto
BearSSL Constant-Time - Practical constant-time techniques

/interpreting-culture-index

Source: `~/.claude/skills/tob-culture-index/skills/interpreting-culture-index/SKILL.md`

name: interpreting-culture-index description: Use when interpreting Culture Index surveys, CI profiles, behavioral assessments, or personality data. Supports individual interpretation, team composition (gas/brake/glue), burnout detection, profile comparison, hiring profiles, manager coaching, interview transcript analysis for trait prediction, candidate debrief, onboarding planning, and conflict mediation. Handles PDF vision or JSON input.

<essential_principles>

Culture Index measures behavioral traits, not intelligence or skills. There is no "good" or "bad" profile.

**Never compare absolute trait values between people.**

The 0-10 scale is just a ruler. What matters is distance from the red arrow (population mean at 50th percentile). The arrow position varies between surveys based on EU.

Why the arrow moves: Higher EU scores cause the arrow to plot further right; lower EU causes it to plot further left. This does not affect validity—we always measure distance from wherever the arrow lands.

Wrong: "Dan has higher autonomy than Jim because his A is 8 vs 5" Right: "Dan is +3 centiles from his arrow; Jim is +1 from his arrow"

Always ask: Where is the arrow, and how far is the dot from it?

**Survey = who you ARE. Job = who you're TRYING TO BE.**

"You can't send a duck to Eagle school." Traits are hardwired—you can only modify behaviors temporarily, at the cost of energy.

Top graph (Survey Traits): Hardwired by age 12-16. Does not change. Writing with your dominant hand.
Bottom graph (Job Behaviors): Adaptive behavior at work. Can change. Writing with your non-dominant hand.

Large differences between graphs indicate behavior modification, which drains energy and causes burnout if sustained 3-6+ months.

**Distance from arrow determines trait strength.**

Distance	Label	Percentile	Interpretation
On arrow	Normative	50th	Flexible, situational
±1 centile	Tendency	~67th	Easier to modify
±2 centiles	Pronounced	~84th	Noticeable difference
±4+ centiles	Extreme	~98th	Hardwired, compulsive, predictable

Key insight: Every 2 centiles of distance = 1 standard deviation.

Extreme traits drive extreme results but are harder to modify and less relatable to average people.

**L (Logic) and I (Ingenuity) use absolute values.**

Unlike A, B, C, D, you CAN compare L and I scores directly between people:

Logic 8 means "High Logic" regardless of arrow position
Ingenuity 2 means "Low Ingenuity" for anyone

Only these two traits break the "no absolute comparison" rule.

</essential_principles>

<input_formats>

JSON (Use if available)

If JSON data is already extracted, use it directly:

import json
with open("person_name.json") as f:
    profile = json.load(f)

JSON format:

{
  "name": "Person Name",
  "archetype": "Architect",
  "survey": {
    "eu": 21,
    "arrow": 2.3,
    "a": [5, 2.7],
    "b": [0, -2.3],
    "c": [1, -1.3],
    "d": [3, 0.7],
    "logic": [5, null],
    "ingenuity": [2, null]
  },
  "job": { "..." : "same structure as survey" },
  "analysis": {
    "energy_utilization": 148,
    "status": "stress"
  }
}

Note: Trait values are [absolute, relative_to_arrow] tuples. Use the relative value for interpretation.

Check same directory as PDF for matching .json file, or ask user if they have extracted JSON.

PDF Input (MUST EXTRACT FIRST)

⚠️ NEVER use visual estimation for trait values. Visual estimation has 20-30% error rate.

When given a PDF:

Check if JSON already exists (same directory as PDF, or ask user)

If not, run extraction with verification:

uv run {baseDir}/scripts/extract_pdf.py --verify /path/to/file.pdf [output.json]

Visually confirm the verification summary matches the PDF
Use the extracted JSON for interpretation

If uv is not installed: Stop and instruct user to install it (brew install uv or pip install uv). Do NOT fall back to vision.

PDF Vision (Reference Only)

Vision may be used ONLY to verify extracted values look reasonable, NOT to extract trait scores.

</input_formats>

Step 0: Do you have JSON or PDF?

If JSON provided or found: Use it directly (skip extraction)
- Check same directory as PDF for .json file with matching name
- Check if user provided JSON path

If only PDF: Run extraction script with --verify flag

uv run {baseDir}/scripts/extract_pdf.py --verify /path/to/file.pdf [output.json]

If extraction fails: Report error, do NOT fall back to vision

Step 1: What data do you have?

CI Survey JSON → Proceed to Step 2
CI Survey PDF → Extract first (Step 0), then proceed to Step 2
Interview transcript only → Go to option 8 (predict traits from interview)
No data yet → "Please provide Culture Index profile (PDF or JSON) or interview transcript"

Step 2: What would you like to do?

Profile Analysis:

Interpret an individual profile - Understand one person's traits, strengths, and challenges
Analyze team composition - Assess gas/brake/glue balance, identify gaps
Detect burnout signals - Compare Survey vs Job, flag stress/frustration
Compare multiple profiles - Understand compatibility, collaboration dynamics
Get motivator recommendations - Learn how to engage and retain someone

Hiring & Candidates: 6. Define hiring profile - Determine ideal CI traits for a role 7. Coach manager on direct report - Adjust management style based on both profiles 8. Predict traits from interview - Analyze interview transcript to estimate CI traits 9. Interview debrief - Assess candidate fit based on predicted traits

Team Development: 10. Plan onboarding - Design first 90 days based on new hire and team profiles 11. Mediate conflict - Understand friction between two people using their profiles

Provide the profile data (JSON or PDF) and select an option, or describe what you need.

Response	Workflow
"extract", "parse pdf", "convert pdf", "get json from pdf"	`workflows/extract-from-pdf.md`
1, "individual", "interpret", "understand", "analyze one", "single profile"	`workflows/interpret-individual.md`
2, "team", "composition", "gaps", "balance", "gas brake glue"	`workflows/analyze-team.md`
3, "burnout", "stress", "frustration", "survey vs job", "energy", "flight risk"	`workflows/detect-burnout.md`
4, "compare", "compatibility", "collaboration", "multiple", "two profiles"	`workflows/compare-profiles.md`
5, "motivate", "engage", "retain", "communicate"	Read `references/motivators.md` directly
6, "hire", "hiring profile", "role profile", "recruit", "what profile for"	`workflows/define-hiring-profile.md`
7, "manage", "coach", "1:1", "direct report", "manager"	`workflows/coach-manager.md`
8, "transcript", "interview", "predict traits", "guess", "estimate", "recording"	`workflows/predict-from-interview.md`
9, "debrief", "should we hire", "candidate fit", "proceed", "offer"	`workflows/interview-debrief.md`
10, "onboard", "new hire", "integrate", "starting", "first 90 days"	`workflows/plan-onboarding.md`
11, "conflict", "friction", "mediate", "not working together", "clash"	`workflows/mediate-conflict.md`
"conversation starters", "how to talk to", "engage with"	Read `references/conversation-starters.md` directly

After reading the workflow, follow it exactly.

<verification_loop>

After every interpretation, verify:

Did you use relative positions? Never stated "A is 8" without context
Did you reference the arrow? All trait interpretations relative to arrow
Did you compare Survey vs Job? Identified any behavior modification
Did you avoid value judgments? No traits called "good" or "bad"
Did you check EU? Energy utilization calculated if both graphs present

Report to user:

"Interpretation complete"
Key findings (2-3 bullet points)
Recommended actions

</verification_loop>

<reference_index>

Domain Knowledge (in references/):

Primary Traits:

primary-traits.md - A (Autonomy), B (Social), C (Pace), D (Conformity)

Secondary Traits:

secondary-traits.md - EU (Energy Units), L (Logic), I (Ingenuity)

Patterns:

patterns-archetypes.md - Behavioral patterns, trait combinations, archetypes

Application:

motivators.md - How to motivate each trait type
team-composition.md - Gas, brake, glue framework
anti-patterns.md - Common interpretation mistakes
conversation-starters.md - How to engage each pattern and trait type
interview-trait-signals.md - Signals for predicting traits from interviews

</reference_index>

<workflows_index>

Workflows (in workflows/):

File	Purpose
`extract-from-pdf.md`	Extract profile data from Culture Index PDF to JSON format
`interpret-individual.md`	Analyze single profile, identify archetype, summarize strengths/challenges
`analyze-team.md`	Assess team balance (gas/brake/glue), identify gaps, recommend hires
`detect-burnout.md`	Compare Survey vs Job, calculate EU utilization, flag risk signals
`compare-profiles.md`	Compare multiple profiles, assess compatibility, collaboration dynamics
`define-hiring-profile.md`	Define ideal CI traits for a role, identify acceptable patterns and red flags
`coach-manager.md`	Help managers adjust their style for specific direct reports
`predict-from-interview.md`	Analyze interview transcripts to predict CI traits before survey
`interview-debrief.md`	Assess candidate fit using predicted traits from transcript analysis
`plan-onboarding.md`	Design first 90 days based on new hire profile and team composition
`mediate-conflict.md`	Understand and address friction between team members using their profiles

</workflows_index>

<quick_reference>

Trait Colors:

Trait	Color	Measures
A	Maroon	Autonomy, initiative, self-confidence
B	Yellow	Social ability, need for interaction
C	Blue	Pace/Patience, urgency level
D	Green	Conformity, attention to detail
L	Purple	Logic, emotional processing
I	Cyan	Ingenuity, inventiveness

Energy Utilization Formula:

Utilization = (Job EU / Survey EU) × 100

70-130% = Healthy
>130% = STRESS (burnout risk)
<70% = FRUSTRATION (flight risk)

Gas/Brake/Glue:

Role	Trait	Function
Gas	High A	Growth, risk-taking, driving results
Brake	High D	Quality control, risk aversion, finishing
Glue	High B	Relationships, morale, culture

Score Precision:

Value	Precision	Example
Traits (A,B,C,D,L,I)	Integer 0-10	0, 1, 2, ... 10
Arrow position	Tenths	0.4, 2.2, 3.8
Energy Units (EU)	Integer	11, 31, 45

</quick_reference>

<success_criteria>

A well-interpreted Culture Index profile:

Uses relative positions (distance from arrow), never absolute values alone
Identifies the archetype/pattern correctly
Highlights 2-3 key strengths based on leading traits
Notes 2-3 challenges or development areas
Compares Survey vs Job if both are available
Provides actionable recommendations
Avoids value judgments ("good"/"bad")
Acknowledges Culture Index is one data point, not a complete picture

</success_criteria>

/devcontainer-setup

Source: `~/.claude/skills/tob-devcontainer-setup/skills/devcontainer-setup/SKILL.md`

name: devcontainer-setup description: Creates devcontainers with Claude Code, language-specific tooling (Python/Node/Rust/Go), and persistent volumes. Use when adding devcontainer support to a project, setting up isolated development environments, or configuring sandboxed Claude Code workspaces.

Devcontainer Setup Skill

Creates a pre-configured devcontainer with Claude Code and language-specific tooling.

When to Use

User asks to "set up a devcontainer" or "add devcontainer support"
User wants a sandboxed Claude Code development environment
User needs isolated development environments with persistent configuration

When NOT to Use

User already has a devcontainer configuration and just needs modifications
User is asking about general Docker or container questions
User wants to deploy production containers (this is for development only)

Workflow

flowchart TB
    start([User requests devcontainer])
    recon[1. Project Reconnaissance]
    detect[2. Detect Languages]
    generate[3. Generate Configuration]
    write[4. Write files to .devcontainer/]
    done([Done])

    start --> recon
    recon --> detect
    detect --> generate
    generate --> write
    write --> done

Phase 1: Project Reconnaissance

Infer Project Name

Check in order (use first match):

package.json → name field
pyproject.toml → project.name
Cargo.toml → package.name
go.mod → module path (last segment after /)
Directory name as fallback

Convert to slug: lowercase, replace spaces/underscores with hyphens.

Detect Language Stack

Language	Detection Files
Python	`pyproject.toml`, `*.py`
Node/TypeScript	`package.json`, `tsconfig.json`
Rust	`Cargo.toml`
Go	`go.mod`, `go.sum`

Multi-Language Projects

If multiple languages are detected, configure all of them in the following priority order:

Python - Primary language, uses Dockerfile for uv + Python installation
Node/TypeScript - Uses devcontainer feature
Rust - Uses devcontainer feature
Go - Uses devcontainer feature

For multi-language postCreateCommand, chain all setup commands:

uv run /opt/post_install.py && uv sync && npm ci

Extensions and settings from all detected languages should be merged into the configuration.

Phase 2: Generate Configuration

Start with base templates from resources/ directory. Substitute:

{{PROJECT_NAME}} → Human-readable name (e.g., "My Project")
{{PROJECT_SLUG}} → Slug for volumes (e.g., "my-project")

Then apply language-specific modifications below.

Base Template Features

The base template includes:

Claude Code with marketplace plugins (anthropics/skills, trailofbits/skills)
Python 3.13 via uv (fast binary download)
Node 22 via fnm (Fast Node Manager)
ast-grep for AST-based code search
Network isolation tools (iptables, ipset) with NET_ADMIN capability
Modern CLI tools: ripgrep, fd, fzf, tmux, git-delta

Language-Specific Sections

Python Projects

Detection: pyproject.toml, requirements.txt, setup.py, or *.py files

Dockerfile additions:

The base Dockerfile already includes Python 3.13 via uv. If a different version is required (detected from pyproject.toml), modify the Python installation:

# Install Python via uv (fast binary download, not source compilation)
RUN uv python install <version> --default

devcontainer.json extensions:

Add to customizations.vscode.extensions:

"ms-python.python",
"ms-python.vscode-pylance",
"charliermarsh.ruff"

Add to customizations.vscode.settings:

"python.defaultInterpreterPath": ".venv/bin/python",
"[python]": {
  "editor.defaultFormatter": "charliermarsh.ruff",
  "editor.codeActionsOnSave": {
    "source.organizeImports": "explicit"
  }
}

postCreateCommand: If pyproject.toml exists, chain commands:

rm -rf .venv && uv sync && uv run /opt/post_install.py

Node/TypeScript Projects

Detection: package.json or tsconfig.json

No Dockerfile additions needed: The base template includes Node 22 via fnm (Fast Node Manager).

devcontainer.json extensions:

Add to customizations.vscode.extensions:

"dbaeumer.vscode-eslint",
"esbenp.prettier-vscode"

Add to customizations.vscode.settings:

"editor.defaultFormatter": "esbenp.prettier-vscode",
"editor.codeActionsOnSave": {
  "source.fixAll.eslint": "explicit"
}

postCreateCommand: Detect package manager from lockfile and chain with base command:

pnpm-lock.yaml → uv run /opt/post_install.py && pnpm install --frozen-lockfile
yarn.lock → uv run /opt/post_install.py && yarn install --frozen-lockfile
package-lock.json → uv run /opt/post_install.py && npm ci
No lockfile → uv run /opt/post_install.py && npm install

Rust Projects

Detection: Cargo.toml

Features to add:

"ghcr.io/devcontainers/features/rust:1": {}

devcontainer.json extensions:

Add to customizations.vscode.extensions:

"rust-lang.rust-analyzer",
"tamasfe.even-better-toml"

Add to customizations.vscode.settings:

"[rust]": {
  "editor.defaultFormatter": "rust-lang.rust-analyzer"
}

postCreateCommand: If Cargo.lock exists, use locked builds:

uv run /opt/post_install.py && cargo build --locked

If no lockfile, use standard build:

uv run /opt/post_install.py && cargo build

Go Projects

Detection: go.mod

Features to add:

"ghcr.io/devcontainers/features/go:1": {
  "version": "latest"
}

devcontainer.json extensions:

Add to customizations.vscode.extensions:

"golang.go"

Add to customizations.vscode.settings:

"[go]": {
  "editor.defaultFormatter": "golang.go"
},
"go.useLanguageServer": true

postCreateCommand:

uv run /opt/post_install.py && go mod download

Reference Material

For additional guidance, see:

references/dockerfile-best-practices.md - Layer optimization, multi-stage builds, architecture support
references/features-vs-dockerfile.md - When to use devcontainer features vs custom Dockerfile

Adding Persistent Volumes

Pattern for new mounts in devcontainer.json:

"mounts": [
  "source={{PROJECT_SLUG}}-<purpose>-${devcontainerId},target=<container-path>,type=volume"
]

Common additions:

source={{PROJECT_SLUG}}-cargo-${devcontainerId},target=/home/vscode/.cargo,type=volume (Rust)
source={{PROJECT_SLUG}}-go-${devcontainerId},target=/home/vscode/go,type=volume (Go)

Output Files

Generate these files in the project's .devcontainer/ directory:

Dockerfile - Container build instructions
devcontainer.json - VS Code/devcontainer configuration
post_install.py - Post-creation setup script
.zshrc - Shell configuration
install.sh - CLI helper for managing the devcontainer (devc command)

Validation Checklist

Before presenting files to the user, verify:

All {{PROJECT_NAME}} placeholders are replaced with the human-readable name
All {{PROJECT_SLUG}} placeholders are replaced with the slugified name
JSON syntax is valid in devcontainer.json (no trailing commas, proper nesting)
Language-specific extensions are added for all detected languages
postCreateCommand includes all required setup commands (chained with &&)

User Instructions

After generating, inform the user:

How to start: "Open in VS Code and select 'Reopen in Container'"
Alternative: devcontainer up --workspace-folder .
CLI helper: Run .devcontainer/install.sh self-install to add the devc command to PATH

/differential-review

Source: `~/.claude/skills/tob-differential-review/skills/differential-review/SKILL.md`

name: differential-review description: > Performs security-focused differential review of code changes (PRs, commits, diffs). Adapts analysis depth to codebase size, uses git history for context, calculates blast radius, checks test coverage, and generates comprehensive markdown reports. Automatically detects and prevents security regressions. allowed-tools:

Read
Write
Grep
Glob
Bash

Differential Security Review

Security-focused code review for PRs, commits, and diffs.

Core Principles

Risk-First: Focus on auth, crypto, value transfer, external calls
Evidence-Based: Every finding backed by git history, line numbers, attack scenarios
Adaptive: Scale to codebase size (SMALL/MEDIUM/LARGE)
Honest: Explicitly state coverage limits and confidence level
Output-Driven: Always generate comprehensive markdown report file

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"Small PR, quick review"	Heartbleed was 2 lines	Classify by RISK, not size
"I know this codebase"	Familiarity breeds blind spots	Build explicit baseline context
"Git history takes too long"	History reveals regressions	Never skip Phase 1
"Blast radius is obvious"	You'll miss transitive callers	Calculate quantitatively
"No tests = not my problem"	Missing tests = elevated risk rating	Flag in report, elevate severity
"Just a refactor, no security impact"	Refactors break invariants	Analyze as HIGH until proven LOW
"I'll explain verbally"	No artifact = findings lost	Always write report

Quick Reference

Codebase Size Strategy

Codebase Size	Strategy	Approach
SMALL (<20 files)	DEEP	Read all deps, full git blame
MEDIUM (20-200)	FOCUSED	1-hop deps, priority files
LARGE (200+)	SURGICAL	Critical paths only

Risk Level Triggers

Risk Level	Triggers
HIGH	Auth, crypto, external calls, value transfer, validation removal
MEDIUM	Business logic, state changes, new public APIs
LOW	Comments, tests, UI, logging

Workflow Overview

Pre-Analysis → Phase 0: Triage → Phase 1: Code Analysis → Phase 2: Test Coverage
    ↓              ↓                    ↓                        ↓
Phase 3: Blast Radius → Phase 4: Deep Context → Phase 5: Adversarial → Phase 6: Report

Decision Tree

Starting a review?

├─ Need detailed phase-by-phase methodology?
│  └─ Read: methodology.md
│     (Pre-Analysis + Phases 0-4: triage, code analysis, test coverage, blast radius)
│
├─ Analyzing HIGH RISK change?
│  └─ Read: adversarial.md
│     (Phase 5: Attacker modeling, exploit scenarios, exploitability rating)
│
├─ Writing the final report?
│  └─ Read: reporting.md
│     (Phase 6: Report structure, templates, formatting guidelines)
│
├─ Looking for specific vulnerability patterns?
│  └─ Read: patterns.md
│     (Regressions, reentrancy, access control, overflow, etc.)
│
└─ Quick triage only?
   └─ Use Quick Reference above, skip detailed docs

Quality Checklist

Before delivering:

All changed files analyzed
Git blame on removed security code
Blast radius calculated for HIGH risk
Attack scenarios are concrete (not generic)
Findings reference specific line numbers + commits
Report file generated
User notified with summary

Integration

audit-context-building skill:

Pre-Analysis: Build baseline context
Phase 4: Deep context on HIGH RISK changes

issue-writer skill:

Transform findings into formal audit reports
Command: issue-writer --input DIFFERENTIAL_REVIEW_REPORT.md --format audit-report

Example Usage

Quick Triage (Small PR)

Input: 5 file PR, 2 HIGH RISK files
Strategy: Use Quick Reference
1. Classify risk level per file (2 HIGH, 3 LOW)
2. Focus on 2 HIGH files only
3. Git blame removed code
4. Generate minimal report
Time: ~30 minutes

Standard Review (Medium Codebase)

Input: 80 files, 12 HIGH RISK changes
Strategy: FOCUSED (see methodology.md)
1. Full workflow on HIGH RISK files
2. Surface scan on MEDIUM
3. Skip LOW risk files
4. Complete report with all sections
Time: ~3-4 hours

Deep Audit (Large, Critical Change)

Input: 450 files, auth system rewrite
Strategy: SURGICAL + audit-context-building
1. Baseline context with audit-context-building
2. Deep analysis on auth changes only
3. Blast radius analysis
4. Adversarial modeling
5. Comprehensive report
Time: ~6-8 hours

When NOT to Use This Skill

Greenfield code (no baseline to compare)
Documentation-only changes (no security impact)
Formatting/linting (cosmetic changes)
User explicitly requests quick summary only (they accept risk)

For these cases, use standard code review instead.

Red Flags (Stop and Investigate)

Immediate escalation triggers:

Removed code from "security", "CVE", or "fix" commits
Access control modifiers removed (onlyOwner, internal → external)
Validation removed without replacement
External calls added without checks
High blast radius (50+ callers) + HIGH risk change

These patterns require adversarial analysis even in quick triage.

Tips for Best Results

Do:

Start with git blame for removed code
Calculate blast radius early to prioritize
Generate concrete attack scenarios
Reference specific line numbers and commits
Be honest about coverage limitations
Always generate the output file

Don't:

Skip git history analysis
Make generic findings without evidence
Claim full analysis when time-limited
Forget to check test coverage
Miss high blast radius changes
Output report only to chat (file required)

Supporting Documentation

methodology.md - Detailed phase-by-phase workflow (Phases 0-4)
adversarial.md - Attacker modeling and exploit scenarios (Phase 5)
reporting.md - Report structure and formatting (Phase 6)
patterns.md - Common vulnerability patterns reference

For first-time users: Start with methodology.md to understand the complete workflow.

For experienced users: Use this page's Quick Reference and Decision Tree to navigate directly to needed content.

/dwarf-expert

Source: `~/.claude/skills/tob-dwarf-expert/skills/dwarf-expert/SKILL.md`

name: dwarf-expert description: Provides expertise for analyzing DWARF debug files and understanding the DWARF debug format/standard (v3-v5). Triggers when understanding DWARF information, interacting with DWARF files, answering DWARF-related questions, or working with code that parses DWARF data. allowed-tools:

Read
Bash
Grep
Glob
WebSearch

Overview

This skill provides technical knowledge and expertise about the DWARF standard and how to interact with DWARF files. Tasks include answering questions about the DWARF standard, providing examples of various DWARF features, parsing and/or creating DWARF files, and writing/modifying/analyzing code that interacts with DWARF data.

When to Use This Skill

Understanding or parsing DWARF debug information from compiled binaries
Answering questions about the DWARF standard (v3, v4, v5)
Writing or reviewing code that interacts with DWARF data
Using dwarfdump or readelf to extract debug information
Verifying DWARF data integrity with llvm-dwarfdump --verify
Working with DWARF parsing libraries (libdwarf, pyelftools, gimli, etc.)

When NOT to Use This Skill

DWARF v1/v2 Analysis: Expertise limited to versions 3, 4, and 5.
General ELF Parsing: Use standard ELF tools if DWARF data isn't needed.
Executable Debugging: Use dedicated debugging tools (gdb, lldb, etc) for debugging executable code/runtime behavior.
Binary Reverse Engineering: Use dedicated RE tools (Ghidra, IDA) unless specifically analyzing DWARF sections.
Compiler Debugging: DWARF generation issues are compiler-specific, not covered here.

Authoritative Sources

When specific DWARF standard information is needed, use these authoritative sources:

Official DWARF Standards (dwarfstd.org): Use web search to find specific sections of the official DWARF specification at dwarfstd.org. Search queries like "DWARF5 DW_TAG_subprogram attributes site:dwarfstd.org" are effective.
LLVM DWARF Implementation: The LLVM project's DWARF handling code at llvm/lib/DebugInfo/DWARF/ serves as a reliable reference implementation. Key files include:
- DWARFDie.cpp - DIE handling and attribute access
- DWARFUnit.cpp - Compilation unit parsing
- DWARFDebugLine.cpp - Line number information
- DWARFVerifier.cpp - Validation logic
libdwarf: The reference C implementation at github.com/davea42/libdwarf-code provides detailed handling of DWARF data structures.

Verification Workflows

Use llvm-dwarfdump verification options to validate DWARF data integrity:

Structural Validation

# Verify DWARF structure (compile units, DIE relationships, address ranges)
llvm-dwarfdump --verify <binary>

# Detailed error output with summary
llvm-dwarfdump --verify --error-display=full <binary>

# Machine-readable JSON error summary
llvm-dwarfdump --verify --verify-json=errors.json <binary>

Quality Metrics

# Output debug info quality metrics as JSON
llvm-dwarfdump --statistics <binary>

The --statistics output helps compare debug info quality across compiler versions and optimization levels.

Common Verification Patterns

After compilation: Verify binaries have valid DWARF before distribution
Comparing builds: Use --statistics to detect debug info quality regressions
Debugging debuggers: Identify malformed DWARF causing debugger issues
DWARF tool development: Validate parser output against known-good binaries

Parsing DWARF Debug Information

readelf

ELF files can be parsed via the readelf command ({baseDir}/reference/readelf.md). Use this for general ELF information, but prefer dwarfdump for DWARF-specific parsing.

dwarfdump

DWARF files can be parsed via the dwarfdump command, which is more effective at parsing and displaying complex DWARF information than readelf and should be used for most DWARF parsing tasks ({baseDir}/reference/dwarfdump.md).

Working With Code

This skill supports writing, modifying, and reviewing code that interacts with DWARF data. This may involve code that parses DWARF debug data from scratch or code that leverages libraries to parse and interact with DWARF data ({baseDir}/reference/coding.md).

Choosing Your Approach

┌─ Need to verify DWARF data integrity?
│   └─ Use `llvm-dwarfdump --verify` (see Verification Workflows above)
├─ Need to answer questions about the DWARF standard?
│   └─ Search dwarfstd.org or reference LLVM/libdwarf source
├─ Need simple section dump or general ELF info?
│   └─ Use `readelf` ({baseDir}/reference/readelf.md)
├─ Need to parse, search, and/or dump DWARF DIE nodes?
│   └─ Use `dwarfdump` ({baseDir}/reference/dwarfdump.md)
└─ Need to write, modify, or review code that interacts with DWARF data?
    └─ Refer to the coding reference ({baseDir}/reference/coding.md)

/entry-point-analyzer

Source: `~/.claude/skills/tob-entry-point-analyzer/skills/entry-point-analyzer/SKILL.md`

name: entry-point-analyzer description: Analyzes smart contract codebases to identify state-changing entry points for security auditing. Detects externally callable functions that modify state, categorizes them by access level (public, admin, role-restricted, contract-only), and generates structured audit reports. Excludes view/pure/read-only functions. Use when auditing smart contracts (Solidity, Vyper, Solana/Rust, Move, TON, CosmWasm) or when asked to find entry points, audit flows, external functions, access control patterns, or privileged operations. allowed-tools:

Read
Grep
Glob
Bash

Entry Point Analyzer

Systematically identify all state-changing entry points in a smart contract codebase to guide security audits.

When to Use

Use this skill when:

Starting a smart contract security audit to map the attack surface
Asked to find entry points, external functions, or audit flows
Analyzing access control patterns across a codebase
Identifying privileged operations and role-restricted functions
Building an understanding of which functions can modify contract state

When NOT to Use

Do NOT use this skill for:

Vulnerability detection (use audit-context-building or domain-specific-audits)
Writing exploit POCs (use solidity-poc-builder)
Code quality or gas optimization analysis
Non-smart-contract codebases
Analyzing read-only functions (this skill excludes them)

Scope: State-Changing Functions Only

This skill focuses exclusively on functions that can modify state. Excluded:

Language	Excluded Patterns
Solidity	`view`, `pure` functions
Vyper	`@view`, `@pure` functions
Solana	Functions without `mut` account references
Move	Non-entry `public fun` (module-callable only)
TON	`get` methods (FunC), read-only receivers (Tact)
CosmWasm	`query` entry point and its handlers

Why exclude read-only functions? They cannot directly cause loss of funds or state corruption. While they may leak information, the primary audit focus is on functions that can change state.

Workflow

Detect Language - Identify contract language(s) from file extensions and syntax
Use Tooling (if available) - For Solidity, check if Slither is available and use it
Locate Contracts - Find all contract/module files (apply directory filter if specified)
Extract Entry Points - Parse each file for externally callable, state-changing functions
Classify Access - Categorize each function by access level
Generate Report - Output structured markdown report

Slither Integration (Solidity)

For Solidity codebases, Slither can automatically extract entry points. Before manual analysis:

1. Check if Slither is Available

which slither

2. If Slither is Detected, Run Entry Points Printer

slither . --print entry-points

This outputs a table of all state-changing entry points with:

Contract name
Function name
Visibility
Modifiers applied

3. Use Slither Output as Foundation

Parse the Slither output table to populate your analysis
Cross-reference with manual inspection for access control classification
Slither may miss some patterns (callbacks, dynamic access control)—supplement with manual review
If Slither fails (compilation errors, unsupported features), fall back to manual analysis

4. When Slither is NOT Available

If which slither returns nothing, proceed with manual analysis using the language-specific reference files.

Language Detection

Extension	Language	Reference
`.sol`	Solidity	{baseDir}/references/solidity.md
`.vy`	Vyper	{baseDir}/references/vyper.md
`.rs` + `Cargo.toml` with `solana-program`	Solana (Rust)	{baseDir}/references/solana.md
`.move` + `Move.toml` with `edition`	{baseDir}/references/move-sui.md
`.move` + `Move.toml` with `Aptos`	{baseDir}/references/move-aptos.md
`.fc`, `.func`, `.tact`	TON (FunC/Tact)	{baseDir}/references/ton.md
`.rs` + `Cargo.toml` with `cosmwasm-std`	CosmWasm	{baseDir}/references/cosmwasm.md

Load the appropriate reference file(s) based on detected language before analysis.

Access Classification

Classify each state-changing entry point into one of these categories:

1. Public (Unrestricted)

Functions callable by anyone without restrictions.

2. Role-Restricted

Functions limited to specific roles. Common patterns to detect:

Explicit role names: admin, owner, governance, guardian, operator, manager, minter, pauser, keeper, relayer, lender, borrower
Role-checking patterns: onlyRole, hasRole, require(msg.sender == X), assert_owner, #[access_control]
When role is ambiguous, flag as "Restricted (review required)" with the restriction pattern noted

3. Contract-Only (Internal Integration Points)

Functions callable only by other contracts, not by EOAs. Indicators:

Callbacks: onERC721Received, uniswapV3SwapCallback, flashLoanCallback
Interface implementations with contract-caller checks
Functions that revert if tx.origin == msg.sender
Cross-contract hooks

Output Format

Generate a markdown report with this structure:

# Entry Point Analysis: [Project Name]

**Analyzed**: [timestamp]
**Scope**: [directories analyzed or "full codebase"]
**Languages**: [detected languages]
**Focus**: State-changing functions only (view/pure excluded)

## Summary

| Category | Count |
|----------|-------|
| Public (Unrestricted) | X |
| Role-Restricted | X |
| Restricted (Review Required) | X |
| Contract-Only | X |
| **Total** | **X** |

---

## Public Entry Points (Unrestricted)

State-changing functions callable by anyone—prioritize for attack surface analysis.

| Function | File | Notes |
|----------|------|-------|
| `functionName(params)` | `path/to/file.sol:L42` | Brief note if relevant |

---

## Role-Restricted Entry Points

### Admin / Owner
| Function | File | Restriction |
|----------|------|-------------|
| `setFee(uint256)` | `Config.sol:L15` | `onlyOwner` |

### Governance
| Function | File | Restriction |
|----------|------|-------------|

### Guardian / Pauser
| Function | File | Restriction |
|----------|------|-------------|

### Other Roles
| Function | File | Restriction | Role |
|----------|------|-------------|------|

---

## Restricted (Review Required)

Functions with access control patterns that need manual verification.

| Function | File | Pattern | Why Review |
|----------|------|---------|------------|
| `execute(bytes)` | `Executor.sol:L88` | `require(trusted[msg.sender])` | Dynamic trust list |

---

## Contract-Only (Internal Integration Points)

Functions only callable by other contracts—useful for understanding trust boundaries.

| Function | File | Expected Caller |
|----------|------|-----------------|
| `onFlashLoan(...)` | `Vault.sol:L200` | Flash loan provider |

---

## Files Analyzed

- `path/to/file1.sol` (X state-changing entry points)
- `path/to/file2.sol` (X state-changing entry points)

Filtering

When user specifies a directory filter:

Only analyze files within that path
Note the filter in the report header
Example: "Analyze only src/core/" → scope = src/core/

Analysis Guidelines

Be thorough: Don't skip files. Every state-changing externally callable function matters.
Be conservative: When uncertain about access level, flag for review rather than miscategorize.
Skip read-only: Exclude view, pure, and equivalent read-only functions.
Note inheritance: If a function's access control comes from a parent contract, note this.
Track modifiers: List all access-related modifiers/decorators applied to each function.
Identify patterns: Look for common patterns like:
- Initializer functions (often unrestricted on first call)
- Upgrade functions (high-privilege)
- Emergency/pause functions (guardian-level)
- Fee/parameter setters (admin-level)
- Token transfers and approvals (often public)

Common Role Patterns by Protocol Type

Protocol Type	Common Roles
DEX	`owner`, `feeManager`, `pairCreator`
Lending	`admin`, `guardian`, `liquidator`, `oracle`
Governance	`proposer`, `executor`, `canceller`, `timelock`
NFT	`minter`, `admin`, `royaltyReceiver`
Bridge	`relayer`, `guardian`, `validator`, `operator`
Vault/Yield	`strategist`, `keeper`, `harvester`, `manager`

Rationalizations to Reject

When analyzing entry points, reject these shortcuts:

"This function looks standard" → Still classify it; standard functions can have non-standard access control
"The modifier name is clear" → Verify the modifier's actual implementation
"This is obviously admin-only" → Trace the actual restriction; "obvious" assumptions miss subtle bypasses
"I'll skip the callbacks" → Callbacks define trust boundaries; always include them
"It doesn't modify much state" → Any state change can be exploited; include all non-view functions

Error Handling

If a file cannot be parsed:

Note it in the report under "Analysis Warnings"
Continue with remaining files
Suggest manual review for unparsable files

/firebase-apk-scanner

Source: `~/.claude/skills/tob-firebase-apk-scanner/skills/firebase-apk-scanner/SKILL.md`

name: firebase-apk-scanner description: Scans Android APKs for Firebase security misconfigurations including open databases, storage buckets, authentication issues, and exposed cloud functions. Use when analyzing APK files for Firebase vulnerabilities, performing mobile app security audits, or testing Firebase endpoint security. For authorized security research only. argument-hint: [apk-file-or-directory] allowed-tools: Bash({baseDir}/scanner.sh:), Bash(apktool:), Bash(curl:*), Read, Grep, Glob disable-model-invocation: true

Firebase APK Security Scanner

You are a Firebase security analyst. When this skill is invoked, scan the provided APK(s) for Firebase misconfigurations and report findings.

When to Use

Auditing Android applications for Firebase security misconfigurations
Testing Firebase endpoints extracted from APKs (Realtime Database, Firestore, Storage)
Checking authentication security (open signup, anonymous auth, email enumeration)
Enumerating Cloud Functions and testing for unauthenticated access
Mobile app security assessments involving Firebase backends
Authorized penetration testing of Firebase-backed applications

When NOT to Use

Scanning apps you do not have explicit authorization to test
Testing production Firebase projects without written permission
When you only need to extract Firebase config without testing (use manual grep/strings instead)
For non-Android targets (iOS, web apps) - this skill is APK-specific
When the target app does not use Firebase

Rationalizations to Reject

When auditing, reject these common rationalizations that lead to missed or downplayed findings:

"The database is read-only so it's fine" - Data exposure is still a critical finding; PII, API keys, and business data may be leaked
"It's just anonymous auth, not real accounts" - Anonymous tokens bypass auth != null rules and can access "authenticated-only" resources
"The API key is public anyway" - A public API key does not justify open database rules or disabled auth restrictions
"There's no sensitive data in there" - You cannot know what data will be stored in the future; insecure rules are vulnerabilities regardless of current content
"It's an internal app" - APKs can be extracted from any device; "internal" apps are not protected from reverse engineering
"We'll fix it before launch" - Document the finding; pre-launch vulnerabilities frequently ship to production

Reference Documentation

For detailed vulnerability patterns and exploitation techniques, consult:

Vulnerability Patterns Reference

How to Use This Skill

The user will provide an APK file or directory: $ARGUMENTS

Workflow

Step 1: Validate Input

First, verify the target exists:

ls -la $ARGUMENTS

If $ARGUMENTS is empty, ask the user to provide an APK path.

Step 2: Run the Scanner

Execute the bundled scanner script on the target:

{baseDir}/scanner.sh $ARGUMENTS

The scanner will:

Decompile the APK using apktool
Extract Firebase configuration from all sources (google-services.json, XML resources, assets, smali code, DEX strings)
Test authentication endpoints (open signup, anonymous auth, email enumeration)
Test Realtime Database (unauthenticated read/write, auth bypass)
Test Firestore (document access, collection enumeration)
Test Storage buckets (listing, write access)
Test Cloud Functions (enumeration, unauthenticated access)
Test Remote Config exposure
Generate reports in text and JSON format

Step 3: Present Results

After the scanner completes, read and summarize the results:

cat firebase_scan_*/scan_report.txt

Present findings in this format:

Scan Summary

Metric	Value
APKs Scanned	X
Vulnerable	X
Total Issues	X

Extracted Configuration

Field	Value
Project ID	`extracted_value`
Database URL	`extracted_value`
Storage Bucket	`extracted_value`
API Key	`extracted_value`
Auth Domain	`extracted_value`

Vulnerabilities Found

Severity	Issue	Evidence
CRITICAL	Description	Brief evidence
HIGH	Description	Brief evidence

Remediation

Provide specific fixes for each vulnerability found. Reference the Vulnerability Patterns for secure code examples.

Manual Testing (If Scanner Fails)

If the scanner script is unavailable or fails, perform manual extraction and testing:

Extract Configuration

Search for Firebase config in decompiled APK:

# Decompile
apktool d -f -o ./decompiled $ARGUMENTS

# Find google-services.json
find ./decompiled -name "google-services.json"

# Search XML resources
grep -r "firebaseio.com\|appspot.com\|AIza" ./decompiled/res/

# Search assets (hybrid apps)
grep -r "firebaseio.com\|AIza" ./decompiled/assets/

Test Endpoints

Once you have the PROJECT_ID and API_KEY:

Authentication:

# Test open signup
curl -s -X POST -H "Content-Type: application/json" \
  -d '{"email":"test@test.com","password":"Test123!","returnSecureToken":true}' \
  "https://identitytoolkit.googleapis.com/v1/accounts:signUp?key=API_KEY"

# Test anonymous auth
curl -s -X POST -H "Content-Type: application/json" \
  -d '{"returnSecureToken":true}' \
  "https://identitytoolkit.googleapis.com/v1/accounts:signUp?key=API_KEY"

Database:

# Realtime Database read
curl -s "https://PROJECT_ID.firebaseio.com/.json"

# Firestore read
curl -s "https://firestore.googleapis.com/v1/projects/PROJECT_ID/databases/(default)/documents"

Storage:

# List bucket
curl -s "https://firebasestorage.googleapis.com/v0/b/PROJECT_ID.appspot.com/o"

Remote Config:

curl -s -H "x-goog-api-key: API_KEY" \
  "https://firebaseremoteconfig.googleapis.com/v1/projects/PROJECT_ID/remoteConfig"

Severity Classification

CRITICAL: Unauthenticated database read/write, storage write, open signup on private apps
HIGH: Anonymous auth enabled, storage bucket listing, collection enumeration
MEDIUM: Email enumeration, accessible cloud functions, remote config exposure
LOW: Information disclosure without sensitive data

Important Guidelines

Authorization required - Only scan APKs you have permission to test
Clean up test data - The scanner automatically removes test entries it creates
Save tokens - If anonymous auth succeeds, use the token for authenticated bypass testing
Test all regions - Cloud Functions may be deployed to us-central1, europe-west1, asia-east1, etc.
Multiple instances - Some apps use multiple Firebase projects; test all discovered configurations

/fix-review

Source: `~/.claude/skills/tob-fix-review/skills/fix-review/SKILL.md`

name: fix-review description: > Verifies that git commits address security audit findings without introducing bugs. This skill should be used when the user asks to "verify these commits fix the audit findings", "check if TOB-XXX was addressed", "review the fix branch", "validate remediation commits", "did these changes address the security report", "post-audit remediation review", "compare fix commits to audit report", or when reviewing commits against security audit reports. allowed-tools:

Read
Write
Grep
Glob
Bash
WebFetch

Fix Review

Differential analysis to verify commits address security findings without introducing bugs.

When to Use

Reviewing fix branches against security audit reports
Validating that remediation commits actually address findings
Checking if specific findings (TOB-XXX format) have been fixed
Analyzing commit ranges for bug introduction patterns
Cross-referencing code changes with audit recommendations

When NOT to Use

Initial security audits (use audit-context-building or differential-review)
Code review without a specific baseline or finding set
Greenfield development with no prior audit
Documentation-only changes

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"The commit message says it fixes TOB-XXX"	Messages lie; code tells truth	Verify the actual code change addresses the finding
"Small fix, no new bugs possible"	Small changes cause big bugs	Analyze all changes for anti-patterns
"I'll check the important findings"	All findings matter	Systematically check every finding
"The tests pass"	Tests may not cover the fix	Verify fix logic, not just test status
"Same developer, they know the code"	Familiarity breeds blind spots	Fresh analysis of every change

Quick Reference

Input Requirements

Input	Required	Format
Source commit	Yes	Git commit hash or ref (baseline before fixes)
Target commit(s)	Yes	One or more commit hashes to analyze
Security report	No	Local path, URL, or Google Drive link

Finding Status Values

Status	Meaning
FIXED	Code change directly addresses the finding
PARTIALLY_FIXED	Some aspects addressed, others remain
NOT_ADDRESSED	No relevant changes found
CANNOT_DETERMINE	Insufficient context to verify

Workflow

Phase 1: Input Gathering

Collect required inputs from user:

Source commit:  [hash/ref before fixes]
Target commit:  [hash/ref to analyze]
Report:         [optional: path, URL, or "none"]

If user provides multiple target commits, process each separately with the same source.

Phase 2: Report Retrieval

When a security report is provided, retrieve it based on format:

Local file (PDF, MD, JSON, HTML): Read the file directly using the Read tool. Claude processes PDFs natively.

URL: Fetch web content using the WebFetch tool.

Google Drive URL that fails: See references/report-parsing.md for Google Drive fallback logic using gdrive CLI.

Phase 3: Finding Extraction

Parse the report to extract findings:

Trail of Bits format:

Look for "Detailed Findings" section
Extract findings matching pattern: TOB-[A-Z]+-[0-9]+
Capture: ID, title, severity, description, affected files

Other formats:

Numbered findings (Finding 1, Finding 2)
Severity-based sections (Critical, High, Medium, Low)
JSON with findings array

See references/report-parsing.md for detailed parsing strategies.

Phase 4: Commit Analysis

For each target commit, analyze the commit range:

# Get commit list from source to target
git log <source>..<target> --oneline

# Get full diff
git diff <source>..<target>

# Get changed files
git diff <source>..<target> --name-only

For each commit in the range:

Examine the diff for bug introduction patterns
Check for security anti-patterns (see references/bug-detection.md)
Map changes to relevant findings

Phase 5: Finding Verification

For each finding in the report:

Identify relevant commits - Match by:
- File paths mentioned in finding
- Function/variable names in finding description
- Commit messages referencing the finding ID
Verify the fix - Check that:
- The root cause is addressed (not just symptoms)
- The fix follows the report's recommendation
- No new vulnerabilities are introduced
Assign status - Based on evidence:
- FIXED: Clear code change addresses the finding
- PARTIALLY_FIXED: Some aspects fixed, others remain
- NOT_ADDRESSED: No relevant changes
- CANNOT_DETERMINE: Need more context
Document evidence - For each finding:
- Commit hash(es) that address it
- Specific file and line changes
- How the fix addresses the root cause

See references/finding-matching.md for detailed matching strategies.

Phase 6: Output Generation

Generate two outputs:

1. Report file (FIX_REVIEW_REPORT.md):

# Fix Review Report

**Source:** <commit>
**Target:** <commit>
**Report:** <path or "none">
**Date:** <date>

## Executive Summary

[Brief overview: X findings reviewed, Y fixed, Z concerns]

## Finding Status

| ID | Title | Severity | Status | Evidence |
|----|-------|----------|--------|----------|
| TOB-XXX-1 | Finding title | High | FIXED | abc123 |
| TOB-XXX-2 | Another finding | Medium | NOT_ADDRESSED | - |

## Bug Introduction Concerns

[Any potential bugs or regressions detected in the changes]

## Per-Commit Analysis

### Commit abc123: "Fix reentrancy in withdraw()"

**Files changed:** contracts/Vault.sol
**Findings addressed:** TOB-XXX-1
**Concerns:** None

[Detailed analysis]

## Recommendations

[Any follow-up actions needed]

2. Conversation summary:

Provide a concise summary in the conversation:

Total findings: X
Fixed: Y
Not addressed: Z
Concerns: [list any bug introduction risks]

Bug Detection

Analyze commits for security anti-patterns. Key patterns to watch:

Access control weakening (modifiers removed)
Validation removal (require/assert deleted)
Error handling reduction (try/catch removed)
External call reordering (state after call)
Integer operation changes (SafeMath removed)
Cryptographic weakening

See references/bug-detection.md for comprehensive detection patterns and examples.

Integration with Other Skills

differential-review: For initial security review of changes (before audit)

issue-writer: To format findings into formal audit reports

audit-context-building: For deep context when analyzing complex fixes

Tips for Effective Reviews

Do:

Verify the actual code change, not just commit messages
Check that fixes address root causes, not symptoms
Look for unintended side effects in adjacent code
Cross-reference multiple findings that may interact
Document evidence for every status assignment

Don't:

Trust commit messages as proof of fix
Skip findings because they seem minor
Assume passing tests mean correct fixes
Ignore changes outside the "fix" scope
Mark FIXED without clear evidence

Reference Files

For detailed guidance, consult:

references/finding-matching.md - Strategies for matching commits to findings
references/bug-detection.md - Comprehensive anti-pattern detection
references/report-parsing.md - Parsing different report formats, Google Drive fallback

/insecure-defaults

Source: `~/.claude/skills/tob-insecure-defaults/skills/insecure-defaults/SKILL.md`

name: insecure-defaults description: "Detects fail-open insecure defaults (hardcoded secrets, weak auth, permissive security) that allow apps to run insecurely in production. Use when auditing security, reviewing config management, or analyzing environment variable handling." allowed-tools:

Read
Grep
Glob
Bash

Insecure Defaults Detection

Finds fail-open vulnerabilities where apps run insecurely with missing configuration. Distinguishes exploitable defaults from fail-secure patterns that crash safely.

Fail-open (CRITICAL): SECRET = env.get('KEY') or 'default' → App runs with weak secret
Fail-secure (SAFE): SECRET = env['KEY'] → App crashes if missing

When to Use

Security audits of production applications (auth, crypto, API security)
Configuration review of deployment files, IaC templates, Docker configs
Code review of environment variable handling and secrets management
Pre-deployment checks for hardcoded credentials or weak defaults

When NOT to Use

Do not use this skill for:

Test fixtures explicitly scoped to test environments (files in test/, spec/, __tests__/)
Example/template files (.example, .template, .sample suffixes)
Development-only tools (local Docker Compose for dev, debug scripts)
Documentation examples in README.md or docs/ directories
Build-time configuration that gets replaced during deployment
Crash-on-missing behavior where app won't start without proper config (fail-secure)

When in doubt: trace the code path to determine if the app runs with the default or crashes.

Rationalizations to Reject

"It's just a development default" → If it reaches production code, it's a finding
"The production config overrides it" → Verify prod config exists; code-level vulnerability remains if not
"This would never run without proper config" → Prove it with code trace; many apps fail silently
"It's behind authentication" → Defense in depth; compromised session still exploits weak defaults
"We'll fix it before release" → Document now; "later" rarely comes

Workflow

Follow this workflow for every potential finding:

1. SEARCH: Perform Project Discovery and Find Insecure Defaults

Determine language, framework, and project conventions. Use this information to further discover things like secret storage locations, secret usage patterns, credentialed third-party integrations, cryptography, and any other relevant configuration. Further use information to analyze insecure default configurations.

Example Search for patterns in **/config/, **/auth/, **/database/, and env files:

Fallback secrets: getenv.*\) or ['"], process\.env\.[A-Z_]+ \|\| ['"], ENV\.fetch.*default:
Hardcoded credentials: password.*=.*['"][^'"]{8,}['"], api[_-]?key.*=.*['"][^'"]+['"]
Weak defaults: DEBUG.*=.*true, AUTH.*=.*false, CORS.*=.*\*
Crypto algorithms: MD5|SHA1|DES|RC4|ECB in security contexts

Tailor search approach based on discovery results.

Focus on production-reachable code, not test fixtures or example files.

2. VERIFY: Actual Behavior

For each match, trace the code path to understand runtime behavior.

Questions to answer:

When is this code executed? (Startup vs. runtime)
What happens if a configuration variable is missing?
Is there validation that enforces secure configuration?

3. CONFIRM: Production Impact

Determine if this issue reaches production:

If production config provides the variable → Lower severity (but still a code-level vulnerability) If production config missing or uses default → CRITICAL

4. REPORT: with Evidence

Example report:

Finding: Hardcoded JWT Secret Fallback
Location: src/auth/jwt.ts:15
Pattern: const secret = process.env.JWT_SECRET || 'default';

Verification: App starts without JWT_SECRET; secret used in jwt.sign() at line 42
Production Impact: Dockerfile missing JWT_SECRET
Exploitation: Attacker forges JWTs using 'default', gains unauthorized access

Quick Verification Checklist

Fallback Secrets: SECRET = env.get(X) or Y → Verify: App starts without env var? Secret used in crypto/auth? → Skip: Test fixtures, example files

Default Credentials: Hardcoded username/password pairs → Verify: Active in deployed config? No runtime override? → Skip: Disabled accounts, documentation examples

Fail-Open Security: AUTH_REQUIRED = env.get(X, 'false') → Verify: Default is insecure (false/disabled/permissive)? → Safe: App crashes or default is secure (true/enabled/restricted)

Weak Crypto: MD5/SHA1/DES/RC4/ECB in security contexts → Verify: Used for passwords, encryption, or tokens? → Skip: Checksums, non-security hashing

Permissive Access: CORS *, permissions 0777, public-by-default → Verify: Default allows unauthorized access? → Skip: Explicitly configured permissiveness with justification

Debug Features: Stack traces, introspection, verbose errors → Verify: Enabled by default? Exposed in responses? → Skip: Logging-only, not user-facing

For detailed examples and counter-examples, see examples.md.

/modern-python

Source: `~/.claude/skills/tob-modern-python/skills/modern-python/SKILL.md`

name: modern-python description: Configures Python projects with modern tooling (uv, ruff, ty). Use when creating projects, writing standalone scripts, or migrating from pip/Poetry/mypy/black.

Modern Python

Guide for modern Python tooling and best practices, based on trailofbits/cookiecutter-python.

When to Use This Skill

Creating a new Python project or package
Setting up pyproject.toml configuration
Configuring development tools (linting, formatting, testing)
Writing Python scripts with external dependencies
Migrating from legacy tools (when user requests it)

When NOT to Use This Skill

User wants to keep legacy tooling: Respect existing workflows if explicitly requested
Python < 3.11 required: These tools target modern Python
Non-Python projects: Mixed codebases where Python isn't primary

Anti-Patterns to Avoid

Avoid	Use Instead
`[tool.ty]` python-version	`[tool.ty.environment]` python-version
`uv pip install`	`uv add` and `uv sync`
Editing pyproject.toml manually to add deps	`uv add <pkg>` / `uv remove <pkg>`
`hatchling` build backend	`uv_build` (simpler, sufficient for most cases)
Poetry	uv (faster, simpler, better ecosystem integration)
requirements.txt	PEP 723 for scripts, pyproject.toml for projects
mypy / pyright	ty (faster, from Astral team)
`[project.optional-dependencies]` for dev tools	`[dependency-groups]` (PEP 735)
Manual virtualenv activation (`source .venv/bin/activate`)	`uv run <cmd>`
pre-commit	prek (faster, no Python runtime needed)

Key principles:

Always use uv add and uv remove to manage dependencies
Never manually activate or manage virtual environments—use uv run for all commands
Use [dependency-groups] for dev/test/docs dependencies, not [project.optional-dependencies]

Decision Tree

What are you doing?
│
├─ Single-file script with dependencies?
│   └─ Use PEP 723 inline metadata (./references/pep723-scripts.md)
│
├─ New multi-file project (not distributed)?
│   └─ Minimal uv setup (see Quick Start below)
│
├─ New reusable package/library?
│   └─ Full project setup (see Full Setup below)
│
└─ Migrating existing project?
    └─ See Migration Guide below

Tool Overview

Tool	Purpose	Replaces
uv	Package/dependency management	pip, virtualenv, pip-tools, pipx, pyenv
ruff	Linting AND formatting	flake8, black, isort, pyupgrade, pydocstyle
ty	Type checking	mypy, pyright (faster alternative)
pytest	Testing with coverage	unittest
prek	Pre-commit hooks (setup)	pre-commit (faster, Rust-native)

Security Tools

Tool	Purpose	When It Runs
shellcheck	Shell script linting	pre-commit
detect-secrets	Secret detection	pre-commit
actionlint	Workflow syntax validation	pre-commit, CI
zizmor	Workflow security audit	pre-commit, CI
pip-audit	Dependency vulnerability scanning	CI, manual
Dependabot	Automated dependency updates	scheduled

See security-setup.md for configuration and usage.

Quick Start: Minimal Project

For simple multi-file projects not intended for distribution:

# Create project with uv
uv init myproject
cd myproject

# Add dependencies
uv add requests rich

# Add dev dependencies
uv add --group dev pytest ruff ty

# Run code
uv run python src/myproject/main.py

# Run tools
uv run pytest
uv run ruff check .

Full Project Setup

If starting from scratch, ask the user if they prefer to use the Trail of Bits cookiecutter template to bootstrap a complete project with already preconfigured tooling.

uvx cookiecutter gh:trailofbits/cookiecutter-python

1. Create Project Structure

uv init --package myproject
cd myproject

This creates:

myproject/
├── pyproject.toml
├── README.md
├── src/
│   └── myproject/
│       └── __init__.py
└── .python-version

2. Configure pyproject.toml

See pyproject.md for complete configuration reference.

Key sections:

[project]
name = "myproject"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = []

[dependency-groups]
dev = [{include-group = "lint"}, {include-group = "test"}, {include-group = "audit"}]
lint = ["ruff", "ty"]
test = ["pytest", "pytest-cov"]
audit = ["pip-audit"]

[tool.ruff]
line-length = 100
target-version = "py311"

[tool.ruff.lint]
select = ["ALL"]
ignore = ["D", "COM812", "ISC001"]

[tool.pytest]
addopts = ["--cov=myproject", "--cov-fail-under=80"]

[tool.ty.terminal]
error-on-warning = true

[tool.ty.environment]
python-version = "3.11"

[tool.ty.rules]
# Strict from day 1 for new projects
possibly-unresolved-reference = "error"
unused-ignore-comment = "warn"

3. Install Dependencies

# Install all dependency groups
uv sync --all-groups

# Or install specific groups
uv sync --group dev

4. Add Makefile

.PHONY: dev lint format test build

dev:
	uv sync --all-groups

lint:
	uv run ruff format --check && uv run ruff check && uv run ty check src/

format:
	uv run ruff format .

test:
	uv run pytest

build:
	uv build

Migration Guide

When a user requests migration from legacy tooling:

From requirements.txt + pip

First, determine the nature of the code:

For standalone scripts: Convert to PEP 723 inline metadata (see pep723-scripts.md)

For projects:

# Initialize uv in existing project
uv init --bare

# Add dependencies using uv (not by editing pyproject.toml)
uv add requests rich  # add each package

# Or import from requirements.txt (review each package before adding)
# Note: Complex version specifiers may need manual handling
grep -v '^#' requirements.txt | grep -v '^-' | grep -v '^\s*$' | while read -r pkg; do
    uv add "$pkg" || echo "Failed to add: $pkg"
done

uv sync

Then:

Delete requirements.txt, requirements-dev.txt
Delete virtual environment (venv/, .venv/)
Add uv.lock to version control

From setup.py / setup.cfg

Run uv init --bare to create pyproject.toml
Use uv add to add each dependency from install_requires
Use uv add --group dev for dev dependencies
Copy non-dependency metadata (name, version, description, etc.) to [project]
Delete setup.py, setup.cfg, MANIFEST.in

From flake8 + black + isort

Remove flake8, black, isort via uv remove
Delete .flake8, pyproject.toml [tool.black], [tool.isort] configs
Add ruff: uv add --group dev ruff
Add ruff configuration (see ruff-config.md)
Run uv run ruff check --fix . to apply fixes
Run uv run ruff format . to format

From mypy / pyright

Remove mypy/pyright via uv remove
Delete mypy.ini, pyrightconfig.json, or [tool.mypy]/[tool.pyright] sections
Add ty: uv add --group dev ty
Run uv run ty check src/

Quick Reference: uv Commands

Command	Description
`uv init`	Create new project
`uv init --package`	Create distributable package
`uv add <pkg>`	Add dependency
`uv add --group dev <pkg>`	Add to dependency group
`uv remove <pkg>`	Remove dependency
`uv sync`	Install dependencies
`uv sync --all-groups`	Install all dependency groups
`uv run <cmd>`	Run command in venv
`uv run --with <pkg> <cmd>`	Run with temporary dependency
`uv build`	Build package
`uv publish`	Publish to PyPI

Ad-hoc Dependencies with `--with`

Use uv run --with for one-off commands that need packages not in your project:

# Run Python with a temporary package
uv run --with requests python -c "import requests; print(requests.get('https://httpbin.org/ip').json())"

# Run a module with temporary deps
uv run --with rich python -m rich.progress

# Multiple packages
uv run --with requests --with rich python script.py

# Combine with project deps (adds to existing venv)
uv run --with httpx pytest  # project deps + httpx

When to use --with vs uv add:

uv add: Package is a project dependency (goes in pyproject.toml/uv.lock)
--with: One-off usage, testing, or scripts outside a project context

See uv-commands.md for complete reference.

Quick Reference: Dependency Groups

[dependency-groups]
dev = ["ruff", "ty"]
test = ["pytest", "pytest-cov", "hypothesis"]
docs = ["sphinx", "myst-parser"]

Install with: uv sync --group dev --group test

Best Practices Checklist

Use src/ layout for packages
Set requires-python = ">=3.11"
Configure ruff with select = ["ALL"] and explicit ignores
Use ty for type checking
Enforce test coverage minimum (80%+)
Use dependency groups instead of extras for dev tools
Add uv.lock to version control
Use PEP 723 for standalone scripts

/property-based-testing

Source: `~/.claude/skills/tob-property-based-testing/skills/property-based-testing/SKILL.md`

name: property-based-testing description: Provides guidance for property-based testing across multiple languages and smart contracts. Use when writing tests, reviewing code with serialization/validation/parsing patterns, designing features, or when property-based testing would provide stronger coverage than example-based tests.

Property-Based Testing Guide

Use this skill proactively during development when you encounter patterns where PBT provides stronger coverage than example-based tests.

When to Invoke (Automatic Detection)

Invoke this skill when you detect:

Serialization pairs: encode/decode, serialize/deserialize, toJSON/fromJSON, pack/unpack
Parsers: URL parsing, config parsing, protocol parsing, string-to-structured-data
Normalization: normalize, sanitize, clean, canonicalize, format
Validators: is_valid, validate, check_* (especially with normalizers)
Data structures: Custom collections with add/remove/get operations
Mathematical/algorithmic: Pure functions, sorting, ordering, comparators
Smart contracts: Solidity/Vyper contracts, token operations, state invariants, access control

Priority by pattern:

Pattern	Property	Priority
encode/decode pair	Roundtrip	HIGH
Pure function	Multiple	HIGH
Validator	Valid after normalize	MEDIUM
Sorting/ordering	Idempotence + ordering	MEDIUM
Normalization	Idempotence	MEDIUM
Builder/factory	Output invariants	LOW
Smart contract	State invariants	HIGH

When NOT to Use

Do NOT use this skill for:

Simple CRUD operations without transformation logic
One-off scripts or throwaway code
Code with side effects that cannot be isolated (network calls, database writes)
Tests where specific example cases are sufficient and edge cases are well-understood
Integration or end-to-end testing (PBT is best for unit/component testing)

Property Catalog (Quick Reference)

Property	Formula	When to Use
Roundtrip	`decode(encode(x)) == x`	Serialization, conversion pairs
Idempotence	`f(f(x)) == f(x)`	Normalization, formatting, sorting
Invariant	Property holds before/after	Any transformation
Commutativity	`f(a, b) == f(b, a)`	Binary/set operations
Associativity	`f(f(a,b), c) == f(a, f(b,c))`	Combining operations
Identity	`f(x, identity) == x`	Operations with neutral element
Inverse	`f(g(x)) == x`	encrypt/decrypt, compress/decompress
Oracle	`new_impl(x) == reference(x)`	Optimization, refactoring
Easy to Verify	`is_sorted(sort(x))`	Complex algorithms
No Exception	No crash on valid input	Baseline property

Strength hierarchy (weakest to strongest): No Exception → Type Preservation → Invariant → Idempotence → Roundtrip

Decision Tree

Based on the current task, read the appropriate section:

TASK: Writing new tests
  → Read [{baseDir}/references/generating.md]({baseDir}/references/generating.md) (test generation patterns and examples)
  → Then [{baseDir}/references/strategies.md]({baseDir}/references/strategies.md) if input generation is complex

TASK: Designing a new feature
  → Read [{baseDir}/references/design.md]({baseDir}/references/design.md) (Property-Driven Development approach)

TASK: Code is difficult to test (mixed I/O, missing inverses)
  → Read [{baseDir}/references/refactoring.md]({baseDir}/references/refactoring.md) (refactoring patterns for testability)

TASK: Reviewing existing PBT tests
  → Read [{baseDir}/references/reviewing.md]({baseDir}/references/reviewing.md) (quality checklist and anti-patterns)

TASK: Need library reference
  → Read [{baseDir}/references/libraries.md]({baseDir}/references/libraries.md) (PBT libraries by language, includes smart contract tools)

How to Suggest PBT

When you detect a high-value pattern while writing tests, offer PBT as an option:

"I notice encode_message/decode_message is a serialization pair. Property-based testing with a roundtrip property would provide stronger coverage than example tests. Want me to use that approach?"

If codebase already uses a PBT library (Hypothesis, fast-check, proptest, Echidna), be more direct:

"This codebase uses Hypothesis. I'll write property-based tests for this serialization pair using a roundtrip property."

If user declines, write good example-based tests without further prompting.

When NOT to Use PBT

Simple CRUD without complex validation
UI/presentation logic
Integration tests requiring complex external setup
Prototyping where requirements are fluid
User explicitly requests example-based tests only

Red Flags

Recommending trivial getters/setters
Missing paired operations (encode without decode)
Ignoring type hints (well-typed = easier to test)
Overwhelming user with candidates (limit to top 5-10)
Being pushy after user declines

/second-opinion

Source: `~/.claude/skills/tob-second-opinion/skills/second-opinion/SKILL.md`

name: second-opinion description: "Runs external LLM code reviews (OpenAI Codex or Google Gemini CLI) on uncommitted changes, branch diffs, or specific commits. Use when the user asks for a second opinion, external review, codex review, gemini review, or mentions /second-opinion." allowed-tools:

Bash
Read
Glob
Grep
AskUserQuestion

Second Opinion

Shell out to external LLM CLIs for an independent code review powered by a separate model. Supports OpenAI Codex CLI and Google Gemini CLI.

When to Use

Getting a second opinion on code changes from a different model
Reviewing branch diffs before opening a PR
Checking uncommitted work for issues before committing
Running a focused review (security, performance, error handling)
Comparing review output from multiple models

When NOT to Use

Neither Codex CLI nor Gemini CLI is installed
No API key or subscription configured for either tool
Reviewing non-code files (documentation, config)
You want Claude's own review (just ask Claude directly)

Safety Note

Gemini CLI is invoked with --yolo, which auto-approves all tool calls without confirmation. This is required for headless (non-interactive) operation but means Gemini will execute any tool actions its extensions request without prompting.

Quick Reference

# Codex
codex review --uncommitted
codex review --base <branch>
codex review --commit <sha>

# Gemini (code review extension)
gemini -p "/code-review" --yolo -e code-review
# Gemini (headless with diff — see references/ for full heredoc pattern)
git diff HEAD > /tmp/review-diff.txt
cat <<'PROMPT' | gemini -p - --yolo
Review this diff...
$(cat /tmp/review-diff.txt)
PROMPT

Invocation

1. Gather context interactively

Use AskUserQuestion to collect review parameters in one shot. Adapt the questions based on what the user already provided in their invocation (skip questions they already answered).

Combine all applicable questions into a single AskUserQuestion call (max 4 questions).

Question 1 — Tool (skip if user already specified):

header: "Review tool"
question: "Which tool should run the review?"
options:
  - "Both Codex and Gemini (Recommended)" → run both in parallel
  - "Codex only"                          → codex review
  - "Gemini only"                         → gemini CLI

Question 2 — Scope (skip if user already specified):

header: "Review scope"
question: "What should be reviewed?"
options:
  - "Uncommitted changes" → --uncommitted / git diff HEAD
  - "Branch diff vs main" → --base (auto-detect default branch)
  - "Specific commit"     → --commit (follow up for SHA)

Question 3 — Project context (skip if neither CLAUDE.md nor AGENTS.md exists):

Check for CLAUDE.md first, then AGENTS.md in the repo root. Only show this question if at least one exists.

header: "Project context"
question: "Include project conventions file so the review
  checks against your standards?"
options:
  - "Yes, include it"
  - "No, standard review"

Note: Project context only applies to Gemini and to Codex with --uncommitted. For Codex with --base/--commit, the positional prompt is not supported — inform the user that Codex will review without custom instructions in this mode (it still reads AGENTS.md if one exists in the repo).

Question 4 — Review focus (always ask):

header: "Review focus"
question: "Any specific focus areas for the review?"
options:
  - "General review"    → no custom prompt
  - "Security & auth"   → security-focused prompt
  - "Performance"       → performance-focused prompt
  - "Error handling"    → error handling-focused prompt

2. Run the tool directly

Do not pre-check tool availability. Run the selected tool immediately. If the command fails with "command not found" or an extension is missing, report the install command from the Error Handling table below and skip that tool (if "Both" was selected, run only the available one).

Diff Preview

After collecting answers, show the diff stats:

# For uncommitted:
git diff --stat HEAD

# For branch diff:
git diff --stat <branch>...HEAD

# For specific commit:
git diff --stat <sha>~1..<sha>

If the diff is empty, stop and tell the user.

If the diff is very large (>2000 lines changed), warn the user that high-effort reasoning on a large diff will be slow and ask whether to proceed or narrow the scope.

Auto-detect Default Branch

For branch diff scope, detect the default branch name:

git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null \
  | sed 's@^refs/remotes/origin/@@' || echo main

Codex Invocation

See references/codex-invocation.md for full details on command syntax, prompt passing, and model fallback.

Summary:

Model: gpt-5.3-codex, reasoning: xhigh
--uncommitted takes a positional prompt
--base and --commit do NOT accept custom prompts (Codex reads AGENTS.md if present, but the skill will not create one; note this limitation to the user)
Falls back to gpt-5.2-codex on auth errors
Output is verbose — summarize findings, don't dump raw (see references/codex-invocation.md § Parsing Output)
Set timeout: 600000 on the Bash call

Gemini Invocation

See references/gemini-invocation.md for full details on flags, scope mapping, and extension usage.

Summary:

Model: gemini-3-pro-preview, flags: --yolo, -e, -m
For uncommitted general review: gemini -p "/code-review" --yolo -e code-review
For branch/commit diffs: pipe git diff into gemini -p
Security extension name is gemini-cli-security (not security)
/security:analyze is interactive-only — use -p with a security prompt instead
Run /security:scan-deps as bonus when security focus selected
Set timeout: 600000 on the Bash call

Scope mapping for git diff (Gemini has no built-in scope flags):

Scope	Diff command
Uncommitted	`git diff HEAD`
Branch diff	`git diff <branch>...HEAD`
Specific commit	`git diff <sha>~1..<sha>`

Running Both

When the user picks "Both" (the default):

Run Codex and Gemini in parallel — issue both Bash tool calls in a single response. Both commands are read-only (they review diffs via external APIs) so there is no shared state or git lock contention.
Collect both results, then present with clear headers:

## Codex Review (gpt-5.3-codex)
<codex output>

## Gemini Review (gemini-3-pro-preview)
<gemini output>

Summarize where the two reviews agree and differ.

Error Handling

Error	Action
`codex: command not found`	Tell user: `npm i -g @openai/codex`
`gemini: command not found`	Tell user: `npm i -g @google/gemini-cli`
Gemini `code-review` extension missing	Tell user: `gemini extensions install https://github.com/gemini-cli-extensions/code-review`
Gemini `gemini-cli-security` extension missing	Tell user: `gemini extensions install https://github.com/gemini-cli-extensions/security`
Model auth error (Codex)	Retry with `gpt-5.2-codex`
Empty diff	Tell user there are no changes to review
Timeout	Inform user and suggest narrowing the diff scope
Tool partially unavailable	Run only the available tool, note the skip

Examples

Both tools (default):

User: /second-opinion
Claude: [asks 4 questions: tool, scope, context, focus]
User: picks "Both", "Branch diff", "Yes include CLAUDE.md", "Security"
Claude: [detects default branch = main]
Claude: [shows diff --stat: 6 files, +103 -15]
Claude: [runs Codex review with security prompt]
Claude: [runs Gemini review with security prompt + dep scan]
Claude: [presents both reviews, highlights agreements/differences]

Codex only with inline args:

User: /second-opinion check uncommitted changes for bugs
Claude: [scope known: uncommitted, focus known: custom]
Claude: [asks 2 questions: tool, project context]
User: picks "Codex only", "No context"
Claude: [shows diff --stat: 3 files, +45 -10]
Claude: [runs codex review --uncommitted with prompt]
Claude: [presents review]

Gemini only:

User: /second-opinion
Claude: [asks 4 questions]
User: picks "Gemini only", "Uncommitted", "No", "General"
Claude: [shows diff --stat: 2 files, +20 -5]
Claude: [runs gemini -p "/code-review" --yolo -e code-review]
Claude: [presents review]

Large diff warning:

User: /second-opinion
Claude: [asks questions] → user picks "Both", "Uncommitted", "General"
Claude: [shows diff --stat: 45 files, +3200 -890]
Claude: "Large diff (3200+ lines). High-effort reasoning will be
  slow. Proceed, or narrow the scope?"
User: "proceed"
Claude: [runs both reviews]

/semgrep-rule-creator

Source: `~/.claude/skills/tob-semgrep-rule-creator/skills/semgrep-rule-creator/SKILL.md`

name: semgrep-rule-creator description: Creates custom Semgrep rules for detecting security vulnerabilities, bug patterns, and code patterns. Use when writing Semgrep rules or building custom static analysis detections. allowed-tools:

Bash
Read
Write
Edit
Glob
Grep
WebFetch

Semgrep Rule Creator

Create production-quality Semgrep rules with proper testing and validation.

When to Use

Ideal scenarios:

Writing Semgrep rules for specific bug patterns
Writing rules to detect security vulnerabilities in your codebase
Writing taint mode rules for data flow vulnerabilities
Writing rules to enforce coding standards

When NOT to Use

Do NOT use this skill for:

Running existing Semgrep rulesets
General static analysis without custom rules (use static-analysis skill)

Rationalizations to Reject

When writing Semgrep rules, reject these common shortcuts:

"The pattern looks complete" → Still run semgrep --test --config <rule-id>.yaml <rule-id>.<ext> to verify. Untested rules have hidden false positives/negatives.
"It matches the vulnerable case" → Matching vulnerabilities is half the job. Verify safe cases don't match (false positives break trust).
"Taint mode is overkill for this" → If data flows from user input to a dangerous sink, taint mode gives better precision than pattern matching.
"One test is enough" → Include edge cases: different coding styles, sanitized inputs, safe alternatives, and boundary conditions.
"I'll optimize the patterns first" → Write correct patterns first, optimize after all tests pass. Premature optimization causes regressions.
"The AST dump is too complex" → The AST reveals exactly how Semgrep sees code. Skipping it leads to patterns that miss syntactic variations.

Anti-Patterns

Too broad - matches everything, useless for detection:

# BAD: Matches any function call
pattern: $FUNC(...)

# GOOD: Specific dangerous function
pattern: eval(...)

Missing safe cases in tests - leads to undetected false positives:

# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)

# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

# ok: my-rule
dangerous("hardcoded_safe_value")

Overly specific patterns - misses variations:

# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)

# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sinks:
  - pattern: os.system(...)

Strictness Level

This workflow is strict - do not skip steps:

Read documentation first: See Documentation before writing Semgrep rules
Test-first is mandatory: Never write a rule without tests
100% test pass is required: "Most tests pass" is not acceptable
Optimization comes last: Only simplify patterns after all tests pass
Avoid generic patterns: Rules must be specific, not match broad patterns
Prioritize taint mode: For data flow vulnerabilities
One YAML file - one Semgrep rule: Each YAML file must contain only one Semgrep rule; don't combine multiple rules in a single file
No generic rules: When targeting a specific language for Semgrep rules - avoid generic pattern matching (languages: generic)
Forbidden todook and todoruleid test annotations: todoruleid: <rule-id> and todook: <rule-id> annotations in tests files for future rule improvements are forbidden

Overview

This skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.

Approach selection:

Taint mode (prioritize): Data flow issues where untrusted input reaches dangerous sinks
Pattern matching: Simple syntactic patterns without data flow requirements

Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.

Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.

Output structure - exactly 2 files in a directory named after the rule-id:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations

Quick Start

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

Test file (insecure-eval.py):

# ruleid: insecure-eval
eval(request.args.get('code'))

# ok: insecure-eval
eval("print('safe')")

Run tests (from rule directory): semgrep --test --config <rule-id>.yaml <rule-id>.<ext>

Quick Reference

For commands, pattern operators, and taint mode syntax, see quick-reference.md.
For detailed workflow and examples, you MUST see workflow.md

Workflow

Copy this checklist and track progress:

Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run

Documentation

REQUIRED: Before writing any rule, use WebFetch to read all of these 4 links with Semgrep documentation:

/semgrep-rule-variant-creator

Source: `~/.claude/skills/tob-semgrep-rule-variant-creator/skills/semgrep-rule-variant-creator/SKILL.md`

name: semgrep-rule-variant-creator description: Creates language variants of existing Semgrep rules. Use when porting a Semgrep rule to specified target languages. Takes an existing rule and target languages as input, produces independent rule+test directories for each language. allowed-tools:

Bash
Read
Write
Edit
Glob
Grep
WebFetch

Semgrep Rule Variant Creator

Port existing Semgrep rules to new target languages with proper applicability analysis and test-driven validation.

When to Use

Ideal scenarios:

Porting an existing Semgrep rule to one or more target languages
Creating language-specific variants of a universal vulnerability pattern
Expanding rule coverage across a polyglot codebase
Translating rules between languages with equivalent constructs

When NOT to Use

Do NOT use this skill for:

Creating a new Semgrep rule from scratch (use semgrep-rule-creator instead)
Running existing rules against code
Languages where the vulnerability pattern fundamentally doesn't apply
Minor syntax variations within the same language

Input Specification

This skill requires:

Existing Semgrep rule - YAML file path or YAML rule content
Target languages - One or more languages to port to (e.g., "Golang and Java")

Output Specification

For each applicable target language, produces:

<original-rule-id>-<language>/
├── <original-rule-id>-<language>.yaml     # Ported Semgrep rule
└── <original-rule-id>-<language>.<ext>    # Test file with annotations

Example output for porting sql-injection to Go and Java:

sql-injection-golang/
├── sql-injection-golang.yaml
└── sql-injection-golang.go

sql-injection-java/
├── sql-injection-java.yaml
└── sql-injection-java.java

Rationalizations to Reject

When porting Semgrep rules, reject these common shortcuts:

Rationalization	Why It Fails	Correct Approach
"Pattern structure is identical"	Different ASTs across languages	Always dump AST for target language
"Same vulnerability, same detection"	Data flow differs between languages	Analyze target language idioms
"Rule doesn't need tests since original worked"	Language edge cases differ	Write NEW test cases for target
"Skip applicability - it obviously applies"	Some patterns are language-specific	Complete applicability analysis first
"I'll create all variants then test"	Errors compound, hard to debug	Complete full cycle per language
"Library equivalent is close enough"	Surface similarity hides differences	Verify API semantics match
"Just translate the syntax 1:1"	Languages have different idioms	Research target language patterns

Strictness Level

This workflow is strict - do not skip steps:

Applicability analysis is mandatory: Don't assume patterns translate
Each language is independent: Complete full cycle before moving to next
Test-first for each variant: Never write a rule without test cases
100% test pass required: "Most tests pass" is not acceptable

Overview

This skill guides the creation of language-specific variants of existing Semgrep rules. Each target language goes through an independent 4-phase cycle:

FOR EACH target language:
  Phase 1: Applicability Analysis → Verdict
  Phase 2: Test Creation (Test-First)
  Phase 3: Rule Creation
  Phase 4: Validation
  (Complete full cycle before moving to next language)

Foundational Knowledge

The semgrep-rule-creator skill is the authoritative reference for Semgrep rule creation fundamentals. While this skill focuses on porting existing rules to new languages, the core principles of writing quality rules remain the same.

Consult semgrep-rule-creator for guidance on:

When to use taint mode vs pattern matching - Choosing the right approach for the vulnerability type
Test-first methodology - Why tests come before rules and how to write effective test cases
Anti-patterns to avoid - Common mistakes like overly broad or overly specific patterns
Iterating until tests pass - The validation loop and debugging techniques
Rule optimization - Removing redundant patterns after tests pass

When porting a rule, you're applying these same principles in a new language context. If uncertain about rule structure or approach, refer to semgrep-rule-creator first.

Four-Phase Workflow

Phase 1: Applicability Analysis

Before porting, determine if the pattern applies to the target language.

Analysis criteria:

Does the vulnerability class exist in the target language?
Does an equivalent construct exist (function, pattern, library)?
Are the semantics similar enough for meaningful detection?

Verdict options:

APPLICABLE → Proceed with variant creation
APPLICABLE_WITH_ADAPTATION → Proceed but significant changes needed
NOT_APPLICABLE → Skip this language, document why

See applicability-analysis.md for detailed guidance.

Phase 2: Test Creation (Test-First)

Always write tests before the rule.

Create test file with target language idioms:

Minimum 2 vulnerable cases (ruleid:)
Minimum 2 safe cases (ok:)
Include language-specific edge cases

// ruleid: sql-injection-golang
db.Query("SELECT * FROM users WHERE id = " + userInput)

// ok: sql-injection-golang
db.Query("SELECT * FROM users WHERE id = ?", userInput)

Phase 3: Rule Creation

Analyze AST: semgrep --dump-ast -l <lang> test-file
Translate patterns to target language syntax
Update metadata: language key, message, rule ID
Adapt for idioms: Handle language-specific constructs

See language-syntax-guide.md for translation guidance.

Phase 4: Validation

# Validate YAML
semgrep --validate --config rule.yaml

# Run tests
semgrep --test --config rule.yaml test-file

Checkpoint: Output MUST show All tests passed.

For taint rule debugging:

semgrep --dataflow-traces -f rule.yaml test-file

See workflow.md for detailed workflow and troubleshooting.

Quick Reference

Task	Command
Run tests	`semgrep --test --config rule.yaml test-file`
Validate YAML	`semgrep --validate --config rule.yaml`
Dump AST	`semgrep --dump-ast -l <lang> <file>`
Debug taint flow	`semgrep --dataflow-traces -f rule.yaml file`

Key Differences from Rule Creation

Aspect	semgrep-rule-creator	This skill
Input	Bug pattern description	Existing rule + target languages
Output	Single rule+test	Multiple rule+test directories
Workflow	Single creation cycle	Independent cycle per language
Phase 1	Problem analysis	Applicability analysis per language
Library research	Always relevant	Optional (when original uses libraries)

Documentation

REQUIRED: Before porting rules, read relevant Semgrep documentation:

Rule Syntax - YAML structure and operators
Pattern Syntax - Pattern matching and metavariables
Pattern Examples - Per-language pattern references
Testing Rules - Testing annotations
Trail of Bits Testing Handbook - Advanced patterns

Next Steps

For applicability analysis guidance, see applicability-analysis.md
For language translation guidance, see language-syntax-guide.md
For detailed workflow and examples, see workflow.md

/sharp-edges

Source: `~/.claude/skills/tob-sharp-edges/skills/sharp-edges/SKILL.md`

name: sharp-edges description: "Identifies error-prone APIs, dangerous configurations, and footgun designs that enable security mistakes. Use when reviewing API designs, configuration schemas, cryptographic library ergonomics, or evaluating whether code follows 'secure by default' and 'pit of success' principles. Triggers: footgun, misuse-resistant, secure defaults, API usability, dangerous configuration." allowed-tools:

Read
Grep
Glob

Sharp Edges Analysis

Evaluates whether APIs, configurations, and interfaces are resistant to developer misuse. Identifies designs where the "easy path" leads to insecurity.

When to Use

Reviewing API or library design decisions
Auditing configuration schemas for dangerous options
Evaluating cryptographic API ergonomics
Assessing authentication/authorization interfaces
Reviewing any code that exposes security-relevant choices to developers

When NOT to Use

Implementation bugs (use standard code review)
Business logic flaws (use domain-specific analysis)
Performance optimization (different concern)

Core Principle

The pit of success: Secure usage should be the path of least resistance. If developers must understand cryptography, read documentation carefully, or remember special rules to avoid vulnerabilities, the API has failed.

Rationalizations to Reject

Rationalization	Why It's Wrong	Required Action
"It's documented"	Developers don't read docs under deadline pressure	Make the secure choice the default or only option
"Advanced users need flexibility"	Flexibility creates footguns; most "advanced" usage is copy-paste	Provide safe high-level APIs; hide primitives
"It's the developer's responsibility"	Blame-shifting; you designed the footgun	Remove the footgun or make it impossible to misuse
"Nobody would actually do that"	Developers do everything imaginable under pressure	Assume maximum developer confusion
"It's just a configuration option"	Config is code; wrong configs ship to production	Validate configs; reject dangerous combinations
"We need backwards compatibility"	Insecure defaults can't be grandfather-claused	Deprecate loudly; force migration

Sharp Edge Categories

1. Algorithm/Mode Selection Footguns

APIs that let developers choose algorithms invite choosing wrong ones.

The JWT Pattern (canonical example):

Header specifies algorithm: attacker can set "alg": "none" to bypass signatures
Algorithm confusion: RSA public key used as HMAC secret when switching RS256→HS256
Root cause: Letting untrusted input control security-critical decisions

Detection patterns:

Function parameters like algorithm, mode, cipher, hash_type
Enums/strings selecting cryptographic primitives
Configuration options for security mechanisms

Example - PHP password_hash allowing weak algorithms:

// DANGEROUS: allows crc32, md5, sha1
password_hash($password, PASSWORD_DEFAULT); // Good - no choice
hash($algorithm, $password); // BAD: accepts "crc32"

2. Dangerous Defaults

Defaults that are insecure, or zero/empty values that disable security.

The OTP Lifetime Pattern:

# What happens when lifetime=0?
def verify_otp(code, lifetime=300):  # 300 seconds default
    if lifetime == 0:
        return True  # OOPS: 0 means "accept all"?
        # Or does it mean "expired immediately"?

Detection patterns:

Timeouts/lifetimes that accept 0 (infinite? immediate expiry?)
Empty strings that bypass checks
Null values that skip validation
Boolean defaults that disable security features
Negative values with undefined semantics

Questions to ask:

What happens with timeout=0? max_attempts=0? key=""?
Is the default the most secure option?
Can any default value disable security entirely?

3. Primitive vs. Semantic APIs

APIs that expose raw bytes instead of meaningful types invite type confusion.

The Libsodium vs. Halite Pattern:

// Libsodium (primitives): bytes are bytes
sodium_crypto_box($message, $nonce, $keypair);
// Easy to: swap nonce/keypair, reuse nonces, use wrong key type

// Halite (semantic): types enforce correct usage
Crypto::seal($message, new EncryptionPublicKey($key));
// Wrong key type = type error, not silent failure

Detection patterns:

Functions taking bytes, string, []byte for distinct security concepts
Parameters that could be swapped without type errors
Same type used for keys, nonces, ciphertexts, signatures

The comparison footgun:

// Timing-safe comparison looks identical to unsafe
if hmac == expected { }           // BAD: timing attack
if hmac.Equal(mac, expected) { }  // Good: constant-time
// Same types, different security properties

4. Configuration Cliffs

One wrong setting creates catastrophic failure, with no warning.

Detection patterns:

Boolean flags that disable security entirely
String configs that aren't validated
Combinations of settings that interact dangerously
Environment variables that override security settings
Constructor parameters with sensible defaults but no validation (callers can override with insecure values)

Examples:

# One typo = disaster
verify_ssl: fasle  # Typo silently accepted as truthy?

# Magic values
session_timeout: -1  # Does this mean "never expire"?

# Dangerous combinations accepted silently
auth_required: true
bypass_auth_for_health_checks: true
health_check_path: "/"  # Oops

// Sensible default doesn't protect against bad callers
public function __construct(
    public string $hashAlgo = 'sha256',  // Good default...
    public int $otpLifetime = 120,       // ...but accepts md5, 0, etc.
) {}

See config-patterns.md for detailed patterns.

5. Silent Failures

Errors that don't surface, or success that masks failure.

Detection patterns:

Functions returning booleans instead of throwing on security failures
Empty catch blocks around security operations
Default values substituted on parse errors
Verification functions that "succeed" on malformed input

Examples:

# Silent bypass
def verify_signature(sig, data, key):
    if not key:
        return True  # No key = skip verification?!

# Return value ignored
signature.verify(data, sig)  # Throws on failure
crypto.verify(data, sig)     # Returns False on failure
# Developer forgets to check return value

6. Stringly-Typed Security

Security-critical values as plain strings enable injection and confusion.

Detection patterns:

SQL/commands built from string concatenation
Permissions as comma-separated strings
Roles/scopes as arbitrary strings instead of enums
URLs constructed by joining strings

The permission accumulation footgun:

permissions = "read,write"
permissions += ",admin"  # Too easy to escalate

# vs. type-safe
permissions = {Permission.READ, Permission.WRITE}
permissions.add(Permission.ADMIN)  # At least it's explicit

Analysis Workflow

Phase 1: Surface Identification

Map security-relevant APIs: authentication, authorization, cryptography, session management, input validation
Identify developer choice points: Where can developers select algorithms, configure timeouts, choose modes?
Find configuration schemas: Environment variables, config files, constructor parameters

Phase 2: Edge Case Probing

For each choice point, ask:

Zero/empty/null: What happens with 0, "", null, []?
Negative values: What does -1 mean? Infinite? Error?
Type confusion: Can different security concepts be swapped?
Default values: Is the default secure? Is it documented?
Error paths: What happens on invalid input? Silent acceptance?

Phase 3: Threat Modeling

Consider three adversaries:

The Scoundrel: Actively malicious developer or attacker controlling config
- Can they disable security via configuration?
- Can they downgrade algorithms?
- Can they inject malicious values?
The Lazy Developer: Copy-pastes examples, skips documentation
- Will the first example they find be secure?
- Is the path of least resistance secure?
- Do error messages guide toward secure usage?
The Confused Developer: Misunderstands the API
- Can they swap parameters without type errors?
- Can they use the wrong key/algorithm/mode by accident?
- Are failure modes obvious or silent?

Phase 4: Validate Findings

For each identified sharp edge:

Reproduce the misuse: Write minimal code demonstrating the footgun
Verify exploitability: Does the misuse create a real vulnerability?
Check documentation: Is the danger documented? (Documentation doesn't excuse bad design, but affects severity)
Test mitigations: Can the API be used safely with reasonable effort?

If a finding seems questionable, return to Phase 2 and probe more edge cases.

Severity Classification

Severity	Criteria	Examples
Critical	Default or obvious usage is insecure	`verify: false` default; empty password allowed
High	Easy misconfiguration breaks security	Algorithm parameter accepts "none"
Medium	Unusual but possible misconfiguration	Negative timeout has unexpected meaning
Low	Requires deliberate misuse	Obscure parameter combination

References

By category:

Cryptographic APIs: See references/crypto-apis.md
Configuration Patterns: See references/config-patterns.md
Authentication/Session: See references/auth-patterns.md
Real-World Case Studies: See references/case-studies.md (OpenSSL, GMP, etc.)

By language (general footguns, not crypto-specific):

Language	Guide
C/C++	references/lang-c.md
Go	references/lang-go.md
Rust	references/lang-rust.md
Swift	references/lang-swift.md
Java	references/lang-java.md
Kotlin	references/lang-kotlin.md
C#	references/lang-csharp.md
PHP	references/lang-php.md
JavaScript/TypeScript	references/lang-javascript.md
Python	references/lang-python.md
Ruby	references/lang-ruby.md

See also references/language-specific.md for a combined quick reference.

Quality Checklist

Before concluding analysis:

Probed all zero/empty/null edge cases
Verified defaults are secure
Checked for algorithm/mode selection footguns
Tested type confusion between security concepts
Considered all three adversary types
Verified error paths don't bypass security
Checked configuration validation
Constructor params validated (not just defaulted) - see config-patterns.md

/spec-to-code-compliance

Source: `~/.claude/skills/tob-spec-to-code-compliance/skills/spec-to-code-compliance/SKILL.md`

name: spec-to-code-compliance description: Verifies code implements exactly what documentation specifies for blockchain audits. Use when comparing code against whitepapers, finding gaps between specs and implementation, or performing compliance checks for protocol implementations.

When to Use

Use this skill when you need to:

Verify code implements exactly what documentation specifies
Audit smart contracts against whitepapers or design documents
Find gaps between intended behavior and actual implementation
Identify undocumented code behavior or unimplemented spec claims
Perform compliance checks for blockchain protocol implementations

Concrete triggers:

User provides both specification documents AND codebase
Questions like "does this code match the spec?" or "what's missing from the implementation?"
Audit engagements requiring spec-to-code alignment analysis
Protocol implementations being verified against whitepapers

When NOT to Use

Do NOT use this skill for:

Codebases without corresponding specification documents
General code review or vulnerability hunting (use audit-context-building instead)
Writing or improving documentation (this skill only verifies compliance)
Non-blockchain projects without formal specifications

Spec-to-Code Compliance Checker Skill

You are the Spec-to-Code Compliance Checker — a senior-level blockchain auditor whose job is to determine whether a codebase implements exactly what the documentation states, across logic, invariants, flows, assumptions, math, and security guarantees.

Your work must be:

deterministic
grounded in evidence
traceable
non-hallucinatory
exhaustive

GLOBAL RULES

Never infer unspecified behavior.
Always cite exact evidence from:
- the documentation (section/title/quote)
- the code (file + line numbers)
Always provide a confidence score (0–1) for mappings.
Always classify ambiguity instead of guessing.
Maintain strict separation between:
1. extraction
2. alignment
3. classification
4. reporting
Do NOT rely on prior knowledge of known protocols. Only use provided materials.
Be literal, pedantic, and exhaustive.

Rationalizations (Do Not Skip)

Rationalization	Why It's Wrong	Required Action
"Spec is clear enough"	Ambiguity hides in plain sight	Extract to IR, classify ambiguity explicitly
"Code obviously matches"	Obvious matches have subtle divergences	Document match_type with evidence
"I'll note this as partial match"	Partial = potential vulnerability	Investigate until full_match or mismatch
"This undocumented behavior is fine"	Undocumented = untested = risky	Classify as UNDOCUMENTED CODE PATH
"Low confidence is okay here"	Low confidence findings get ignored	Investigate until confidence ≥ 0.8 or classify as AMBIGUOUS
"I'll infer what the spec meant"	Inference = hallucination	Quote exact text or mark UNDOCUMENTED

PHASE 0 — Documentation Discovery

Identify all content representing documentation, even if not named "spec."

Documentation may appear as:

whitepaper.pdf
Protocol.md
design_notes
Flow.pdf
README.md
kickoff transcripts
Notion exports
Anything describing logic, flows, assumptions, incentives, etc.

Use semantic cues:

architecture descriptions
invariants
formulas
variable meanings
trust models
workflow sequencing
tables describing logic
diagrams (convert to text)

Extract ALL relevant documents into a unified spec corpus.

PHASE 1 — Universal Format Normalization

Normalize ANY input format:

PDF
Markdown
DOCX
HTML
TXT
Notion export
Meeting transcripts

Preserve:

heading hierarchy
bullet lists
formulas
tables (converted to plaintext)
code snippets
invariant definitions

Remove:

layout noise
styling artifacts
watermarks

Output: a clean, canonical spec_corpus.

PHASE 2 — Spec Intent IR (Intermediate Representation)

Extract all intended behavior into the Spec-IR.

Each extracted item MUST include:

spec_excerpt
source_section
semantic_type
normalized representation
confidence score

Extract:

protocol purpose
actors, roles, trust boundaries
variable definitions & expected relationships
all preconditions / postconditions
explicit invariants
implicit invariants deduced from context
math formulas (in canonical symbolic form)
expected flows & state-machine transitions
economic assumptions
ordering & timing constraints
error conditions & expected revert logic
security requirements ("must/never/always")
edge-case behavior

This forms Spec-IR.

See IR_EXAMPLES.md for detailed examples.

PHASE 3 — Code Behavior IR

(WITH TRUE LINE-BY-LINE / BLOCK-BY-BLOCK ANALYSIS)

Perform structured, deterministic, line-by-line and block-by-block semantic analysis of the entire codebase.

For EVERY LINE and EVERY BLOCK, extract:

file + exact line numbers
local variable updates
state reads/writes
conditional branches & alternative paths
unreachable branches
revert conditions & custom errors
external calls (call, delegatecall, staticcall, create2)
event emissions
math operations and rounding behavior
implicit assumptions
block-level preconditions & postconditions
locally enforced invariants
state transitions
side effects
dependencies on prior state

For EVERY FUNCTION, extract:

signature & visibility
applied modifiers (and their logic)
purpose (based on actual behavior)
input/output semantics
read/write sets
full control-flow structure
success vs revert paths
internal/external call graph
cross-function interactions

Also capture:

storage layout
initialization logic
authorization graph (roles → permissions)
upgradeability mechanism (if present)
hidden assumptions

Output: Code-IR, a granular semantic map with full traceability.

See IR_EXAMPLES.md for detailed examples.

PHASE 4 — Alignment IR (Spec ↔ Code Comparison)

For each item in Spec-IR: Locate related behaviors in Code-IR and generate an Alignment Record containing:

spec_excerpt
code_excerpt (with file + line numbers)
match_type:
- full_match
- partial_match
- mismatch
- missing_in_code
- code_stronger_than_spec
- code_weaker_than_spec
reasoning trace
confidence score (0–1)
ambiguity rating
evidence links

Explicitly check:

invariants vs enforcement
formulas vs math implementation
flows vs real transitions
actor expectations vs real privilege map
ordering constraints vs actual logic
revert expectations vs actual checks
trust assumptions vs real external call behavior

Also detect:

undocumented code behavior
unimplemented spec claims
contradictions inside the spec
contradictions inside the code
inconsistencies across multiple spec documents

Output: Alignment-IR

See IR_EXAMPLES.md for detailed examples.

PHASE 5 — Divergence Classification

Classify each misalignment by severity:

CRITICAL

Spec says X, code does Y
Missing invariant enabling exploits
Math divergence involving funds
Trust boundary mismatches

HIGH

Partial/incorrect implementation
Access control misalignment
Dangerous undocumented behavior

MEDIUM

Ambiguity with security implications
Missing revert checks
Incomplete edge-case handling

LOW

Documentation drift
Minor semantics mismatch

Each finding MUST include:

evidence links
severity justification
exploitability reasoning
recommended remediation

See IR_EXAMPLES.md for detailed divergence finding examples with complete exploit scenarios, economic analysis, and remediation plans.

PHASE 6 — Final Audit-Grade Report

Produce a structured compliance report:

Executive Summary
Documentation Sources Identified
Spec Intent Breakdown (Spec-IR)
Code Behavior Summary (Code-IR)
Full Alignment Matrix (Spec → Code → Status)
Divergence Findings (with evidence & severity)
Missing invariants
Incorrect logic
Math inconsistencies
Flow/state machine mismatches
Access control drift
Undocumented behavior
Ambiguity hotspots (spec & code)
Recommended remediations
Documentation update suggestions
Final risk assessment

Output Requirements & Quality Standards

See OUTPUT_REQUIREMENTS.md for:

Required IR production standards for all phases
Quality thresholds (minimum Spec-IR items, confidence scores, etc.)
Format consistency requirements (YAML formatting, line number citations)
Anti-hallucination requirements

Completeness Verification

Before finalizing analysis, review the COMPLETENESS_CHECKLIST.md to verify:

Spec-IR completeness (all invariants, formulas, security requirements extracted)
Code-IR completeness (all functions analyzed, state changes tracked)
Alignment-IR completeness (every spec item has alignment record)
Divergence finding quality (exploit scenarios, economic impact, remediation)
Final report completeness (all 16 sections present)

ANTI-HALLUCINATION REQUIREMENTS

If the spec is silent: classify as UNDOCUMENTED.
If the code adds behavior: classify as UNDOCUMENTED CODE PATH.
If unclear: classify as AMBIGUOUS.
Every claim must quote original text or line numbers.
Zero speculation.
Exhaustive, literal, pedantic reasoning.

Resources

Detailed Examples:

IR_EXAMPLES.md - Complete IR workflow examples with DEX swap patterns

Standards & Requirements:

OUTPUT_REQUIREMENTS.md - IR production standards, quality thresholds, format rules
COMPLETENESS_CHECKLIST.md - Verification checklist for all phases

END OF SKILL

/codeql

Source: `~/.claude/skills/tob-static-analysis/skills/codeql/SKILL.md`

name: codeql description: >- Runs CodeQL static analysis for security vulnerability detection using interprocedural data flow and taint tracking. Applicable when finding vulnerabilities, running a security scan, performing a security audit, running CodeQL, building a CodeQL database, selecting query rulesets, creating data extension models, or processing CodeQL SARIF output. NOT for writing custom QL queries or CI/CD pipeline setup. allowed-tools:

Bash
Read
Write
Glob
Grep
AskUserQuestion
Task
TaskCreate
TaskList
TaskUpdate

CodeQL Analysis

Supported languages: Python, JavaScript/TypeScript, Go, Java/Kotlin, C/C++, C#, Ruby, Swift.

Skill resources: Reference files and templates are located at {baseDir}/references/ and {baseDir}/workflows/. Use {baseDir} to resolve paths to these files at runtime.

Quick Start

For the common case ("scan this codebase for vulnerabilities"):

# 1. Verify CodeQL is installed
command -v codeql >/dev/null 2>&1 && codeql --version || echo "NOT INSTALLED"

# 2. Check for existing database
ls -dt codeql_*.db 2>/dev/null | head -1

Then execute the full pipeline: build database → create data extensions → run analysis using the workflows below.

When to Use

Scanning a codebase for security vulnerabilities with deep data flow analysis
Building a CodeQL database from source code (with build capability for compiled languages)
Finding complex vulnerabilities that require interprocedural taint tracking or AST/CFG analysis
Performing comprehensive security audits with multiple query packs

When NOT to Use

Writing custom queries - Use a dedicated query development skill
CI/CD integration - Use GitHub Actions documentation directly
Quick pattern searches - Use Semgrep or grep for speed
No build capability for compiled languages - Consider Semgrep instead
Single-file or lightweight analysis - Semgrep is faster for simple pattern matching

Rationalizations to Reject

These shortcuts lead to missed findings. Do not accept them:

"security-extended is enough" - It is the baseline. Always check if Trail of Bits packs and Community Packs are available for the language. They catch categories security-extended misses entirely.
"The database built, so it's good" - A database that builds does not mean it extracted well. Always run Step 4 (quality assessment) and check file counts against expected source files. A cached build produces zero useful extraction.
"Data extensions aren't needed for standard frameworks" - Even Django/Spring apps have custom wrappers around ORM calls, request parsing, or shell execution that CodeQL does not model. Skipping the extensions workflow means missing vulnerabilities in project-specific code.
"build-mode=none is fine for compiled languages" - It produces severely incomplete analysis. No interprocedural data flow through compiled code is traced. Only use as an absolute last resort and clearly flag the limitation.
"No findings means the code is secure" - Zero findings can indicate poor database quality, missing models, or wrong query packs. Investigate before reporting clean results.
"I'll just run the default suite" - The default suite varies by how CodeQL is invoked. Always explicitly specify the suite (e.g., security-extended) so results are reproducible.

Workflow Selection

This skill has three workflows:

Workflow	Purpose
build-database	Create CodeQL database using 3 build methods in sequence
create-data-extensions	Detect or generate data extension models for project APIs
run-analysis	Select rulesets, execute queries, process results

Auto-Detection Logic

If user explicitly specifies what to do (e.g., "build a database", "run analysis"), execute that workflow.

Default pipeline for "test", "scan", "analyze", or similar: Execute all three workflows sequentially: build → extensions → analysis. The create-data-extensions step is critical for finding vulnerabilities in projects with custom frameworks or annotations that CodeQL doesn't model by default.

# Check if database exists
DB=$(ls -dt codeql_*.db 2>/dev/null | head -1)
if [ -n "$DB" ] && codeql resolve database -- "$DB" >/dev/null 2>&1; then
  echo "DATABASE EXISTS ($DB) - can run analysis"
else
  echo "NO DATABASE - need to build first"
fi

Condition	Action
No database exists	Execute build → extensions → analysis (full pipeline)
Database exists, no extensions	Execute extensions → analysis
Database exists, extensions exist	Ask user: run analysis on existing DB, or rebuild?
User says "just run analysis" or "skip extensions"	Run analysis only

Decision Prompt

If unclear, ask user:

I can help with CodeQL analysis. What would you like to do?

1. **Full scan (Recommended)** - Build database, create extensions, then run analysis
2. **Build database** - Create a new CodeQL database from this codebase
3. **Create data extensions** - Generate custom source/sink models for project APIs
4. **Run analysis** - Run security queries on existing database

[If database exists: "I found an existing database at <DB_NAME>"]

/sarif-parsing

Source: `~/.claude/skills/tob-static-analysis/skills/sarif-parsing/SKILL.md`

name: sarif-parsing description: Parse, analyze, and process SARIF (Static Analysis Results Interchange Format) files. Use when reading security scan results, aggregating findings from multiple tools, deduplicating alerts, extracting specific vulnerabilities, or integrating SARIF data into CI/CD pipelines. allowed-tools:

Bash
Read
Glob
Grep

SARIF Parsing Best Practices

You are a SARIF parsing expert. Your role is to help users effectively read, analyze, and process SARIF files from static analysis tools.

When to Use

Use this skill when:

Reading or interpreting static analysis scan results in SARIF format
Aggregating findings from multiple security tools
Deduplicating or filtering security alerts
Extracting specific vulnerabilities from SARIF files
Integrating SARIF data into CI/CD pipelines
Converting SARIF output to other formats

When NOT to Use

Do NOT use this skill for:

Running static analysis scans (use CodeQL or Semgrep skills instead)
Writing CodeQL or Semgrep rules (use their respective skills)
Analyzing source code directly (SARIF is for processing existing scan results)
Triaging findings without SARIF input (use variant-analysis or audit skills)

SARIF Structure Overview

SARIF 2.1.0 is the current OASIS standard. Every SARIF file has this hierarchical structure:

sarifLog
├── version: "2.1.0"
├── $schema: (optional, enables IDE validation)
└── runs[] (array of analysis runs)
    ├── tool
    │   ├── driver
    │   │   ├── name (required)
    │   │   ├── version
    │   │   └── rules[] (rule definitions)
    │   └── extensions[] (plugins)
    ├── results[] (findings)
    │   ├── ruleId
    │   ├── level (error/warning/note)
    │   ├── message.text
    │   ├── locations[]
    │   │   └── physicalLocation
    │   │       ├── artifactLocation.uri
    │   │       └── region (startLine, startColumn, etc.)
    │   ├── fingerprints{}
    │   └── partialFingerprints{}
    └── artifacts[] (scanned files metadata)

Why Fingerprinting Matters

Without stable fingerprints, you can't track findings across runs:

Baseline comparison: "Is this a new finding or did we see it before?"
Regression detection: "Did this PR introduce new vulnerabilities?"
Suppression: "Ignore this known false positive in future runs"

Tools report different paths (/path/to/project/ vs /github/workspace/), so path-based matching fails. Fingerprints hash the content (code snippet, rule ID, relative location) to create stable identifiers regardless of environment.

Tool Selection Guide

Use Case	Tool	Installation
Quick CLI queries	jq	`brew install jq` / `apt install jq`
Python scripting (simple)	pysarif	`pip install pysarif`
Python scripting (advanced)	sarif-tools	`pip install sarif-tools`
.NET applications	SARIF SDK	NuGet package
JavaScript/Node.js	sarif-js	npm package
Go applications	garif	`go get github.com/chavacava/garif`
Validation	SARIF Validator	sarifweb.azurewebsites.net

Strategy 1: Quick Analysis with jq

For rapid exploration and one-off queries:

# Pretty print the file
jq '.' results.sarif

# Count total findings
jq '[.runs[].results[]] | length' results.sarif

# List all rule IDs triggered
jq '[.runs[].results[].ruleId] | unique' results.sarif

# Extract errors only
jq '.runs[].results[] | select(.level == "error")' results.sarif

# Get findings with file locations
jq '.runs[].results[] | {
  rule: .ruleId,
  message: .message.text,
  file: .locations[0].physicalLocation.artifactLocation.uri,
  line: .locations[0].physicalLocation.region.startLine
}' results.sarif

# Filter by severity and get count per rule
jq '[.runs[].results[] | select(.level == "error")] | group_by(.ruleId) | map({rule: .[0].ruleId, count: length})' results.sarif

# Extract findings for a specific file
jq --arg file "src/auth.py" '.runs[].results[] | select(.locations[].physicalLocation.artifactLocation.uri | contains($file))' results.sarif

Strategy 2: Python with pysarif

For programmatic access with full object model:

from pysarif import load_from_file, save_to_file

# Load SARIF file
sarif = load_from_file("results.sarif")

# Iterate through runs and results
for run in sarif.runs:
    tool_name = run.tool.driver.name
    print(f"Tool: {tool_name}")

    for result in run.results:
        print(f"  [{result.level}] {result.rule_id}: {result.message.text}")

        if result.locations:
            loc = result.locations[0].physical_location
            if loc and loc.artifact_location:
                print(f"    File: {loc.artifact_location.uri}")
                if loc.region:
                    print(f"    Line: {loc.region.start_line}")

# Save modified SARIF
save_to_file(sarif, "modified.sarif")

Strategy 3: Python with sarif-tools

For aggregation, reporting, and CI/CD integration:

from sarif import loader

# Load single file
sarif_data = loader.load_sarif_file("results.sarif")

# Or load multiple files
sarif_set = loader.load_sarif_files(["tool1.sarif", "tool2.sarif"])

# Get summary report
report = sarif_data.get_report()

# Get histogram by severity
errors = report.get_issue_type_histogram_for_severity("error")
warnings = report.get_issue_type_histogram_for_severity("warning")

# Filter results
high_severity = [r for r in sarif_data.get_results()
                 if r.get("level") == "error"]

sarif-tools CLI commands:

# Summary of findings
sarif summary results.sarif

# List all results with details
sarif ls results.sarif

# Get results by severity
sarif ls --level error results.sarif

# Diff two SARIF files (find new/fixed issues)
sarif diff baseline.sarif current.sarif

# Convert to other formats
sarif csv results.sarif > results.csv
sarif html results.sarif > report.html

Strategy 4: Aggregating Multiple SARIF Files

When combining results from multiple tools:

import json
from pathlib import Path

def aggregate_sarif_files(sarif_paths: list[str]) -> dict:
    """Combine multiple SARIF files into one."""
    aggregated = {
        "version": "2.1.0",
        "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
        "runs": []
    }

    for path in sarif_paths:
        with open(path) as f:
            sarif = json.load(f)
            aggregated["runs"].extend(sarif.get("runs", []))

    return aggregated

def deduplicate_results(sarif: dict) -> dict:
    """Remove duplicate findings based on fingerprints."""
    seen_fingerprints = set()

    for run in sarif["runs"]:
        unique_results = []
        for result in run.get("results", []):
            # Use partialFingerprints or create key from location
            fp = None
            if result.get("partialFingerprints"):
                fp = tuple(sorted(result["partialFingerprints"].items()))
            elif result.get("fingerprints"):
                fp = tuple(sorted(result["fingerprints"].items()))
            else:
                # Fallback: create fingerprint from rule + location
                loc = result.get("locations", [{}])[0]
                phys = loc.get("physicalLocation", {})
                fp = (
                    result.get("ruleId"),
                    phys.get("artifactLocation", {}).get("uri"),
                    phys.get("region", {}).get("startLine")
                )

            if fp not in seen_fingerprints:
                seen_fingerprints.add(fp)
                unique_results.append(result)

        run["results"] = unique_results

    return sarif

Strategy 5: Extracting Actionable Data

import json
from dataclasses import dataclass
from typing import Optional

@dataclass
class Finding:
    rule_id: str
    level: str
    message: str
    file_path: Optional[str]
    start_line: Optional[int]
    end_line: Optional[int]
    fingerprint: Optional[str]

def extract_findings(sarif_path: str) -> list[Finding]:
    """Extract structured findings from SARIF file."""
    with open(sarif_path) as f:
        sarif = json.load(f)

    findings = []
    for run in sarif.get("runs", []):
        for result in run.get("results", []):
            loc = result.get("locations", [{}])[0]
            phys = loc.get("physicalLocation", {})
            region = phys.get("region", {})

            findings.append(Finding(
                rule_id=result.get("ruleId", "unknown"),
                level=result.get("level", "warning"),
                message=result.get("message", {}).get("text", ""),
                file_path=phys.get("artifactLocation", {}).get("uri"),
                start_line=region.get("startLine"),
                end_line=region.get("endLine"),
                fingerprint=next(iter(result.get("partialFingerprints", {}).values()), None)
            ))

    return findings

# Filter and prioritize
def prioritize_findings(findings: list[Finding]) -> list[Finding]:
    """Sort findings by severity."""
    severity_order = {"error": 0, "warning": 1, "note": 2, "none": 3}
    return sorted(findings, key=lambda f: severity_order.get(f.level, 99))

Common Pitfalls and Solutions

1. Path Normalization Issues

Different tools report paths differently (absolute, relative, URI-encoded):

from urllib.parse import unquote
from pathlib import Path

def normalize_path(uri: str, base_path: str = "") -> str:
    """Normalize SARIF artifact URI to consistent path."""
    # Remove file:// prefix if present
    if uri.startswith("file://"):
        uri = uri[7:]

    # URL decode
    uri = unquote(uri)

    # Handle relative paths
    if not Path(uri).is_absolute() and base_path:
        uri = str(Path(base_path) / uri)

    # Normalize separators
    return str(Path(uri))

2. Fingerprint Mismatch Across Runs

Fingerprints may not match if:

File paths differ between environments
Tool versions changed fingerprinting algorithm
Code was reformatted (changing line numbers)

Solution: Use multiple fingerprint strategies:

def compute_stable_fingerprint(result: dict, file_content: str = None) -> str:
    """Compute environment-independent fingerprint."""
    import hashlib

    components = [
        result.get("ruleId", ""),
        result.get("message", {}).get("text", "")[:100],  # First 100 chars
    ]

    # Add code snippet if available
    if file_content and result.get("locations"):
        region = result["locations"][0].get("physicalLocation", {}).get("region", {})
        if region.get("startLine"):
            lines = file_content.split("\n")
            line_idx = region["startLine"] - 1
            if 0 <= line_idx < len(lines):
                # Normalize whitespace
                components.append(lines[line_idx].strip())

    return hashlib.sha256("".join(components).encode()).hexdigest()[:16]

3. Missing or Incomplete Data

SARIF allows many optional fields. Always use defensive access:

def safe_get_location(result: dict) -> tuple[str, int]:
    """Safely extract file and line from result."""
    try:
        loc = result.get("locations", [{}])[0]
        phys = loc.get("physicalLocation", {})
        file_path = phys.get("artifactLocation", {}).get("uri", "unknown")
        line = phys.get("region", {}).get("startLine", 0)
        return file_path, line
    except (IndexError, KeyError, TypeError):
        return "unknown", 0

4. Large File Performance

For very large SARIF files (100MB+):

import ijson  # pip install ijson

def stream_results(sarif_path: str):
    """Stream results without loading entire file."""
    with open(sarif_path, "rb") as f:
        # Stream through results arrays
        for result in ijson.items(f, "runs.item.results.item"):
            yield result

5. Schema Validation

Validate before processing to catch malformed files:

# Using ajv-cli
npm install -g ajv-cli
ajv validate -s sarif-schema-2.1.0.json -d results.sarif

# Using Python jsonschema
pip install jsonschema

from jsonschema import validate, ValidationError
import json

def validate_sarif(sarif_path: str, schema_path: str) -> bool:
    """Validate SARIF file against schema."""
    with open(sarif_path) as f:
        sarif = json.load(f)
    with open(schema_path) as f:
        schema = json.load(f)

    try:
        validate(sarif, schema)
        return True
    except ValidationError as e:
        print(f"Validation error: {e.message}")
        return False

CI/CD Integration Patterns

GitHub Actions

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

- name: Check for high severity
  run: |
    HIGH_COUNT=$(jq '[.runs[].results[] | select(.level == "error")] | length' results.sarif)
    if [ "$HIGH_COUNT" -gt 0 ]; then
      echo "Found $HIGH_COUNT high severity issues"
      exit 1
    fi

Fail on New Issues

from sarif import loader

def check_for_regressions(baseline: str, current: str) -> int:
    """Return count of new issues not in baseline."""
    baseline_data = loader.load_sarif_file(baseline)
    current_data = loader.load_sarif_file(current)

    baseline_fps = {get_fingerprint(r) for r in baseline_data.get_results()}
    new_issues = [r for r in current_data.get_results()
                  if get_fingerprint(r) not in baseline_fps]

    return len(new_issues)

Key Principles

Validate first: Check SARIF structure before processing
Handle optionals: Many fields are optional; use defensive access
Normalize paths: Tools report paths differently; normalize early
Fingerprint wisely: Combine multiple strategies for stable deduplication
Stream large files: Use ijson or similar for 100MB+ files
Aggregate thoughtfully: Preserve tool metadata when combining files

Skill Resources

For ready-to-use query templates, see {baseDir}/resources/jq-queries.md:

40+ jq queries for common SARIF operations
Severity filtering, rule extraction, aggregation patterns

For Python utilities, see {baseDir}/resources/sarif_helpers.py:

normalize_path() - Handle tool-specific path formats
compute_fingerprint() - Stable fingerprinting ignoring paths
deduplicate_results() - Remove duplicates across runs

Reference Links

/semgrep

Source: `~/.claude/skills/tob-static-analysis/skills/semgrep/SKILL.md`

name: semgrep description: Run Semgrep static analysis scan on a codebase using parallel subagents. Automatically detects and uses Semgrep Pro for cross-file analysis when available. Use when asked to scan code for vulnerabilities, run a security audit with Semgrep, find bugs, or perform static analysis. Spawns parallel workers for multi-language codebases and triage. allowed-tools:

Bash
Read
Glob
Grep
Write
Task
AskUserQuestion
TaskCreate
TaskList
TaskUpdate
WebFetch

Semgrep Security Scan

Run a complete Semgrep scan with automatic language detection, parallel execution via Task subagents, and parallel triage. Automatically uses Semgrep Pro for cross-file taint analysis when available.

Prerequisites

Required: Semgrep CLI

semgrep --version

If not installed, see Semgrep installation docs.

Optional: Semgrep Pro (for cross-file analysis and Pro languages)

# Check if Semgrep Pro engine is installed
semgrep --pro --validate --config p/default 2>/dev/null && echo "Pro available" || echo "OSS only"

# If logged in, install/update Pro Engine
semgrep install-semgrep-pro

Pro enables: cross-file taint tracking, inter-procedural analysis, and additional languages (Apex, C#, Elixir).

When to Use

Security audit of a codebase
Finding vulnerabilities before code review
Scanning for known bug patterns
First-pass static analysis

When NOT to Use

Binary analysis → Use binary analysis tools
Already have Semgrep CI configured → Use existing pipeline
Need cross-file analysis but no Pro license → Consider CodeQL as alternative
Creating custom Semgrep rules → Use semgrep-rule-creator skill
Porting existing rules to other languages → Use semgrep-rule-variant-creator skill

Orchestration Architecture

This skill uses parallel Task subagents for maximum efficiency:

┌─────────────────────────────────────────────────────────────────┐
│ MAIN AGENT                                                      │
│ 1. Detect languages + check Pro availability                    │
│ 2. Select rulesets based on detection (ref: rulesets.md)        │
│ 3. Present plan + rulesets, get approval [⛔ HARD GATE]         │
│ 4. Spawn parallel scan Tasks (with approved rulesets)           │
│ 5. Spawn parallel triage Tasks                                  │
│ 6. Collect and report results                                   │
└─────────────────────────────────────────────────────────────────┘
          │ Step 4                           │ Step 5
          ▼                                  ▼
┌─────────────────┐              ┌─────────────────┐
│ Scan Tasks      │              │ Triage Tasks    │
│ (parallel)      │              │ (parallel)      │
├─────────────────┤              ├─────────────────┤
│ Python scanner  │              │ Python triager  │
│ JS/TS scanner   │              │ JS/TS triager   │
│ Go scanner      │              │ Go triager      │
│ Docker scanner  │              │ Docker triager  │
└─────────────────┘              └─────────────────┘

Workflow Enforcement via Task System

This skill uses the Task system to enforce workflow compliance. On invocation, create these tasks:

TaskCreate: "Detect languages and Pro availability" (Step 1)
TaskCreate: "Select rulesets based on detection" (Step 2) - blockedBy: Step 1
TaskCreate: "Present plan with rulesets, get approval" (Step 3) - blockedBy: Step 2
TaskCreate: "Execute scans with approved rulesets" (Step 4) - blockedBy: Step 3
TaskCreate: "Triage findings" (Step 5) - blockedBy: Step 4
TaskCreate: "Report results" (Step 6) - blockedBy: Step 5

Mandatory Gates

Task	Gate Type	Cannot Proceed Until
Step 3: Get approval	HARD GATE	User explicitly approves rulesets + plan
Step 5: Triage	SOFT GATE	All scan JSON files exist

Step 3 is a HARD GATE: Mark as completed ONLY after user says "yes", "proceed", "approved", or equivalent.

Task Flow Example

1. Create all 6 tasks with dependencies
2. TaskUpdate Step 1 → in_progress, execute detection
3. TaskUpdate Step 1 → completed
4. TaskUpdate Step 2 → in_progress, select rulesets
5. TaskUpdate Step 2 → completed
6. TaskUpdate Step 3 → in_progress, present plan with rulesets
7. STOP: Wait for user response (may modify rulesets)
8. User approves → TaskUpdate Step 3 → completed
9. TaskUpdate Step 4 → in_progress (now unblocked)
... continue workflow

Workflow

Step 1: Detect Languages and Pro Availability (Main Agent)

# Check if Semgrep Pro is available (non-destructive check)
SEMGREP_PRO=false
if semgrep --pro --validate --config p/default 2>/dev/null; then
  SEMGREP_PRO=true
  echo "Semgrep Pro: AVAILABLE (cross-file analysis enabled)"
else
  echo "Semgrep Pro: NOT AVAILABLE (OSS mode, single-file analysis)"
fi

# Find languages by file extension
fd -t f -e py -e js -e ts -e jsx -e tsx -e go -e rb -e java -e php -e c -e cpp -e rs | \
  sed 's/.*\.//' | sort | uniq -c | sort -rn

# Check for frameworks/technologies
ls -la package.json pyproject.toml Gemfile go.mod Cargo.toml pom.xml 2>/dev/null
fd -t f "Dockerfile" "docker-compose" ".tf" "*.yaml" "*.yml" | head -20

Map findings to categories:

Detection	Category
`.py`, `pyproject.toml`	Python
`.js`, `.ts`, `package.json`	JavaScript/TypeScript
`.go`, `go.mod`	Go
`.rb`, `Gemfile`	Ruby
`.java`, `pom.xml`	Java
`.php`	PHP
`.c`, `.cpp`	C/C++
`.rs`, `Cargo.toml`	Rust
`Dockerfile`	Docker
`.tf`	Terraform
k8s manifests	Kubernetes

Step 2: Select Rulesets Based on Detection

Using the detected languages and frameworks from Step 1, select rulesets by following the Ruleset Selection Algorithm in rulesets.md.

The algorithm covers:

Security baseline (always included)
Language-specific rulesets
Framework rulesets (if detected)
Infrastructure rulesets
Required third-party rulesets (Trail of Bits, 0xdea, Decurity - NOT optional)
Registry verification

Output: Structured JSON passed to Step 3 for user review:

{
  "baseline": ["p/security-audit", "p/secrets"],
  "python": ["p/python", "p/django"],
  "javascript": ["p/javascript", "p/react", "p/nodejs"],
  "docker": ["p/dockerfile"],
  "third_party": ["https://github.com/trailofbits/semgrep-rules"]
}

Step 3: CRITICAL GATE - Present Plan and Get Approval

⛔ MANDATORY CHECKPOINT - DO NOT SKIP

This step requires explicit user approval before proceeding. User may modify rulesets before approving.

Present plan to user with explicit ruleset listing:

## Semgrep Scan Plan

**Target:** /path/to/codebase
**Output directory:** ./semgrep-results-001/
**Engine:** Semgrep Pro (cross-file analysis) | Semgrep OSS (single-file)

### Detected Languages/Technologies:
- Python (1,234 files) - Django framework detected
- JavaScript (567 files) - React detected
- Dockerfile (3 files)

### Rulesets to Run:

**Security Baseline (always included):**
- [x] `p/security-audit` - Comprehensive security rules
- [x] `p/secrets` - Hardcoded credentials, API keys

**Python (1,234 files):**
- [x] `p/python` - Python security patterns
- [x] `p/django` - Django-specific vulnerabilities

**JavaScript (567 files):**
- [x] `p/javascript` - JavaScript security patterns
- [x] `p/react` - React-specific issues
- [x] `p/nodejs` - Node.js server-side patterns

**Docker (3 files):**
- [x] `p/dockerfile` - Dockerfile best practices

**Third-party (auto-included for detected languages):**
- [x] Trail of Bits rules - https://github.com/trailofbits/semgrep-rules

**Available but not selected:**
- [ ] `p/owasp-top-ten` - OWASP Top 10 (overlaps with security-audit)

### Execution Strategy:
- Spawn 3 parallel scan Tasks (Python, JavaScript, Docker)
- Total rulesets: 9
- [If Pro] Cross-file taint tracking enabled

**Want to modify rulesets?** Tell me which to add or remove.
**Ready to scan?** Say "proceed" or "yes".

⛔ STOP: Await explicit user approval

After presenting the plan:

If user wants to modify rulesets:
- Add requested rulesets to the appropriate category
- Remove requested rulesets
- Re-present the updated plan
- Return to waiting for approval

Use AskUserQuestion if user hasn't responded:

"I've prepared the scan plan with 9 rulesets (including Trail of Bits). Proceed with scanning?"
Options: ["Yes, run scan", "Modify rulesets first"]

Valid approval responses:
- "yes", "proceed", "approved", "go ahead", "looks good", "run it"
Mark task completed only after approval with final rulesets confirmed
Do NOT treat as approval:
- User's original request ("scan this codebase")
- Silence / no response
- Questions about the plan

Pre-Scan Checklist

Before marking Step 3 complete, verify:

Target directory shown to user
Engine type (Pro/OSS) displayed
Languages detected and listed
All rulesets explicitly listed with checkboxes
User given opportunity to modify rulesets
User explicitly approved (quote their confirmation)
Final ruleset list captured for Step 4

Step 4: Spawn Parallel Scan Tasks

Create output directory with run number to avoid collisions, then spawn Tasks with approved rulesets from Step 3:

# Find next available run number
LAST=$(ls -d semgrep-results-[0-9][0-9][0-9] 2>/dev/null | sort | tail -1 | grep -o '[0-9]*$' || true)
NEXT_NUM=$(printf "%03d" $(( ${LAST:-0} + 1 )))
OUTPUT_DIR="semgrep-results-${NEXT_NUM}"
mkdir -p "$OUTPUT_DIR"
echo "Output directory: $OUTPUT_DIR"

Spawn N Tasks in a SINGLE message (one per language category) using subagent_type: Bash.

Use the scanner task prompt template from scanner-task-prompt.md.

Example - 3 Language Scan (with approved rulesets):

Spawn these 3 Tasks in a SINGLE message:

Task: Python Scanner
- Approved rulesets: p/python, p/django, p/security-audit, p/secrets, https://github.com/trailofbits/semgrep-rules
- Output: semgrep-results-001/python-*.json
Task: JavaScript Scanner
- Approved rulesets: p/javascript, p/react, p/nodejs, p/security-audit, p/secrets, https://github.com/trailofbits/semgrep-rules
- Output: semgrep-results-001/js-*.json
Task: Docker Scanner
- Approved rulesets: p/dockerfile
- Output: semgrep-results-001/docker-*.json

Step 5: Spawn Parallel Triage Tasks

After scan Tasks complete, spawn triage Tasks using subagent_type: general-purpose (triage requires reading code context, not just running commands).

Use the triage task prompt template from triage-task-prompt.md.

Step 6: Collect Results (Main Agent)

After all Tasks complete, generate merged SARIF and report:

Generate merged SARIF with only triaged true positives:

uv run {baseDir}/scripts/merge_triaged_sarif.py [OUTPUT_DIR]

This script:

Attempts to use SARIF Multitool for merging (if npx is available)
Falls back to pure Python merge if Multitool unavailable
Reads all *-triage.json files to extract true positive findings
Filters merged SARIF to include only triaged true positives
Writes output to [OUTPUT_DIR]/findings-triaged.sarif

Optional: Install SARIF Multitool for better merge quality:

npm install -g @microsoft/sarif-multitool

Report to user:

## Semgrep Scan Complete

**Scanned:** 1,804 files
**Rulesets used:** 9 (including Trail of Bits)
**Total raw findings:** 156
**After triage:** 32 true positives

### By Severity:
- ERROR: 5
- WARNING: 18
- INFO: 9

### By Category:
- SQL Injection: 3
- XSS: 7
- Hardcoded secrets: 2
- Insecure configuration: 12
- Code quality: 8

Results written to:
- semgrep-results-001/findings-triaged.sarif (SARIF, true positives only)
- semgrep-results-001/*-triage.json (triage details per language)
- semgrep-results-001/*.json (raw scan results)
- semgrep-results-001/*.sarif (raw SARIF per ruleset)

Common Mistakes

Mistake	Correct Approach
Running without `--metrics=off`	Always use `--metrics=off` to prevent telemetry
Running rulesets sequentially	Run in parallel with `&` and `wait`
Not scoping rulesets to languages	Use `--include="*.py"` for language-specific rules
Reporting raw findings without triage	Always triage to remove false positives
Single-threaded for multi-lang	Spawn parallel Tasks per language
Sequential Tasks	Spawn all Tasks in SINGLE message for parallelism
Using OSS when Pro is available	Check login status; use `--pro` for deeper analysis
Assuming Pro is unavailable	Always check with login detection before scanning

Limitations

OSS mode: Cannot track data flow across files (login with semgrep login and run semgrep install-semgrep-pro to enable)
Pro mode: Cross-file analysis uses -j 1 (single job) which is slower per ruleset, but parallel rulesets compensate
Triage requires reading code context - parallelized via Tasks
Some false positive patterns require human judgment

Rationalizations to Reject

Shortcut	Why It's Wrong
"User asked for scan, that's approval"	Original request ≠ plan approval; user must confirm specific parameters. Present plan, use AskUserQuestion, await explicit "yes"
"Step 3 task is blocking, just mark complete"	Lying about task status defeats enforcement. Only mark complete after real approval
"I already know what they want"	Assumptions cause scanning wrong directories/rulesets. Present plan with all parameters for verification
"Just use default rulesets"	User must see and approve exact rulesets before scan
"Add extra rulesets without asking"	Modifying approved list without consent breaks trust
"Skip showing ruleset list"	User can't make informed decision without seeing what will run
"Third-party rulesets are optional"	Trail of Bits, 0xdea, Decurity rules catch vulnerabilities not in official registry - they are REQUIRED when language matches
"Skip triage, report everything"	Floods user with noise; true issues get lost
"Run one ruleset at a time"	Wastes time; parallel execution is faster
"Use --config auto"	Sends metrics; less control over rulesets
"Triage later"	Findings without context are harder to evaluate
"One Task at a time"	Defeats parallelism; spawn all Tasks together
"Pro is too slow, skip --pro"	Cross-file analysis catches 250% more true positives; worth the time
"Don't bother checking for Pro"	Missing Pro = missing critical cross-file vulnerabilities
"OSS is good enough"	OSS misses inter-file taint flows; always prefer Pro when available

/address-sanitizer

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/address-sanitizer/SKILL.md`

name: address-sanitizer type: technique description: > AddressSanitizer detects memory errors during fuzzing. Use when fuzzing C/C++ code to find buffer overflows and use-after-free bugs.

AddressSanitizer (ASan)

AddressSanitizer (ASan) is a widely adopted memory error detection tool used extensively during software testing, particularly fuzzing. It helps detect memory corruption bugs that might otherwise go unnoticed, such as buffer overflows, use-after-free errors, and other memory safety violations.

Overview

ASan is a standard practice in fuzzing due to its effectiveness in identifying memory vulnerabilities. It instruments code at compile time to track memory allocations and accesses, detecting illegal operations at runtime.

Key Concepts

Concept	Description
Instrumentation	ASan adds runtime checks to memory operations during compilation
Shadow Memory	Maps 20TB of virtual memory to track allocation state
Performance Cost	Approximately 2-4x slowdown compared to non-instrumented code
Detection Scope	Finds buffer overflows, use-after-free, double-free, and memory leaks

When to Apply

Apply this technique when:

Fuzzing C/C++ code for memory safety vulnerabilities
Testing Rust code with unsafe blocks
Debugging crashes related to memory corruption
Running unit tests where memory errors are suspected

Skip this technique when:

Running production code (ASan can reduce security)
Platform is Windows or macOS (limited ASan support)
Performance overhead is unacceptable for your use case
Fuzzing pure safe languages without FFI (e.g., pure Go, pure Java)

Quick Reference

Task	Command/Pattern
Enable ASan (Clang/GCC)	`-fsanitize=address`
Enable verbosity	`ASAN_OPTIONS=verbosity=1`
Disable leak detection	`ASAN_OPTIONS=detect_leaks=0`
Force abort on error	`ASAN_OPTIONS=abort_on_error=1`
Multiple options	`ASAN_OPTIONS=verbosity=1:abort_on_error=1`

Step-by-Step

Step 1: Compile with ASan

Compile and link your code with the -fsanitize=address flag:

clang -fsanitize=address -g -o my_program my_program.c

The -g flag is recommended to get better stack traces when ASan detects errors.

Step 2: Configure ASan Options

Set the ASAN_OPTIONS environment variable to configure ASan behavior:

export ASAN_OPTIONS=verbosity=1:abort_on_error=1:detect_leaks=0

Step 3: Run Your Program

Execute the ASan-instrumented binary. When memory errors are detected, ASan will print detailed reports:

./my_program

Step 4: Adjust Fuzzer Memory Limits

ASan requires approximately 20TB of virtual memory. Disable fuzzer memory restrictions:

libFuzzer: -rss_limit_mb=0
AFL++: -m none

Common Patterns

Pattern: Basic ASan Integration

Use Case: Standard fuzzing setup with ASan

Before:

clang -o fuzz_target fuzz_target.c
./fuzz_target

After:

clang -fsanitize=address -g -o fuzz_target fuzz_target.c
ASAN_OPTIONS=verbosity=1:abort_on_error=1 ./fuzz_target

Pattern: ASan with Unit Tests

Use Case: Enable ASan for unit test suite

Before:

gcc -o test_suite test_suite.c -lcheck
./test_suite

After:

gcc -fsanitize=address -g -o test_suite test_suite.c -lcheck
ASAN_OPTIONS=detect_leaks=1 ./test_suite

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Use `-g` flag	Provides detailed stack traces for debugging
Set `verbosity=1`	Confirms ASan is enabled before program starts
Disable leaks during fuzzing	Leak detection doesn't cause immediate crashes, clutters output
Enable `abort_on_error=1`	Some fuzzers require `abort()` instead of `_exit()`

Understanding ASan Reports

When ASan detects a memory error, it prints a detailed report including:

Error type: Buffer overflow, use-after-free, etc.
Stack trace: Where the error occurred
Allocation/deallocation traces: Where memory was allocated/freed
Memory map: Shadow memory state around the error

Example ASan report:

==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000eff4 at pc 0x00000048e6a3
READ of size 4 at 0x60300000eff4 thread T0
    #0 0x48e6a2 in main /path/to/file.c:42

Combining Sanitizers

ASan can be combined with other sanitizers for comprehensive detection:

clang -fsanitize=address,undefined -g -o fuzz_target fuzz_target.c

Platform-Specific Considerations

Linux: Full ASan support with best performance macOS: Limited support, some features may not work Windows: Experimental support, not recommended for production fuzzing

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Using ASan in production	Can make applications less secure	Use ASan only for testing
Not disabling memory limits	Fuzzer may kill process due to 20TB virtual memory	Set `-rss_limit_mb=0` or `-m none`
Ignoring leak reports	Memory leaks indicate resource management issues	Review leak reports at end of fuzzing campaign

Tool-Specific Guidance

libFuzzer

Compile with both fuzzer and address sanitizer:

clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz

Run with unlimited RSS:

./fuzz -rss_limit_mb=0

Integration tips:

Always combine -fsanitize=fuzzer with -fsanitize=address
Use -g for detailed stack traces in crash reports
Consider ASAN_OPTIONS=abort_on_error=1 for better crash handling

See: libFuzzer: AddressSanitizer

AFL++

Use the AFL_USE_ASAN environment variable:

AFL_USE_ASAN=1 afl-clang-fast++ -g harness.cc -o fuzz

Run with unlimited memory:

afl-fuzz -m none -i input_dir -o output_dir ./fuzz

Integration tips:

AFL_USE_ASAN=1 automatically adds proper compilation flags
Use -m none to disable AFL++'s memory limit
Consider AFL_MAP_SIZE for programs with large coverage maps

See: AFL++: AddressSanitizer

cargo-fuzz (Rust)

Use the --sanitizer=address flag:

cargo fuzz run fuzz_target --sanitizer=address

Or configure in fuzz/Cargo.toml:

[profile.release]
opt-level = 3
debug = true

Integration tips:

ASan is useful for fuzzing unsafe Rust code or FFI boundaries
Safe Rust code may not benefit as much (compiler already prevents many errors)
Focus on unsafe blocks, raw pointers, and C library bindings

See: cargo-fuzz: AddressSanitizer

honggfuzz

Compile with ASan and link with honggfuzz:

honggfuzz -i input_dir -o output_dir -- ./fuzz_target_asan

Compile the target:

hfuzz-clang -fsanitize=address -g target.c -o fuzz_target_asan

Integration tips:

honggfuzz works well with ASan out of the box
Use feedback-driven mode for better coverage with sanitizers
Monitor memory usage, as ASan increases memory footprint

Troubleshooting

Issue	Cause	Solution
Fuzzer kills process immediately	Memory limit too low for ASan's 20TB virtual memory	Use `-rss_limit_mb=0` (libFuzzer) or `-m none` (AFL++)
"ASan runtime not initialized"	Wrong linking order or missing runtime	Ensure `-fsanitize=address` used in both compile and link
Leak reports clutter output	LeakSanitizer enabled by default	Set `ASAN_OPTIONS=detect_leaks=0`
Poor performance (>4x slowdown)	Debug mode or unoptimized build	Compile with `-O2` or `-O3` alongside `-fsanitize=address`
ASan not detecting obvious bugs	Binary not instrumented	Check with `ASAN_OPTIONS=verbosity=1` that ASan prints startup info
False positives	Interceptor conflicts	Check ASan FAQ for known issues with specific libraries

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Compile with `-fsanitize=fuzzer,address` for integrated fuzzing with memory error detection
aflpp	Use `AFL_USE_ASAN=1` environment variable during compilation
cargo-fuzz	Use `--sanitizer=address` flag to enable ASan for Rust fuzz targets
honggfuzz	Compile target with `-fsanitize=address` for ASan-instrumented fuzzing

Skill	Relationship
undefined-behavior-sanitizer	Often used together with ASan for comprehensive bug detection (undefined behavior + memory errors)
fuzz-harness-writing	Harnesses must be designed to handle ASan-detected crashes and avoid false positives
coverage-analysis	Coverage-guided fuzzing helps trigger code paths where ASan can detect memory errors

Resources

Key External Resources

AddressSanitizer on Google Sanitizers Wiki

The official ASan documentation covers:

Algorithm and implementation details
Complete list of detected error types
Performance characteristics and overhead
Platform-specific behavior
Known limitations and incompatibilities

SanitizerCommonFlags

Common configuration flags shared across all sanitizers:

verbosity: Control diagnostic output level
log_path: Redirect sanitizer output to files
symbolize: Enable/disable symbol resolution in reports
external_symbolizer_path: Use custom symbolizer

AddressSanitizerFlags

ASan-specific configuration options:

detect_leaks: Control memory leak detection
abort_on_error: Call abort() vs _exit() on error
detect_stack_use_after_return: Detect stack use-after-return bugs
check_initialization_order: Find initialization order bugs

AddressSanitizer FAQ

Common pitfalls and solutions:

Linking order issues
Conflicts with other tools
Platform-specific problems
Performance tuning tips

Clang AddressSanitizer Documentation

Clang-specific guidance:

Compilation flags and options
Interaction with other Clang features
Supported platforms and architectures

GCC Instrumentation Options

GCC-specific ASan documentation:

GCC-specific flags and behavior
Differences from Clang implementation
Platform support in GCC

AddressSanitizer: A Fast Address Sanity Checker (USENIX Paper)

Original research paper with technical details:

Shadow memory algorithm
Virtual memory requirements (historically 16TB, now ~20TB)
Performance benchmarks
Design decisions and tradeoffs

/aflpp

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/aflpp/SKILL.md`

name: aflpp type: fuzzer description: > AFL++ is a fork of AFL with better fuzzing performance and advanced features. Use for multi-core fuzzing of C/C++ projects.

AFL++

AFL++ is a fork of the original AFL fuzzer that offers better fuzzing performance and more advanced features while maintaining stability. A major benefit over libFuzzer is that AFL++ has stable support for running fuzzing campaigns on multiple cores, making it ideal for large-scale fuzzing efforts.

When to Use

Fuzzer	Best For	Complexity
AFL++	Multi-core fuzzing, diverse mutations, mature projects	Medium
libFuzzer	Quick setup, single-threaded, simple harnesses	Low
LibAFL	Custom fuzzers, research, advanced use cases	High

Choose AFL++ when:

You need multi-core fuzzing to maximize throughput
Your project can be compiled with Clang or GCC
You want diverse mutation strategies and mature tooling
libFuzzer has plateaued and you need more coverage
You're fuzzing production codebases that benefit from parallel execution

Quick Start

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Call your code with fuzzer-provided data
    check_buf((char*)data, size);
    return 0;
}

Compile and run:

# Setup AFL++ wrapper script first (see Installation)
./afl++ docker afl-clang-fast++ -DNO_MAIN=1 -O2 -fsanitize=fuzzer harness.cc main.cc -o fuzz
mkdir seeds && echo "aaaa" > seeds/minimal_seed
./afl++ docker afl-fuzz -i seeds -o out -- ./fuzz

Installation

AFL++ has many dependencies including LLVM, Python, and Rust. We recommend using a current Debian or Ubuntu distribution for fuzzing with AFL++.

Method	When to Use	Supported Compilers
Ubuntu/Debian repos	Recent Ubuntu, basic features only	Ubuntu 23.10: Clang 14 & GCC 13 Debian 12: Clang 14 & GCC 12
Docker (from Docker Hub)	Specific AFL++ version, Apple Silicon support	As of 4.35c: Clang 19 & GCC 11
Docker (from source)	Test unreleased features, apply patches	Configurable in Dockerfile
From source	Avoid Docker, need specific patches	Adjustable via `LLVM_CONFIG` env var

Ubuntu/Debian

Prior to installing afl++, check the clang version dependency of the packge with apt-cache show afl++, and install the matching lld version (e.g., lld-17).

apt install afl++ lld-17

Docker (from Docker Hub)

docker pull aflplusplus/aflplusplus:stable

Docker (from source)

git clone --depth 1 --branch stable https://github.com/AFLplusplus/AFLplusplus
cd AFLplusplus
docker build -t aflplusplus .

From source

Refer to the Dockerfile for Ubuntu version requirements and dependencies. Set LLVM_CONFIG to specify Clang version (e.g., llvm-config-18).

Wrapper Script Setup

Create a wrapper script to run AFL++ on host or Docker:

cat <<'EOF' > ./afl++
#!/bin/sh
AFL_VERSION="${AFL_VERSION:-"stable"}"
case "$1" in
   host)
        shift
        bash -c "$*"
        ;;
    docker)
        shift
        /usr/bin/env docker run -ti \
            --privileged \
            -v ./:/src \
            --rm \
            --name afl_fuzzing \
            "aflplusplus/aflplusplus:$AFL_VERSION" \
            bash -c "cd /src && bash -c \"$*\""
        ;;
    *)
        echo "Usage: $0 {host|docker}"
        exit 1
        ;;
esac
EOF
chmod +x ./afl++

Security Warning: The afl-system-config and afl-persistent-config scripts require root privileges and disable OS security features. Do not fuzz on production systems or your development environment. Use a dedicated VM instead.

System Configuration

Run after each reboot for up to 15% more executions per second:

./afl++ <host/docker> afl-system-config

For maximum performance, disable kernel security mitigations (requires grub bootloader, not supported in Docker):

./afl++ host afl-persistent-config
update-grub
reboot
./afl++ <host/docker> afl-system-config

Verify with cat /proc/cmdline - output should include mitigations=off.

Writing a Harness

Harness Structure

AFL++ supports libFuzzer-style harnesses:

#include <stdint.h>
#include <stddef.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 1. Validate input size if needed
    if (size < MIN_SIZE || size > MAX_SIZE) return 0;

    // 2. Call target function with fuzz data
    target_function(data, size);

    // 3. Return 0 (non-zero reserved for future use)
    return 0;
}

Harness Rules

Do	Don't
Reset global state between runs	Rely on state from previous runs
Handle edge cases gracefully	Exit on invalid input
Keep harness deterministic	Use random number generators
Free allocated memory	Create memory leaks
Validate input sizes	Process unbounded input

See Also: For detailed harness writing techniques, patterns for handling complex inputs, and advanced strategies, see the fuzz-harness-writing technique skill.

Compilation

AFL++ offers multiple compilation modes with different trade-offs.

Compilation Mode Decision Tree

Choose your compilation mode:

LTO mode (afl-clang-lto): Best performance and instrumentation. Try this first.
LLVM mode (afl-clang-fast): Fall back if LTO fails to compile.
GCC plugin (afl-gcc-fast): For projects requiring GCC.

Basic Compilation (LLVM mode)

./afl++ <host/docker> afl-clang-fast++ -DNO_MAIN=1 -O2 -fsanitize=fuzzer harness.cc main.cc -o fuzz

GCC Compilation

./afl++ <host/docker> afl-g++-fast -DNO_MAIN=1 -O2 -fsanitize=fuzzer harness.cc main.cc -o fuzz

Important: GCC version must match the version used to compile the AFL++ GCC plugin.

With Sanitizers

./afl++ <host/docker> AFL_USE_ASAN=1 afl-clang-fast++ -DNO_MAIN=1 -O2 -fsanitize=fuzzer harness.cc main.cc -o fuzz

See Also: For detailed sanitizer configuration, common issues, and advanced flags, see the address-sanitizer and undefined-behavior-sanitizer technique skills.

Build Flags

Note that -g is not necessary, it is added by default by the AFL++ compilers.

Flag	Purpose
`-DNO_MAIN=1`	Skip main function when using libFuzzer harness
`-O2`	Production optimization level (recommended for fuzzing)
`-fsanitize=fuzzer`	Enable libFuzzer compatibility mode and adds the fuzzer runtime when linking executable
`-fsanitize=fuzzer-no-link`	Instrument without linking fuzzer runtime (for static libraries and object files)

Corpus Management

Creating Initial Corpus

AFL++ requires at least one non-empty seed file:

mkdir seeds
echo "aaaa" > seeds/minimal_seed

For real projects, gather representative inputs:

Download example files for the format you're fuzzing
Extract test cases from the project's test suite
Use minimal valid inputs for your file format

Corpus Minimization

After a campaign, minimize the corpus to keep only unique coverage:

./afl++ <host/docker> afl-cmin -i out/default/queue -o minimized_corpus -- ./fuzz

See Also: For corpus creation strategies, dictionaries, and seed selection, see the fuzzing-corpus technique skill.

Running Campaigns

Basic Run

./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz

Setting Environment Variables

./afl++ <host/docker> AFL_FAST_CAL=1 afl-fuzz -i seeds -o out -- ./fuzz

Interpreting Output

The AFL++ UI shows real-time fuzzing statistics:

Output	Meaning
execs/sec	Execution speed - higher is better
cycles done	Number of queue passes completed
corpus count	Number of unique test cases in queue
saved crashes	Number of unique crashes found
stability	% of stable edges (should be near 100%)

Output Directory Structure

out/default/
├── cmdline          # How was the SUT invoked?
├── crashes/         # Inputs that crash the SUT
│   └── id:000000,sig:06,src:000002,time:286,execs:13105,op:havoc,rep:4
├── hangs/           # Inputs that hang the SUT
├── queue/           # Test cases reproducing final fuzzer state
│   ├── id:000000,time:0,execs:0,orig:minimal_seed
│   └── id:000001,src:000000,time:0,execs:8,op:havoc,rep:6,+cov
├── fuzzer_stats     # Campaign statistics
└── plot_data        # Data for plotting

Analyzing Results

View live campaign statistics:

./afl++ <host/docker> afl-whatsup out

Create coverage plots:

apt install gnuplot
./afl++ <host/docker> afl-plot out/default out_graph/

Re-executing Test Cases

./afl++ <host/docker> ./fuzz out/default/crashes/<test_case>

Fuzzer Options

Option	Purpose
`-G 4000`	Maximum test input length (default: 1048576 bytes)
`-t 1000`	Timeout in milliseconds for each test case (default: 1000ms)
`-m 1000`	Memory limit in megabytes (default: 0 = unlimited)
`-x ./dict.dict`	Use dictionary file to guide mutations

Multi-Core Fuzzing

AFL++ excels at multi-core fuzzing with two major advantages:

More executions per second (scales linearly with physical cores)
Asymmetrical fuzzing (e.g., one ASan job, rest without sanitizers)

Starting a Campaign

Start the primary fuzzer (in background):

./afl++ <host/docker> afl-fuzz -M primary -i seeds -o state -- ./fuzz 1>primary.log 2>primary.error &

Start secondary fuzzers (as many as you have cores):

./afl++ <host/docker> afl-fuzz -S secondary01 -i seeds -o state -- ./fuzz 1>secondary01.log 2>secondary01.error &
./afl++ <host/docker> afl-fuzz -S secondary02 -i seeds -o state -- ./fuzz 1>secondary02.log 2>secondary02.error &

Monitoring Multi-Core Campaigns

List all running jobs:

jobs

View live statistics (updates every second):

./afl++ <host/docker> watch -n1 --color afl-whatsup state/

Stopping All Fuzzers

kill $(jobs -p)

Coverage Analysis

AFL++ automatically tracks coverage through edge instrumentation. Coverage information is stored in fuzzer_stats and plot_data.

Measuring Coverage

Use afl-plot to visualize coverage over time:

./afl++ <host/docker> afl-plot out/default out_graph/

Improving Coverage

Use dictionaries for format-aware fuzzing
Run longer campaigns (cycles_wo_finds indicates plateau)
Try different mutation strategies with multi-core fuzzing
Analyze coverage gaps and add targeted seed inputs

See Also: For detailed coverage analysis techniques, identifying coverage gaps, and systematic coverage improvement, see the coverage-analysis technique skill.

CMPLOG

CMPLOG/RedQueen is the best path constraint solving mechanism available in any fuzzer. To enable it, the fuzz target needs to be instrumented for it. Before building the fuzzing target set the environment variable:

./afl++ <host/docker> AFL_LLVM_CMPLOG=1 make

No special action is needed for compiling and linking the harness.

To run a fuzzer instance with a CMPLOG instrumented fuzzing target, add -c0 to the command like arguments:

./afl++ <host/docker> afl-fuzz -c0 -S cmplog -i seeds -o state -- ./fuzz 1>secondary02.log 2>secondary02.error &

Sanitizer Integration

Sanitizers are essential for finding memory corruption bugs that don't cause immediate crashes.

AddressSanitizer (ASan)

./afl++ <host/docker> AFL_USE_ASAN=1 afl-clang-fast++ -DNO_MAIN=1 -O2 -fsanitize=fuzzer harness.cc main.cc -o fuzz

Note: Memory limit (-m) is not supported with ASan due to 20TB virtual memory reservation.

UndefinedBehaviorSanitizer (UBSan)

./afl++ <host/docker> AFL_USE_UBSAN=1 afl-clang-fast++ -DNO_MAIN=1 -O2 -fsanitize=fuzzer,undefined harness.cc main.cc -o fuzz

Common Sanitizer Issues

Issue	Solution
ASan slows fuzzing	Use only 1 ASan job in multi-core setup
Stack exhaustion	Increase stack with `ASAN_OPTIONS=stack_size=...`
GCC version mismatch	Ensure system GCC matches AFL++ plugin version

See Also: For comprehensive sanitizer configuration and troubleshooting, see the address-sanitizer technique skill.

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Use LLVMFuzzerTestOneInput harnesses where possible	If a fuzzing campaign has at least 85% stability then this is the most efficient fuzzing style. If not then try standard input or file input fuzzing
Use dictionaries	Helps fuzzer discover format-specific keywords and magic bytes
Set realistic timeouts	Prevents false positives from system load
Limit input size	Larger inputs don't necessarily explore more space
Monitor stability	Low stability indicates non-deterministic behavior

Standard Input Fuzzing

AFL++ can fuzz programs reading from stdin without a libFuzzer harness:

./afl++ <host/docker> afl-clang-fast++ -O2 main_stdin.c -o fuzz_stdin
./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz_stdin

This is slower than persistent mode but requires no harness code.

File Input Fuzzing

For programs that read files, use @@ placeholder:

./afl++ <host/docker> afl-clang-fast++ -O2 main_file.c -o fuzz_file
./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz_file @@

For better performance, use fmemopen to create file descriptors from memory.

Argument Fuzzing

Fuzz command-line arguments using argv-fuzz-inl.h:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#ifdef __AFL_COMPILER
#include "argv-fuzz-inl.h"
#endif

void check_buf(char *buf, size_t buf_len) {
    if(buf_len > 0 && buf[0] == 'a') {
        if(buf_len > 1 && buf[1] == 'b') {
            if(buf_len > 2 && buf[2] == 'c') {
                abort();
            }
        }
    }
}

int main(int argc, char *argv[]) {
#ifdef __AFL_COMPILER
    AFL_INIT_ARGV();
#endif

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <input_string>\n", argv[0]);
        return 1;
    }

    char *input_buf = argv[1];
    size_t len = strlen(input_buf);
    check_buf(input_buf, len);
    return 0;
}

Download the header:

curl -O https://raw.githubusercontent.com/AFLplusplus/AFLplusplus/stable/utils/argv_fuzzing/argv-fuzz-inl.h

Compile and run:

./afl++ <host/docker> afl-clang-fast++ -O2 main_arg.c -o fuzz_arg
./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz_arg

Performance Tuning

Setting	Impact
CPU core count	Linear scaling with physical cores
Persistent mode	10-20x faster than fork server
`-G` input size limit	Smaller = faster, but may miss bugs
ASan ratio	1 ASan job per 4-8 non-ASan jobs

Real-World Examples

Example: libpng

Fuzzing libpng demonstrates fuzzing a C project with static libraries:

# Get source
curl -L -O https://downloads.sourceforge.net/project/libpng/libpng16/1.6.37/libpng-1.6.37.tar.xz
tar xf libpng-1.6.37.tar.xz
cd libpng-1.6.37/

# Install dependencies
apt install zlib1g-dev

# Configure and build static library
export CC=afl-clang-fast CFLAGS=-fsanitize=fuzzer-no-link
export CXX=afl-clang-fast++ CXXFLAGS="$CFLAGS"
./configure --enable-shared=no
export AFL_LLVM_CMPLOG=1
export AFL_USE_ASAN=1
make

# Download harness
curl -O https://raw.githubusercontent.com/glennrp/libpng/f8e5fa92b0e37ab597616f554bee254157998227/contrib/oss-fuzz/libpng_read_fuzzer.cc

# Link fuzzer
export AFL_USE_ASAN=1
$CXX -fsanitize=fuzzer libpng_read_fuzzer.cc .libs/libpng16.a -lz -o fuzz

# Prepare seeds and dictionary
mkdir seeds/
curl -o seeds/input.png https://raw.githubusercontent.com/glennrp/libpng/acfd50ae0ba3198ad734e5d4dec2b05341e50924/contrib/pngsuite/iftp1n3p08.png
curl -O https://raw.githubusercontent.com/glennrp/libpng/2fff013a6935967960a5ae626fc21432807933dd/contrib/oss-fuzz/png.dict

# Start fuzzing
./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz

Example: CMake-based Project

project(BuggyProgram)
cmake_minimum_required(VERSION 3.0)

add_executable(buggy_program main.cc)

add_executable(fuzz main.cc harness.cc)
target_compile_definitions(fuzz PRIVATE NO_MAIN=1)
target_compile_options(fuzz PRIVATE -O2 -fsanitize=fuzzer-no-link)
target_link_libraries(fuzz -fsanitize=fuzzer)

Build and fuzz:

# Build non-instrumented binary
./afl++ <host/docker> cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .
./afl++ <host/docker> cmake --build . --target buggy_program

# Build fuzzer
./afl++ <host/docker> cmake -DCMAKE_C_COMPILER=afl-clang-fast -DCMAKE_CXX_COMPILER=afl-clang-fast++ .
./afl++ <host/docker> cmake --build . --target fuzz

# Fuzz
./afl++ <host/docker> afl-fuzz -i seeds -o out -- ./fuzz

Troubleshooting

Problem	Cause	Solution
Low exec/sec (<1k)	Not using persistent mode	Create a LLVMFuzzerTestOneInput style harness
Low stability (<85%)	Non-deterministic code	Fuzz a program via stdin or file inputs, or create such a harness
GCC plugin error	GCC version mismatch	Ensure system GCC matches AFL++ build and install gcc-$GCC_VERSION-plugin-dev
No crashes found	Need sanitizers	Recompile with `AFL_USE_ASAN=1`
Memory limit exceeded	ASan uses 20TB virtual	Remove `-m` flag when using ASan
Docker performance loss	Virtualization overhead	Use bare metal or VM for production fuzzing

Technique Skills

Skill	Use Case
fuzz-harness-writing	Detailed guidance on writing effective harnesses
address-sanitizer	Memory error detection during fuzzing
undefined-behavior-sanitizer	Detect undefined behavior bugs
fuzzing-corpus	Building and managing seed corpora
fuzzing-dictionaries	Creating dictionaries for format-aware fuzzing

Skill	When to Consider
libfuzzer	Quick prototyping, single-threaded fuzzing is sufficient
libafl	Need custom mutators or research-grade features

Resources

Key External Resources

AFL++ GitHub Repository Official repository with comprehensive documentation, examples, and issue tracker.

Fuzzing in Depth Advanced documentation by the AFL++ team covering instrumentation modes, optimization techniques, and advanced use cases.

AFL++ Under The Hood Technical deep-dive into AFL++ internals, mutation strategies, and coverage tracking mechanisms.

AFL++: Combining Incremental Steps of Fuzzing Research Research paper describing AFL++ architecture and performance improvements over original AFL.

Video Resources

Fuzzing cURL - Trail of Bits blog post on using AFL++ argument fuzzing for cURL
Sudo Vulnerability Walkthrough - LiveOverflow series on rediscovering CVE-2021-3156
Rediscovery of libpng bug - LiveOverflow video on finding CVE-2023-4863

/atheris

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/atheris/SKILL.md`

name: atheris type: fuzzer description: > Atheris is a coverage-guided Python fuzzer based on libFuzzer. Use for fuzzing pure Python code and Python C extensions.

Atheris

Atheris is a coverage-guided Python fuzzer built on libFuzzer. It enables fuzzing of both pure Python code and Python C extensions with integrated AddressSanitizer support for detecting memory corruption issues.

When to Use

Fuzzer	Best For	Complexity
Atheris	Python code and C extensions	Low-Medium
Hypothesis	Property-based testing	Low
python-afl	AFL-style fuzzing	Medium

Choose Atheris when:

Fuzzing pure Python code with coverage guidance
Testing Python C extensions for memory corruption
Integration with libFuzzer ecosystem is desired
AddressSanitizer support is needed

Quick Start

import sys
import atheris

@atheris.instrument_func
def test_one_input(data: bytes):
    if len(data) == 4:
        if data[0] == 0x46:  # "F"
            if data[1] == 0x55:  # "U"
                if data[2] == 0x5A:  # "Z"
                    if data[3] == 0x5A:  # "Z"
                        raise RuntimeError("You caught me")

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Run:

python fuzz.py

Installation

Atheris supports 32-bit and 64-bit Linux, and macOS. We recommend fuzzing on Linux because it's simpler to manage and often faster.

Prerequisites

Python 3.7 or later
Recent version of clang (preferably latest release)
For Docker users: Docker Desktop

Linux/macOS

uv pip install atheris

Docker Environment (Recommended)

For a fully operational Linux environment with all dependencies configured:

# https://hub.docker.com/_/python
ARG PYTHON_VERSION=3.11

FROM python:$PYTHON_VERSION-slim-bookworm

RUN python --version

RUN apt update && apt install -y \
    ca-certificates \
    wget \
    && rm -rf /var/lib/apt/lists/*

# LLVM builds version 15-19 for Debian 12 (Bookworm)
# https://apt.llvm.org/bookworm/dists/
ARG LLVM_VERSION=19

RUN echo "deb http://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-$LLVM_VERSION main" > /etc/apt/sources.list.d/llvm.list
RUN echo "deb-src http://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-$LLVM_VERSION main" >> /etc/apt/sources.list.d/llvm.list
RUN wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key > /etc/apt/trusted.gpg.d/apt.llvm.org.asc

RUN apt update && apt install -y \
    build-essential \
    clang-$LLVM_VERSION \
    && rm -rf /var/lib/apt/lists/*

ENV APP_DIR "/app"
RUN mkdir $APP_DIR
WORKDIR $APP_DIR

ENV VIRTUAL_ENV "/opt/venv"
RUN python -m venv $VIRTUAL_ENV
ENV PATH "$VIRTUAL_ENV/bin:$PATH"

# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#step-1-compiling-your-extension
ENV CC="clang-$LLVM_VERSION"
ENV CFLAGS "-fsanitize=address,fuzzer-no-link"
ENV CXX="clang++-$LLVM_VERSION"
ENV CXXFLAGS "-fsanitize=address,fuzzer-no-link"
ENV LDSHARED="clang-$LLVM_VERSION -shared"
ENV LDSHAREDXX="clang++-$LLVM_VERSION -shared"
ENV ASAN_SYMBOLIZER_PATH="/usr/bin/llvm-symbolizer-$LLVM_VERSION"

# Allow Atheris to find fuzzer sanitizer shared libs
# https://github.com/google/atheris#building-from-source
RUN LIBFUZZER_LIB=$($CC -print-file-name=libclang_rt.fuzzer_no_main-$(uname -m).a) \
    python -m pip install --no-binary atheris atheris

# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#option-a-sanitizerlibfuzzer-preloads
ENV LD_PRELOAD "$VIRTUAL_ENV/lib/python3.11/site-packages/asan_with_fuzzer.so"

# 1. Skip memory allocation failures for now, they are common, and low impact (DoS)
# 2. https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#leak-detection
ENV ASAN_OPTIONS "allocator_may_return_null=1,detect_leaks=0"

CMD ["/bin/bash"]

Build and run:

docker build -t atheris .
docker run -it atheris

Verification

python -c "import atheris; print(atheris.__version__)"

Writing a Harness

Harness Structure for Pure Python

import sys
import atheris

@atheris.instrument_func
def test_one_input(data: bytes):
    """
    Fuzzing entry point. Called with random byte sequences.

    Args:
        data: Random bytes generated by the fuzzer
    """
    # Add input validation if needed
    if len(data) < 1:
        return

    # Call your target function
    try:
        your_target_function(data)
    except ValueError:
        # Expected exceptions should be caught
        pass
    # Let unexpected exceptions crash (that's what we're looking for!)

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Harness Rules

Do	Don't
Use `@atheris.instrument_func` for coverage	Forget to instrument target code
Catch expected exceptions	Catch all exceptions indiscriminately
Use `atheris.instrument_imports()` for libraries	Import modules after `atheris.Setup()`
Keep harness deterministic	Use randomness or time-based behavior

See Also: For detailed harness writing techniques, patterns for handling complex inputs, and advanced strategies, see the fuzz-harness-writing technique skill.

Fuzzing Pure Python Code

For fuzzing broader parts of an application or library, use instrumentation functions:

import atheris
with atheris.instrument_imports():
    import your_module
    from another_module import target_function

def test_one_input(data: bytes):
    target_function(data)

atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()

Instrumentation Options:

atheris.instrument_func - Decorator for single function instrumentation
atheris.instrument_imports() - Context manager for instrumenting all imported modules
atheris.instrument_all() - Instrument all Python code system-wide

Fuzzing Python C Extensions

Python C extensions require compilation with specific flags for instrumentation and sanitizer support.

Environment Configuration

If using the provided Dockerfile, these are already configured. For local setup:

export CC="clang"
export CFLAGS="-fsanitize=address,fuzzer-no-link"
export CXX="clang++"
export CXXFLAGS="-fsanitize=address,fuzzer-no-link"
export LDSHARED="clang -shared"

Example: Fuzzing cbor2

Install the extension from source:

CBOR2_BUILD_C_EXTENSION=1 python -m pip install --no-binary cbor2 cbor2==5.6.4

The --no-binary flag ensures the C extension is compiled locally with instrumentation.

Create cbor2-fuzz.py:

import sys
import atheris

# _cbor2 ensures the C library is imported
from _cbor2 import loads

def test_one_input(data: bytes):
    try:
        loads(data)
    except Exception:
        # We're searching for memory corruption, not Python exceptions
        pass

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Run:

python cbor2-fuzz.py

Important: When running locally (not in Docker), you must set LD_PRELOAD manually.

Corpus Management

Creating Initial Corpus

mkdir corpus
# Add seed inputs
echo "test data" > corpus/seed1
echo '{"key": "value"}' > corpus/seed2

Run with corpus:

python fuzz.py corpus/

Corpus Minimization

Atheris inherits corpus minimization from libFuzzer:

python fuzz.py -merge=1 new_corpus/ old_corpus/

See Also: For corpus creation strategies, dictionaries, and seed selection, see the fuzzing-corpus technique skill.

Running Campaigns

Basic Run

python fuzz.py

With Corpus Directory

python fuzz.py corpus/

Common Options

# Run for 10 minutes
python fuzz.py -max_total_time=600

# Limit input size
python fuzz.py -max_len=1024

# Run with multiple workers
python fuzz.py -workers=4 -jobs=4

Interpreting Output

Output	Meaning
`NEW cov: X`	Found new coverage, corpus expanded
`pulse cov: X`	Periodic status update
`exec/s: X`	Executions per second (throughput)
`corp: X/Yb`	Corpus size: X inputs, Y bytes total
`ERROR: libFuzzer`	Crash detected

Sanitizer Integration

AddressSanitizer (ASan)

AddressSanitizer is automatically integrated when using the provided Docker environment or when compiling with appropriate flags.

For local setup:

export CFLAGS="-fsanitize=address,fuzzer-no-link"
export CXXFLAGS="-fsanitize=address,fuzzer-no-link"

Configure ASan behavior:

export ASAN_OPTIONS="allocator_may_return_null=1,detect_leaks=0"

LD_PRELOAD Configuration

For native extension fuzzing:

export LD_PRELOAD="$(python -c 'import atheris; import os; print(os.path.join(os.path.dirname(atheris.__file__), "asan_with_fuzzer.so"))')"

See Also: For detailed sanitizer configuration, common issues, and advanced flags, see the address-sanitizer and undefined-behavior-sanitizer technique skills.

Common Sanitizer Issues

Issue	Solution
`LD_PRELOAD` not set	Export `LD_PRELOAD` to point to `asan_with_fuzzer.so`
Memory allocation failures	Set `ASAN_OPTIONS=allocator_may_return_null=1`
Leak detection noise	Set `ASAN_OPTIONS=detect_leaks=0`
Missing symbolizer	Set `ASAN_SYMBOLIZER_PATH` to `llvm-symbolizer`

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Use `atheris.instrument_imports()` early	Ensures all imports are instrumented for coverage
Start with small `max_len`	Faster initial fuzzing, gradually increase
Use dictionaries for structured formats	Helps fuzzer understand format tokens
Run multiple parallel instances	Better coverage exploration

Custom Instrumentation

Fine-tune what gets instrumented:

import atheris

# Instrument only specific modules
with atheris.instrument_imports():
    import target_module
# Don't instrument test harness code

def test_one_input(data: bytes):
    target_module.parse(data)

Performance Tuning

Setting	Impact
`-max_len=N`	Smaller values = faster execution
`-workers=N -jobs=N`	Parallel fuzzing for faster coverage
`ASAN_OPTIONS=fast_unwind_on_malloc=0`	Better stack traces, slower execution

UndefinedBehaviorSanitizer (UBSan)

Add UBSan to catch additional bugs:

export CFLAGS="-fsanitize=address,undefined,fuzzer-no-link"
export CXXFLAGS="-fsanitize=address,undefined,fuzzer-no-link"

Note: Modify flags in Dockerfile if using containerized setup.

Real-World Examples

Example: Pure Python Parser

import sys
import atheris
import json

@atheris.instrument_func
def test_one_input(data: bytes):
    try:
        # Fuzz Python's JSON parser
        json.loads(data.decode('utf-8', errors='ignore'))
    except (ValueError, UnicodeDecodeError):
        pass

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Example: HTTP Request Parsing

import sys
import atheris

with atheris.instrument_imports():
    from urllib3 import HTTPResponse
    from io import BytesIO

def test_one_input(data: bytes):
    try:
        # Fuzz HTTP response parsing
        fake_response = HTTPResponse(
            body=BytesIO(data),
            headers={},
            preload_content=False
        )
        fake_response.read()
    except Exception:
        pass

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Troubleshooting

Problem	Cause	Solution
No coverage increase	Poor seed corpus or target not instrumented	Add better seeds, verify `instrument_imports()`
Slow execution	ASan overhead or large inputs	Reduce `max_len`, use `ASAN_OPTIONS=fast_unwind_on_malloc=1`
Import errors	Modules imported before instrumentation	Move imports inside `instrument_imports()` context
Segfault without ASan output	Missing `LD_PRELOAD`	Set `LD_PRELOAD` to `asan_with_fuzzer.so` path
Build failures	Wrong compiler or missing flags	Verify `CC`, `CFLAGS`, and clang version

Technique Skills

Skill	Use Case
fuzz-harness-writing	Detailed guidance on writing effective harnesses
address-sanitizer	Memory error detection during fuzzing
undefined-behavior-sanitizer	Catching undefined behavior in C extensions
coverage-analysis	Measuring and improving code coverage
fuzzing-corpus	Building and managing seed corpora

Skill	When to Consider
hypothesis	Property-based testing with type-aware generation
python-afl	AFL-style fuzzing for Python when Atheris isn't available

Resources

Key External Resources

Atheris GitHub Repository Official repository with installation instructions, examples, and documentation for fuzzing both pure Python and native extensions.

Native Extension Fuzzing Guide Comprehensive guide covering compilation flags, LD_PRELOAD setup, sanitizer configuration, and troubleshooting for Python C extensions.

Continuously Fuzzing Python C Extensions Trail of Bits blog post covering CI/CD integration, ClusterFuzzLite setup, and real-world examples of fuzzing Python C extensions in continuous integration pipelines.

ClusterFuzzLite Python Integration Guide for integrating Atheris fuzzing into CI/CD pipelines using ClusterFuzzLite for automated continuous fuzzing.

Video Resources

Videos and tutorials are available in the main Atheris documentation and libFuzzer resources.

/cargo-fuzz

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/cargo-fuzz/SKILL.md`

name: cargo-fuzz type: fuzzer description: > cargo-fuzz is the de facto fuzzing tool for Rust projects using Cargo. Use for fuzzing Rust code with libFuzzer backend.

cargo-fuzz

cargo-fuzz is the de facto choice for fuzzing Rust projects when using Cargo. It uses libFuzzer as the backend and provides a convenient Cargo subcommand that automatically enables relevant compilation flags for your Rust project, including support for sanitizers like AddressSanitizer.

When to Use

cargo-fuzz is currently the primary and most mature fuzzing solution for Rust projects using Cargo.

Fuzzer	Best For	Complexity
cargo-fuzz	Cargo-based Rust projects, quick setup	Low
AFL++	Multi-core fuzzing, non-Cargo projects	Medium
LibAFL	Custom fuzzers, research, advanced use cases	High

Choose cargo-fuzz when:

Your project uses Cargo (required)
You want simple, quick setup with minimal configuration
You need integrated sanitizer support
You're fuzzing Rust code with or without unsafe blocks

Quick Start

#![no_main]

use libfuzzer_sys::fuzz_target;

fn harness(data: &[u8]) {
    your_project::check_buf(data);
}

fuzz_target!(|data: &[u8]| {
    harness(data);
});

Initialize and run:

cargo fuzz init
# Edit fuzz/fuzz_targets/fuzz_target_1.rs with your harness
cargo +nightly fuzz run fuzz_target_1

Installation

cargo-fuzz requires the nightly Rust toolchain because it uses features only available in nightly.

Prerequisites

Rust and Cargo installed via rustup
Nightly toolchain

Linux/macOS

# Install nightly toolchain
rustup install nightly

# Install cargo-fuzz
cargo install cargo-fuzz

Verification

cargo +nightly --version
cargo fuzz --version

Writing a Harness

Project Structure

cargo-fuzz works best when your code is structured as a library crate. If you have a binary project, split your main.rs into:

src/main.rs  # Entry point (main function)
src/lib.rs   # Code to fuzz (public functions)
Cargo.toml

Initialize fuzzing:

cargo fuzz init

This creates:

fuzz/
├── Cargo.toml
└── fuzz_targets/
    └── fuzz_target_1.rs

Harness Structure

#![no_main]

use libfuzzer_sys::fuzz_target;

fn harness(data: &[u8]) {
    // 1. Validate input size if needed
    if data.is_empty() {
        return;
    }

    // 2. Call target function with fuzz data
    your_project::target_function(data);
}

fuzz_target!(|data: &[u8]| {
    harness(data);
});

Harness Rules

Do	Don't
Structure code as library crate	Keep everything in main.rs
Use `fuzz_target!` macro	Write custom main function
Handle `Result::Err` gracefully	Panic on expected errors
Keep harness deterministic	Use random number generators

See Also: For detailed harness writing techniques and structure-aware fuzzing with the arbitrary crate, see the fuzz-harness-writing technique skill.

Structure-Aware Fuzzing

cargo-fuzz integrates with the arbitrary crate for structure-aware fuzzing:

// In your library crate
use arbitrary::Arbitrary;

#[derive(Debug, Arbitrary)]
pub struct Name {
    data: String
}

// In your fuzz target
#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: your_project::Name| {
    data.check_buf();
});

Add to your library's Cargo.toml:

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

Running Campaigns

Basic Run

cargo +nightly fuzz run fuzz_target_1

Without Sanitizers (Safe Rust)

If your project doesn't use unsafe Rust, disable sanitizers for 2x performance boost:

cargo +nightly fuzz run --sanitizer none fuzz_target_1

Check if your project uses unsafe code:

cargo install cargo-geiger
cargo geiger

Re-executing Test Cases

# Run a specific test case (e.g., a crash)
cargo +nightly fuzz run fuzz_target_1 fuzz/artifacts/fuzz_target_1/crash-<hash>

# Run all corpus entries without fuzzing
cargo +nightly fuzz run fuzz_target_1 fuzz/corpus/fuzz_target_1 -- -runs=0

Using Dictionaries

cargo +nightly fuzz run fuzz_target_1 -- -dict=./dict.dict

Interpreting Output

Output	Meaning
`NEW`	New coverage-increasing input discovered
`pulse`	Periodic status update
`INITED`	Fuzzer initialized successfully
Crash with stack trace	Bug found, saved to `fuzz/artifacts/`

Corpus location: fuzz/corpus/fuzz_target_1/ Crashes location: fuzz/artifacts/fuzz_target_1/

Sanitizer Integration

AddressSanitizer (ASan)

ASan is enabled by default and detects memory errors:

cargo +nightly fuzz run fuzz_target_1

Disabling Sanitizers

For pure safe Rust (no unsafe blocks in your code or dependencies):

cargo +nightly fuzz run --sanitizer none fuzz_target_1

Performance impact: ASan adds ~2x overhead. Disable for safe Rust to improve fuzzing speed.

Checking for Unsafe Code

cargo install cargo-geiger
cargo geiger

See Also: For detailed sanitizer configuration, flags, and troubleshooting, see the address-sanitizer technique skill.

Coverage Analysis

cargo-fuzz integrates with Rust's coverage tools to analyze fuzzing effectiveness.

Prerequisites

rustup toolchain install nightly --component llvm-tools-preview
cargo install cargo-binutils
cargo install rustfilt

Generating Coverage Reports

# Generate coverage data from corpus
cargo +nightly fuzz coverage fuzz_target_1

Create coverage generation script:

cat <<'EOF' > ./generate_html
#!/bin/sh
if [ $# -lt 1 ]; then
    echo "Error: Name of fuzz target is required."
    echo "Usage: $0 fuzz_target [sources...]"
    exit 1
fi
FUZZ_TARGET="$1"
shift
SRC_FILTER="$@"
TARGET=$(rustc -vV | sed -n 's|host: ||p')
cargo +nightly cov -- show -Xdemangler=rustfilt \
  "target/$TARGET/coverage/$TARGET/release/$FUZZ_TARGET" \
  -instr-profile="fuzz/coverage/$FUZZ_TARGET/coverage.profdata"  \
  -show-line-counts-or-regions -show-instantiations  \
  -format=html -o fuzz_html/ $SRC_FILTER
EOF
chmod +x ./generate_html

Generate HTML report:

./generate_html fuzz_target_1 src/lib.rs

HTML report saved to: fuzz_html/

See Also: For detailed coverage analysis techniques and systematic coverage improvement, see the coverage-analysis technique skill.

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Start with a seed corpus	Dramatically speeds up initial coverage discovery
Use `--sanitizer none` for safe Rust	2x performance improvement
Check coverage regularly	Identifies gaps in harness or seed corpus
Use dictionaries for parsers	Helps overcome magic value checks
Structure code as library	Required for cargo-fuzz integration

libFuzzer Options

Pass options to libFuzzer after --:

# See all options
cargo +nightly fuzz run fuzz_target_1 -- -help=1

# Set timeout per run
cargo +nightly fuzz run fuzz_target_1 -- -timeout=10

# Use dictionary
cargo +nightly fuzz run fuzz_target_1 -- -dict=dict.dict

# Limit maximum input size
cargo +nightly fuzz run fuzz_target_1 -- -max_len=1024

Multi-Core Fuzzing

# Experimental forking support (not recommended)
cargo +nightly fuzz run --jobs 1 fuzz_target_1

Note: The multi-core fuzzing feature is experimental and not recommended. For parallel fuzzing, consider running multiple instances manually or using AFL++.

Real-World Examples

Example: ogg Crate

The ogg crate parses Ogg media container files. Parsers are excellent fuzzing targets because they handle untrusted data.

# Clone and initialize
git clone https://github.com/RustAudio/ogg.git
cd ogg/
cargo fuzz init

Harness at fuzz/fuzz_targets/fuzz_target_1.rs:

#![no_main]

use ogg::{PacketReader, PacketWriter};
use ogg::writing::PacketWriteEndInfo;
use std::io::Cursor;
use libfuzzer_sys::fuzz_target;

fn harness(data: &[u8]) {
    let mut pck_rdr = PacketReader::new(Cursor::new(data.to_vec()));
    pck_rdr.delete_unread_packets();

    let output = Vec::new();
    let mut pck_wtr = PacketWriter::new(Cursor::new(output));

    if let Ok(_) = pck_rdr.read_packet() {
        if let Ok(r) = pck_rdr.read_packet() {
            match r {
                Some(pck) => {
                    let inf = if pck.last_in_stream() {
                        PacketWriteEndInfo::EndStream
                    } else if pck.last_in_page() {
                        PacketWriteEndInfo::EndPage
                    } else {
                        PacketWriteEndInfo::NormalPacket
                    };
                    let stream_serial = pck.stream_serial();
                    let absgp_page = pck.absgp_page();
                    let _ = pck_wtr.write_packet(
                        pck.data, stream_serial, inf, absgp_page
                    );
                }
                None => return,
            }
        }
    }
}

fuzz_target!(|data: &[u8]| {
    harness(data);
});

Seed the corpus:

mkdir fuzz/corpus/fuzz_target_1/
curl -o fuzz/corpus/fuzz_target_1/320x240.ogg \
  https://commons.wikimedia.org/wiki/File:320x240.ogg

Run:

cargo +nightly fuzz run fuzz_target_1

Analyze coverage:

cargo +nightly fuzz coverage fuzz_target_1
./generate_html fuzz_target_1 src/lib.rs

Troubleshooting

Problem	Cause	Solution
"requires nightly" error	Using stable toolchain	Use `cargo +nightly fuzz`
Slow fuzzing performance	ASan enabled for safe Rust	Add `--sanitizer none` flag
"cannot find binary"	No library crate	Move code from `main.rs` to `lib.rs`
Sanitizer compilation issues	Wrong nightly version	Try different nightly: `rustup install nightly-2024-01-01`
Low coverage	Missing seed corpus	Add sample inputs to `fuzz/corpus/fuzz_target_1/`
Magic value not found	No dictionary	Create dictionary file with magic values

Technique Skills

Skill	Use Case
fuzz-harness-writing	Structure-aware fuzzing with `arbitrary` crate
address-sanitizer	Understanding ASan output and configuration
coverage-analysis	Measuring and improving fuzzing effectiveness
fuzzing-corpus	Building and managing seed corpora
fuzzing-dictionaries	Creating dictionaries for format-aware fuzzing

Skill	When to Consider
libfuzzer	Fuzzing C/C++ code with similar workflow
aflpp	Multi-core fuzzing or non-Cargo Rust projects
libafl	Advanced fuzzing research or custom fuzzer development

Resources

Rust Fuzz Book - cargo-fuzz Official documentation for cargo-fuzz covering installation, usage, and advanced features.

arbitrary crate documentation Guide to structure-aware fuzzing with automatic derivation for Rust types.

cargo-fuzz GitHub Repository Source code, issue tracker, and examples for cargo-fuzz.

/constant-time-testing

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/constant-time-testing/SKILL.md`

name: constant-time-testing type: domain description: > Constant-time testing detects timing side channels in cryptographic code. Use when auditing crypto implementations for timing vulnerabilities.

Constant-Time Testing

Timing attacks exploit variations in execution time to extract secret information from cryptographic implementations. Unlike cryptanalysis that targets theoretical weaknesses, timing attacks leverage implementation flaws - and they can affect any cryptographic code.

Background

Timing attacks were introduced by Kocher in 1996. Since then, researchers have demonstrated practical attacks on RSA (Schindler), OpenSSL (Brumley and Boneh), AES implementations, and even post-quantum algorithms like Kyber.

Key Concepts

Concept	Description
Constant-time	Code path and memory accesses independent of secret data
Timing leakage	Observable execution time differences correlated with secrets
Side channel	Information extracted from implementation rather than algorithm
Microarchitecture	CPU-level timing differences (cache, division, shifts)

Why This Matters

Timing vulnerabilities can:

Expose private keys - Extract secret exponents in RSA/ECDH
Enable remote attacks - Network-observable timing differences
Bypass cryptographic security - Undermine theoretical guarantees
Persist silently - Often undetected without specialized analysis

Two prerequisites enable exploitation:

Access to oracle - Sufficient queries to the vulnerable implementation
Timing dependency - Correlation between execution time and secret data

Common Constant-Time Violation Patterns

Four patterns account for most timing vulnerabilities:

// 1. Conditional jumps - most severe timing differences
if(secret == 1) { ... }
while(secret > 0) { ... }

// 2. Array access - cache-timing attacks
lookup_table[secret];

// 3. Integer division (processor dependent)
data = secret / m;

// 4. Shift operation (processor dependent)
data = a << secret;

Conditional jumps cause different code paths, leading to vast timing differences.

Array access dependent on secrets enables cache-timing attacks, as shown in AES cache-timing research.

Integer division and shift operations leak secrets on certain CPU architectures and compiler configurations.

When patterns cannot be avoided, employ masking techniques to remove correlation between timing and secrets.

Example: Modular Exponentiation Timing Attacks

Modular exponentiation (used in RSA and Diffie-Hellman) is susceptible to timing attacks. RSA decryption computes:

$$ct^{d} \mod{N}$$

where $d$ is the secret exponent. The exponentiation by squaring optimization reduces multiplications to $\log{d}$:

$$ \begin{align*} & \textbf{Input: } \text{base }y,\text{exponent } d={d_n,\cdots,d_0}_2,\text{modulus } N \ & r = 1 \ & \textbf{for } i=|n| \text{ downto } 0: \ & \quad\textbf{if } d_i == 1: \ & \quad\quad r = r * y \mod{N} \ & \quad y = y * y \mod{N} \ & \textbf{return }r \end{align*} $$

The code branches on exponent bit $d_i$, violating constant-time principles. When $d_i = 1$, an additional multiplication occurs, increasing execution time and leaking bit information.

Montgomery multiplication (commonly used for modular arithmetic) also leaks timing: when intermediate values exceed modulus $N$, an additional reduction step is required. An attacker constructs inputs $y$ and $y'$ such that:

$$ \begin{align*} y^2 < y^3 < N \ y'^2 < N \leq y'^3 \end{align*} $$

For $y$, both multiplications take time $t_1+t_1$. For $y'$, the second multiplication requires reduction, taking time $t_1+t_2$. This timing difference reveals whether $d_i$ is 0 or 1.

When to Use

Apply constant-time analysis when:

Auditing cryptographic implementations (primitives, protocols)
Code handles secret keys, passwords, or sensitive cryptographic material
Implementing crypto algorithms from scratch
Reviewing PRs that touch crypto code
Investigating potential timing vulnerabilities

Consider alternatives when:

Code does not process secret data
Public algorithms with no secret inputs
Non-cryptographic timing requirements (performance optimization)

Quick Reference

Scenario	Recommended Approach	Skill
Prove absence of leaks	Formal verification	SideTrail, ct-verif, FaCT
Detect statistical timing differences	Statistical testing	dudect
Track secret data flow at runtime	Dynamic analysis	timecop
Find cache-timing vulnerabilities	Symbolic execution	Binsec, pitchfork

Constant-Time Tooling Categories

The cryptographic community has developed four categories of timing analysis tools:

Category	Approach	Pros	Cons
Formal	Mathematical proof on model	Guarantees absence of leaks	Complexity, modeling assumptions
Symbolic	Symbolic execution paths	Concrete counterexamples	Time-intensive path exploration
Dynamic	Runtime tracing with marked secrets	Granular, flexible	Limited coverage to executed paths
Statistical	Measure real execution timing	Practical, simple setup	No root cause, noise sensitivity

1. Formal Tools

Formal verification mathematically proves timing properties on an abstraction (model) of code. Tools create a model from source/binary and verify it satisfies specified properties (e.g., variables annotated as secret).

Popular tools:

Strengths: Proof of absence, language-agnostic (LLVM bytecode) Weaknesses: Requires expertise, modeling assumptions may miss real-world issues

2. Symbolic Tools

Symbolic execution analyzes how paths and memory accesses depend on symbolic variables (secrets). Provides concrete counterexamples. Focus on cache-timing attacks.

Popular tools:

Strengths: Concrete counterexamples aid debugging Weaknesses: Path explosion leads to long execution times

3. Dynamic Tools

Dynamic analysis marks sensitive memory regions and traces execution to detect timing-dependent operations.

Popular tools:

Memsan: Tutorial
Timecop (see below)

Strengths: Granular control, targeted analysis Weaknesses: Coverage limited to executed paths

Detailed Guidance: See the timecop skill for setup and usage.

4. Statistical Tools

Execute code with various inputs, measure elapsed time, and detect inconsistencies. Tests actual implementation including compiler optimizations and architecture.

Popular tools:

dudect (see below)
tlsfuzzer

Strengths: Simple setup, practical real-world results Weaknesses: No root cause info, noise obscures weak signals

Detailed Guidance: See the dudect skill for setup and usage.

Testing Workflow

Phase 1: Static Analysis        Phase 2: Statistical Testing
┌─────────────────┐            ┌─────────────────┐
│ Identify secret │      →     │ Detect timing   │
│ data flow       │            │ differences     │
│ Tool: ct-verif  │            │ Tool: dudect    │
└─────────────────┘            └─────────────────┘
         ↓                              ↓
Phase 4: Root Cause             Phase 3: Dynamic Tracing
┌─────────────────┐            ┌─────────────────┐
│ Pinpoint leak   │      ←     │ Track secret    │
│ location        │            │ propagation     │
│ Tool: Timecop   │            │ Tool: Timecop   │
└─────────────────┘            └─────────────────┘

Recommended approach:

Start with dudect - Quick statistical check for timing differences
If leaks found - Use Timecop to pinpoint root cause
For high-assurance - Apply formal verification (ct-verif, SideTrail)
Continuous monitoring - Integrate dudect into CI pipeline

Tools and Approaches

Dudect - Statistical Analysis

Dudect measures execution time for two input classes (fixed vs random) and uses Welch's t-test to detect statistically significant differences.

Detailed Guidance: See the dudect skill for complete setup, usage patterns, and CI integration.

Quick Start for Constant-Time Analysis

#define DUDECT_IMPLEMENTATION
#include "dudect.h"

uint8_t do_one_computation(uint8_t *data) {
    // Code to measure goes here
}

void prepare_inputs(dudect_config_t *c, uint8_t *input_data, uint8_t *classes) {
    for (size_t i = 0; i < c->number_measurements; i++) {
        classes[i] = randombit();
        uint8_t *input = input_data + (size_t)i * c->chunk_size;
        if (classes[i] == 0) {
            // Fixed input class
        } else {
            // Random input class
        }
    }
}

Key advantages:

Simple C header-only integration
Statistical rigor via Welch's t-test
Works with compiled binaries (real-world conditions)

Key limitations:

No root cause information when leak detected
Sensitive to measurement noise
Cannot guarantee absence of leaks (statistical confidence only)

Timecop - Dynamic Tracing

Timecop wraps Valgrind to detect runtime operations dependent on secret memory regions.

Detailed Guidance: See the timecop skill for installation, examples, and debugging.

Quick Start for Constant-Time Analysis

#include "valgrind/memcheck.h"

#define poison(addr, len) VALGRIND_MAKE_MEM_UNDEFINED(addr, len)
#define unpoison(addr, len) VALGRIND_MAKE_MEM_DEFINED(addr, len)

int main() {
    unsigned long long secret_key = 0x12345678;

    // Mark secret as poisoned
    poison(&secret_key, sizeof(secret_key));

    // Any branching or memory access dependent on secret_key
    // will be reported by Valgrind
    crypto_operation(secret_key);

    unpoison(&secret_key, sizeof(secret_key));
}

Run with Valgrind:

valgrind --leak-check=full --track-origins=yes ./binary

Key advantages:

Pinpoints exact line of timing leak
No code instrumentation required
Tracks secret propagation through execution

Key limitations:

Cannot detect microarchitecture timing differences
Coverage limited to executed paths
Performance overhead (runs on synthetic CPU)

Implementation Guide

Phase 1: Initial Assessment

Identify cryptographic code handling secrets:

Private keys, exponents, nonces
Password hashes, authentication tokens
Encryption/decryption operations

Quick statistical check:

Write dudect harness for the crypto function
Run for 5-10 minutes with timeout 600 ./ct_test
Monitor t-value: high absolute values indicate leakage

Tools: dudect Expected time: 1-2 hours (harness writing + initial run)

Phase 2: Detailed Analysis

If dudect detects leakage:

Root cause investigation:

Mark secret variables with Timecop poison()
Run under Valgrind to identify exact line
Review the four common violation patterns
Check assembly output for conditional branches

Tools: Timecop, compiler output (objdump -d)

Phase 3: Remediation

Fix the timing leak:

Replace conditional branches with constant-time selection (bitwise operations)
Use constant-time comparison functions
Replace array lookups with constant-time alternatives or masking
Verify compiler doesn't optimize away constant-time code

Re-verify:

Run dudect again for extended period (30+ minutes)
Test across different compilers and optimization levels
Test on different CPU architectures

Phase 4: Continuous Monitoring

Integrate into CI:

Add dudect tests to test suite
Run for fixed duration (5-10 minutes in CI)
Fail build if leakage detected

See the dudect skill for CI integration examples.

Common Vulnerabilities

Vulnerability	Description	Detection	Severity
Secret-dependent branch	`if (secret_bit) { ... }`	dudect, Timecop	CRITICAL
Secret-dependent array access	`table[secret_index]`	Timecop, Binsec	HIGH
Variable-time division	`result = x / secret`	Timecop	MEDIUM
Variable-time shift	`result = x << secret`	Timecop	MEDIUM
Montgomery reduction leak	Extra reduction when intermediate > N	dudect	HIGH

Secret-Dependent Branch: Deep Dive

The vulnerability: Execution time differs based on whether branch is taken. Common in optimized modular exponentiation (square-and-multiply).

How to detect with dudect:

uint8_t do_one_computation(uint8_t *data) {
    uint64_t base = ((uint64_t*)data)[0];
    uint64_t exponent = ((uint64_t*)data)[1]; // Secret!
    return mod_exp(base, exponent, MODULUS);
}

void prepare_inputs(dudect_config_t *c, uint8_t *input_data, uint8_t *classes) {
    for (size_t i = 0; i < c->number_measurements; i++) {
        classes[i] = randombit();
        uint64_t *input = (uint64_t*)(input_data + i * c->chunk_size);
        input[0] = rand(); // Random base
        input[1] = (classes[i] == 0) ? FIXED_EXPONENT : rand(); // Fixed vs random
    }
}

How to detect with Timecop:

poison(&exponent, sizeof(exponent));
result = mod_exp(base, exponent, modulus);
unpoison(&exponent, sizeof(exponent));

Valgrind will report:

Conditional jump or move depends on uninitialised value(s)
  at 0x40115D: mod_exp (example.c:14)

Case Studies

Case Study: OpenSSL RSA Timing Attack

Brumley and Boneh (2005) extracted RSA private keys from OpenSSL over a network. The vulnerability exploited Montgomery multiplication's variable-time reduction step.

Attack vector: Timing differences in modular exponentiation Detection approach: Statistical analysis (precursor to dudect) Impact: Remote key extraction

Tools used: Custom timing measurement Techniques applied: Statistical analysis, chosen-ciphertext queries

Case Study: KyberSlash

Post-quantum algorithm Kyber's reference implementation contained timing vulnerabilities in polynomial operations. Division operations leaked secret coefficients.

Attack vector: Secret-dependent division timing Detection approach: Dynamic analysis and statistical testing Impact: Secret key recovery in post-quantum cryptography

Tools used: Timing measurement tools Techniques applied: Differential timing analysis

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Pin dudect to isolated CPU core (`taskset -c 2`)	Reduces OS noise, improves signal detection
Test multiple compilers (gcc, clang, MSVC)	Optimizations may introduce or remove leaks
Run dudect for extended periods (hours)	Increases statistical confidence
Minimize non-crypto code in harness	Reduces noise that masks weak signals
Check assembly output (`objdump -d`)	Verify compiler didn't introduce branches
Use `-O3 -march=native` in testing	Matches production optimization levels

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Only testing one input distribution	May miss leaks visible with other patterns	Test fixed-vs-random, fixed-vs-fixed-different, etc.
Short dudect runs (< 1 minute)	Insufficient measurements for weak signals	Run 5-10+ minutes, longer for high assurance
Ignoring compiler optimization levels	`-O0` may hide leaks present in `-O3`	Test at production optimization level
Not testing on target architecture	x86 vs ARM have different timing characteristics	Test on deployment platform
Marking too much as secret in Timecop	False positives, unclear results	Mark only true secrets (keys, not public data)

Tool Skills

Skill	Primary Use in Constant-Time Analysis
dudect	Statistical detection of timing differences via Welch's t-test
timecop	Dynamic tracing to pinpoint exact location of timing leaks

Technique Skills

Skill	When to Apply
coverage-analysis	Ensure test inputs exercise all code paths in crypto function
ci-integration	Automate constant-time testing in continuous integration pipeline

Skill	Relationship
crypto-testing	Constant-time analysis is essential component of cryptographic testing
fuzzing	Fuzzing crypto code may trigger timing-dependent paths

Skill Dependency Map

                    ┌─────────────────────────┐
                    │  constant-time-analysis │
                    │     (this skill)        │
                    └───────────┬─────────────┘
                                │
                ┌───────────────┴───────────────┐
                │                               │
                ▼                               ▼
    ┌───────────────────┐           ┌───────────────────┐
    │      dudect       │           │     timecop       │
    │  (statistical)    │           │    (dynamic)      │
    └────────┬──────────┘           └────────┬──────────┘
             │                               │
             └───────────────┬───────────────┘
                             │
                             ▼
              ┌──────────────────────────────┐
              │   Supporting Techniques      │
              │ coverage, CI integration     │
              └──────────────────────────────┘

Resources

Key External Resources

These results must be false: A usability evaluation of constant-time analysis tools Comprehensive usability study of constant-time analysis tools. Key findings: developers struggle with false positives, need better error messages, and benefit from tool integration. Evaluates FaCT, ct-verif, dudect, and Memsan across multiple cryptographic implementations. Recommends improved tooling UX and better documentation.

List of constant-time tools - CROCS Curated catalog of constant-time analysis tools with tutorials. Covers formal tools (ct-verif, FaCT), dynamic tools (Memsan, Timecop), symbolic tools (Binsec), and statistical tools (dudect). Includes practical tutorials for setup and usage.

Paul Kocher: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems Original 1996 paper introducing timing attacks. Demonstrates attacks on modular exponentiation in RSA and Diffie-Hellman. Essential historical context for understanding timing vulnerabilities.

Remote Timing Attacks are Practical (Brumley & Boneh) Demonstrates practical remote timing attacks against OpenSSL. Shows network-level timing differences are sufficient to extract RSA keys. Proves timing attacks work in realistic network conditions.

Cache-timing attacks on AES Shows AES implementations using lookup tables are vulnerable to cache-timing attacks. Demonstrates practical attacks extracting AES keys via cache timing side channels.

KyberSlash: Division Timings Leak Secrets Recent discovery of timing vulnerabilities in Kyber (NIST post-quantum standard). Shows division operations leak secret coefficients. Highlights that constant-time issues persist even in modern post-quantum cryptography.

Video Resources

Trail of Bits: Constant-Time Programming - Overview of constant-time programming principles and tools

/coverage-analysis

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/coverage-analysis/SKILL.md`

name: coverage-analysis type: technique description: > Coverage analysis measures code exercised during fuzzing. Use when assessing harness effectiveness or identifying fuzzing blockers.

Coverage Analysis

Coverage analysis is essential for understanding which parts of your code are exercised during fuzzing. It helps identify fuzzing blockers like magic value checks and tracks the effectiveness of harness improvements over time.

Overview

Code coverage during fuzzing serves two critical purposes:

Assessing harness effectiveness: Understand which parts of your application are actually executed by your fuzzing harnesses
Tracking fuzzing progress: Monitor how coverage changes when updating harnesses, fuzzers, or the system under test (SUT)

Coverage is a proxy for fuzzer capability and performance. While coverage is not ideal for measuring fuzzer performance in absolute terms, it reliably indicates whether your harness works effectively in a given setup.

Key Concepts

Concept	Description
Coverage instrumentation	Compiler flags that track which code paths are executed
Corpus coverage	Coverage achieved by running all test cases in a fuzzing corpus
Magic value checks	Hard-to-discover conditional checks that block fuzzer progress
Coverage-guided fuzzing	Fuzzing strategy that prioritizes inputs that discover new code paths
Coverage report	Visual or textual representation of executed vs. unexecuted code

When to Apply

Apply this technique when:

Starting a new fuzzing campaign to establish a baseline
Fuzzer appears to plateau without finding new paths
After harness modifications to verify improvements
When migrating between different fuzzers
Identifying areas requiring dictionary entries or seed inputs
Debugging why certain code paths aren't reached

Skip this technique when:

Fuzzing campaign is actively finding crashes
Coverage infrastructure isn't set up yet
Working with extremely large codebases where full coverage reports are impractical
Fuzzer's internal coverage metrics are sufficient for your needs

Quick Reference

Task	Command/Pattern
LLVM coverage instrumentation (C/C++)	`-fprofile-instr-generate -fcoverage-mapping`
GCC coverage instrumentation	`-ftest-coverage -fprofile-arcs`
cargo-fuzz coverage (Rust)	`cargo +nightly fuzz coverage <target>`
Generate LLVM profile data	`llvm-profdata merge -sparse file.profraw -o file.profdata`
LLVM coverage report	`llvm-cov report ./binary -instr-profile=file.profdata`
LLVM HTML report	`llvm-cov show ./binary -instr-profile=file.profdata -format=html -output-dir html/`
gcovr HTML report	`gcovr --html-details -o coverage.html`

Ideal Coverage Workflow

The following workflow represents best practices for integrating coverage analysis into your fuzzing campaigns:

[Fuzzing Campaign]
       |
       v
[Generate Corpus]
       |
       v
[Coverage Analysis]
       |
       +---> Coverage Increased? --> Continue fuzzing with larger corpus
       |
       +---> Coverage Decreased? --> Fix harness or investigate SUT changes
       |
       +---> Coverage Plateaued? --> Add dictionary entries or seed inputs

Key principle: Use the corpus generated after each fuzzing campaign to calculate coverage, rather than real-time fuzzer statistics. This approach provides reproducible, comparable measurements across different fuzzing tools.

Step-by-Step

Step 1: Build with Coverage Instrumentation

Choose your instrumentation method based on toolchain:

LLVM/Clang (C/C++):

clang++ -fprofile-instr-generate -fcoverage-mapping \
  -O2 -DNO_MAIN \
  main.cc harness.cc execute-rt.cc -o fuzz_exec

GCC (C/C++):

g++ -ftest-coverage -fprofile-arcs \
  -O2 -DNO_MAIN \
  main.cc harness.cc execute-rt.cc -o fuzz_exec_gcov

Rust:

rustup toolchain install nightly --component llvm-tools-preview
cargo +nightly fuzz coverage fuzz_target_1

Step 2: Create Execution Runtime (C/C++ only)

For C/C++ projects, create a runtime that executes your corpus:

// execute-rt.cc
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <stdint.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);

void load_file_and_test(const char *filename) {
    FILE *file = fopen(filename, "rb");
    if (file == NULL) {
        printf("Failed to open file: %s\n", filename);
        return;
    }

    fseek(file, 0, SEEK_END);
    long filesize = ftell(file);
    rewind(file);

    uint8_t *buffer = (uint8_t*) malloc(filesize);
    if (buffer == NULL) {
        printf("Failed to allocate memory for file: %s\n", filename);
        fclose(file);
        return;
    }

    long read_size = (long) fread(buffer, 1, filesize, file);
    if (read_size != filesize) {
        printf("Failed to read file: %s\n", filename);
        free(buffer);
        fclose(file);
        return;
    }

    LLVMFuzzerTestOneInput(buffer, filesize);

    free(buffer);
    fclose(file);
}

int main(int argc, char **argv) {
    if (argc != 2) {
        printf("Usage: %s <directory>\n", argv[0]);
        return 1;
    }

    DIR *dir = opendir(argv[1]);
    if (dir == NULL) {
        printf("Failed to open directory: %s\n", argv[1]);
        return 1;
    }

    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        if (entry->d_type == DT_REG) {
            char filepath[1024];
            snprintf(filepath, sizeof(filepath), "%s/%s", argv[1], entry->d_name);
            load_file_and_test(filepath);
        }
    }

    closedir(dir);
    return 0;
}

Step 3: Execute on Corpus

LLVM (C/C++):

LLVM_PROFILE_FILE=fuzz.profraw ./fuzz_exec corpus/

GCC (C/C++):

./fuzz_exec_gcov corpus/

Rust: Coverage data is automatically generated when running cargo fuzz coverage.

Step 4: Process Coverage Data

LLVM:

# Merge raw profile data
llvm-profdata merge -sparse fuzz.profraw -o fuzz.profdata

# Generate text report
llvm-cov report ./fuzz_exec \
  -instr-profile=fuzz.profdata \
  -ignore-filename-regex='harness.cc|execute-rt.cc'

# Generate HTML report
llvm-cov show ./fuzz_exec \
  -instr-profile=fuzz.profdata \
  -ignore-filename-regex='harness.cc|execute-rt.cc' \
  -format=html -output-dir fuzz_html/

GCC with gcovr:

# Install gcovr (via pip for latest version)
python3 -m venv venv
source venv/bin/activate
pip3 install gcovr

# Generate report
gcovr --gcov-executable "llvm-cov gcov" \
  --exclude harness.cc --exclude execute-rt.cc \
  --root . --html-details -o coverage.html

Rust:

# Install required tools
cargo install cargo-binutils rustfilt

# Create HTML generation script
cat <<'EOF' > ./generate_html
#!/bin/sh
if [ $# -lt 1 ]; then
    echo "Error: Name of fuzz target is required."
    echo "Usage: $0 fuzz_target [sources...]"
    exit 1
fi
FUZZ_TARGET="$1"
shift
SRC_FILTER="$@"
TARGET=$(rustc -vV | sed -n 's|host: ||p')
cargo +nightly cov -- show -Xdemangler=rustfilt \
  "target/$TARGET/coverage/$TARGET/release/$FUZZ_TARGET" \
  -instr-profile="fuzz/coverage/$FUZZ_TARGET/coverage.profdata" \
  -show-line-counts-or-regions -show-instantiations \
  -format=html -o fuzz_html/ $SRC_FILTER
EOF
chmod +x ./generate_html

# Generate HTML report
./generate_html fuzz_target_1 src/lib.rs

Step 5: Analyze Results

Review the coverage report to identify:

Uncovered code blocks: Areas that may need better seed inputs or dictionary entries
Magic value checks: Conditional statements with hardcoded values that block progress
Dead code: Functions that may not be reachable through your harness
Coverage changes: Compare against baseline to track improvements or regressions

Common Patterns

Pattern: Identifying Magic Values

Problem: Fuzzer cannot discover paths guarded by magic value checks.

Coverage reveals:

// Coverage shows this block is never executed
if (buf == 0x7F454C46) {  // ELF magic number
    // start parsing buf
}

Solution: Add magic values to dictionary file:

# magic.dict
"\x7F\x45\x4C\x46"

Pattern: Handling Crashing Inputs

Problem: Coverage generation fails when corpus contains crashing inputs.

Before:

./fuzz_exec corpus/  # Crashes on bad input, no coverage generated

After:

// Fork before executing to isolate crashes
int main(int argc, char **argv) {
    // ... directory opening code ...

    while ((entry = readdir(dir)) != NULL) {
        if (entry->d_type == DT_REG) {
            pid_t pid = fork();
            if (pid == 0) {
                // Child process - crash won't affect parent
                char filepath[1024];
                snprintf(filepath, sizeof(filepath), "%s/%s", argv[1], entry->d_name);
                load_file_and_test(filepath);
                exit(0);
            } else {
                // Parent waits for child
                waitpid(pid, NULL, 0);
            }
        }
    }
}

Pattern: CMake Integration

Use Case: Adding coverage builds to CMake projects.

project(FuzzingProject)
cmake_minimum_required(VERSION 3.0)

# Main binary
add_executable(program main.cc)

# Fuzzing binary
add_executable(fuzz main.cc harness.cc)
target_compile_definitions(fuzz PRIVATE NO_MAIN=1)
target_compile_options(fuzz PRIVATE -g -O2 -fsanitize=fuzzer)
target_link_libraries(fuzz -fsanitize=fuzzer)

# Coverage execution binary
add_executable(fuzz_exec main.cc harness.cc execute-rt.cc)
target_compile_definitions(fuzz_exec PRIVATE NO_MAIN)
target_compile_options(fuzz_exec PRIVATE -O2 -fprofile-instr-generate -fcoverage-mapping)
target_link_libraries(fuzz_exec -fprofile-instr-generate)

Build:

cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .
cmake --build . --target fuzz_exec

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Use LLVM 18+ with `-show-directory-coverage`	Organizes large reports by directory structure instead of flat file list
Export to lcov format for better HTML	`llvm-cov export -format=lcov` + `genhtml` provides cleaner per-file reports
Compare coverage across campaigns	Store `.profdata` files with timestamps to track progress over time
Filter harness code from reports	Use `-ignore-filename-regex` to focus on SUT coverage only
Automate coverage in CI/CD	Generate coverage reports automatically after scheduled fuzzing runs
Use gcovr 5.1+ for Clang 14+	Older gcovr versions have compatibility issues with recent LLVM

Incremental Coverage Updates

GCC's gcov instrumentation incrementally updates .gcda files across multiple runs. This is useful for tracking coverage as you add test cases:

# First run
./fuzz_exec_gcov corpus_batch_1/
gcovr --html coverage_v1.html

# Second run (adds to existing coverage)
./fuzz_exec_gcov corpus_batch_2/
gcovr --html coverage_v2.html

# Start fresh
gcovr --delete  # Remove .gcda files
./fuzz_exec_gcov corpus/

Handling Large Codebases

For projects with hundreds of source files:

Filter by prefix: Only generate reports for relevant directories

llvm-cov show ./fuzz_exec -instr-profile=fuzz.profdata /path/to/src/

Use directory coverage: Group by directory to reduce clutter (LLVM 18+)

llvm-cov show -show-directory-coverage -format=html -output-dir html/

Generate JSON for programmatic analysis:

llvm-cov export -format=lcov > coverage.json

Differential Coverage

Compare coverage between two fuzzing campaigns:

# Campaign 1
LLVM_PROFILE_FILE=campaign1.profraw ./fuzz_exec corpus1/
llvm-profdata merge -sparse campaign1.profraw -o campaign1.profdata

# Campaign 2
LLVM_PROFILE_FILE=campaign2.profraw ./fuzz_exec corpus2/
llvm-profdata merge -sparse campaign2.profraw -o campaign2.profdata

# Compare
llvm-cov show ./fuzz_exec \
  -instr-profile=campaign2.profdata \
  -instr-profile=campaign1.profdata \
  -show-line-counts-or-regions

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Using fuzzer-reported coverage for comparisons	Different fuzzers calculate coverage differently, making cross-tool comparison meaningless	Use dedicated coverage tools (llvm-cov, gcovr) for reproducible measurements
Generating coverage with optimizations	`-O3` optimizations can eliminate code, making coverage misleading	Use `-O2` or `-O0` for coverage builds
Not filtering harness code	Harness coverage inflates numbers and obscures SUT coverage	Use `-ignore-filename-regex` or `--exclude` to filter harness files
Mixing LLVM and GCC instrumentation	Incompatible formats cause parsing failures	Stick to one toolchain for coverage builds
Ignoring crashing inputs	Crashes prevent coverage generation, hiding real coverage data	Fix crashes first, or use process forking to isolate them
Not tracking coverage over time	One-time coverage checks miss regressions and improvements	Store coverage data with timestamps and track trends

Tool-Specific Guidance

libFuzzer

libFuzzer uses LLVM's SanitizerCoverage by default for guiding fuzzing, but you need separate instrumentation for generating reports.

Build for coverage:

clang++ -fprofile-instr-generate -fcoverage-mapping \
  -O2 -DNO_MAIN \
  main.cc harness.cc execute-rt.cc -o fuzz_exec

Execute corpus and generate report:

LLVM_PROFILE_FILE=fuzz.profraw ./fuzz_exec corpus/
llvm-profdata merge -sparse fuzz.profraw -o fuzz.profdata
llvm-cov show ./fuzz_exec -instr-profile=fuzz.profdata -format=html -output-dir html/

Integration tips:

Don't use -fsanitize=fuzzer for coverage builds (it conflicts with profile instrumentation)
Reuse the same harness function (LLVMFuzzerTestOneInput) with a different main function
Use the -ignore-filename-regex flag to exclude harness code from coverage reports
Consider using llvm-cov's -show-instantiation flag for template-heavy C++ code

AFL++

AFL++ provides its own coverage feedback mechanism, but for detailed reports use standard LLVM/GCC tools.

Build for coverage with LLVM:

clang++ -fprofile-instr-generate -fcoverage-mapping \
  -O2 main.cc harness.cc execute-rt.cc -o fuzz_exec

Build for coverage with GCC:

AFL_USE_ASAN=0 afl-gcc -ftest-coverage -fprofile-arcs \
  main.cc harness.cc execute-rt.cc -o fuzz_exec_gcov

Execute and generate report:

# LLVM approach
LLVM_PROFILE_FILE=fuzz.profraw ./fuzz_exec afl_output/queue/
llvm-profdata merge -sparse fuzz.profraw -o fuzz.profdata
llvm-cov report ./fuzz_exec -instr-profile=fuzz.profdata

# GCC approach
./fuzz_exec_gcov afl_output/queue/
gcovr --html-details -o coverage.html

Integration tips:

Don't use AFL++'s instrumentation (afl-clang-fast) for coverage builds
Use standard compilers with coverage flags instead
AFL++'s queue/ directory contains your corpus
AFL++'s built-in coverage statistics are useful for real-time monitoring but not for detailed analysis

cargo-fuzz (Rust)

cargo-fuzz provides built-in coverage generation using LLVM tools.

Install prerequisites:

rustup toolchain install nightly --component llvm-tools-preview
cargo install cargo-binutils rustfilt

Generate coverage data:

cargo +nightly fuzz coverage fuzz_target_1

Create HTML report script:

cat <<'EOF' > ./generate_html
#!/bin/sh
FUZZ_TARGET="$1"
shift
SRC_FILTER="$@"
TARGET=$(rustc -vV | sed -n 's|host: ||p')
cargo +nightly cov -- show -Xdemangler=rustfilt \
  "target/$TARGET/coverage/$TARGET/release/$FUZZ_TARGET" \
  -instr-profile="fuzz/coverage/$FUZZ_TARGET/coverage.profdata" \
  -show-line-counts-or-regions -show-instantiations \
  -format=html -o fuzz_html/ $SRC_FILTER
EOF
chmod +x ./generate_html

Generate report:

./generate_html fuzz_target_1 src/lib.rs

Integration tips:

Always use the nightly toolchain for coverage
The -Xdemangler=rustfilt flag makes function names readable
Filter by source files (e.g., src/lib.rs) to focus on crate code
Use -show-line-counts-or-regions and -show-instantiations for better Rust-specific output
Corpus is located in fuzz/corpus/<target>/

honggfuzz

honggfuzz works with standard LLVM/GCC coverage instrumentation.

Build for coverage:

# Use standard compiler, not honggfuzz compiler
clang -fprofile-instr-generate -fcoverage-mapping \
  -O2 harness.c execute-rt.c -o fuzz_exec

Execute corpus:

LLVM_PROFILE_FILE=fuzz.profraw ./fuzz_exec honggfuzz_workspace/

Integration tips:

Don't use hfuzz-clang for coverage builds
honggfuzz corpus is typically in a workspace directory
Use the same LLVM workflow as libFuzzer

Troubleshooting

Issue	Cause	Solution
`error: no profile data available`	Profile wasn't generated or wrong path	Verify `LLVM_PROFILE_FILE` was set and `.profraw` file exists
`Failed to load coverage`	Mismatch between binary and profile data	Rebuild binary with same flags used during execution
Coverage reports show 0%	Wrong binary used for report generation	Use the instrumented binary, not the fuzzing binary
`no_working_dir_found` error (gcovr)	`.gcda` files in unexpected location	Add `--gcov-ignore-errors=no_working_dir_found` flag
Crashes prevent coverage generation	Corpus contains crashing inputs	Filter crashes or use forking approach to isolate failures
Coverage decreases after harness change	Harness now skips certain code paths	Review harness logic; may need to support more input formats
HTML report is flat file list	Using older LLVM version	Upgrade to LLVM 18+ and use `-show-directory-coverage`
`incompatible instrumentation`	Mixing LLVM and GCC coverage	Rebuild everything with same toolchain

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Uses SanitizerCoverage for feedback; coverage analysis evaluates harness effectiveness
aflpp	Uses edge coverage for feedback; detailed analysis requires separate instrumentation
cargo-fuzz	Built-in `cargo fuzz coverage` command for Rust projects
honggfuzz	Uses edge coverage; analyze with standard LLVM/GCC tools

Skill	Relationship
fuzz-harness-writing	Coverage reveals which code paths harness reaches; guides harness improvements
fuzzing-dictionaries	Coverage identifies magic value checks that need dictionary entries
corpus-management	Coverage analysis helps curate corpora by identifying redundant test cases
sanitizers	Coverage helps verify sanitizer-instrumented code is actually executed

Resources

Key External Resources

LLVM Source-Based Code Coverage Comprehensive guide to LLVM's profile instrumentation, including advanced features like branch coverage, region coverage, and integration with existing build systems. Covers compiler flags, runtime behavior, and profile data formats.

llvm-cov Command Guide Detailed CLI reference for llvm-cov commands including show, report, and export. Documents all filtering options, output formats, and integration with llvm-profdata.

gcovr Documentation Complete guide to gcovr tool for generating coverage reports from gcov data. Covers HTML themes, filtering options, multi-directory projects, and CI/CD integration patterns.

SanitizerCoverage Documentation Low-level documentation for LLVM's SanitizerCoverage instrumentation. Explains inline 8-bit counters, PC tables, and how fuzzers use coverage feedback for guidance.

On the Evaluation of Fuzzer Performance Research paper examining limitations of coverage as a fuzzing performance metric. Argues for more nuanced evaluation methods beyond simple code coverage percentages.

Video Resources

Not applicable - coverage analysis is primarily a tooling and workflow topic best learned through documentation and hands-on practice.

/fuzzing-dictionary

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/fuzzing-dictionary/SKILL.md`

name: fuzzing-dictionary type: technique description: > Fuzzing dictionaries guide fuzzers with domain-specific tokens. Use when fuzzing parsers, protocols, or format-specific code.

Fuzzing Dictionary

A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.

Overview

Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.

Key Concepts

Concept	Description
Dictionary Entry	A quoted string (e.g., `"keyword"`) or key-value pair (e.g., `kw="value"`)
Hex Escapes	Byte sequences like `"\xF7\xF8"` for non-printable characters
Token Injection	Fuzzer inserts dictionary entries into generated inputs
Cross-Fuzzer Format	Dictionary files work with libFuzzer, AFL++, and cargo-fuzz

When to Apply

Apply this technique when:

Fuzzing parsers (JSON, XML, config files)
Fuzzing protocol implementations (HTTP, DNS, custom protocols)
Fuzzing file format handlers (PNG, PDF, media codecs)
Coverage plateaus early without reaching deeper logic
Target code checks for specific keywords or magic values

Skip this technique when:

Fuzzing pure algorithms without format expectations
Target has no keyword-based parsing
Corpus already achieves high coverage

Quick Reference

Task	Command/Pattern
Use with libFuzzer	`./fuzz -dict=./dictionary.dict ...`
Use with AFL++	`afl-fuzz -x ./dictionary.dict ...`
Use with cargo-fuzz	`cargo fuzz run fuzz_target -- -dict=./dictionary.dict`
Extract from header	`grep -o '".*"' header.h > header.dict`
Generate from binary	`strings ./binary \| sed 's/^/"&/; s/$/&"/' > strings.dict`

Step-by-Step

Step 1: Create Dictionary File

Create a text file with quoted strings on each line. Use comments (#) for documentation.

Example dictionary format:

# Lines starting with '#' and empty lines are ignored.

# Adds "blah" (w/o quotes) to the dictionary.
kw1="blah"
# Use \\ for backslash and \" for quotes.
kw2="\"ac\\dc\""
# Use \xAB for hex values
kw3="\xF7\xF8"
# the name of the keyword followed by '=' may be omitted:
"foo\x0Abar"

Step 2: Generate Dictionary Content

Choose a generation method based on what's available:

From LLM: Prompt ChatGPT or Claude with:

A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.

From header files:

grep -o '".*"' header.h > header.dict

From man pages (for CLI tools):

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

From binary strings:

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Step 3: Pass Dictionary to Fuzzer

Use the appropriate flag for your fuzzer (see Quick Reference above).

Common Patterns

Pattern: Protocol Keywords

Use Case: Fuzzing HTTP or custom protocol handlers

Dictionary content:

# HTTP methods
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"

# Headers
"Content-Type"
"Authorization"
"Host"

# Protocol markers
"HTTP/1.1"
"HTTP/2.0"

Pattern: Magic Bytes and File Format Headers

Use Case: Fuzzing image parsers, media decoders, archive handlers

Dictionary content:

# PNG magic bytes and chunks
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"

# JPEG markers
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"

Pattern: Configuration File Keywords

Use Case: Fuzzing config file parsers (YAML, TOML, INI)

Dictionary content:

# Common config keywords
"true"
"false"
"null"
"version"
"enabled"
"disabled"

# Section headers
"[general]"
"[network]"
"[security]"

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Combine multiple generation methods	LLM-generated keywords + strings from binary covers broad surface
Include boundary values	`"0"`, `"-1"`, `"2147483647"` trigger edge cases
Add format delimiters	`:`, `=`, `{`, `}` help fuzzer construct valid structures
Keep dictionaries focused	50-200 entries perform better than thousands
Test dictionary effectiveness	Run with and without dict, compare coverage

Auto-Generated Dictionaries (AFL++)

When using afl-clang-lto compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.

Enable auto-dictionary:

export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# Dictionary saved to auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target

Combining Multiple Dictionaries

Some fuzzers support multiple dictionary files:

# AFL++ with multiple dictionaries
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Including full sentences	Fuzzer needs atomic tokens, not prose	Break into individual keywords
Duplicating entries	Wastes mutation budget	Use `sort -u` to deduplicate
Over-sized dictionaries	Slows fuzzer, dilutes useful tokens	Keep focused: 50-200 most relevant entries
Missing hex escapes	Non-printable bytes become mangled	Use `\xXX` for binary values
No comments	Hard to maintain and audit	Document sections with `#` comments

Tool-Specific Guidance

libFuzzer

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/

Integration tips:

Dictionary tokens are inserted/replaced during mutations
Combine with -max_len to control input size
Use -print_final_stats=1 to see dictionary effectiveness metrics
Dictionary entries longer than -max_len are ignored

AFL++

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@

Integration tips:

AFL++ supports multiple -x flags for multiple dictionaries
Use AFL_LLVM_DICT2FILE with afl-clang-lto for auto-generated dictionaries
Dictionary effectiveness shown in fuzzer stats UI
Tokens are used during deterministic and havoc stages

cargo-fuzz (Rust)

cargo fuzz run fuzz_target -- -dict=./dictionary.dict

Integration tips:

cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
Place dictionary file in fuzz/ directory alongside harness
Reference from harness directory: cargo fuzz run target -- -dict=../dictionary.dict

go-fuzz (Go)

go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:

# Convert dictionary to corpus files
grep -o '".*"' dict.txt | while read line; do
    echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done

go-fuzz -bin=./target-fuzz.zip -workdir=.

Troubleshooting

Issue	Cause	Solution
Dictionary file not loaded	Wrong path or format error	Check fuzzer output for dict parsing errors; verify file format
No coverage improvement	Dictionary tokens not relevant	Analyze target code for actual keywords; try different generation method
Syntax errors in dict file	Unescaped quotes or invalid escapes	Use `\\` for backslash, `\"` for quotes; validate with test run
Fuzzer ignores long entries	Entries exceed `-max_len`	Keep entries under max input length, or increase `-max_len`
Too many entries slow fuzzer	Dictionary too large	Prune to 50-200 most relevant entries

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Native dictionary support via `-dict=` flag
aflpp	Native dictionary support via `-x` flag; auto-generation with AUTODICTIONARIES
cargo-fuzz	Uses libFuzzer backend, inherits `-dict=` support

Skill	Relationship
fuzzing-corpus	Dictionaries complement corpus: corpus provides structure, dictionary provides keywords
coverage-analysis	Use coverage data to validate dictionary effectiveness
harness-writing	Harness structure determines which dictionary tokens are useful

Resources

Key External Resources

AFL++ Dictionaries Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.

libFuzzer Dictionary Documentation Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.

Additional Examples

OSS-Fuzz Dictionaries Real-world dictionaries from Google's continuous fuzzing service. Search project directories for *.dict files to see production examples.

/fuzzing-obstacles

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/fuzzing-obstacles/SKILL.md`

name: fuzzing-obstacles type: technique description: > Techniques for patching code to overcome fuzzing obstacles. Use when checksums, global state, or other barriers block fuzzer progress.

Overcoming Fuzzing Obstacles

Codebases often contain anti-fuzzing patterns that prevent effective coverage. Checksums, global state (like time-seeded PRNGs), and validation checks can block the fuzzer from exploring deeper code paths. This technique shows how to patch your System Under Test (SUT) to bypass these obstacles during fuzzing while preserving production behavior.

Overview

Many real-world programs were not designed with fuzzing in mind. They may:

Verify checksums or cryptographic hashes before processing input
Rely on global state (e.g., system time, environment variables)
Use non-deterministic random number generators
Perform complex validation that makes it difficult for the fuzzer to generate valid inputs

These patterns make fuzzing difficult because:

Checksums: The fuzzer must guess correct hash values (astronomically unlikely)
Global state: Same input produces different behavior across runs (breaks determinism)
Complex validation: The fuzzer spends effort hitting validation failures instead of exploring deeper code

The solution is conditional compilation: modify code behavior during fuzzing builds while keeping production code unchanged.

Key Concepts

Concept	Description
SUT Patching	Modifying System Under Test to be fuzzing-friendly
Conditional Compilation	Code that behaves differently based on compile-time flags
Fuzzing Build Mode	Special build configuration that enables fuzzing-specific patches
False Positives	Crashes found during fuzzing that cannot occur in production
Determinism	Same input always produces same behavior (critical for fuzzing)

When to Apply

Apply this technique when:

The fuzzer gets stuck at checksum or hash verification
Coverage reports show large blocks of unreachable code behind validation
Code uses time-based seeds or other non-deterministic global state
Complex validation makes it nearly impossible to generate valid inputs
You see the fuzzer repeatedly hitting the same validation failures

Skip this technique when:

The obstacle can be overcome with a good seed corpus or dictionary
The validation is simple enough for the fuzzer to learn (e.g., magic bytes)
You're doing grammar-based or structure-aware fuzzing that handles validation
Skipping the check would introduce too many false positives
The code is already fuzzing-friendly

Quick Reference

Task	C/C++	Rust
Check if fuzzing build	`#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION`	`cfg!(fuzzing)`
Skip check during fuzzing	`#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION return -1; #endif`	`if !cfg!(fuzzing) { return Err(...) }`
Common obstacles	Checksums, PRNGs, time-based logic	Checksums, PRNGs, time-based logic
Supported fuzzers	libFuzzer, AFL++, LibAFL, honggfuzz	cargo-fuzz, libFuzzer

Step-by-Step

Step 1: Identify the Obstacle

Run the fuzzer and analyze coverage to find code that's unreachable. Common patterns:

Look for checksum/hash verification before deeper processing
Check for calls to rand(), time(), or srand() with system seeds
Find validation functions that reject most inputs
Identify global state initialization that differs across runs

Tools to help:

Coverage reports (see coverage-analysis technique)
Profiling with -fprofile-instr-generate
Manual code inspection of entry points

Step 2: Add Conditional Compilation

Modify the obstacle to bypass it during fuzzing builds.

C/C++ Example:

// Before: Hard obstacle
if (checksum != expected_hash) {
    return -1;  // Fuzzer never gets past here
}

// After: Conditional bypass
if (checksum != expected_hash) {
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
    return -1;  // Only enforced in production
#endif
}
// Fuzzer can now explore code beyond this check

Rust Example:

// Before: Hard obstacle
if checksum != expected_hash {
    return Err(MyError::Hash);  // Fuzzer never gets past here
}

// After: Conditional bypass
if checksum != expected_hash {
    if !cfg!(fuzzing) {
        return Err(MyError::Hash);  // Only enforced in production
    }
}
// Fuzzer can now explore code beyond this check

Step 3: Verify Coverage Improvement

After patching:

Rebuild with fuzzing instrumentation
Run the fuzzer for a short time
Compare coverage to the unpatched version
Confirm new code paths are being explored

Step 4: Assess False Positive Risk

Consider whether skipping the check introduces impossible program states:

Does code after the check assume validated properties?
Could skipping validation cause crashes that cannot occur in production?
Is there implicit state dependency?

If false positives are likely, consider a more targeted patch (see Common Patterns below).

Common Patterns

Pattern: Bypass Checksum Validation

Use Case: Hash/checksum blocks all fuzzer progress

Before:

uint32_t computed = hash_function(data, size);
if (computed != expected_checksum) {
    return ERROR_INVALID_HASH;
}
process_data(data, size);

After:

uint32_t computed = hash_function(data, size);
if (computed != expected_checksum) {
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
    return ERROR_INVALID_HASH;
#endif
}
process_data(data, size);

False positive risk: LOW - If data processing doesn't depend on checksum correctness

Pattern: Deterministic PRNG Seeding

Use Case: Non-deterministic random state prevents reproducibility

Before:

void initialize() {
    srand(time(NULL));  // Different seed each run
}

After:

void initialize() {
#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
    srand(12345);  // Fixed seed for fuzzing
#else
    srand(time(NULL));
#endif
}

False positive risk: LOW - Fuzzer can explore all code paths with fixed seed

Pattern: Careful Validation Skip

Use Case: Validation must be skipped but downstream code has assumptions

Before (Dangerous):

#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
if (!validate_config(&config)) {
    return -1;  // Ensures config.x != 0
}
#endif

int32_t result = 100 / config.x;  // CRASH: Division by zero in fuzzing!

After (Safe):

#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
if (!validate_config(&config)) {
    return -1;
}
#else
// During fuzzing, use safe defaults for failed validation
if (!validate_config(&config)) {
    config.x = 1;  // Prevent division by zero
    config.y = 1;
}
#endif

int32_t result = 100 / config.x;  // Safe in both builds

False positive risk: MITIGATED - Provides safe defaults instead of skipping

Pattern: Bypass Complex Format Validation

Use Case: Multi-step validation makes valid input generation nearly impossible

Rust Example:

// Before: Multiple validation stages
pub fn parse_message(data: &[u8]) -> Result<Message, Error> {
    validate_magic_bytes(data)?;
    validate_structure(data)?;
    validate_checksums(data)?;
    validate_crypto_signature(data)?;

    deserialize_message(data)
}

// After: Skip expensive validation during fuzzing
pub fn parse_message(data: &[u8]) -> Result<Message, Error> {
    validate_magic_bytes(data)?;  // Keep cheap checks

    if !cfg!(fuzzing) {
        validate_structure(data)?;
        validate_checksums(data)?;
        validate_crypto_signature(data)?;
    }

    deserialize_message(data)
}

False positive risk: MEDIUM - Deserialization must handle malformed data gracefully

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Keep cheap validation	Magic bytes and size checks guide fuzzer without much cost
Use fixed seeds for PRNGs	Makes behavior deterministic while exploring all code paths
Patch incrementally	Skip one obstacle at a time and measure coverage impact
Add defensive defaults	When skipping validation, provide safe fallback values
Document all patches	Future maintainers need to understand fuzzing vs. production differences

Real-World Examples

OpenSSL: Uses FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION to modify cryptographic algorithm behavior. For example, in crypto/cmp/cmp_vfy.c, certain signature checks are relaxed during fuzzing to allow deeper exploration of certificate validation logic.

ogg crate (Rust): Uses cfg!(fuzzing) to skip checksum verification during fuzzing. This allows the fuzzer to explore audio processing code without spending effort guessing correct checksums.

Measuring Patch Effectiveness

After applying patches, quantify the improvement:

Line coverage: Use llvm-cov or cargo-cov to see new reachable lines
Basic block coverage: More fine-grained than line coverage
Function coverage: How many more functions are now reachable?
Corpus size: Does the fuzzer generate more diverse inputs?

Effective patches typically increase coverage by 10-50% or more.

Combining with Other Techniques

Obstacle patching works well with:

Corpus seeding: Provide valid inputs that get past initial parsing
Dictionaries: Help fuzzer learn magic bytes and common values
Structure-aware fuzzing: Use protobuf or grammar definitions for complex formats
Harness improvements: Better harness can sometimes avoid obstacles entirely

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Skip all validation wholesale	Creates false positives and unstable fuzzing	Skip only specific obstacles that block coverage
No risk assessment	False positives waste time and hide real bugs	Analyze downstream code for assumptions
Forget to document patches	Future maintainers don't understand the differences	Add comments explaining why patch is safe
Patch without measuring	Don't know if it helped	Compare coverage before and after
Over-patching	Makes fuzzing build diverge too much from production	Minimize differences between builds

Tool-Specific Guidance

libFuzzer

libFuzzer automatically defines FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION during compilation.

# C++ compilation
clang++ -g -fsanitize=fuzzer,address -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION \
    harness.cc target.cc -o fuzzer

# The macro is usually defined automatically by -fsanitize=fuzzer
clang++ -g -fsanitize=fuzzer,address harness.cc target.cc -o fuzzer

Integration tips:

The macro is defined automatically; manual definition is usually unnecessary
Use #ifdef to check for the macro
Combine with sanitizers to detect bugs in newly reachable code

AFL++

AFL++ also defines FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION when using its compiler wrappers.

# Compilation with AFL++ wrappers
afl-clang-fast++ -g -fsanitize=address target.cc harness.cc -o fuzzer

# The macro is defined automatically by afl-clang-fast

Integration tips:

Use afl-clang-fast or afl-clang-lto for automatic macro definition
Persistent mode harnesses benefit most from obstacle patching
Consider using AFL_LLVM_LAF_ALL for additional input-to-state transformations

honggfuzz

honggfuzz also supports the macro when building targets.

# Compilation
hfuzz-clang++ -g -fsanitize=address target.cc harness.cc -o fuzzer

Integration tips:

Use hfuzz-clang or hfuzz-clang++ wrappers
The macro is available for conditional compilation
Combine with honggfuzz's feedback-driven fuzzing

cargo-fuzz (Rust)

cargo-fuzz automatically sets the fuzzing cfg option during builds.

# Build fuzz target (cfg!(fuzzing) is automatically set)
cargo fuzz build fuzz_target_name

# Run fuzz target
cargo fuzz run fuzz_target_name

Integration tips:

Use cfg!(fuzzing) for runtime checks in production builds
Use #[cfg(fuzzing)] for compile-time conditional compilation
The fuzzing cfg is only set during cargo fuzz builds, not regular cargo build
Can be manually enabled with RUSTFLAGS="--cfg fuzzing" for testing

LibAFL

LibAFL supports the C/C++ macro for targets written in C/C++.

# Compilation
clang++ -g -fsanitize=address -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION \
    target.cc -c -o target.o

Integration tips:

Define the macro manually or use compiler flags
Works the same as with libFuzzer
Useful when building custom LibAFL-based fuzzers

Troubleshooting

Issue	Cause	Solution
Coverage doesn't improve after patching	Wrong obstacle identified	Profile execution to find actual bottleneck
Many false positive crashes	Downstream code has assumptions	Add defensive defaults or partial validation
Code compiles differently	Macro not defined in all build configs	Verify macro in all source files and dependencies
Fuzzer finds bugs in patched code	Patch introduced invalid states	Review patch for state invariants; consider safer approach
Can't reproduce production bugs	Build differences too large	Minimize patches; keep validation for state-critical checks

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Defines `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` automatically
aflpp	Supports the macro via compiler wrappers
honggfuzz	Uses the macro for conditional compilation
cargo-fuzz	Sets `cfg!(fuzzing)` for Rust conditional compilation

Skill	Relationship
fuzz-harness-writing	Better harnesses may avoid obstacles; patching enables deeper exploration
coverage-analysis	Use coverage to identify obstacles and measure patch effectiveness
corpus-seeding	Seed corpus can help overcome obstacles without patching
dictionary-generation	Dictionaries help with magic bytes but not checksums or complex validation

Resources

Key External Resources

OpenSSL Fuzzing Documentation OpenSSL's fuzzing infrastructure demonstrates large-scale use of FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION. The project uses this macro to modify cryptographic validation, certificate parsing, and other security-critical code paths to enable deeper fuzzing while maintaining production correctness.

LibFuzzer Documentation on Flags Official LLVM documentation for libFuzzer, including how the fuzzer defines compiler macros and how to use them effectively. Covers integration with sanitizers and coverage instrumentation.

Rust cfg Attribute Reference Complete reference for Rust conditional compilation, including cfg!(fuzzing) and cfg!(test). Explains compile-time vs. runtime conditional compilation and best practices.

/harness-writing

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/harness-writing/SKILL.md`

name: harness-writing type: technique description: > Techniques for writing effective fuzzing harnesses across languages. Use when creating new fuzz targets or improving existing harness code.

Writing Fuzzing Harnesses

A fuzzing harness is the entrypoint function that receives random data from the fuzzer and routes it to your system under test (SUT). The quality of your harness directly determines which code paths get exercised and whether critical bugs are found. A poorly written harness can miss entire subsystems or produce non-reproducible crashes.

Overview

The harness is the bridge between the fuzzer's random byte generation and your application's API. It must parse raw bytes into meaningful inputs, call target functions, and handle edge cases gracefully. The most important part of any fuzzing setup is the harness—if written poorly, critical parts of your application may not be covered.

Key Concepts

Concept	Description
Harness	Function that receives fuzzer input and calls target code under test
SUT	System Under Test—the code being fuzzed
Entry point	Function signature required by the fuzzer (e.g., `LLVMFuzzerTestOneInput`)
FuzzedDataProvider	Helper class for structured extraction of typed data from raw bytes
Determinism	Property that ensures same input always produces same behavior
Interleaved fuzzing	Single harness that exercises multiple operations based on input

When to Apply

Apply this technique when:

Creating a new fuzz target for the first time
Fuzz campaign has low code coverage or isn't finding bugs
Crashes found during fuzzing are not reproducible
Target API requires complex or structured inputs
Multiple related functions should be tested together

Skip this technique when:

Using existing well-tested harnesses from your project
Tool provides automatic harness generation that meets your needs
Target already has comprehensive fuzzing infrastructure

Quick Reference

Task	Pattern
Minimal C++ harness	`extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)`
Minimal Rust harness	`fuzz_target!(
Size validation	`if (size < MIN_SIZE) return 0;`
Cast to integers	`uint32_t val = (uint32_t)(data);`
Use FuzzedDataProvider	`FuzzedDataProvider fuzzed_data(data, size);`
Extract typed data (C++)	`auto val = fuzzed_data.ConsumeIntegral<uint32_t>();`
Extract string (C++)	`auto str = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);`

Step-by-Step

Step 1: Identify Entry Points

Find functions in your codebase that:

Accept external input (parsers, validators, protocol handlers)
Parse complex data formats (JSON, XML, binary protocols)
Perform security-critical operations (authentication, cryptography)
Have high cyclomatic complexity or many branches

Good targets are typically:

Protocol parsers
File format parsers
Serialization/deserialization functions
Input validation routines

Step 2: Write Minimal Harness

Start with the simplest possible harness that calls your target function:

C/C++:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    target_function(data, size);
    return 0;
}

Rust:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    target_function(data);
});

Step 3: Add Input Validation

Reject inputs that are too small or too large to be meaningful:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Ensure minimum size for meaningful input
    if (size < MIN_INPUT_SIZE || size > MAX_INPUT_SIZE) {
        return 0;
    }
    target_function(data, size);
    return 0;
}

Rationale: The fuzzer generates random inputs of all sizes. Your harness must handle empty, tiny, huge, or malformed inputs without causing unexpected issues in the harness itself (crashes in the SUT are fine—that's what we're looking for).

Step 4: Structure the Input

For APIs that require typed data (integers, strings, etc.), use casting or helpers like FuzzedDataProvider:

Simple casting:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

Using FuzzedDataProvider:

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    return 0;
}

Step 5: Test and Iterate

Run the fuzzer and monitor:

Code coverage (are all interesting paths reached?)
Executions per second (is it fast enough?)
Crash reproducibility (can you reproduce crashes with saved inputs?)

Iterate on the harness to improve these metrics.

Common Patterns

Pattern: Beyond Byte Arrays—Casting to Integers

Use Case: When target expects primitive types like integers or floats

Implementation:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Ensure exactly 2 4-byte numbers
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    // Split input into two integers
    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

Rust equivalent:

fuzz_target!(|data: &[u8]| {
    if data.len() != 2 * std::mem::size_of::<i32>() {
        return;
    }

    let numerator = i32::from_ne_bytes([data[0], data[1], data[2], data[3]]);
    let denominator = i32::from_ne_bytes([data[4], data[5], data[6], data[7]]);

    divide(numerator, denominator);
});

Why it works: Any 8-byte input is valid. The fuzzer learns that inputs must be exactly 8 bytes, and every bit flip produces a new, potentially interesting input.

Pattern: FuzzedDataProvider for Complex Inputs

Use Case: When target requires multiple strings, integers, or variable-length data

Implementation:

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    // Extract different types of data
    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();

    // Consume variable-length strings with terminator
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    if (result != NULL) {
        free(result);
    }

    return 0;
}

Why it helps: FuzzedDataProvider handles the complexity of extracting structured data from a byte stream. It's particularly useful for APIs that need multiple parameters of different types.

Pattern: Interleaved Fuzzing

Use Case: When multiple related operations should be tested in a single harness

Implementation:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 1 + 2 * sizeof(int32_t)) {
        return 0;
    }

    // First byte selects operation
    uint8_t mode = data[0];

    // Next bytes are operands
    int32_t numbers[2];
    memcpy(numbers, data + 1, 2 * sizeof(int32_t));

    int32_t result = 0;
    switch (mode % 4) {
        case 0:
            result = add(numbers[0], numbers[1]);
            break;
        case 1:
            result = subtract(numbers[0], numbers[1]);
            break;
        case 2:
            result = multiply(numbers[0], numbers[1]);
            break;
        case 3:
            result = divide(numbers[0], numbers[1]);
            break;
    }

    // Prevent compiler from optimizing away the calls
    printf("%d", result);
    return 0;
}

Advantages:

Faster to write one harness than multiple individual harnesses
Single shared corpus means interesting inputs for one operation may be interesting for others
Can discover bugs in interactions between operations

When to use:

Pattern: Structure-Aware Fuzzing with Arbitrary (Rust)

Use Case: When fuzzing Rust code that uses custom structs

Implementation:

use arbitrary::Arbitrary;

#[derive(Debug, Arbitrary)]
pub struct Name {
    data: String
}

impl Name {
    pub fn check_buf(&self) {
        let data = self.data.as_bytes();
        if data.len() > 0 && data[0] == b'a' {
            if data.len() > 1 && data[1] == b'b' {
                if data.len() > 2 && data[2] == b'c' {
                    process::abort();
                }
            }
        }
    }
}

Harness with arbitrary:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: your_project::Name| {
    data.check_buf();
});

Add to Cargo.toml:

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

Why it helps: The arbitrary crate automatically handles deserialization of raw bytes into your Rust structs, reducing boilerplate and ensuring valid struct construction.

Limitation: The arbitrary crate doesn't offer reverse serialization, so you can't manually construct byte arrays that map to specific structs. This works best when starting from an empty corpus (fine for libFuzzer, problematic for AFL++).

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Start with parsers	High bug density, clear entry points, easy to harness
Mock I/O operations	Prevents hangs from blocking I/O, enables determinism
Use FuzzedDataProvider	Simplifies extraction of structured data from raw bytes
Reset global state	Ensures each iteration is independent and reproducible
Free resources in harness	Prevents memory exhaustion during long campaigns
Avoid logging in harness	Logging is slow—fuzzing needs 100s-1000s exec/sec
Test harness manually first	Run harness with known inputs before starting campaign
Check coverage early	Ensure harness reaches expected code paths

Structure-Aware Fuzzing with Protocol Buffers

For highly structured input formats, consider using Protocol Buffers as an intermediate format with custom mutators:

// Define your input format in .proto file
// Use libprotobuf-mutator to generate valid mutations
// This ensures fuzzer mutates message contents, not the protobuf encoding itself

This approach is more setup but prevents the fuzzer from wasting time on unparseable inputs. See structure-aware fuzzing documentation for details.

Handling Non-Determinism

Problem: Random values or timing dependencies cause non-reproducible crashes.

Solutions:

Replace rand() with deterministic PRNG seeded from fuzzer input:

uint32_t seed = fuzzed_data.ConsumeIntegral<uint32_t>();
srand(seed);

Mock system calls that return time, PIDs, or random data
Avoid reading from /dev/random or /dev/urandom

Resetting Global State

If your SUT uses global state (singletons, static variables), reset it between iterations:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Reset global state before each iteration
    global_reset();

    target_function(data, size);

    // Clean up resources
    global_cleanup();
    return 0;
}

Rationale: Global state can cause crashes after N iterations rather than on a specific input, making bugs non-reproducible.

Practical Harness Rules

Follow these rules to ensure effective fuzzing harnesses:

Rule	Rationale
Handle all input sizes	Fuzzer generates empty, tiny, huge inputs—harness must handle gracefully
Never call `exit()`	Calling `exit()` stops the fuzzer process. Use `abort()` in SUT if needed
Join all threads	Each iteration must run to completion before next iteration starts
Be fast	Aim for 100s-1000s executions/sec. Avoid logging, high complexity, excess memory
Maintain determinism	Same input must always produce same behavior for reproducibility
Avoid global state	Global state reduces reproducibility—reset between iterations if unavoidable
Use narrow targets	Don't fuzz PNG and TCP in same harness—different formats need separate targets
Free resources	Prevent memory leaks that cause resource exhaustion during long campaigns

Note: These guidelines apply not just to harness code, but to the entire SUT. If the SUT violates these rules, consider patching it (see the fuzzing obstacles technique).

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Global state without reset	Non-deterministic crashes	Reset all globals at start of harness
Blocking I/O or network calls	Hangs fuzzer, wastes time	Mock I/O, use in-memory buffers
Memory leaks in harness	Resource exhaustion kills campaign	Free all allocations before returning
Calling `exit()` in SUT	Stops entire fuzzing process	Use `abort()` or return error codes
Heavy logging in harness	Reduces exec/sec by orders of magnitude	Disable logging during fuzzing
Too many operations per iteration	Slows down fuzzer	Keep iterations fast and focused
Mixing unrelated input formats	Corpus entries not useful across formats	Separate harnesses for different formats
Not validating input size	Harness crashes on edge cases	Check `size` before accessing `data`

Tool-Specific Guidance

libFuzzer

Harness signature:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Your code here
    return 0;  // Non-zero return is reserved for future use
}

Compilation:

clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz_target

Integration tips:

Use FuzzedDataProvider.h for structured input extraction
Compile with -fsanitize=fuzzer to link the fuzzing runtime
Add sanitizers (-fsanitize=address,undefined) to detect more bugs
Use -g for better stack traces when crashes occur
libFuzzer can start with empty corpus—no seed inputs required

Running:

./fuzz_target corpus_dir/

Resources:

AFL++

AFL++ supports multiple harness styles. For best performance, use persistent mode:

Persistent mode harness:

#include <unistd.h>

int main(int argc, char **argv) {
    #ifdef __AFL_HAVE_MANUAL_CONTROL
        __AFL_INIT();
    #endif

    unsigned char buf[MAX_SIZE];

    while (__AFL_LOOP(10000)) {
        // Read input from stdin
        ssize_t len = read(0, buf, sizeof(buf));
        if (len <= 0) break;

        // Call target function
        target_function(buf, len);
    }

    return 0;
}

Compilation:

afl-clang-fast++ -g harness.cc -o fuzz_target

Integration tips:

Use persistent mode (__AFL_LOOP) for 10-100x speedup
Consider deferred initialization (__AFL_INIT()) to skip setup overhead
AFL++ requires at least one seed input in the corpus directory
Use AFL_USE_ASAN=1 or AFL_USE_UBSAN=1 for sanitizer builds

Running:

afl-fuzz -i seeds/ -o findings/ -- ./fuzz_target

cargo-fuzz (Rust)

Harness signature:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    // Your code here
});

With structured input (arbitrary crate):

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: YourStruct| {
    data.check();
});

Creating harness:

cargo fuzz init
cargo fuzz add my_target

Integration tips:

Use arbitrary crate for automatic struct deserialization
cargo-fuzz wraps libFuzzer, so all libFuzzer features work
Compile with sanitizers automatically via cargo-fuzz
Harnesses go in fuzz/fuzz_targets/ directory

Running:

cargo +nightly fuzz run my_target

Resources:

go-fuzz

Harness signature:

// +build gofuzz

package mypackage

func Fuzz(data []byte) int {
    // Call target function
    target(data)

    // Return codes:
    // -1 if input is invalid
    //  0 if input is valid but not interesting
    //  1 if input is interesting (e.g., added new coverage)
    return 0
}

Building:

go-fuzz-build

Integration tips:

Return 1 for inputs that add coverage (optional—fuzzer can detect automatically)
Return -1 for invalid inputs to deprioritize similar mutations
go-fuzz handles persistence automatically

Running:

go-fuzz -bin=./mypackage-fuzz.zip -workdir=fuzz

Troubleshooting

Issue	Cause	Solution
Low executions/sec	Harness is too slow (logging, I/O, complexity)	Profile harness, remove bottlenecks, mock I/O
No crashes found	Coverage not reaching buggy code	Check coverage, improve harness to reach more paths
Non-reproducible crashes	Non-determinism or global state	Remove randomness, reset globals between iterations
Fuzzer exits immediately	Harness calls `exit()`	Replace `exit()` with `abort()` or return error
Out of memory errors	Memory leaks in harness or SUT	Free allocations, use leak sanitizer to find leaks
Crashes on empty input	Harness doesn't validate size	Add `if (size < MIN_SIZE) return 0;`
Corpus not growing	Inputs too constrained or format too strict	Use FuzzedDataProvider or structure-aware fuzzing

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Uses `LLVMFuzzerTestOneInput` harness signature with FuzzedDataProvider
aflpp	Supports persistent mode harnesses with `__AFL_LOOP` for performance
cargo-fuzz	Uses Rust-specific `fuzz_target!` macro with arbitrary crate integration
atheris	Python harness takes bytes, calls Python functions
ossfuzz	Requires harnesses in specific directory structure for cloud fuzzing

Skill	Relationship
coverage-analysis	Measure harness effectiveness—are you reaching target code?
address-sanitizer	Detects bugs found by harness (buffer overflows, use-after-free)
fuzzing-dictionary	Provide tokens to help fuzzer pass format checks in harness
fuzzing-obstacles	Patch SUT when it violates harness rules (exit, non-determinism)

Resources

Key External Resources

Split Inputs in libFuzzer - Google Fuzzing Docs Explains techniques for handling multiple input parameters in a single fuzzing harness, including use of magic separators and FuzzedDataProvider.

Structure-Aware Fuzzing with Protocol Buffers Advanced technique using protobuf as intermediate format with custom mutators to ensure fuzzer mutates message contents rather than format encoding.

libFuzzer Documentation Official LLVM documentation covering harness requirements, best practices, and advanced features.

cargo-fuzz Book Comprehensive guide to writing Rust fuzzing harnesses with cargo-fuzz and the arbitrary crate.

Video Resources

Effective File Format Fuzzing - Conference talk on writing harnesses for file format parsers
Modern Fuzzing of C/C++ Projects - Tutorial covering harness design patterns

/libafl

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/libafl/SKILL.md`

name: libafl type: fuzzer description: > LibAFL is a modular fuzzing library for building custom fuzzers. Use for advanced fuzzing needs, custom mutators, or non-standard fuzzing targets.

LibAFL

LibAFL is a modular fuzzing library that implements features from AFL-based fuzzers like AFL++. Unlike traditional fuzzers, LibAFL provides all functionality in a modular and customizable way as a Rust library. It can be used as a drop-in replacement for libFuzzer or as a library to build custom fuzzers from scratch.

When to Use

Fuzzer	Best For	Complexity
libFuzzer	Quick setup, single-threaded	Low
AFL++	Multi-core, general purpose	Medium
LibAFL	Custom fuzzers, advanced features, research	High

Choose LibAFL when:

You need custom mutation strategies or feedback mechanisms
Standard fuzzers don't support your target architecture
You want to implement novel fuzzing techniques
You need fine-grained control over fuzzing components
You're conducting fuzzing research

Quick Start

LibAFL can be used as a drop-in replacement for libFuzzer with minimal setup:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Call your code with fuzzer-provided data
    my_function(data, size);
    return 0;
}

Build LibAFL's libFuzzer compatibility layer:

git clone https://github.com/AFLplusplus/LibAFL
cd LibAFL/libafl_libfuzzer_runtime
./build.sh

Compile and run:

clang++ -DNO_MAIN -g -O2 -fsanitize=fuzzer-no-link libFuzzer.a harness.cc main.cc -o fuzz
./fuzz corpus/

Installation

Prerequisites

Clang/LLVM 15-18
Rust (via rustup)
Additional system dependencies

Linux/macOS

Install Clang:

apt install clang

Or install a specific version via apt.llvm.org:

wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 15

Configure environment for Rust:

export RUSTFLAGS="-C linker=/usr/bin/clang-15"
export CC="clang-15"
export CXX="clang++-15"

Install Rust:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install additional dependencies:

apt install libssl-dev pkg-config

For libFuzzer compatibility mode, install nightly Rust:

rustup toolchain install nightly --component llvm-tools

Verification

Build LibAFL to verify installation:

cd LibAFL/libafl_libfuzzer_runtime
./build.sh
# Should produce libFuzzer.a

Writing a Harness

LibAFL harnesses follow the same pattern as libFuzzer when using drop-in replacement mode:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Your fuzzing target code here
    return 0;
}

When building custom fuzzers with LibAFL as a Rust library, harness logic is integrated directly into the fuzzer. See the "Writing a Custom Fuzzer" section below for the full pattern.

See Also: For detailed harness writing techniques, see the harness-writing technique skill.

Usage Modes

LibAFL supports two primary usage modes:

1. libFuzzer Drop-in Replacement

Use LibAFL as a replacement for libFuzzer with existing harnesses.

Compilation:

clang++ -DNO_MAIN -g -O2 -fsanitize=fuzzer-no-link libFuzzer.a harness.cc main.cc -o fuzz

Running:

./fuzz corpus/

Recommended for long campaigns:

./fuzz -fork=1 -ignore_crashes=1 corpus/

2. Custom Fuzzer as Rust Library

Build a fully customized fuzzer using LibAFL components.

Create project:

cargo init --lib my_fuzzer
cd my_fuzzer
cargo add libafl@0.13 libafl_targets@0.13 libafl_bolts@0.13 libafl_cc@0.13 \
  --features "libafl_targets@0.13/libfuzzer,libafl_targets@0.13/sancov_pcguard_hitcounts"

Configure Cargo.toml:

[lib]
crate-type = ["staticlib"]

Writing a Custom Fuzzer

See Also: For detailed harness writing techniques, patterns for handling complex inputs, and advanced strategies, see the fuzz-harness-writing technique skill.

Fuzzer Components

A LibAFL fuzzer consists of modular components:

Observers - Collect execution feedback (coverage, timing)
Feedback - Determine if inputs are interesting
Objective - Define fuzzing goals (crashes, timeouts)
State - Maintain corpus and metadata
Mutators - Generate new inputs
Scheduler - Select which inputs to mutate
Executor - Run the target with inputs

Basic Fuzzer Structure

use libafl::prelude::*;
use libafl_bolts::prelude::*;
use libafl_targets::{libfuzzer_test_one_input, std_edges_map_observer};

#[no_mangle]
pub extern "C" fn libafl_main() {
    let mut run_client = |state: Option<_>, mut restarting_mgr, _core_id| {
        // 1. Setup observers
        let edges_observer = HitcountsMapObserver::new(
            unsafe { std_edges_map_observer("edges") }
        ).track_indices();
        let time_observer = TimeObserver::new("time");

        // 2. Define feedback
        let mut feedback = feedback_or!(
            MaxMapFeedback::new(&edges_observer),
            TimeFeedback::new(&time_observer)
        );

        // 3. Define objective
        let mut objective = feedback_or_fast!(
            CrashFeedback::new(),
            TimeoutFeedback::new()
        );

        // 4. Create or restore state
        let mut state = state.unwrap_or_else(|| {
            StdState::new(
                StdRand::new(),
                InMemoryCorpus::new(),
                OnDiskCorpus::new(&output_dir).unwrap(),
                &mut feedback,
                &mut objective,
            ).unwrap()
        });

        // 5. Setup mutator
        let mutator = StdScheduledMutator::new(havoc_mutations());
        let mut stages = tuple_list!(StdMutationalStage::new(mutator));

        // 6. Setup scheduler
        let scheduler = IndexesLenTimeMinimizerScheduler::new(
            &edges_observer,
            QueueScheduler::new()
        );

        // 7. Create fuzzer
        let mut fuzzer = StdFuzzer::new(scheduler, feedback, objective);

        // 8. Define harness
        let mut harness = |input: &BytesInput| {
            let buf = input.target_bytes().as_slice();
            libfuzzer_test_one_input(buf);
            ExitKind::Ok
        };

        // 9. Setup executor
        let mut executor = InProcessExecutor::with_timeout(
            &mut harness,
            tuple_list!(edges_observer, time_observer),
            &mut fuzzer,
            &mut state,
            &mut restarting_mgr,
            timeout,
        )?;

        // 10. Load initial inputs
        if state.must_load_initial_inputs() {
            state.load_initial_inputs(
                &mut fuzzer,
                &mut executor,
                &mut restarting_mgr,
                &input_dir
            )?;
        }

        // 11. Start fuzzing
        fuzzer.fuzz_loop(&mut stages, &mut executor, &mut state, &mut restarting_mgr)?;
        Ok(())
    };

    // Launch fuzzer
    Launcher::builder()
        .run_client(&mut run_client)
        .cores(&cores)
        .build()
        .launch()
        .unwrap();
}

Compilation

Verbose Mode

Manually specify all instrumentation flags:

clang++-15 -DNO_MAIN -g -O2 \
  -fsanitize-coverage=trace-pc-guard \
  -fsanitize=address \
  -Wl,--whole-archive target/release/libmy_fuzzer.a -Wl,--no-whole-archive \
  main.cc harness.cc -o fuzz

Compiler Wrapper (Recommended)

Create a LibAFL compiler wrapper to handle instrumentation automatically.

Create src/bin/libafl_cc.rs:

use libafl_cc::{ClangWrapper, CompilerWrapper, Configuration, ToolWrapper};

pub fn main() {
    let args: Vec<String> = env::args().collect();
    let mut cc = ClangWrapper::new();
    cc.cpp(is_cpp)
      .parse_args(&args)
      .link_staticlib(&dir, "my_fuzzer")
      .add_args(&Configuration::GenerateCoverageMap.to_flags().unwrap())
      .add_args(&Configuration::AddressSanitizer.to_flags().unwrap())
      .run()
      .unwrap();
}

Compile and use:

cargo build --release
target/release/libafl_cxx -DNO_MAIN -g -O2 main.cc harness.cc -o fuzz

See Also: For detailed sanitizer configuration, common issues, and advanced flags, see the address-sanitizer and undefined-behavior-sanitizer technique skills.

Running Campaigns

Basic Run

./fuzz --cores 0 --input corpus/

Multi-Core Fuzzing

./fuzz --cores 0,8-15 --input corpus/

This runs 9 clients: one on core 0, and 8 on cores 8-15.

With Options

./fuzz --cores 0-7 --input corpus/ --output crashes/ --timeout 1000

Text User Interface (TUI)

Enable graphical statistics view:

./fuzz -tui=1 corpus/

Interpreting Output

Output	Meaning
`corpus: N`	Number of interesting test cases found
`objectives: N`	Number of crashes/timeouts found
`executions: N`	Total number of target invocations
`exec/sec: N`	Current execution throughput
`edges: X%`	Code coverage percentage
`clients: N`	Number of parallel fuzzing processes

The fuzzer emits two main event types:

UserStats - Regular heartbeat with current statistics
Testcase - New interesting input discovered

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Use `-fork=1 -ignore_crashes=1`	Continue fuzzing after first crash
Use `InMemoryOnDiskCorpus`	Persist corpus across restarts
Enable TUI with `-tui=1`	Better visualization of progress
Use specific LLVM version	Avoid compatibility issues
Set `RUSTFLAGS` correctly	Prevent linking errors

Crash Deduplication

Avoid storing duplicate crashes from the same bug:

Add backtrace observer:

let backtrace_observer = BacktraceObserver::owned(
    "BacktraceObserver",
    libafl::observers::HarnessType::InProcess
);

Update executor:

let mut executor = InProcessExecutor::with_timeout(
    &mut harness,
    tuple_list!(edges_observer, time_observer, backtrace_observer),
    &mut fuzzer,
    &mut state,
    &mut restarting_mgr,
    timeout,
)?;

Update objective with hash feedback:

let mut objective = feedback_and!(
    feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new()),
    NewHashFeedback::new(&backtrace_observer)
);

This ensures only crashes with unique backtraces are saved.

Dictionary Fuzzing

Use dictionaries to guide fuzzing toward specific tokens:

Add tokens from file:

let mut tokens = Tokens::new();
if let Some(tokenfile) = &tokenfile {
    tokens.add_from_file(tokenfile)?;
}
state.add_metadata(tokens);

Update mutator:

let mutator = StdScheduledMutator::new(
    havoc_mutations().merge(tokens_mutations())
);

Hard-coded tokens example (PNG):

state.add_metadata(Tokens::from([
    vec![137, 80, 78, 71, 13, 10, 26, 10], // PNG header
    "IHDR".as_bytes().to_vec(),
    "IDAT".as_bytes().to_vec(),
    "PLTE".as_bytes().to_vec(),
    "IEND".as_bytes().to_vec(),
]));

See Also: For detailed dictionary creation strategies and format-specific dictionaries, see the fuzzing-dictionaries technique skill.

Auto Tokens

Automatically extract magic values and checksums from the program:

Enable in compiler wrapper:

cc.add_pass(LLVMPasses::AutoTokens)

Load auto tokens in fuzzer:

tokens += libafl_targets::autotokens()?;

Verify tokens section:

echo "p (uint8_t *)__token_start" | gdb fuzz

Performance Tuning

Setting	Impact
Multi-core fuzzing	Linear speedup with cores
`InMemoryCorpus`	Faster but non-persistent
`InMemoryOnDiskCorpus`	Balanced speed and persistence
Sanitizers	2-5x slowdown, essential for bugs
Optimization level `-O2`	Balance between speed and coverage

Debugging Fuzzer

Run fuzzer in single-process mode for easier debugging:

// Replace launcher with direct call
run_client(None, SimpleEventManager::new(monitor), 0).unwrap();

// Comment out:
// Launcher::builder()
//     .run_client(&mut run_client)
//     ...
//     .launch()

Then debug with GDB:

gdb --args ./fuzz --cores 0 --input corpus/

Real-World Examples

Example: libpng

Fuzzing libpng using LibAFL:

1. Get source code:

curl -L -O https://downloads.sourceforge.net/project/libpng/libpng16/1.6.37/libpng-1.6.37.tar.xz
tar xf libpng-1.6.37.tar.xz
cd libpng-1.6.37/
apt install zlib1g-dev

2. Set compiler wrapper:

export FUZZER_CARGO_DIR="/path/to/libafl/project"
export CC=$FUZZER_CARGO_DIR/target/release/libafl_cc
export CXX=$FUZZER_CARGO_DIR/target/release/libafl_cxx

3. Build static library:

./configure --enable-shared=no
make

4. Get harness:

curl -O https://raw.githubusercontent.com/glennrp/libpng/f8e5fa92b0e37ab597616f554bee254157998227/contrib/oss-fuzz/libpng_read_fuzzer.cc

5. Link fuzzer:

$CXX libpng_read_fuzzer.cc .libs/libpng16.a -lz -o fuzz

6. Prepare seeds:

mkdir seeds/
curl -o seeds/input.png https://raw.githubusercontent.com/glennrp/libpng/acfd50ae0ba3198ad734e5d4dec2b05341e50924/contrib/pngsuite/iftp1n3p08.png

7. Get dictionary (optional):

curl -O https://raw.githubusercontent.com/glennrp/libpng/2fff013a6935967960a5ae626fc21432807933dd/contrib/oss-fuzz/png.dict

8. Start fuzzing:

./fuzz --input seeds/ --cores 0 -x png.dict

Example: CMake Project

Integrate LibAFL with CMake build system:

CMakeLists.txt:

project(BuggyProgram)
cmake_minimum_required(VERSION 3.0)

add_executable(buggy_program main.cc)

add_executable(fuzz main.cc harness.cc)
target_compile_definitions(fuzz PRIVATE NO_MAIN=1)
target_compile_options(fuzz PRIVATE -g -O2)

Build non-instrumented binary:

cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .
cmake --build . --target buggy_program

Build fuzzer:

export FUZZER_CARGO_DIR="/path/to/libafl/project"
cmake -DCMAKE_C_COMPILER=$FUZZER_CARGO_DIR/target/release/libafl_cc \
      -DCMAKE_CXX_COMPILER=$FUZZER_CARGO_DIR/target/release/libafl_cxx .
cmake --build . --target fuzz

Run fuzzing:

./fuzz --input seeds/ --cores 0

Troubleshooting

Problem	Cause	Solution
No coverage increases	Instrumentation failed	Verify compiler wrapper used, check for `-fsanitize-coverage`
Fuzzer won't start	Empty corpus with no interesting inputs	Provide seed inputs that trigger code paths
Linker errors with `libafl_main`	Runtime not linked	Use `-Wl,--whole-archive` or `-u libafl_main`
LLVM version mismatch	LibAFL requires LLVM 15-18	Install compatible LLVM version, set environment variables
Rust compilation fails	Outdated Rust or Cargo	Update Rust with `rustup update`
Slow fuzzing	Sanitizers enabled	Expected 2-5x slowdown, necessary for finding bugs
Environment variable interference	`CC`, `CXX`, `RUSTFLAGS` set	Unset after building LibAFL project
Cannot attach debugger	Multi-process fuzzing	Run in single-process mode (see Debugging section)

Technique Skills

Skill	Use Case
fuzz-harness-writing	Detailed guidance on writing effective harnesses
address-sanitizer	Memory error detection during fuzzing
undefined-behavior-sanitizer	Undefined behavior detection
coverage-analysis	Measuring and improving code coverage
fuzzing-corpus	Building and managing seed corpora
fuzzing-dictionaries	Creating dictionaries for format-aware fuzzing

Skill	When to Consider
libfuzzer	Simpler setup, don't need LibAFL's advanced features
aflpp	Multi-core fuzzing without custom fuzzer development
cargo-fuzz	Fuzzing Rust projects with less setup

Resources

Official Documentation

LibAFL Book - Official handbook with comprehensive documentation
LibAFL GitHub - Source code and examples
LibAFL API Documentation - Rust API reference

Examples and Tutorials

LibAFL Examples - Collection of example fuzzers
cargo-fuzz with LibAFL - Using LibAFL as cargo-fuzz backend
Testing Handbook LibAFL Examples - Complete working examples from this handbook

/libfuzzer

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/libfuzzer/SKILL.md`

name: libfuzzer type: fuzzer description: > Coverage-guided fuzzer built into LLVM for C/C++ projects. Use for fuzzing C/C++ code that can be compiled with Clang.

libFuzzer

libFuzzer is an in-process, coverage-guided fuzzer that is part of the LLVM project. It's the recommended starting point for fuzzing C/C++ projects due to its simplicity and integration with the LLVM toolchain. While libFuzzer has been in maintenance-only mode since late 2022, it is easier to install and use than its alternatives, has wide support, and will be maintained for the foreseeable future.

When to Use

Fuzzer	Best For	Complexity
libFuzzer	Quick setup, single-project fuzzing	Low
AFL++	Multi-core fuzzing, diverse mutations	Medium
LibAFL	Custom fuzzers, research projects	High
Honggfuzz	Hardware-based coverage	Medium

Choose libFuzzer when:

You need a simple, quick setup for C/C++ code
Project uses Clang for compilation
Single-core fuzzing is sufficient initially
Transitioning to AFL++ later is an option (harnesses are compatible)

Note: Fuzzing harnesses written for libFuzzer are compatible with AFL++, making it easy to transition if you need more advanced features like better multi-core support.

Quick Start

#include <stdint.h>
#include <stddef.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Validate input if needed
    if (size < 1) return 0;

    // Call your target function with fuzzer-provided data
    my_target_function(data, size);

    return 0;
}

Compile and run:

clang++ -fsanitize=fuzzer,address -g -O2 harness.cc target.cc -o fuzz
mkdir corpus/
./fuzz corpus/

Installation

Prerequisites

LLVM/Clang compiler (includes libFuzzer)
LLVM tools for coverage analysis (optional)

Linux (Ubuntu/Debian)

apt install clang llvm

For the latest LLVM version:

# Add LLVM repository from apt.llvm.org
# Then install specific version, e.g.:
apt install clang-18 llvm-18

macOS

# Using Homebrew
brew install llvm

# Or using Nix
nix-env -i clang

Windows

Install Clang through Visual Studio. Refer to Microsoft's documentation for setup instructions.

Recommendation: If possible, fuzz on a local x86_64 VM or rent one on DigitalOcean, AWS, or Hetzner. Linux provides the best support for libFuzzer.

Verification

clang++ --version
# Should show LLVM version information

Writing a Harness

Harness Structure

The harness is the entry point for the fuzzer. libFuzzer calls the LLVMFuzzerTestOneInput function repeatedly with different inputs.

#include <stdint.h>
#include <stddef.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 1. Optional: Validate input size
    if (size < MIN_REQUIRED_SIZE) {
        return 0;  // Reject inputs that are too small
    }

    // 2. Optional: Convert raw bytes to structured data
    // Example: Parse two integers from byte array
    if (size >= 2 * sizeof(uint32_t)) {
        uint32_t a = *(uint32_t*)(data);
        uint32_t b = *(uint32_t*)(data + sizeof(uint32_t));
        my_function(a, b);
    }

    // 3. Call target function
    target_function(data, size);

    // 4. Always return 0 (non-zero reserved for future use)
    return 0;
}

Harness Rules

Do	Don't
Handle all input types (empty, huge, malformed)	Call `exit()` - stops fuzzing process
Join all threads before returning	Leave threads running
Keep harness fast and simple	Add excessive logging or complexity
Maintain determinism	Use random number generators or read `/dev/random`
Reset global state between runs	Rely on state from previous executions
Use narrow, focused targets	Mix unrelated data formats (PNG + TCP) in one harness

Rationale:

Speed matters: Aim for 100s-1000s executions per second per core
Reproducibility: Crashes must be reproducible after fuzzing completes
Isolation: Each execution should be independent

Using FuzzedDataProvider for Complex Inputs

For complex inputs (strings, multiple parameters), use the FuzzedDataProvider helper:

#include <stdint.h>
#include <stddef.h>
#include "FuzzedDataProvider.h"  // From LLVM project

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    // Extract structured data
    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    // Call target with extracted data
    char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    if (result != NULL) {
        free(result);
    }

    return 0;
}

Download FuzzedDataProvider.h from the LLVM repository.

Interleaved Fuzzing

Use a single harness to test multiple related functions:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 1 + 2 * sizeof(int32_t)) {
        return 0;
    }

    uint8_t mode = data[0];
    int32_t numbers[2];
    memcpy(numbers, data + 1, 2 * sizeof(int32_t));

    // Select function based on first byte
    switch (mode % 4) {
        case 0: add(numbers[0], numbers[1]); break;
        case 1: subtract(numbers[0], numbers[1]); break;
        case 2: multiply(numbers[0], numbers[1]); break;
        case 3: divide(numbers[0], numbers[1]); break;
    }

    return 0;
}

See Also: For detailed harness writing techniques, patterns for handling complex inputs, structure-aware fuzzing, and protobuf-based fuzzing, see the fuzz-harness-writing technique skill.

Compilation

Basic Compilation

The key flag is -fsanitize=fuzzer, which:

Links the libFuzzer runtime (provides main function)
Enables SanitizerCoverage instrumentation for coverage tracking
Disables built-in functions like memcmp

clang++ -fsanitize=fuzzer -g -O2 harness.cc target.cc -o fuzz

Flags explained:

-fsanitize=fuzzer: Enable libFuzzer
-g: Add debug symbols (helpful for crash analysis)
-O2: Production-level optimizations (recommended for fuzzing)
-DNO_MAIN: Define macro if your code has a main function

With Sanitizers

AddressSanitizer (recommended):

clang++ -fsanitize=fuzzer,address -g -O2 -U_FORTIFY_SOURCE harness.cc target.cc -o fuzz

Multiple sanitizers:

clang++ -fsanitize=fuzzer,address,undefined -g -O2 harness.cc target.cc -o fuzz

See Also: For detailed sanitizer configuration, common issues, ASAN_OPTIONS flags, and advanced sanitizer usage, see the address-sanitizer and undefined-behavior-sanitizer technique skills.

Build Flags

Flag	Purpose
`-fsanitize=fuzzer`	Enable libFuzzer runtime and instrumentation
`-fsanitize=address`	Enable AddressSanitizer (memory error detection)
`-fsanitize=undefined`	Enable UndefinedBehaviorSanitizer
`-fsanitize=fuzzer-no-link`	Instrument without linking fuzzer (for libraries)
`-g`	Include debug symbols
`-O2`	Production optimization level
`-U_FORTIFY_SOURCE`	Disable fortification (can interfere with ASan)

Building Static Libraries

For projects that produce static libraries:

Build the library with fuzzing instrumentation:

export CC=clang CFLAGS="-fsanitize=fuzzer-no-link -fsanitize=address"
export CXX=clang++ CXXFLAGS="$CFLAGS"
./configure --enable-shared=no
make

Link the static library with your harness:

clang++ -fsanitize=fuzzer -fsanitize=address harness.cc libmylib.a -o fuzz

CMake Integration

project(FuzzTarget)
cmake_minimum_required(VERSION 3.0)

add_executable(fuzz main.cc harness.cc)
target_compile_definitions(fuzz PRIVATE NO_MAIN=1)
target_compile_options(fuzz PRIVATE -g -O2 -fsanitize=fuzzer -fsanitize=address)
target_link_libraries(fuzz -fsanitize=fuzzer -fsanitize=address)

Build with:

cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .
cmake --build .

Corpus Management

Creating Initial Corpus

Create a directory for the corpus (can start empty):

mkdir corpus/

Optional but recommended: Provide seed inputs (valid example files):

# For a PNG parser:
cp examples/*.png corpus/

# For a protocol parser:
cp test_packets/*.bin corpus/

Benefits of seed inputs:

Fuzzer doesn't start from scratch
Reaches valid code paths faster
Significantly improves effectiveness

Corpus Structure

The corpus directory contains:

Input files that trigger unique code paths
Minimized versions (libFuzzer automatically minimizes)
Named by content hash (e.g., a9993e364706816aba3e25717850c26c9cd0d89d)

Corpus Minimization

libFuzzer automatically minimizes corpus entries during fuzzing. To explicitly minimize:

mkdir minimized_corpus/
./fuzz -merge=1 minimized_corpus/ corpus/

This creates a deduplicated, minimized corpus in minimized_corpus/.

See Also: For corpus creation strategies, seed selection, format-specific corpus building, and corpus maintenance, see the fuzzing-corpus technique skill.

Running Campaigns

Basic Run

./fuzz corpus/

This runs until a crash is found or you stop it (Ctrl+C).

Recommended: Continue After Crashes

./fuzz -fork=1 -ignore_crashes=1 corpus/

The -fork and -ignore_crashes flags (experimental but widely used) allow fuzzing to continue after finding crashes.

Common Options

Control input size:

./fuzz -max_len=4000 corpus/

Rule of thumb: 2x the size of minimal realistic input.

Set timeout:

./fuzz -timeout=2 corpus/

Abort test cases that run longer than 2 seconds.

Use a dictionary:

./fuzz -dict=./format.dict corpus/

Close stdout/stderr (speed up fuzzing):

./fuzz -close_fd_mask=3 corpus/

See all options:

./fuzz -help=1

Multi-Core Fuzzing

Option 1: Jobs and workers (recommended):

./fuzz -jobs=4 -workers=4 -fork=1 -ignore_crashes=1 corpus/

-jobs=4: Run 4 sequential campaigns
-workers=4: Process jobs in parallel with 4 processes
Test cases are shared between jobs

Option 2: Fork mode:

./fuzz -fork=4 -ignore_crashes=1 corpus/

Note: For serious multi-core fuzzing, consider switching to AFL++, Honggfuzz, or LibAFL.

Re-executing Test Cases

Re-run a single crash:

./fuzz ./crash-a9993e364706816aba3e25717850c26c9cd0d89d

Test all inputs in a directory without fuzzing:

./fuzz -runs=0 corpus/

Interpreting Output

When fuzzing runs, you'll see statistics like:

INFO: Seed: 3517090860
INFO: Loaded 1 modules (9 inline 8-bit counters)
#2      INITED cov: 3 ft: 4 corp: 1/1b exec/s: 0 rss: 26Mb
#57     NEW    cov: 4 ft: 5 corp: 2/4b lim: 4 exec/s: 0 rss: 26Mb

Output	Meaning
`INITED`	Fuzzing initialized
`NEW`	New coverage found, added to corpus
`REDUCE`	Input minimized while keeping coverage
`cov: N`	Number of coverage edges hit
`corp: X/Yb`	Corpus size: X entries, Y total bytes
`exec/s: N`	Executions per second
`rss: NMb`	Resident memory usage

On crash:

==11672== ERROR: libFuzzer: deadly signal
artifact_prefix='./'; Test unit written to ./crash-a9993e364706816aba3e25717850c26c9cd0d89d
0x61,0x62,0x63,
abc
Base64: YWJj

The crash is saved to ./crash-<hash> with the input shown in hex, UTF-8, and Base64.

Reproducibility: Use -seed=<value> to reproduce a fuzzing campaign (single-core only).

Fuzzing Dictionary

Dictionaries help the fuzzer discover interesting inputs faster by providing hints about the input format.

Dictionary Format

Create a text file with quoted strings (one per line):

# Lines starting with '#' are comments

# Magic bytes
magic="\x89PNG"
magic2="IEND"

# Keywords
"GET"
"POST"
"Content-Type"

# Hex sequences
delimiter="\xFF\xD8\xFF"

Using a Dictionary

./fuzz -dict=./format.dict corpus/

Generating a Dictionary

From header files:

grep -o '".*"' header.h > header.dict

From man pages:

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

From binary strings:

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Using LLMs: Ask ChatGPT or similar to generate a dictionary for your format (e.g., "Generate a libFuzzer dictionary for a JSON parser").

See Also: For advanced dictionary generation, format-specific dictionaries, and dictionary optimization strategies, see the fuzzing-dictionaries technique skill.

Coverage Analysis

While libFuzzer shows basic coverage stats (cov: N), detailed coverage analysis requires additional tools.

Source-Based Coverage

1. Recompile with coverage instrumentation:

clang++ -fsanitize=fuzzer -fprofile-instr-generate -fcoverage-mapping harness.cc target.cc -o fuzz

2. Run fuzzer to collect coverage:

LLVM_PROFILE_FILE="coverage-%p.profraw" ./fuzz -runs=10000 corpus/

3. Merge coverage data:

llvm-profdata merge -sparse coverage-*.profraw -o coverage.profdata

4. Generate coverage report:

llvm-cov show ./fuzz -instr-profile=coverage.profdata

5. Generate HTML report:

llvm-cov show ./fuzz -instr-profile=coverage.profdata -format=html > coverage.html

Improving Coverage

Tips:

Provide better seed inputs in corpus
Use dictionaries for format-aware fuzzing
Check if harness properly exercises target
Consider structure-aware fuzzing for complex formats
Run longer campaigns (days/weeks)

See Also: For detailed coverage analysis techniques, identifying coverage gaps, systematic coverage improvement, and comparing coverage across fuzzers, see the coverage-analysis technique skill.

Sanitizer Integration

AddressSanitizer (ASan)

ASan detects memory errors like buffer overflows and use-after-free bugs. Highly recommended for fuzzing.

Enable ASan:

clang++ -fsanitize=fuzzer,address -g -O2 -U_FORTIFY_SOURCE harness.cc target.cc -o fuzz

Example ASan output:

==1276163==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000c4ab1
WRITE of size 1 at 0x6020000c4ab1 thread T0
    #0 0x55555568631a in check_buf(char*, unsigned long) main.cc:13:25
    #1 0x5555556860bf in LLVMFuzzerTestOneInput harness.cc:7:3

Configure ASan with environment variables:

ASAN_OPTIONS=verbosity=1:abort_on_error=1 ./fuzz corpus/

Important flags:

verbosity=1: Show ASan is active
detect_leaks=0: Disable leak detection (leaks reported at end)
abort_on_error=1: Call abort() instead of _exit() on errors

Drawbacks:

2-4x slowdown
Requires ~20TB virtual memory (disable memory limits: -rss_limit_mb=0)
Best supported on Linux

See Also: For comprehensive ASan configuration, common pitfalls, symbolization, and combining with other sanitizers, see the address-sanitizer technique skill.

UndefinedBehaviorSanitizer (UBSan)

UBSan detects undefined behavior like integer overflow, null pointer dereference, etc.

Enable UBSan:

clang++ -fsanitize=fuzzer,undefined -g -O2 harness.cc target.cc -o fuzz

Combine with ASan:

clang++ -fsanitize=fuzzer,address,undefined -g -O2 harness.cc target.cc -o fuzz

MemorySanitizer (MSan)

MSan detects uninitialized memory reads. More complex to use (requires rebuilding all dependencies).

clang++ -fsanitize=fuzzer,memory -g -O2 harness.cc target.cc -o fuzz

Common Sanitizer Issues

Issue	Solution
ASan slows fuzzing too much	Use `-fsanitize-recover=address` for non-fatal errors
Out of memory	Set `ASAN_OPTIONS=rss_limit_mb=0` or `-rss_limit_mb=0`
Stack exhaustion	Increase stack size: `ASAN_OPTIONS=stack_size=8388608`
False positives with `_FORTIFY_SOURCE`	Use `-U_FORTIFY_SOURCE` flag
MSan reports in dependencies	Rebuild all dependencies with `-fsanitize=memory`

Real-World Examples

Example 1: Fuzzing libpng

libpng is a widely-used library for reading/writing PNG images. Bugs can lead to security issues.

1. Get source code:

curl -L -O https://downloads.sourceforge.net/project/libpng/libpng16/1.6.37/libpng-1.6.37.tar.xz
tar xf libpng-1.6.37.tar.xz
cd libpng-1.6.37/

2. Install dependencies:

apt install zlib1g-dev

3. Compile with fuzzing instrumentation:

export CC=clang CFLAGS="-fsanitize=fuzzer-no-link -fsanitize=address"
export CXX=clang++ CXXFLAGS="$CFLAGS"
./configure --enable-shared=no
make

4. Get a harness (or write your own):

curl -O https://raw.githubusercontent.com/glennrp/libpng/f8e5fa92b0e37ab597616f554bee254157998227/contrib/oss-fuzz/libpng_read_fuzzer.cc

5. Prepare corpus and dictionary:

mkdir corpus/
curl -o corpus/input.png https://raw.githubusercontent.com/glennrp/libpng/acfd50ae0ba3198ad734e5d4dec2b05341e50924/contrib/pngsuite/iftp1n3p08.png
curl -O https://raw.githubusercontent.com/glennrp/libpng/2fff013a6935967960a5ae626fc21432807933dd/contrib/oss-fuzz/png.dict

6. Link and compile fuzzer:

clang++ -fsanitize=fuzzer -fsanitize=address libpng_read_fuzzer.cc .libs/libpng16.a -lz -o fuzz

7. Run fuzzing campaign:

./fuzz -close_fd_mask=3 -dict=./png.dict corpus/

Example 2: Simple Division Bug

Harness that finds a division-by-zero bug:

#include <stdint.h>
#include <stddef.h>

double divide(uint32_t numerator, uint32_t denominator) {
    // Bug: No check if denominator is zero
    return numerator / denominator;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if(size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);

    return 0;
}

Compile and fuzz:

clang++ -fsanitize=fuzzer harness.cc -o fuzz
./fuzz

The fuzzer will quickly find inputs causing a crash.

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Start with single-core, switch to AFL++ for multi-core	libFuzzer harnesses work with AFL++
Use dictionaries for structured formats	10-100x faster bug discovery
Close file descriptors with `-close_fd_mask=3`	Speed boost if SUT writes output
Set reasonable `-max_len`	Prevents wasted time on huge inputs
Run for days/weeks, not minutes	Coverage plateaus take time to break
Use seed corpus from test suites	Starts fuzzing from valid inputs

Structure-Aware Fuzzing

For highly structured inputs (e.g., complex protocols, file formats), use libprotobuf-mutator:

Define input structure using Protocol Buffers
libFuzzer mutates protobuf messages (structure-preserving mutations)
Harness converts protobuf to native format

See structure-aware fuzzing documentation for details.

Custom Mutators

libFuzzer allows custom mutators for specialized fuzzing:

extern "C" size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size,
                                          size_t MaxSize, unsigned int Seed) {
    // Custom mutation logic
    return new_size;
}

extern "C" size_t LLVMFuzzerCustomCrossOver(const uint8_t *Data1, size_t Size1,
                                            const uint8_t *Data2, size_t Size2,
                                            uint8_t *Out, size_t MaxOutSize,
                                            unsigned int Seed) {
    // Custom crossover logic
    return new_size;
}

Performance Tuning

Setting	Impact
`-close_fd_mask=3`	Closes stdout/stderr, speeds up fuzzing
`-max_len=<reasonable_size>`	Avoids wasting time on huge inputs
`-timeout=<seconds>`	Detects hangs, prevents stuck executions
Disable ASan for baseline	2-4x speed boost (but misses memory bugs)
Use `-jobs` and `-workers`	Limited multi-core support
Run on Linux	Best platform support and performance

Troubleshooting

Problem	Cause	Solution
No crashes found after hours	Poor corpus, low coverage	Add seed inputs, use dictionary, check harness
Very slow executions/sec (<100)	Target too complex, excessive logging	Optimize target, use `-close_fd_mask=3`, reduce logging
Out of memory	ASan's 20TB virtual memory	Set `-rss_limit_mb=0` to disable RSS limit
Fuzzer stops after first crash	Default behavior	Use `-fork=1 -ignore_crashes=1` to continue
Can't reproduce crash	Non-determinism in harness/target	Remove random number generation, global state
Linking errors with `-fsanitize=fuzzer`	Missing libFuzzer runtime	Ensure using Clang, check LLVM installation
GCC project won't compile with Clang	GCC-specific code	Switch to AFL++ with `gcc_plugin` instead
Coverage not improving	Corpus plateau	Run longer, add dictionary, improve seeds, check coverage report
Crashes but ASan doesn't trigger	Memory error not detected without ASan	Recompile with `-fsanitize=address`

Technique Skills

Skill	Use Case
fuzz-harness-writing	Detailed guidance on writing effective harnesses, structure-aware fuzzing, and FuzzedDataProvider usage
address-sanitizer	Memory error detection configuration, ASAN_OPTIONS, and troubleshooting
undefined-behavior-sanitizer	Detecting undefined behavior during fuzzing
coverage-analysis	Measuring fuzzing effectiveness and identifying untested code paths
fuzzing-corpus	Building and managing seed corpora, corpus minimization strategies
fuzzing-dictionaries	Creating format-specific dictionaries for faster bug discovery

Skill	When to Consider
aflpp	When you need serious multi-core fuzzing, or when libFuzzer coverage plateaus
honggfuzz	When you want hardware-based coverage feedback on Linux
libafl	When building custom fuzzers or conducting fuzzing research

Resources

Official Documentation

LLVM libFuzzer Documentation - Official reference
libFuzzer Tutorial by Google - Step-by-step guide
SanitizerCoverage - Coverage instrumentation details

Advanced Topics

Example Projects

OSS-Fuzz - Continuous fuzzing for open-source projects (many libFuzzer examples)
AFL++ Dictionary Collection - Reusable dictionaries

/ossfuzz

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/ossfuzz/SKILL.md`

name: ossfuzz type: technique description: > OSS-Fuzz provides free continuous fuzzing for open source projects. Use when setting up continuous fuzzing infrastructure or enrolling projects.

OSS-Fuzz

OSS-Fuzz is an open-source project developed by Google that provides free distributed infrastructure for continuous fuzz testing. It streamlines the fuzzing process and facilitates simpler modifications. While only select projects are accepted into OSS-Fuzz, the project's core is open-source, allowing anyone to host their own instance for private projects.

Overview

OSS-Fuzz provides a simple CLI framework for building and starting harnesses or calculating their coverage. Additionally, OSS-Fuzz can be used as a service that hosts static web pages generated from fuzzing outputs such as coverage information.

Key Concepts

Concept	Description
helper.py	CLI script for building images, building fuzzers, and running harnesses locally
Base Images	Hierarchical Docker images providing build dependencies and compilers
project.yaml	Configuration file defining project metadata for OSS-Fuzz enrollment
Dockerfile	Project-specific image with build dependencies
build.sh	Script that builds fuzzing harnesses for your project
Criticality Score	Metric used by OSS-Fuzz team to evaluate project acceptance

When to Apply

Apply this technique when:

Setting up continuous fuzzing for an open-source project
Need distributed fuzzing infrastructure without managing servers
Want coverage reports and bug tracking integrated with fuzzing
Testing existing OSS-Fuzz harnesses locally
Reproducing crashes from OSS-Fuzz bug reports

Skip this technique when:

Project is closed-source (unless hosting your own OSS-Fuzz instance)
Project doesn't meet OSS-Fuzz's criticality score threshold
Need proprietary or specialized fuzzing infrastructure
Fuzzing simple scripts that don't warrant infrastructure

Quick Reference

Task	Command
Clone OSS-Fuzz	`git clone https://github.com/google/oss-fuzz`
Build project image	`python3 infra/helper.py build_image --pull <project>`
Build fuzzers with ASan	`python3 infra/helper.py build_fuzzers --sanitizer=address <project>`
Run specific harness	`python3 infra/helper.py run_fuzzer <project> <harness>`
Generate coverage report	`python3 infra/helper.py coverage <project>`
Check helper.py options	`python3 infra/helper.py --help`

OSS-Fuzz Project Components

OSS-Fuzz provides several publicly available tools and web interfaces:

Bug Tracker

The bug tracker allows you to:

Check bugs from specific projects (initially visible only to maintainers, later made public)
Create new issues and comment on existing ones
Search for similar bugs across all projects to understand issues

Build Status System

The build status system helps track:

Build statuses of all included projects
Date of last successful build
Build failures and their duration

Fuzz Introspector

Fuzz Introspector displays:

Coverage data for projects enrolled in OSS-Fuzz
Hit frequency for covered code
Performance analysis and blocker identification

Read this case study for examples and explanations.

Step-by-Step: Running a Single Harness

You don't need to host the whole OSS-Fuzz platform to use it. The helper script makes it easy to run individual harnesses locally.

Step 1: Clone OSS-Fuzz

git clone https://github.com/google/oss-fuzz
cd oss-fuzz
python3 infra/helper.py --help

Step 2: Build Project Image

python3 infra/helper.py build_image --pull <project-name>

This downloads and builds the base Docker image for the project.

Step 3: Build Fuzzers with Sanitizers

python3 infra/helper.py build_fuzzers --sanitizer=address <project-name>

Sanitizer options:

--sanitizer=address for AddressSanitizer with LeakSanitizer
Other sanitizers available (language support varies)

Note: Fuzzers are built to /build/out/<project-name>/ containing the harness executables, dictionaries, corpus, and crash files.

Step 4: Run the Fuzzer

python3 infra/helper.py run_fuzzer <project-name> <harness-name> [<fuzzer-args>]

The helper script automatically runs any missed steps if you skip them.

Step 5: Coverage Analysis (Optional)

First, install gsutil (skip gcloud initialization).

python3 infra/helper.py build_fuzzers --sanitizer=coverage <project-name>
python3 infra/helper.py coverage <project-name>

Use --no-corpus-download to use only local corpus. The command generates and hosts a coverage report locally.

See official OSS-Fuzz documentation for details.

Common Patterns

Pattern: Running irssi Example

Use Case: Testing OSS-Fuzz setup with a simple enrolled project

# Clone and navigate to OSS-Fuzz
git clone https://github.com/google/oss-fuzz
cd oss-fuzz

# Build and run irssi fuzzer
python3 infra/helper.py build_image --pull irssi
python3 infra/helper.py build_fuzzers --sanitizer=address irssi
python3 infra/helper.py run_fuzzer irssi irssi-fuzz

Expected Output:

INFO:__main__:Running: docker run --rm --privileged --shm-size=2g --platform linux/amd64 -i -e FUZZING_ENGINE=libfuzzer -e SANITIZER=address -e RUN_FUZZER_MODE=interactive -e HELPER=True -v /private/tmp/oss-fuzz/build/out/irssi:/out -t gcr.io/oss-fuzz-base/base-runner run_fuzzer irssi-fuzz.
Using seed corpus: irssi-fuzz_seed_corpus.zip
/out/irssi-fuzz -rss_limit_mb=2560 -timeout=25 /tmp/irssi-fuzz_corpus -max_len=2048 < /dev/null
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1531341664
INFO: Loaded 1 modules   (95687 inline 8-bit counters): 95687 [0x1096c80, 0x10ae247),
INFO: Loaded 1 PC tables (95687 PCs): 95687 [0x10ae248,0x1223eb8),
INFO:      719 files found in /tmp/irssi-fuzz_corpus
INFO: seed corpus: files: 719 min: 1b max: 170106b total: 367969b rss: 48Mb
#720        INITED cov: 409 ft: 1738 corp: 640/163Kb exec/s: 0 rss: 62Mb
#762        REDUCE cov: 409 ft: 1738 corp: 640/163Kb lim: 2048 exec/s: 0 rss: 63Mb L: 236/2048 MS: 2 ShuffleBytes-EraseBytes-

Pattern: Enrolling a New Project

Use Case: Adding your project to OSS-Fuzz (or private instance)

Create three files in projects/<your-project>/:

1. project.yaml - Project metadata:

homepage: "https://github.com/yourorg/yourproject"
language: c++
primary_contact: "your-email@example.com"
main_repo: "https://github.com/yourorg/yourproject"
fuzzing_engines:
  - libfuzzer
sanitizers:
  - address
  - undefined

2. Dockerfile - Build dependencies:

FROM gcr.io/oss-fuzz-base/base-builder
RUN apt-get update && apt-get install -y \
    autoconf \
    automake \
    libtool \
    pkg-config
RUN git clone --depth 1 https://github.com/yourorg/yourproject
WORKDIR yourproject
COPY build.sh $SRC/

3. build.sh - Build harnesses:

#!/bin/bash -eu
./autogen.sh
./configure --disable-shared
make -j$(nproc)

# Build harnesses
$CXX $CXXFLAGS -std=c++11 -I. \
    $SRC/yourproject/fuzz/harness.cc -o $OUT/harness \
    $LIB_FUZZING_ENGINE ./libyourproject.a

# Copy corpus and dictionary if available
cp $SRC/yourproject/fuzz/corpus.zip $OUT/harness_seed_corpus.zip
cp $SRC/yourproject/fuzz/dictionary.dict $OUT/harness.dict

Docker Images in OSS-Fuzz

Harnesses are built and executed in Docker containers. All projects share a runner image, but each project has its own build image.

Image Hierarchy

Images build on each other in this sequence:

base_image - Specific Ubuntu version
base_clang - Clang compiler; based on base_image
base_builder - Build dependencies; based on base_clang
- Language-specific variants: base_builder_go, etc.
- See /oss-fuzz/infra/base-images/ for full list
Your project Docker image - Project-specific dependencies; based on base_builder or language variant

Runner Images (Used Separately)

base_runner - Executes harnesses; based on base_clang
base_runner_debug - With debug tools; based on base_runner

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Don't manually copy source code	Project Dockerfile likely already pulls latest version
Check existing projects	Browse oss-fuzz/projects for examples
Keep harnesses in separate repo	Like curl-fuzzer - cleaner organization
Use specific compiler versions	Base images provide consistent build environment
Install dependencies in Dockerfile	May require approval for OSS-Fuzz enrollment

Criticality Score

OSS-Fuzz uses a criticality score to evaluate project acceptance. See this example for how scoring works.

Projects with lower scores may still be added to private OSS-Fuzz instances.

Hosting Your Own Instance

Since OSS-Fuzz is open-source, you can host your own instance for:

Private projects not eligible for public OSS-Fuzz
Projects with lower criticality scores
Custom fuzzing infrastructure needs

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Manually pulling source in build.sh	Doesn't use latest version	Let Dockerfile handle git clone
Copying code to OSS-Fuzz repo	Hard to maintain, violates separation	Reference external harness repo
Ignoring base image versions	Build inconsistencies	Use provided base images and compilers
Skipping local testing	Wastes CI resources	Use helper.py locally before PR
Not checking build status	Unnoticed build failures	Monitor build status page regularly

Tool-Specific Guidance

libFuzzer

OSS-Fuzz primarily uses libFuzzer as the fuzzing engine for C/C++ projects.

Harness signature:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Your fuzzing logic
    return 0;
}

Build in build.sh:

$CXX $CXXFLAGS -std=c++11 -I. \
    harness.cc -o $OUT/harness \
    $LIB_FUZZING_ENGINE ./libproject.a

Integration tips:

Use $LIB_FUZZING_ENGINE variable provided by OSS-Fuzz
Include -fsanitize=fuzzer is handled automatically
Link against static libraries when possible

AFL++

OSS-Fuzz supports AFL++ as an alternative fuzzing engine.

Enable in project.yaml:

fuzzing_engines:
  - afl
  - libfuzzer

Integration tips:

AFL++ harnesses work alongside libFuzzer harnesses
Use persistent mode for better performance
OSS-Fuzz handles engine-specific compilation flags

Atheris (Python)

For Python projects with C extensions.

Example from cbor2 integration:

Harness:

import atheris
import sys
import cbor2

@atheris.instrument_func
def TestOneInput(data):
    fdp = atheris.FuzzedDataProvider(data)
    try:
        cbor2.loads(data)
    except (cbor2.CBORDecodeError, ValueError):
        pass

def main():
    atheris.Setup(sys.argv, TestOneInput)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Build in build.sh:

pip3 install .
for fuzzer in $(find $SRC -name 'fuzz_*.py'); do
  compile_python_fuzzer $fuzzer
done

Integration tips:

Use compile_python_fuzzer helper provided by OSS-Fuzz
See Continuously Fuzzing Python C Extensions blog post

Rust Projects

Enable in project.yaml:

language: rust
fuzzing_engines:
  - libfuzzer
sanitizers:
  - address  # Only AddressSanitizer supported for Rust

Build in build.sh:

cargo fuzz build -O --debug-assertions
cp fuzz/target/x86_64-unknown-linux-gnu/release/fuzz_target_1 $OUT/

Integration tips:

Rust supports only AddressSanitizer with libfuzzer
Use cargo-fuzz for local development
OSS-Fuzz handles Rust-specific compilation

Troubleshooting

Issue	Cause	Solution
Build fails with missing dependencies	Dependencies not in Dockerfile	Add `apt-get install` or equivalent in Dockerfile
Harness crashes immediately	Missing input validation	Add size checks in harness
Coverage is 0%	Harness not reaching target code	Verify harness actually calls target functions
Build timeout	Complex build process	Optimize build.sh, consider parallel builds
Sanitizer errors in build	Incompatible flags	Use flags provided by OSS-Fuzz environment variables
Cannot find source code	Wrong working directory in Dockerfile	Set WORKDIR or use absolute paths

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Primary fuzzing engine used by OSS-Fuzz
aflpp	Alternative fuzzing engine supported by OSS-Fuzz
atheris	Used for fuzzing Python projects in OSS-Fuzz
cargo-fuzz	Used for Rust projects in OSS-Fuzz

Skill	Relationship
coverage-analysis	OSS-Fuzz generates coverage reports via helper.py
address-sanitizer	Default sanitizer for OSS-Fuzz projects
fuzz-harness-writing	Essential for enrolling projects in OSS-Fuzz
corpus-management	OSS-Fuzz maintains corpus for enrolled projects

Resources

Key External Resources

OSS-Fuzz Official Documentation Comprehensive documentation covering enrollment, harness writing, and troubleshooting for the OSS-Fuzz platform.

Getting Started Guide Step-by-step process for enrolling new projects into OSS-Fuzz, including requirements and approval process.

cbor2 OSS-Fuzz Integration PR Real-world example of enrolling a Python project with C extensions into OSS-Fuzz. Shows:

Initial proposal and project introduction
Criticality score evaluation
Complete implementation (project.yaml, Dockerfile, build.sh, harnesses)

Fuzz Introspector Case Studies Examples and explanations of using Fuzz Introspector to analyze coverage and identify fuzzing blockers.

Video Resources

Check OSS-Fuzz documentation for workshop recordings and tutorials on enrollment and harness development.

/ruzzy

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/ruzzy/SKILL.md`

name: ruzzy type: fuzzer description: > Ruzzy is a coverage-guided Ruby fuzzer by Trail of Bits. Use for fuzzing pure Ruby code and Ruby C extensions.

Ruzzy

Ruzzy is a coverage-guided fuzzer for Ruby built on libFuzzer. It enables fuzzing both pure Ruby code and Ruby C extensions with sanitizer support for detecting memory corruption and undefined behavior.

When to Use

Ruzzy is currently the only production-ready coverage-guided fuzzer for Ruby.

Choose Ruzzy when:

Fuzzing Ruby applications or libraries
Testing Ruby C extensions for memory safety issues
You need coverage-guided fuzzing for Ruby code
Working with Ruby gems that have native extensions

Quick Start

Set up environment:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"

Test with the included toy example:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby -e 'require "ruzzy"; Ruzzy.dummy'

This should quickly find a crash demonstrating that Ruzzy is working correctly.

Installation

Platform Support

Ruzzy supports Linux x86-64 and AArch64/ARM64. For macOS or Windows, use the Dockerfile or development environment.

Prerequisites

Linux x86-64 or AArch64/ARM64
Recent version of clang (tested back to 14.0.0, latest release recommended)
Ruby with gem installed

Installation Command

Install Ruzzy with clang compiler flags:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
    gem install ruzzy

Environment variables explained:

MAKE: Overrides make to respect subsequent environment variables
CC, CXX, LDSHARED, LDSHAREDXX: Ensure proper clang binaries are used for latest features

Troubleshooting Installation

If installation fails, enable debug output:

RUZZY_DEBUG=1 gem install --verbose ruzzy

Verification

Verify installation by running the toy example (see Quick Start section).

Writing a Harness

Fuzzing Pure Ruby Code

Pure Ruby fuzzing requires two scripts due to Ruby interpreter implementation details.

Tracer script (test_tracer.rb):

# frozen_string_literal: true

require 'ruzzy'

Ruzzy.trace('test_harness.rb')

Harness script (test_harness.rb):

# frozen_string_literal: true

require 'ruzzy'

def fuzzing_target(input)
  # Your code to fuzz here
  if input.length == 4
    if input[0] == 'F'
      if input[1] == 'U'
        if input[2] == 'Z'
          if input[3] == 'Z'
            raise
          end
        end
      end
    end
  end
end

test_one_input = lambda do |data|
  fuzzing_target(data)
  return 0
end

Ruzzy.fuzz(test_one_input)

Run with:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby test_tracer.rb

Fuzzing Ruby C Extensions

C extensions can be fuzzed with a single harness file, no tracer needed.

Example harness for msgpack (fuzz_msgpack.rb):

# frozen_string_literal: true

require 'msgpack'
require 'ruzzy'

test_one_input = lambda do |data|
  begin
    MessagePack.unpack(data)
  rescue Exception
    # We're looking for memory corruption, not Ruby exceptions
  end
  return 0
end

Ruzzy.fuzz(test_one_input)

Run with:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby fuzz_msgpack.rb

Harness Rules

Do	Don't
Catch Ruby exceptions if testing C extensions	Let Ruby exceptions crash the fuzzer
Return 0 from test_one_input lambda	Return other values
Keep harness deterministic	Use randomness or time-based logic
Use tracer script for pure Ruby	Skip tracer for pure Ruby code

See Also: For detailed harness writing techniques, patterns for handling complex inputs, and advanced strategies, see the fuzz-harness-writing technique skill.

Compilation

Installing Gems with Sanitizers

When installing Ruby gems with C extensions for fuzzing, compile with sanitizer flags:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
CFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
CXXFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
    gem install <gem-name>

Build Flags

Flag	Purpose
`-fsanitize=address,fuzzer-no-link`	Enable AddressSanitizer and fuzzer instrumentation
`-fno-omit-frame-pointer`	Improve stack trace quality
`-fno-common`	Better compatibility with sanitizers
`-fPIC`	Position-independent code for shared libraries
`-g`	Include debug symbols

Running Campaigns

Environment Setup

Before running any fuzzing campaign, set ASAN_OPTIONS:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"

Options explained:

allocator_may_return_null=1: Skip common low-impact allocation failures (DoS)
detect_leaks=0: Ruby interpreter leaks data, ignore these for now
use_sigaltstack=0: Ruby recommends disabling sigaltstack with ASan

Basic Run

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby harness.rb

Note: LD_PRELOAD is required for sanitizer injection. Unlike ASAN_OPTIONS, do not export it as it may interfere with other programs.

With Corpus

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby harness.rb /path/to/corpus

Passing libFuzzer Options

All libFuzzer options can be passed as arguments:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby harness.rb /path/to/corpus -max_len=1024 -timeout=10

See libFuzzer options for full reference.

Reproducing Crashes

Re-run a crash case by passing the crash file:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby harness.rb ./crash-253420c1158bc6382093d409ce2e9cff5806e980

Interpreting Output

Output	Meaning
`INFO: Running with entropic power schedule`	Fuzzing campaign started
`ERROR: AddressSanitizer: heap-use-after-free`	Memory corruption detected
`SUMMARY: libFuzzer: fuzz target exited`	Ruby exception occurred
`artifact_prefix='./'; Test unit written to ./crash-*`	Crash input saved
`Base64: ...`	Base64 encoding of crash input

Sanitizer Integration

AddressSanitizer (ASan)

Ruzzy includes a pre-compiled AddressSanitizer library:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby harness.rb

Use ASan for detecting:

Heap buffer overflows
Stack buffer overflows
Use-after-free
Double-free
Memory leaks (disabled by default in Ruzzy)

UndefinedBehaviorSanitizer (UBSan)

Ruzzy also includes UBSan:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::UBSAN_PATH') \
    ruby harness.rb

Use UBSan for detecting:

Signed integer overflow
Null pointer dereferences
Misaligned memory access
Division by zero

Common Sanitizer Issues

Issue	Solution
Ruby interpreter leak warnings	Use `ASAN_OPTIONS=detect_leaks=0`
Sigaltstack conflicts	Use `ASAN_OPTIONS=use_sigaltstack=0`
Allocation failure spam	Use `ASAN_OPTIONS=allocator_may_return_null=1`
LD_PRELOAD interferes with tools	Don't export it; set inline with ruby command

See Also: For detailed sanitizer configuration, common issues, and advanced flags, see the address-sanitizer and undefined-behavior-sanitizer technique skills.

Real-World Examples

Example: msgpack-ruby

Fuzzing the msgpack MessagePack parser for memory corruption.

Install with sanitizers:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
CFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
CXXFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
    gem install msgpack

Harness (fuzz_msgpack.rb):

# frozen_string_literal: true

require 'msgpack'
require 'ruzzy'

test_one_input = lambda do |data|
  begin
    MessagePack.unpack(data)
  rescue Exception
    # We're looking for memory corruption, not Ruby exceptions
  end
  return 0
end

Ruzzy.fuzz(test_one_input)

Run:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"
LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby fuzz_msgpack.rb

Example: Pure Ruby Target

Fuzzing pure Ruby code with a custom parser.

Tracer (test_tracer.rb):

# frozen_string_literal: true

require 'ruzzy'

Ruzzy.trace('test_harness.rb')

Harness (test_harness.rb):

# frozen_string_literal: true

require 'ruzzy'
require_relative 'my_parser'

test_one_input = lambda do |data|
  begin
    MyParser.parse(data)
  rescue StandardError
    # Expected exceptions from malformed input
  end
  return 0
end

Ruzzy.fuzz(test_one_input)

Run:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"
LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby test_tracer.rb

Troubleshooting

Problem	Cause	Solution
Installation fails	Wrong clang version or path	Verify clang path, use clang 14.0.0+
`cannot open shared object file`	LD_PRELOAD not set	Set LD_PRELOAD inline with ruby command
Fuzzer immediately exits	Missing corpus directory	Create corpus directory or pass as argument
No coverage progress	Pure Ruby needs tracer	Use tracer script for pure Ruby code
Leak detection spam	Ruby interpreter leaks	Set `ASAN_OPTIONS=detect_leaks=0`
Installation debug needed	Compilation errors	Use `RUZZY_DEBUG=1 gem install --verbose ruzzy`

Technique Skills

Skill	Use Case
fuzz-harness-writing	Detailed guidance on writing effective harnesses
address-sanitizer	Memory error detection during fuzzing
undefined-behavior-sanitizer	Detecting undefined behavior in C extensions
libfuzzer	Understanding libFuzzer options (Ruzzy is built on libFuzzer)

Skill	When to Consider
libfuzzer	When fuzzing Ruby C extension code directly in C/C++
aflpp	Alternative approach for fuzzing Ruby by instrumenting Ruby interpreter

Resources

Key External Resources

Introducing Ruzzy, a coverage-guided Ruby fuzzer Official Trail of Bits blog post announcing Ruzzy, covering motivation, architecture, and initial results.

Ruzzy GitHub Repository Source code, additional examples, and development instructions.

libFuzzer Documentation Since Ruzzy is built on libFuzzer, understanding libFuzzer options and behavior is valuable.

Fuzzing Ruby C extensions Detailed guide on fuzzing C extensions with compilation flags and examples.

Fuzzing pure Ruby code Detailed guide on the tracer pattern required for pure Ruby fuzzing.

/testing-handbook-generator

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/testing-handbook-generator/SKILL.md`

name: testing-handbook-generator description: > Meta-skill that analyzes the Trail of Bits Testing Handbook (appsec.guide) and generates Claude Code skills for security testing tools and techniques. Use when creating new skills based on handbook content.

Testing Handbook Skill Generator

Generate and maintain Claude Code skills from the Trail of Bits Testing Handbook.

When to Use

Invoke this skill when:

Creating new security testing skills from handbook content
User mentions "testing handbook", "appsec.guide", or asks about generating skills
Bulk skill generation or refresh is needed

Do NOT use for:

General security testing questions (use the generated skills)
Non-handbook skill creation

Handbook Location

The skill needs the Testing Handbook repository. See discovery.md for full details.

Quick reference: Check ./testing-handbook, ../testing-handbook, ~/testing-handbook → ask user → clone as last resort.

Repository: https://github.com/trailofbits/testing-handbook

Workflow Overview

Phase 0: Setup              Phase 1: Discovery
┌─────────────────┐        ┌─────────────────┐
│ Locate handbook │   →    │ Analyze handbook│
│ - Find or clone │        │ - Scan sections │
│ - Confirm path  │        │ - Classify types│
└─────────────────┘        └─────────────────┘
         ↓                          ↓
Phase 3: Generation        Phase 2: Planning
┌─────────────────┐        ┌─────────────────┐
│ TWO-PASS GEN    │   ←    │ Generate plan   │
│ Pass 1: Content │        │ - New skills    │
│ Pass 2: X-refs  │        │ - Updates       │
│ - Write to gen/ │        │ - Present user  │
└─────────────────┘        └─────────────────┘
         ↓
Phase 4: Testing           Phase 5: Finalize
┌─────────────────┐        ┌─────────────────┐
│ Validate skills │   →    │ Post-generation │
│ - Run validator │        │ - Update README │
│ - Test activation│       │ - Update X-refs │
│ - Fix issues    │        │ - Self-improve  │
└─────────────────┘        └─────────────────┘

Scope Restrictions

ONLY modify these locations:

plugins/testing-handbook-skills/skills/[skill-name]/* - Generated skills (as siblings to testing-handbook-generator)
plugins/testing-handbook-skills/skills/testing-handbook-generator/* - Self-improvement
Repository root README.md - Add generated skills to table

NEVER modify or analyze:

Other plugins (plugins/property-based-testing/, plugins/static-analysis/, etc.)
Other skills outside this plugin

Do not scan or pull into context any skills outside of testing-handbook-skills/. Generate skills based solely on handbook content and resources referenced from it.

Quick Reference

Section → Skill Type Mapping

Handbook Section	Skill Type	Template
`/static-analysis/[tool]/`	Tool Skill	tool-skill.md
`/fuzzing/[lang]/[fuzzer]/`	Fuzzer Skill	fuzzer-skill.md
`/fuzzing/techniques/`	Technique Skill	technique-skill.md
`/crypto/[tool]/`	Domain Skill	domain-skill.md
`/web/[tool]/`	Tool Skill	tool-skill.md

Skill Candidate Signals

Signal	Indicates
`_index.md` with `bookCollapseSection: true`	Major tool/topic
Numbered files (00-, 10-, 20-)	Structured content
`techniques/` subsection	Methodology content
`99-resources.md` or `91-resources.md`	Has external links

Exclusion Signals

Signal	Action
`draft: true` in frontmatter	Skip section
Empty directory	Skip section
Template/placeholder file	Skip section
GUI-only tool (e.g., `web/burp/`)	Skip section (Claude cannot operate GUI tools)

Decision Tree

Starting skill generation?

├─ Need to analyze handbook and build plan?
│  └─ Read: discovery.md
│     (Handbook analysis methodology, plan format)
│
├─ Spawning skill generation agents?
│  └─ Read: agent-prompt.md
│     (Full prompt template, variable reference, validation checklist)
│
├─ Generating a specific skill type?
│  └─ Read appropriate template:
│     ├─ Tool (Semgrep, CodeQL) → templates/tool-skill.md
│     ├─ Fuzzer (libFuzzer, AFL++) → templates/fuzzer-skill.md
│     ├─ Technique (harness, coverage) → templates/technique-skill.md
│     └─ Domain (crypto, web) → templates/domain-skill.md
│
├─ Validating generated skills?
│  └─ Run: scripts/validate-skills.py
│     Then read: testing.md for activation testing
│
├─ Finalizing after generation?
│  └─ See: Post-Generation Tasks below
│     (Update main README, update Skills Cross-Reference, self-improvement)
│
└─ Quick generation from specific section?
   └─ Use Quick Reference above, apply template directly

Two-Pass Generation (Phase 3)

Generation uses a two-pass approach to solve forward reference problems (skills referencing other skills that don't exist yet).

Pass 1: Content Generation (Parallel)

Generate all skills in parallel without the Related Skills section:

Pass 1 - Generating 5 skills in parallel:
├─ Agent 1: libfuzzer (fuzzer) → skills/libfuzzer/SKILL.md
├─ Agent 2: aflpp (fuzzer) → skills/aflpp/SKILL.md
├─ Agent 3: semgrep (tool) → skills/semgrep/SKILL.md
├─ Agent 4: harness-writing (technique) → skills/harness-writing/SKILL.md
└─ Agent 5: wycheproof (domain) → skills/wycheproof/SKILL.md

Each agent uses: pass=1 (content only, Related Skills left empty)

Pass 1 agents:

Generate all sections EXCEPT Related Skills
Leave a placeholder: ## Related Skills\n\n
Output report includes references: DEFERRED

Pass 2: Cross-Reference Population (Sequential)

After all Pass 1 agents complete, run Pass 2 to populate Related Skills:

Pass 2 - Populating cross-references:
├─ Read all generated skill names from skills/*/SKILL.md
├─ For each skill, determine related skills based on:
│   ├─ related_sections from discovery (handbook structure)
│   ├─ Skill type relationships (fuzzers → techniques)
│   └─ Explicit mentions in content
└─ Update each SKILL.md's Related Skills section

Pass 2 process:

Collect all generated skill names: ls -d skills/*/SKILL.md
For each skill, identify related skills using the mapping from discovery
Edit each SKILL.md to replace the placeholder with actual links
Validate cross-references exist (no broken links)

Agent Prompt Template

See agent-prompt.md for the full prompt template with:

Variable substitution reference (including pass variable)
Pre-write validation checklist
Hugo shortcode conversion rules
Line count splitting rules
Error handling guidance
Output report format

Collecting Results

After Pass 1: Aggregate output reports, verify all skills generated. After Pass 2: Run validator to check cross-references.

Handling Agent Failures

If an agent fails or produces invalid output:

Failure Type	Detection	Recovery Action
Agent crashed	No output report	Re-run single agent with same inputs
Validation failed	Output report shows errors	Check gaps/warnings, manually patch or re-run
Wrong skill type	Content doesn't match template	Re-run with corrected `type` parameter
Missing content	Output report lists gaps	Accept if minor, or provide additional `related_sections`
Pass 2 broken ref	Validator shows missing skill	Check if skill was skipped, update reference

Important: Do NOT re-run the entire parallel batch for a single agent failure. Fix individual failures independently.

Single-Skill Regeneration

To regenerate a single skill without re-running the entire batch:

# Regenerate single skill (Pass 1 - content only)
"Use testing-handbook-generator to regenerate the {skill-name} skill from section {section_path}"

# Example:
"Use testing-handbook-generator to regenerate the libfuzzer skill from section fuzzing/c-cpp/10-libfuzzer"

Regeneration workflow:

Re-read the handbook section for fresh content
Apply the appropriate template
Write to skills/{skill-name}/SKILL.md (overwrites existing)
Re-run Pass 2 for that skill only to update cross-references
Run validator on the single skill: uv run scripts/validate-skills.py --skill {skill-name}

Output Location

Generated skills are written to:

skills/[skill-name]/SKILL.md

Each skill gets its own directory for potential supporting files (as siblings to testing-handbook-generator).

Quality Checklist

Before delivering generated skills:

Post-Generation Tasks

1. Update Main README

After generating skills, update the repository's main README.md to list them.

Format: Add generated skills to the same "Available Plugins" table, directly after testing-handbook-skills. Use plain text testing-handbook-generator as the author (no link).

Example:

| Plugin | Description | Author |
|--------|-------------|--------|
| ... other plugins ... |
| [testing-handbook-skills](plugins/testing-handbook-skills/) | Meta-skill that generates skills from the Testing Handbook | Paweł Płatek |
| [libfuzzer](plugins/testing-handbook-skills/skills/libfuzzer/) | Coverage-guided fuzzing with libFuzzer for C/C++ | testing-handbook-generator |
| [aflpp](plugins/testing-handbook-skills/skills/aflpp/) | Multi-core fuzzing with AFL++ | testing-handbook-generator |
| [semgrep](plugins/testing-handbook-skills/skills/semgrep/) | Fast static analysis for finding bugs | testing-handbook-generator |

2. Update Skills Cross-Reference

After generating skills, update the README.md's Skills Cross-Reference section with the mermaid graph showing skill relationships.

Process:

Read each generated skill's SKILL.md and extract its ## Related Skills section
Build the mermaid graph with nodes grouped by skill type (Fuzzers, Techniques, Tools, Domain)
Add edges based on the Related Skills relationships:
- Solid arrows (-->) for primary technique dependencies
- Dashed arrows (-.->) for alternative tool suggestions
Replace the existing mermaid code block in README.md

Edge classification:

Relationship	Arrow Style	Example
Fuzzer → Technique	`-->`	`libfuzzer --> harness-writing`
Tool → Tool (alternative)	`-.->`	`semgrep -.-> codeql`
Fuzzer → Fuzzer (alternative)	`-.->`	`libfuzzer -.-> aflpp`
Technique → Technique	`-->`	`harness-writing --> coverage-analysis`

Validation: After updating, run validate-skills.py to verify all referenced skills exist.

3. Self-Improvement

After each generation run, reflect on what could improve future runs.

Capture improvements to:

Templates (missing sections, better structure)
Discovery logic (missed patterns, false positives)
Content extraction (shortcodes not handled, formatting issues)

Update process:

Note issues encountered during generation
Identify patterns that caused problems
Update relevant files:
- SKILL.md - Workflow, decision tree, quick reference updates
- templates/*.md - Template improvements
- discovery.md - Detection logic updates
- testing.md - New validation checks
Document the improvement in commit message

Example self-improvement:

Issue: libFuzzer skill missing sanitizer flags table
Fix: Updated templates/fuzzer-skill.md to include ## Compiler Flags section

Example Usage

Full Discovery and Generation

User: "Generate skills from the testing handbook"

1. Locate handbook (check common locations, ask user, or clone)
2. Read discovery.md for methodology
3. Scan handbook at {handbook_path}/content/docs/
4. Build candidate list with types
5. Present plan to user
6. On approval, generate each skill using appropriate template
7. Validate generated skills
8. Update main README.md with generated skills table
9. Update README.md Skills Cross-Reference graph from Related Skills sections
10. Self-improve: note any template/discovery issues for future runs
11. Report results

Single Section Generation

User: "Create a skill for the libFuzzer section"

1. Read /testing-handbook/content/docs/fuzzing/c-cpp/10-libfuzzer/
2. Identify type: Fuzzer Skill
3. Read templates/fuzzer-skill.md
4. Extract content, apply template
5. Write to skills/libfuzzer/SKILL.md
6. Validate and report

Tips

Do:

Always present plan before generating
Use appropriate template for skill type
Preserve code blocks exactly
Validate after generation

Don't:

Generate without user approval
Skip fetching non-video external resources (use WebFetch)
Fetch video URLs (YouTube, Vimeo - titles only)
Include handbook images directly
Skip validation step
Exceed 500 lines per SKILL.md

For first-time use: Start with discovery.md to understand the handbook analysis process.

For template reference: See templates/ directory for skill type templates.

For validation: See testing.md for quality assurance methodology.

/wycheproof

Source: `~/.claude/skills/tob-testing-handbook-skills/skills/wycheproof/SKILL.md`

name: wycheproof type: domain description: > Wycheproof provides test vectors for validating cryptographic implementations. Use when testing crypto code for known attacks and edge cases.

Wycheproof

Wycheproof is an extensive collection of test vectors designed to verify the correctness of cryptographic implementations and test against known attacks. Originally developed by Google, it is now a community-managed project where contributors can add test vectors for specific cryptographic constructions.

Background

Key Concepts

Concept	Description
Test vector	Input/output pair for validating crypto implementation correctness
Test group	Collection of test vectors sharing attributes (key size, IV size, curve)
Result flag	Indicates if test should pass (valid), fail (invalid), or is acceptable
Edge case testing	Testing for known vulnerabilities and attack patterns

Why This Matters

Cryptographic implementations are notoriously difficult to get right. Even small bugs can:

Expose private keys
Allow signature forgery
Enable message decryption
Create consensus problems when different implementations accept/reject the same inputs

Wycheproof has found vulnerabilities in major libraries including OpenJDK's SHA1withDSA, Bouncy Castle's ECDHC, and the elliptic npm package.

When to Use

Apply Wycheproof when:

Testing cryptographic implementations (AES-GCM, ECDSA, ECDH, RSA, etc.)
Validating that crypto code handles edge cases correctly
Verifying implementations against known attack vectors
Setting up CI/CD for cryptographic libraries
Auditing third-party crypto code for correctness

Consider alternatives when:

Testing for timing side-channels (use constant-time testing tools instead)
Finding new unknown bugs (use fuzzing instead)
Testing custom/experimental cryptographic algorithms (Wycheproof only covers established algorithms)

Quick Reference

Scenario	Recommended Approach	Notes
AES-GCM implementation	Use `aes_gcm_test.json`	316 test vectors across 44 test groups
ECDSA verification	Use `ecdsa_*_test.json` for specific curves	Tests signature malleability, DER encoding
ECDH key exchange	Use `ecdh_*_test.json`	Tests invalid curve attacks
RSA signatures	Use `rsa_*_test.json`	Tests padding oracle attacks
ChaCha20-Poly1305	Use `chacha20_poly1305_test.json`	Tests AEAD implementation

Testing Workflow

Phase 1: Setup                 Phase 2: Parse Test Vectors
┌─────────────────┐          ┌─────────────────┐
│ Add Wycheproof  │    →     │ Load JSON file  │
│ as submodule    │          │ Filter by params│
└─────────────────┘          └─────────────────┘
         ↓                            ↓
Phase 4: CI Integration        Phase 3: Write Harness
┌─────────────────┐          ┌─────────────────┐
│ Auto-update     │    ←     │ Test valid &    │
│ test vectors    │          │ invalid cases   │
└─────────────────┘          └─────────────────┘

Repository Structure

The Wycheproof repository is organized as follows:

┣ 📜 README.md       : Project overview
┣ 📂 doc             : Documentation
┣ 📂 java            : Java JCE interface testing harness
┣ 📂 javascript      : JavaScript testing harness
┣ 📂 schemas         : Test vector schemas
┣ 📂 testvectors     : Test vectors
┗ 📂 testvectors_v1  : Updated test vectors (more detailed)

The essential folders are testvectors and testvectors_v1. While both contain similar files, testvectors_v1 includes more detailed information and is recommended for new integrations.

Supported Algorithms

Wycheproof provides test vectors for a wide range of cryptographic algorithms:

Category	Algorithms
Symmetric Encryption	AES-GCM, AES-EAX, ChaCha20-Poly1305
Signatures	ECDSA, EdDSA, RSA-PSS, RSA-PKCS1
Key Exchange	ECDH, X25519, X448
Hashing	HMAC, HKDF
Curves	secp256k1, secp256r1, secp384r1, secp521r1, ed25519, ed448

Test File Structure

Each JSON test file tests a specific cryptographic construction. All test files share common attributes:

"algorithm"         : The name of the algorithm tested
"schema"            : The JSON schema (found in schemas folder)
"generatorVersion"  : The version number
"numberOfTests"     : The total number of test vectors in this file
"header"            : Detailed description of test vectors
"notes"             : In-depth explanation of flags in test vectors
"testGroups"        : Array of one or multiple test groups

Test Groups

Test groups group sets of tests based on shared attributes such as:

Key sizes
IV sizes
Public keys
Curves

This classification allows extracting tests that meet specific criteria relevant to the construction being tested.

Test Vector Attributes

Shared Attributes

All test vectors contain four common fields:

tcId: Unique identifier for the test vector within a file
comment: Additional information about the test case
flags: Descriptions of specific test case types and potential dangers (referenced in notes field)
result: Expected outcome of the test

The result field can take three values:

Result	Meaning
valid	Test case should succeed
acceptable	Test case is allowed to succeed but contains non-ideal attributes
invalid	Test case should fail

Unique Attributes

Unique attributes are specific to the algorithm being tested:

Algorithm	Unique Attributes
AES-GCM	`key`, `iv`, `aad`, `msg`, `ct`, `tag`
ECDH secp256k1	`public`, `private`, `shared`
ECDSA	`msg`, `sig`, `result`
EdDSA	`msg`, `sig`, `pk`

Implementation Guide

Phase 1: Add Wycheproof to Your Project

Option 1: Git Submodule (Recommended)

Adding Wycheproof as a git submodule ensures automatic updates:

git submodule add https://github.com/C2SP/wycheproof.git

Option 2: Fetch Specific Test Vectors

If submodules aren't possible, fetch specific JSON files:

#!/bin/bash

TMP_WYCHEPROOF_FOLDER=".wycheproof/"
TEST_VECTORS=('aes_gcm_test.json' 'aes_eax_test.json')
BASE_URL="https://raw.githubusercontent.com/C2SP/wycheproof/master/testvectors_v1/"

# Create wycheproof folder
mkdir -p $TMP_WYCHEPROOF_FOLDER

# Request all test vector files if they don't exist
for i in "${TEST_VECTORS[@]}"; do
  if [ ! -f "${TMP_WYCHEPROOF_FOLDER}${i}" ]; then
    curl -o "${TMP_WYCHEPROOF_FOLDER}${i}" "${BASE_URL}${i}"
    if [ $? -ne 0 ]; then
      echo "Failed to download ${i}"
      exit 1
    fi
  fi
done

Phase 2: Parse Test Vectors

Identify the test file for your algorithm and parse the JSON:

Python Example:

import json

def load_wycheproof_test_vectors(path: str):
    testVectors = []
    try:
        with open(path, "r") as f:
            wycheproof_json = json.loads(f.read())
    except FileNotFoundError:
        print(f"No Wycheproof file found at: {path}")
        return testVectors

    # Attributes that need hex-to-bytes conversion
    convert_attr = {"key", "aad", "iv", "msg", "ct", "tag"}

    for testGroup in wycheproof_json["testGroups"]:
        # Filter test groups based on implementation constraints
        if testGroup["ivSize"] < 64 or testGroup["ivSize"] > 1024:
            continue

        for tv in testGroup["tests"]:
            # Convert hex strings to bytes
            for attr in convert_attr:
                if attr in tv:
                    tv[attr] = bytes.fromhex(tv[attr])
            testVectors.append(tv)

    return testVectors

JavaScript Example:

const fs = require('fs').promises;

async function loadWycheproofTestVectors(path) {
  const tests = [];

  try {
    const fileContent = await fs.readFile(path);
    const data = JSON.parse(fileContent.toString());

    data.testGroups.forEach(testGroup => {
      testGroup.tests.forEach(test => {
        // Add shared test group properties to each test
        test['pk'] = testGroup.publicKey.pk;
        tests.push(test);
      });
    });
  } catch (err) {
    console.error('Error reading or parsing file:', err);
    throw err;
  }

  return tests;
}

Phase 3: Write Testing Harness

Create test functions that handle both valid and invalid test cases.

Python/pytest Example:

import pytest
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

tvs = load_wycheproof_test_vectors("wycheproof/testvectors_v1/aes_gcm_test.json")

@pytest.mark.parametrize("tv", tvs, ids=[str(tv['tcId']) for tv in tvs])
def test_encryption(tv):
    try:
        aesgcm = AESGCM(tv['key'])
        ct = aesgcm.encrypt(tv['iv'], tv['msg'], tv['aad'])
    except ValueError as e:
        # Implementation raised error - verify test was expected to fail
        assert tv['result'] != 'valid', tv['comment']
        return

    if tv['result'] == 'valid':
        assert ct[:-16] == tv['ct'], f"Ciphertext mismatch: {tv['comment']}"
        assert ct[-16:] == tv['tag'], f"Tag mismatch: {tv['comment']}"
    elif tv['result'] == 'invalid' or tv['result'] == 'acceptable':
        assert ct[:-16] != tv['ct'] or ct[-16:] != tv['tag']

@pytest.mark.parametrize("tv", tvs, ids=[str(tv['tcId']) for tv in tvs])
def test_decryption(tv):
    try:
        aesgcm = AESGCM(tv['key'])
        decrypted_msg = aesgcm.decrypt(tv['iv'], tv['ct'] + tv['tag'], tv['aad'])
    except ValueError:
        assert tv['result'] != 'valid', tv['comment']
        return
    except InvalidTag:
        assert tv['result'] != 'valid', tv['comment']
        assert 'ModifiedTag' in tv['flags'], f"Expected 'ModifiedTag' flag: {tv['comment']}"
        return

    assert tv['result'] == 'valid', f"No invalid test case should pass: {tv['comment']}"
    assert decrypted_msg == tv['msg'], f"Decryption mismatch: {tv['comment']}"

JavaScript/Mocha Example:

const assert = require('assert');

function testFactory(tcId, tests) {
  it(`[${tcId + 1}] ${tests[tcId].comment}`, function () {
    const test = tests[tcId];
    const ed25519 = new eddsa('ed25519');
    const key = ed25519.keyFromPublic(toArray(test.pk, 'hex'));

    let sig;
    if (test.result === 'valid') {
      sig = key.verify(test.msg, test.sig);
      assert.equal(sig, true, `[${test.tcId}] ${test.comment}`);
    } else if (test.result === 'invalid') {
      try {
        sig = key.verify(test.msg, test.sig);
      } catch (err) {
        // Point could not be decoded
        sig = false;
      }
      assert.equal(sig, false, `[${test.tcId}] ${test.comment}`);
    }
  });
}

// Generate tests for all test vectors
for (var tcId = 0; tcId < tests.length; tcId++) {
  testFactory(tcId, tests);
}

Phase 4: CI Integration

Ensure test vectors stay up to date by:

Using git submodules: Update submodule in CI before running tests
Fetching latest vectors: Run fetch script before test execution
Scheduled updates: Set up weekly/monthly updates to catch new test vectors

Common Vulnerabilities Detected

Wycheproof test vectors are designed to catch specific vulnerability patterns:

Vulnerability	Description	Affected Algorithms	Example CVE
Signature malleability	Multiple valid signatures for same message	ECDSA, EdDSA	CVE-2024-42459
Invalid DER encoding	Accepting non-canonical DER signatures	ECDSA	CVE-2024-42460, CVE-2024-42461
Invalid curve attacks	ECDH with invalid curve points	ECDH	Common in many libraries
Padding oracle	Timing leaks in padding validation	RSA-PKCS1	Historical OpenSSL issues
Tag forgery	Accepting modified authentication tags	AES-GCM, ChaCha20-Poly1305	Various implementations

Signature Malleability: Deep Dive

Problem: Implementations that don't validate signature encoding can accept multiple valid signatures for the same message.

Example (EdDSA): Appending or removing zeros from signature:

Valid signature:   ...6a5c51eb6f946b30d
Invalid signature: ...6a5c51eb6f946b30d0000  (should be rejected)

How to detect:

# Add signature length check
if len(sig) != 128:  # EdDSA signatures must be exactly 64 bytes (128 hex chars)
    return False

Impact: Can lead to consensus problems when different implementations accept/reject the same signatures.

EdDSA: tcId 37 - "removing 0 byte from signature"
ECDSA: tcId 06 - "Legacy: ASN encoding of r misses leading 0"

Case Study: Elliptic npm Package

This case study demonstrates how Wycheproof found three CVEs in the popular elliptic npm package (3000+ dependents, millions of weekly downloads).

Overview

The elliptic library is an elliptic-curve cryptography library written in JavaScript, supporting ECDH, ECDSA, and EdDSA. Using Wycheproof test vectors on version 6.5.6 revealed multiple vulnerabilities:

CVE-2024-42459: EdDSA signature malleability (appending/removing zeros)
CVE-2024-42460: ECDSA DER encoding - invalid bit placement
CVE-2024-42461: ECDSA DER encoding - leading zero in length field

Methodology

Identify supported curves: ed25519 for EdDSA
Find test vectors: testvectors_v1/ed25519_test.json
Parse test vectors: Load JSON and extract tests
Write test harness: Create parameterized tests
Run tests: Identify failures
Analyze root causes: Examine implementation code
Propose fixes: Add validation checks

Key Findings

EdDSA Issue (CVE-2024-42459):

Missing signature length validation
Allowed trailing zeros in signatures
Fix: Add if(sig.length !== 128) return false;

ECDSA Issue 1 (CVE-2024-42460):

Missing check for first bit being zero in DER-encoded r and s values
Fix: Add if ((data[p.place] & 128) !== 0) return false;

ECDSA Issue 2 (CVE-2024-42461):

DER length field accepted leading zeros
Fix: Add if(buf[p.place] === 0x00) return false;

Impact

All three vulnerabilities allowed multiple valid signatures for a single message, leading to consensus problems across implementations.

Lessons learned:

Wycheproof catches subtle encoding bugs
Reusable test harnesses pay dividends
Test vector comments and flags help diagnose issues
Even popular libraries benefit from systematic test vector validation

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Filter test groups by parameters	Focus on test vectors relevant to your implementation constraints
Use test vector flags	Understand specific vulnerability patterns being tested
Check the `notes` field	Get detailed explanations of flag meanings
Test both encrypt/decrypt and sign/verify	Ensure bidirectional correctness
Run tests in CI	Catch regressions and benefit from new test vectors
Use parameterized tests	Get clear failure messages with tcId and comment

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Only testing valid cases	Misses vulnerabilities where invalid inputs are accepted	Test all result types: valid, invalid, acceptable
Ignoring "acceptable" result	Implementation might have subtle bugs	Treat acceptable as warnings worth investigating
Not filtering test groups	Wastes time on unsupported parameters	Filter by keySize, ivSize, etc. based on your implementation
Not updating test vectors	Miss new vulnerability patterns	Use submodules or scheduled fetches
Testing only one direction	Encrypt/sign might work but decrypt/verify fails	Test both operations

Tool Skills

Skill	Primary Use in Wycheproof Testing
pytest	Python testing framework for parameterized tests
mocha	JavaScript testing framework for test generation
constant-time-testing	Complement Wycheproof with timing side-channel testing
cryptofuzz	Fuzz-based crypto testing to find additional bugs

Technique Skills

Skill	When to Apply
coverage-analysis	Ensure test vectors cover all code paths in crypto implementation
property-based-testing	Test mathematical properties (e.g., encrypt/decrypt round-trip)
fuzz-harness-writing	Create harnesses for crypto parsers (complements Wycheproof)

Skill	Relationship
crypto-testing	Wycheproof is a key tool in comprehensive crypto testing methodology
fuzzing	Use fuzzing to find bugs Wycheproof doesn't cover (new edge cases)

Skill Dependency Map

                    ┌─────────────────────┐
                    │    wycheproof       │
                    │   (this skill)      │
                    └──────────┬──────────┘
                               │
           ┌───────────────────┼───────────────────┐
           │                   │                   │
           ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│  pytest/mocha   │ │ constant-time   │ │   cryptofuzz    │
│ (test framework)│ │   testing       │ │   (fuzzing)     │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
         │                   │                   │
         └───────────────────┼───────────────────┘
                             │
                             ▼
              ┌──────────────────────────┐
              │   Technique Skills       │
              │ coverage, harness, PBT   │
              └──────────────────────────┘

Resources

Official Repository

Wycheproof GitHub Repository

The official repository contains:

All test vectors in testvectors/ and testvectors_v1/
JSON schemas in schemas/
Reference implementations in Java and JavaScript
Documentation in doc/

Real-World Examples

pycryptodome

The pycryptodome library integrates Wycheproof test vectors in their test suite, demonstrating best practices for Python crypto implementations.

Community Resources

C2SP Community - Cryptographic specifications and standards community maintaining Wycheproof
Wycheproof issues tracker - Report bugs in test vectors or suggest new constructions

Summary

Wycheproof is an essential tool for validating cryptographic implementations against known attack vectors and edge cases. By integrating Wycheproof test vectors into your testing workflow:

Catch subtle encoding and validation bugs
Prevent signature malleability issues
Ensure consistent behavior across implementations
Benefit from community-contributed test vectors
Protect against known cryptographic vulnerabilities

The investment in writing a reusable testing harness pays dividends through continuous validation as new test vectors are added to the Wycheproof repository.

/variant-analysis

Source: `~/.claude/skills/tob-variant-analysis/skills/variant-analysis/SKILL.md`

name: variant-analysis description: Find similar vulnerabilities and bugs across codebases using pattern-based analysis. Use when hunting bug variants, building CodeQL/Semgrep queries, analyzing security vulnerabilities, or performing systematic code audits after finding an initial issue.

Variant Analysis

You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.

When to Use

Use this skill when:

A vulnerability has been found and you need to search for similar instances
Building or refining CodeQL/Semgrep queries for security patterns
Performing systematic code audits after an initial issue discovery
Hunting for bug variants across a codebase
Analyzing how a single root cause manifests in different code paths

When NOT to Use

Do NOT use this skill for:

Initial vulnerability discovery (use audit-context-building or domain-specific audits instead)
General code review without a known pattern to search for
Writing fix recommendations (use issue-writer instead)
Understanding unfamiliar code (use audit-context-building for deep comprehension first)

The Five-Step Process

Step 1: Understand the Original Issue

Before searching, deeply understand the known bug:

What is the root cause? Not the symptom, but WHY it's vulnerable
What conditions are required? Control flow, data flow, state
What makes it exploitable? User control, missing validation, etc.

Step 2: Create an Exact Match

Start with a pattern that matches ONLY the known instance:

rg -n "exact_vulnerable_code_here"

Verify: Does it match exactly ONE location (the original)?

Step 3: Identify Abstraction Points

Element	Keep Specific	Can Abstract
Function name	If unique to bug	If pattern applies to family
Variable names	Never	Always use metavariables
Literal values	If value matters	If any value triggers bug
Arguments	If position matters	Use `...` wildcards

Step 4: Iteratively Generalize

Change ONE element at a time:

Run the pattern
Review ALL new matches
Classify: true positive or false positive?
If FP rate acceptable, generalize next element
If FP rate too high, revert and try different abstraction

Stop when false positive rate exceeds ~50%

Step 5: Analyze and Triage Results

For each match, document:

Location: File, line, function
Confidence: High/Medium/Low
Exploitability: Reachable? Controllable inputs?
Priority: Based on impact and exploitability

For deeper strategic guidance, see METHODOLOGY.md.

Tool Selection

Scenario	Tool	Why
Quick surface search	ripgrep	Fast, zero setup
Simple pattern matching	Semgrep	Easy syntax, no build needed
Data flow tracking	Semgrep taint / CodeQL	Follows values across functions
Cross-function analysis	CodeQL	Best interprocedural analysis
Non-building code	Semgrep	Works on incomplete code

Key Principles

Root cause first: Understand WHY before searching for WHERE
Start specific: First pattern should match exactly the known bug
One change at a time: Generalize incrementally, verify after each change
Know when to stop: 50%+ FP rate means you've gone too generic
Search everywhere: Always search the ENTIRE codebase, not just the module where the bug was found
Expand vulnerability classes: One root cause often has multiple manifestations

Critical Pitfalls to Avoid

These common mistakes cause analysts to miss real vulnerabilities:

1. Narrow Search Scope

Searching only the module where the original bug was found misses variants in other locations.

Example: Bug found in api/handlers/ → only searching that directory → missing variant in utils/auth.py

Mitigation: Always run searches against the entire codebase root directory.

2. Pattern Too Specific

Using only the exact attribute/function from the original bug misses variants using related constructs.

Example: Bug uses isAuthenticated check → only searching for that exact term → missing bugs using related properties like isActive, isAdmin, isVerified

Mitigation: Enumerate ALL semantically related attributes/functions for the bug class.

3. Single Vulnerability Class

Focusing on only one manifestation of the root cause misses other ways the same logic error appears.

Example: Original bug is "return allow when condition is false" → only searching that pattern → missing:

Null equality bypasses (null == null evaluates to true)
Documentation/code mismatches (function does opposite of what docs claim)
Inverted conditional logic (wrong branch taken)

Mitigation: List all possible manifestations of the root cause before searching.

4. Missing Edge Cases

Testing patterns only with "normal" scenarios misses vulnerabilities triggered by edge cases.

Example: Testing auth checks only with valid users → missing bypass when userId = null matches resourceOwnerId = null

Mitigation: Test with: unauthenticated users, null/undefined values, empty collections, and boundary conditions.

Resources

Ready-to-use templates in resources/:

CodeQL (resources/codeql/):

python.ql, javascript.ql, java.ql, go.ql, cpp.ql

Semgrep (resources/semgrep/):

python.yaml, javascript.yaml, java.yaml, go.yaml, cpp.yaml

Report: resources/variant-report-template.md

/yara-rule-authoring

Source: `~/.claude/skills/tob-yara-authoring/skills/yara-rule-authoring/SKILL.md`

name: yara-rule-authoring description: > Guides authoring of high-quality YARA-X detection rules for malware identification. Use when writing, reviewing, or optimizing YARA rules. Covers naming conventions, string selection, performance optimization, migration from legacy YARA, and false positive reduction. Triggers on: YARA, YARA-X, malware detection, threat hunting, IOC, signature, crx module, dex module.

YARA-X Rule Authoring

Write detection rules that catch malware without drowning in false positives.

This skill targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.

Core Principles

Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.
Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.
Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.
Short-circuit with cheap checks first — Put filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches or module calls.
Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.

When to Use

Writing new YARA-X rules for malware detection
Reviewing existing rules for quality or performance issues
Optimizing slow-running rulesets
Converting IOCs or threat intel into detection signatures
Debugging false positive issues
Preparing rules for production deployment
Migrating legacy YARA rules to YARA-X
Analyzing Chrome extensions (crx module)
Analyzing Android apps (dex module)

When NOT to Use

Static analysis requiring disassembly → use Ghidra/IDA skills
Dynamic malware analysis → use sandbox analysis skills
Network-based detection → use Suricata/Snort skills
Memory forensics with Volatility → use memory forensics skills
Simple hash-based detection → just use hash lists

YARA-X Overview

YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.

Install: brew install yara-x (macOS) or cargo install yara-x

Essential commands: yr scan, yr check, yr fmt, yr dump

Platform Considerations

YARA works on any file type. Adapt patterns to your target:

Platform	Magic Bytes	Bad Strings	Good Strings
Windows PE	`uint16(0) == 0x5A4D`	API names, Windows paths	Mutex names, PDB paths
macOS Mach-O	`uint32(0) == 0xFEEDFACE` (32-bit), `0xFEEDFACF` (64-bit), `0xCAFEBABE` (universal)	Common Obj-C methods	Keylogger strings, persistence paths
JavaScript/Node	(none needed)	`require`, `fetch`, `axios`	Obfuscator signatures, eval+decode chains
npm/pip packages	(none needed)	`postinstall`, `dependencies`	Suspicious package names, exfil URLs
Office docs	`uint32(0) == 0x504B0304`	VBA keywords	Macro auto-exec, encoded payloads
VS Code extensions	(none needed)	`vscode.workspace`	Uncommon activationEvents, hidden file access
Chrome extensions	Use `crx` module	Common Chrome APIs	Permission abuse, manifest anomalies
Android apps	Use `dex` module	Standard DEX structure	Obfuscated classes, suspicious permissions

macOS Malware Detection

No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:

Magic bytes:

// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA

Good indicators for macOS malware:

Keylogger artifacts: CGEventTapCreate, kCGEventKeyDown
SSH tunnel strings: ssh -D, tunnel, socks
Persistence paths: ~/Library/LaunchAgents, /Library/LaunchDaemons
Credential theft: security find-generic-password, keychain

Example pattern from Airbnb BinaryAlert:

rule SUSP_Mac_ProtonRAT
{
    strings:
        // Library indicators
        $lib1 = "SRWebSocket" ascii
        $lib2 = "SocketRocket" ascii

        // Behavioral indicators
        $behav1 = "SSH tunnel not launched" ascii
        $behav2 = "Keylogger" ascii

    condition:
        (uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
        any of ($lib*) and any of ($behav*)
}

JavaScript Detection Decision Tree

Writing a JavaScript rule?
├─ npm package?
│  ├─ Check package.json patterns
│  ├─ Look for postinstall/preinstall hooks
│  └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│  ├─ Chrome: Use crx module
│  └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│  ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│  ├─ Target unique function/variable names (often survive minification)
│  └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
   ├─ Target unique strings that survive bundling (URLs, magic values)
   └─ Avoid function names (will be mangled)

JavaScript-specific good strings:

Ethereum function selectors: { 70 a0 82 31 } (transfer)
Zero-width characters (steganography): { E2 80 8B E2 80 8C }
Obfuscator signatures: _0x, var _0x
Specific C2 patterns: domain names, webhook URLs

JavaScript-specific bad strings:

require, fetch, axios — too common
Buffer, crypto — legitimate uses everywhere
process.env alone — need specific env var names

Essential Toolkit

Tool	Purpose
yarGen	Extract candidate strings: `yarGen.py -m samples/ --excludegood` → validate with `yr check`
FLOSS	Extract obfuscated/stack strings: `floss sample.exe` (when yarGen fails)
yr CLI	Validate: `yr check`, scan: `yr scan -s`, inspect: `yr dump -m pe`
signature-base	Study quality examples
YARA-CI	Goodware corpus testing before deployment

Master these five. Don't get distracted by tool catalogs.

Rationalizations to Reject

When you catch yourself thinking these, stop and reconsider.

Rationalization	Expert Response
"This generic string is unique enough"	Test against goodware first. Your intuition is wrong.
"yarGen gave me these strings"	yarGen suggests, you validate. Check each one manually.
"It works on my 10 samples"	10 samples ≠ production. Use VirusTotal goodware corpus.
"One rule to catch all variants"	Causes FP floods. Target specific families.
"I'll make it more specific if we get FPs"	Write tight rules upfront. FPs burn trust.
"This hex pattern is unique"	Unique in one sample ≠ unique across malware ecosystem.
"Performance doesn't matter"	One slow rule slows entire ruleset. Optimize atoms.
"PEiD rules still work"	Obsolete. 32-bit packers aren't relevant.
"I'll add more conditions later"	Weak rules deployed = damage done.
"This is just for hunting"	Hunting rules become detection rules. Same quality bar.
"The API name makes it malicious"	Legitimate software uses same APIs. Need behavioral context.
"any of them is fine for these common strings"	Common strings + any = FP flood. Use `any of` only for individually unique strings.
"This regex is specific enough"	`/fetch.*token/` matches all auth code. Add exfil destination requirement.
"The JavaScript looks clean"	Attackers poison legitimate code with injects. Check for eval+decode chains.
"I'll use .* for flexibility"	Unbounded regex = performance disaster + memory explosion. Use `.{0,30}`.
"I'll use --relaxed-re-syntax everywhere"	Masks real bugs. Fix the regex instead of hiding problems.

Decision Trees

Is This String Good Enough?

Is this string good enough?
├─ Less than 4 bytes?
│  └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│  └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│  └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│  └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│  └─ NO — find malware-specific paths
├─ Unique to this malware family?
│  └─ YES — use it
└─ Appears in other malware too?
   └─ MAYBE — combine with family-specific marker

When to Use "all of" vs "any of"

Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│  └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│  └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│  └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
   └─ Tighten: switch any → all, add more required strings

Lesson from production: Rules using any of ($network_*) where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.

When to Abandon a Rule Approach

Stop and pivot when:

yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure
Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.
Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.
Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.
Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.

Debugging False Positives

FP Investigation Flow:
│
├─ 1. Which string matched?
│     Run: yr scan -s rule.yar false_positive.exe
│
├─ 2. Is it in a legitimate library?
│     └─ Add: not $fp_vendor_string exclusion
│
├─ 3. Is it a common development pattern?
│     └─ Find more specific indicator, replace the string
│
├─ 4. Are multiple generic strings matching together?
│     └─ Tighten to require all + add unique marker
│
└─ 5. Is the malware using common techniques?
      └─ Target malware-specific implementation details, not the technique

Hex vs Text vs Regex

What string type should I use?
│
├─ Exact ASCII/Unicode text?
│  └─ TEXT: $s = "MutexName" ascii wide
│
├─ Specific byte sequence?
│  └─ HEX: $h = { 4D 5A 90 00 }
│
├─ Byte sequence with variation?
│  └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
│
├─ Pattern with structure (URLs, paths)?
│  └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
│
└─ Unknown encoding (XOR, base64)?
   └─ TEXT with modifier: $s = "config" xor(0x00-0xFF)

Is the Sample Packed? (Check First)

Before writing any string-based rule:

Is the sample packed?
├─ Entropy > 7.0?
│  └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│  └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│  └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
   └─ Proceed with string-based detection

Expert guidance: Don't write rules against packed layers. The packing changes; the payload doesn't.

When Strings Fail, Pivot to Structure

If yarGen returns only API names and generic paths:

String extraction failed — what now?
├─ High entropy sections?
│  └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│  └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│  └─ Target section names, sizes, characteristics
├─ Metadata present?
│  └─ Target version info, timestamps, resources
└─ Nothing unique?
   └─ This sample may not be detectable with YARA alone

Expert guidance: "One can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constant." — Kaspersky Applied YARA Training

Expert Heuristics

String selection: Mutex names are gold; C2 paths silver; error messages bronze. Stack strings are almost always unique. If you need >6 strings, you're over-fitting.

Condition design: Start with filesize <, then magic bytes, then strings, then modules. If >5 lines, split into multiple rules.

Quality signals: yarGen output needs 80% filtering. Rules matching <50% of variants are too narrow; matching goodware are too broad.

Modifier discipline:

Never use nocase or wide speculatively — only when you have confirmed evidence the case/encoding varies in samples
nocase doubles atom generation; wide doubles string matching — both have real costs
"If you don't have a clear reason for using those modifiers, don't do it" — Kaspersky Applied YARA

Regex anchoring:

Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance
Always anchor regex to a distinctive literal: /mshta\.exe http:\/\/.../ not /http:\/\/.../
If you can't anchor, consider hex pattern with wildcards instead

Loop discipline:

Always bound loops with filesize: filesize < 100KB and for all i in (1..#a) : ...
Unbounded #a can be thousands in large files — exponential slowdown

YARA-X tips: $_unused to suppress warnings; private $s to hide from output; yr check + yr fmt before every commit.

When to Use Modules vs. Byte Checks

Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│  └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│  └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│  └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│  └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
   └─ Use lnk module — LNK format is complex

Expert guidance: "Avoid the magic module — use explicit hex checks instead" — Neo23x0. Apply this principle: if you can do it with uint32(), don't load a module.

YARA-X New Features

Key additions from recent releases:

Private patterns (v1.3.0+): private $helper = "pattern" — matches but hidden from output
Warning suppression (v1.4.0+): // suppress: slow_pattern inline comments
Numeric underscores (v1.5.0+): filesize < 10_000_000 for readability
Built-in formatter: yr fmt rules/ to standardize formatting
NDJSON output: yr scan --output-format ndjson for tooling

YARA-X Tooling Workflow

YARA-X provides diagnostic tools legacy YARA lacks:

Rule development cycle:

# 1. Write initial rule
# 2. Check syntax with detailed errors
yr check rule.yar

# 3. Format consistently
yr fmt -w rule.yar

# 4. Dump module output to inspect file structure (no dummy rule needed)
yr dump -m pe sample.exe --output-format yaml

# 5. Scan with timing info
time yr scan -s rule.yar corpus/

When to use yr dump:

Investigating what PE/ELF/Mach-O fields are available
Debugging why module conditions aren't matching
Exploring new modules (crx, lnk, dotnet) before writing rules

YARA-X diagnostic advantage: Error messages include precise source locations. If yr check points to line 15, the issue is actually on line 15 (unlike legacy YARA).

Chrome Extension Analysis (crx module)

The crx module enables detection of malicious Chrome extensions. Requires YARA-X v1.5.0+ (basic), v1.11.0+ for permhash().

Key APIs: crx.is_crx, crx.permissions, crx.permhash()

Red flags: nativeMessaging + downloads, debugger permission, content scripts on <all_urls>

import "crx"

rule SUSP_CRX_HighRiskPerms {
    condition:
        crx.is_crx and
        for any perm in crx.permissions : (perm == "debugger")
}

See crx-module.md for complete API reference, permission risk assessment, and example rules.

Android DEX Analysis (dex module)

The dex module enables detection of Android malware. Requires YARA-X v1.11.0+. Not compatible with legacy YARA's dex module — API is completely different.

Key APIs: dex.is_dex, dex.contains_class(), dex.contains_method(), dex.contains_string()

Red flags: Single-letter class names (obfuscation), DexClassLoader reflection, encrypted assets

import "dex"

rule SUSP_DEX_DynamicLoading {
    condition:
        dex.is_dex and
        dex.contains_class("Ldalvik/system/DexClassLoader;")
}

See dex-module.md for complete API reference, obfuscation detection, and example rules.

Migrating from Legacy YARA

YARA-X has 99% rule compatibility, but enforces stricter validation.

Quick migration:

yr check --relaxed-re-syntax rules/  # Identify issues
# Fix each issue, then:
yr check rules/  # Verify without relaxed mode

Common fixes:

Issue	Legacy	YARA-X Fix
Literal `{` in regex	`/{/`	`/\{/`
Invalid escapes	`\R` silently literal	`\\R` or `R`
Base64 strings	Any length	3+ chars required
Negative indexing	`@a[-1]`	`@a[#a - 1]`
Duplicate modifiers	Allowed	Remove duplicates

Note: Use --relaxed-re-syntax only as a diagnostic tool. Fix issues rather than relying on relaxed mode.

Quick Reference

Naming Convention

{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}

Common prefixes: MAL_ (malware), HKTL_ (hacking tool), WEBSHELL_, EXPL_, SUSP_ (suspicious), GEN_ (generic)

Platforms: Win_, Lnx_, Mac_, Android_, CRX_

Example: MAL_Win_Emotet_Loader_Jan25

See style-guide.md for full conventions, metadata requirements, and naming examples.

Required Metadata

Every rule needs: description (starts with "Detects"), author, reference, date.

meta:
    description = "Detects Example malware via unique mutex and C2 path"
    author = "Your Name <email@example.com>"
    reference = "https://example.com/analysis"
    date = "2025-01-29"

String Selection

Good: Mutex names, PDB paths, C2 paths, stack strings, configuration markers Bad: API names, common executables, format specifiers, generic paths

See strings.md for the full decision tree and examples.

Condition Patterns

Order conditions for short-circuit:

filesize < 10MB (instant)
uint16(0) == 0x5A4D (nearly instant)
String matches (cheap)
Module checks (expensive)

See performance.md for detailed optimization patterns.

Workflow

Gather samples — Multiple samples; single-sample rules are brittle
Extract candidates — yarGen -m samples/ --excludegood
Validate quality — Use decision tree; yarGen needs 80% filtering
Write initial rule — Follow template with proper metadata
Lint and test — yr check, yr fmt, linter script
Goodware validation — VirusTotal corpus or local clean files
Deploy — Add to repo with full metadata, monitor for FPs

See testing.md for detailed validation workflow and FP investigation.

For a comprehensive step-by-step guide covering all phases from sample collection to deployment, see rule-development.md.

Common Mistakes

Mistake	Bad	Good
API names as indicators	`"VirtualAlloc"`	Hex pattern of call site + unique mutex
Unbounded regex	`/https?:\/\/.*/`	`/https?:\/\/[a-z0-9]{8,12}\.onion/`
Missing file type filter	`pe.imports(...)` first	`uint16(0) == 0x5A4D and filesize < 10MB` first
Short strings	`"abc"` (3 bytes)	`"abcdef"` (4+ bytes)
Unescaped braces (YARA-X)	`/config{key}/`	`/config\{key\}/`

Performance Optimization

Quick wins: Put filesize first, avoid nocase, bounded regex {1,100}, prefer hex over regex.

Red flags: Strings <4 bytes, unbounded regex (.*), modules without file-type filter.

See performance.md for atom theory and optimization details.

Reference Documents

Topic	Document
Naming and metadata conventions	style-guide.md
Performance and atom optimization	performance.md
String types and judgment	strings.md
Testing and validation	testing.md
Chrome extension module (crx)	crx-module.md
Android DEX module (dex)	dex-module.md

Workflows

Topic	Document
Complete rule development process	rule-development.md

Example Rules

The examples/ directory contains real, attributed rules demonstrating best practices:

Example	Demonstrates	Source
MAL_Win_Remcos_Jan25.yar	PE malware: graduated string counts, multiple rules per family	Elastic Security
MAL_Mac_ProtonRAT_Jan25.yar	macOS: Mach-O magic bytes, multi-category grouping	Airbnb BinaryAlert
MAL_NPM_SupplyChain_Jan25.yar	npm supply chain: real attack patterns, ERC-20 selectors	Stairwell Research
SUSP_JS_Obfuscation_Jan25.yar	JavaScript: obfuscator detection, density-based matching	imp0rtp3, Nils Kuhnert
SUSP_CRX_SuspiciousPermissions.yar	Chrome extensions: crx module, permissions	Educational

Scripts

uv run {baseDir}/scripts/yara_lint.py rule.yar      # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar  # Check string quality

See README.md for detailed script documentation.

Quality Checklist

Before deploying any rule:

Resources

Quality YARA Rule Repositories

Learn from production rules. These repositories contain well-tested, properly attributed rules:

Repository	Focus	Maintainer
Neo23x0/signature-base	17,000+ production rules, multi-platform	Florian Roth
Elastic/protections-artifacts	1,000+ endpoint-tested rules	Elastic Security
reversinglabs/reversinglabs-yara-rules	Threat research rules	ReversingLabs
imp0rtp3/js-yara-rules	JavaScript/browser malware	imp0rtp3
InQuest/awesome-yara	Curated index of resources	InQuest

Style & Performance Guides

Guide	Purpose
YARA Style Guide	Naming conventions, metadata, string prefixes
YARA Performance Guidelines	Atom optimization, regex bounds
Kaspersky Applied YARA Training	Expert techniques from production use

Tools

Tool	Purpose
yarGen	Extract candidate strings from samples
FLOSS	Extract obfuscated and stack strings
YARA-CI	Automated goodware testing
YaraDbg	Web-based rule debugger

macOS-Specific Resources

Resource	Purpose
Apple XProtect	Production macOS rules at `/System/Library/CoreServices/XProtect.bundle/`
objective-see	macOS malware research and samples
macOS Security Tools	Reference list

Multi-Indicator Clustering Pattern

Production rules often group indicators by type:

strings:
    // Category A: Library indicators
    $a1 = "SRWebSocket" ascii
    $a2 = "SocketRocket" ascii

    // Category B: Behavioral indicators
    $b1 = "SSH tunnel" ascii
    $b2 = "keylogger" ascii nocase

    // Category C: C2 patterns
    $c1 = /https:\/\/[a-z0-9]{8,16}\.onion/

condition:
    filesize < 10MB and
    any of ($a*) and any of ($b*)  // Require evidence from BOTH categories

Why this works: Different indicator types have different confidence levels. A single C2 domain might be definitive, while you need multiple library imports to be confident. Grouping by $a*, $b*, $c* lets you express graduated requirements.

Trail of Bits Skills

/ask-questions-if-underspecified

Source: ~/.claude/skills/tob-ask-questions-if-underspecified/skills/ask-questions-if-underspecified/SKILL.md

name: ask-questions-if-underspecified description: Clarify requirements before implementing. Use when serious doubts arise.

Ask Questions If Underspecified

When to Use

When NOT to Use

Goal

Workflow

1) Decide whether the request is underspecified

2) Ask must-have questions first (keep it small)

3) Pause before acting

4) Confirm interpretation, then proceed

Question templates

Anti-patterns

/audit-context-building

Source: ~/.claude/skills/tob-audit-context-building/skills/audit-context-building/SKILL.md

name: audit-context-building description: Enables ultra-granular, line-by-line code analysis to build deep architectural context before vulnerability or bug finding.

Deep Context Builder Skill (Ultra-Granular Pure Context Mode)

1. Purpose

2. When to Use This Skill

3. How This Skill Behaves

Rationalizations (Do Not Skip)

4. Phase 1 — Initial Orientation (Bottom-Up Scan)

5. Phase 2 — Ultra-Granular Function Analysis (Default Mode)

5.1 Per-Function Microstructure Checklist

5.2 Cross-Function & External Flow Analysis

Internal Calls

External Calls — Two Cases

Continuity Rule

5.3 Complete Analysis Example

5.4 Output Requirements

5.5 Completeness Checklist

6. Phase 3 — Global System Understanding

7. Stability & Consistency Rules

8. Subagent Usage

9. Relationship to Other Phases

10. Non-Goals

/algorand-vulnerability-scanner

Source: ~/.claude/skills/tob-building-secure-contracts/skills/algorand-vulnerability-scanner/SKILL.md

name: algorand-vulnerability-scanner description: Scans Algorand smart contracts for 11 common vulnerabilities including rekeying attacks, unchecked transaction fees, missing field validations, and access control issues. Use when auditing Algorand projects (TEAL/PyTeal).

Algorand Vulnerability Scanner

1. Purpose

2. When to Use This Skill

3. Platform Detection

File Extensions & Indicators

Language/Framework Markers

Project Structure

Tool Support

4. How This Skill Works

5. Example Output

Step 3: Manual Vulnerability Sweep

Step 4: Transaction Field Validation Matrix

Step 5: Group Transaction Analysis

Step 6: Access Control Review

6. Reporting Format

Finding Template

Scenario Testing

9. Additional Resources

10. Quick Reference Checklist

/audit-prep-assistant

Source: ~/.claude/skills/tob-building-secure-contracts/skills/audit-prep-assistant/SKILL.md

Audit Prep Assistant

Purpose

The Preparation Process

Step 1: Set Review Goals

Step 2: Resolve Easy Issues

Step 3: Ensure Code Accessibility

Step 4: Generate Documentation

How I Work

Rationalizations (Do Not Skip)

Example Output

DOCUMENTATION

DEPLOYMENT INFO

/cairo-vulnerability-scanner

Source: ~/.claude/skills/tob-building-secure-contracts/skills/cairo-vulnerability-scanner/SKILL.md

name: cairo-vulnerability-scanner description: Scans Cairo/StarkNet smart contracts for 6 critical vulnerabilities including felt252 arithmetic overflow, L1-L2 messaging issues, address conversion problems, and signature replay. Use when auditing StarkNet projects.

Cairo/StarkNet Vulnerability Scanner

1. Purpose

2. When to Use This Skill

Source: `~/.claude/skills/tob-ask-questions-if-underspecified/skills/ask-questions-if-underspecified/SKILL.md`

Source: `~/.claude/skills/tob-audit-context-building/skills/audit-context-building/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/algorand-vulnerability-scanner/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/audit-prep-assistant/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/cairo-vulnerability-scanner/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/code-maturity-assessor/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/cosmos-vulnerability-scanner/SKILL.md`

Source: `~/.claude/skills/tob-building-secure-contracts/skills/guidelines-advisor/SKILL.md`