Document Skills

/pdf
/docx
/pptx
/xlsx
/doc-coauthoring

/pdf

Source: `~/.claude/skills/pdf/SKILL.md`

name: pdf description: Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill. license: Proprietary. LICENSE.txt has complete terms

PDF Processing Guide

Overview

This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.

Quick Start

from pypdf import PdfReader, PdfWriter

# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")

# Extract text
text = ""
for page in reader.pages:
    text += page.extract_text()

Python Libraries

pypdf - Basic Operations

Merge PDFs

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Split PDF

reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i+1}.pdf", "wb") as output:
        writer.write(output)

Extract Metadata

reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")

Rotate Pages

reader = PdfReader("input.pdf")
writer = PdfWriter()

page = reader.pages[0]
page.rotate(90)  # Rotate 90 degrees clockwise
writer.add_page(page)

with open("rotated.pdf", "wb") as output:
    writer.write(output)

pdfplumber - Text and Table Extraction

Extract Text with Layout

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        print(text)

Extract Tables

with pdfplumber.open("document.pdf") as pdf:
    for i, page in enumerate(pdf.pages):
        tables = page.extract_tables()
        for j, table in enumerate(tables):
            print(f"Table {j+1} on page {i+1}:")
            for row in table:
                print(row)

Advanced Table Extraction

import pandas as pd

with pdfplumber.open("document.pdf") as pdf:
    all_tables = []
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            if table:  # Check if table is not empty
                df = pd.DataFrame(table[1:], columns=table[0])
                all_tables.append(df)

# Combine all tables
if all_tables:
    combined_df = pd.concat(all_tables, ignore_index=True)
    combined_df.to_excel("extracted_tables.xlsx", index=False)

reportlab - Create PDFs

Basic PDF Creation

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter

# Add text
c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")

# Add a line
c.line(100, height - 140, 400, height - 140)

# Save
c.save()

Create PDF with Multiple Pages

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []

# Add content
title = Paragraph("Report Title", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))

body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())

# Page 2
story.append(Paragraph("Page 2", styles['Heading1']))
story.append(Paragraph("Content for page 2", styles['Normal']))

# Build PDF
doc.build(story)

Subscripts and Superscripts

IMPORTANT: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.

Instead, use ReportLab's XML markup tags in Paragraph objects:

from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet

styles = getSampleStyleSheet()

# Subscripts: use <sub> tag
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])

# Superscripts: use <super> tag
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])

For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.

Command-Line Tools

pdftotext (poppler-utils)

# Extract text
pdftotext input.pdf output.txt

# Extract text preserving layout
pdftotext -layout input.pdf output.txt

# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt  # Pages 1-5

qpdf

# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf

# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1  # Rotate page 1 by 90 degrees

# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf

pdftk (if available)

# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf

# Split
pdftk input.pdf burst

# Rotate
pdftk input.pdf rotate 1east output rotated.pdf

Common Tasks

Extract Text from Scanned PDFs

# Requires: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path

# Convert PDF to images
images = convert_from_path('scanned.pdf')

# OCR each page
text = ""
for i, image in enumerate(images):
    text += f"Page {i+1}:\n"
    text += pytesseract.image_to_string(image)
    text += "\n\n"

print(text)

Add Watermark

from pypdf import PdfReader, PdfWriter

# Create watermark (or load existing)
watermark = PdfReader("watermark.pdf").pages[0]

# Apply to all pages
reader = PdfReader("document.pdf")
writer = PdfWriter()

for page in reader.pages:
    page.merge_page(watermark)
    writer.add_page(page)

with open("watermarked.pdf", "wb") as output:
    writer.write(output)

Extract Images

# Using pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix

# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.

Password Protection

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

# Add password
writer.encrypt("userpassword", "ownerpassword")

with open("encrypted.pdf", "wb") as output:
    writer.write(output)

Quick Reference

Task	Best Tool	Command/Code
Merge PDFs	pypdf	`writer.add_page(page)`
Split PDFs	pypdf	One page per file
Extract text	pdfplumber	`page.extract_text()`
Extract tables	pdfplumber	`page.extract_tables()`
Create PDFs	reportlab	Canvas or Platypus
Command line merge	qpdf	`qpdf --empty --pages ...`
OCR scanned PDFs	pytesseract	Convert to image first
Fill PDF forms	pdf-lib or pypdf (see FORMS.md)	See FORMS.md

Next Steps

For advanced pypdfium2 usage, see REFERENCE.md
For JavaScript libraries (pdf-lib), see REFERENCE.md
If you need to fill out a PDF form, follow the instructions in FORMS.md
For troubleshooting guides, see REFERENCE.md

/docx

Source: `~/.claude/skills/docx/SKILL.md`

name: docx description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of "Word doc", "word document", ".docx", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a "report", "memo", "letter", "template", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation." license: Proprietary. LICENSE.txt has complete terms

DOCX creation, editing, and analysis

Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

Task	Approach
Read/analyze content	`pandoc` or unpack for raw XML
Create new document	Use `docx-js` - see Creating New Documents below
Edit existing document	Unpack → edit XML → repack - see Editing Existing Documents below

Converting .doc to .docx

Legacy .doc files must be converted before editing:

python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md

# Raw XML access
python scripts/office/unpack.py document.docx unpacked/

Converting to Images

python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

Accepting Tracked Changes

To produce a clean document with all tracked changes accepted (requires LibreOffice):

python scripts/accept_changes.py input.docx output.docx

Creating New Documents

Generate .docx files with JavaScript, then validate. Install: npm install -g docx

Setup

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

Validation

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.

python scripts/office/validate.py doc.docx

Page Size

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
  properties: {
    page: {
      size: {
        width: 12240,   // 8.5 inches in DXA
        height: 15840   // 11 inches in DXA
      },
      margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
    }
  },
  children: [/* content */]
}]

Common page sizes (DXA units, 1440 DXA = 1 inch):

Paper	Width	Height	Content Width (1" margins)
US Letter	12,240	15,840	9,360
A4 (default)	11,906	16,838	9,026

Landscape orientation: docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:

size: {
  width: 12240,   // Pass SHORT edge as width
  height: 15840,  // Pass LONG edge as height
  orientation: PageOrientation.LANDSCAPE  // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)

Styles (Override Built-in Headings)

Use Arial as the default font (universally supported). Keep titles black for readability.

const doc = new Document({
  styles: {
    default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
    paragraphStyles: [
      // IMPORTANT: Use exact IDs to override built-in styles
      { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 32, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
      { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
        run: { size: 28, bold: true, font: "Arial" },
        paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
    ]
  }]
});

Lists (NEVER use unicode bullets)

// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] })  // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] })  // BAD

// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
  numbering: {
    config: [
      { reference: "bullets",
        levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
      { reference: "numbers",
        levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
          style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
    ]
  },
  sections: [{
    children: [
      new Paragraph({ numbering: { reference: "bullets", level: 0 },
        children: [new TextRun("Bullet item")] }),
      new Paragraph({ numbering: { reference: "numbers", level: 0 },
        children: [new TextRun("Numbered item")] }),
    ]
  }]
});

// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)

Tables

CRITICAL: Tables need dual widths - set both columnWidths on the table AND width on each cell. Without both, tables render incorrectly on some platforms.

// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

new Table({
  width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
  columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
  rows: [
    new TableRow({
      children: [
        new TableCell({
          borders,
          width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
          shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
          margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
          children: [new Paragraph({ children: [new TextRun("Cell")] })]
        })
      ]
    })
  ]
})

Table width calculation:

Always use WidthType.DXA — WidthType.PERCENTAGE breaks in Google Docs.

// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360]  // Must sum to table width

Width rules:

Always use WidthType.DXA — never WidthType.PERCENTAGE (incompatible with Google Docs)
Table width must equal the sum of columnWidths
Cell width must match corresponding columnWidth
Cell margins are internal padding - they reduce content area, not add to cell width
For full-width tables: use content width (page width minus left and right margins)

Images

// CRITICAL: type parameter is REQUIRED
new Paragraph({
  children: [new ImageRun({
    type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
    data: fs.readFileSync("image.png"),
    transformation: { width: 200, height: 150 },
    altText: { title: "Title", description: "Desc", name: "Name" } // All three required
  })]
})

Page Breaks

// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })

// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })

// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })

Headers/Footers

sections: [{
  properties: {
    page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
  },
  headers: {
    default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
  },
  footers: {
    default: new Footer({ children: [new Paragraph({
      children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
    })] })
  },
  children: [/* content */]
}]

Critical Rules for docx-js

Set page size explicitly - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
Landscape: pass portrait dimensions - docx-js swaps width/height internally; pass short edge as width, long edge as height, and set orientation: PageOrientation.LANDSCAPE
Never use \n - use separate Paragraph elements
Never use unicode bullets - use LevelFormat.BULLET with numbering config
PageBreak must be in Paragraph - standalone creates invalid XML
ImageRun requires type - always specify png/jpg/etc
Always set table width with DXA - never use WidthType.PERCENTAGE (breaks in Google Docs)
Tables need dual widths - columnWidths array AND cell width, both must match
Table width = sum of columnWidths - for DXA, ensure they add up exactly
Always add cell margins - use margins: { top: 80, bottom: 80, left: 120, right: 120 } for readable padding
Use ShadingType.CLEAR - never SOLID for table shading
TOC requires HeadingLevel only - no custom styles on heading paragraphs
Override built-in styles - use exact IDs: "Heading1", "Heading2", etc.
Include outlineLevel - required for TOC (0 for H1, 1 for H2, etc.)

Editing Existing Documents

Follow all 3 steps in order.

Step 1: Unpack

python scripts/office/unpack.py document.docx unpacked/

Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (“ etc.) so they survive editing. Use --merge-runs false to skip run merging.

Step 2: Edit XML

Edit files in unpacked/word/. See XML Reference below for patterns.

Use "Claude" as the author for tracked changes and comments, unless the user explicitly requests use of a different name.

Use the Edit tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.

CRITICAL: Use smart quotes for new content. When adding text with apostrophes or quotes, use XML entities to produce smart quotes:

<!-- Use these entities for professional typography -->
<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>

Entity	Character
`‘`	‘ (left single)
`’`	’ (right single / apostrophe)
`“`	“ (left double)
`”`	” (right double)

Adding comments: Use comment.py to handle boilerplate across multiple XML files (text must be pre-escaped XML):

python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0  # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author"  # custom author name

Then add markers to document.xml (see Comments in XML Reference).

Step 3: Pack

python scripts/office/pack.py unpacked/ output.docx --original document.docx

Validates with auto-repair, condenses XML, and creates DOCX. Use --validate false to skip.

Auto-repair will fix:

durableId >= 0x7FFFFFFF (regenerates valid ID)
Missing xml:space="preserve" on <w:t> with whitespace

Auto-repair won't fix:

Malformed XML, invalid element nesting, missing relationships, schema violations

Common Pitfalls

Replace entire <w:r> elements: When adding tracked changes, replace the whole <w:r>...</w:r> block with <w:del>...<w:ins>... as siblings. Don't inject tracked change tags inside a run.
Preserve <w:rPr> formatting: Copy the original run's <w:rPr> block into your tracked change runs to maintain bold, font size, etc.

XML Reference

Schema Compliance

Element order in <w:pPr>: <w:pStyle>, <w:numPr>, <w:spacing>, <w:ind>, <w:jc>, <w:rPr> last
Whitespace: Add xml:space="preserve" to <w:t> with leading/trailing spaces
RSIDs: Must be 8-digit hex (e.g., 00AB1234)

Tracked Changes

Insertion:

<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:t>inserted text</w:t></w:r>
</w:ins>

Deletion:

<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>

Inside <w:del>: Use <w:delText> instead of <w:t>, and <w:delInstrText> instead of <w:instrText>.

Minimal edits - only mark what changes:

<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
  <w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
  <w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>

Deleting entire paragraphs/list items - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add <w:del/> inside <w:pPr><w:rPr>:

<w:p>
  <w:pPr>
    <w:numPr>...</w:numPr>  <!-- list numbering if present -->
    <w:rPr>
      <w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
    </w:rPr>
  </w:pPr>
  <w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
    <w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
  </w:del>
</w:p>

Without the <w:del/> in <w:pPr><w:rPr>, accepting changes leaves an empty paragraph/list item.

Rejecting another author's insertion - nest deletion inside their insertion:

<w:ins w:author="Jane" w:id="5">
  <w:del w:author="Claude" w:id="10">
    <w:r><w:delText>their inserted text</w:delText></w:r>
  </w:del>
</w:ins>

Restoring another author's deletion - add insertion after (don't modify their deletion):

<w:del w:author="Jane" w:id="5">
  <w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
  <w:r><w:t>deleted text</w:t></w:r>
</w:ins>

Comments

After running comment.py (see Step 2), add markers to document.xml. For replies, use --parent flag and nest markers inside the parent's.

CRITICAL: <w:commentRangeStart> and <w:commentRangeEnd> are siblings of <w:r>, never inside <w:r>.

<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
  <w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
  <w:commentRangeStart w:id="1"/>
  <w:r><w:t>text</w:t></w:r>
  <w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>

Images

Add image file to word/media/
Add relationship to word/_rels/document.xml.rels:

<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>

Add content type to [Content_Types].xml:

<Default Extension="png" ContentType="image/png"/>

Reference in document.xml:

<w:drawing>
  <wp:inline>
    <wp:extent cx="914400" cy="914400"/>  <!-- EMUs: 914400 = 1 inch -->
    <a:graphic>
      <a:graphicData uri=".../picture">
        <pic:pic>
          <pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
        </pic:pic>
      </a:graphicData>
    </a:graphic>
  </wp:inline>
</w:drawing>

Dependencies

pandoc: Text extraction
docx: npm install -g docx (new documents)
LibreOffice: PDF conversion (auto-configured for sandboxed environments via scripts/office/soffice.py)
Poppler: pdftoppm for images

/pptx

Source: `~/.claude/skills/pptx/SKILL.md`

name: pptx description: "Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions "deck," "slides," "presentation," or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill." license: Proprietary. LICENSE.txt has complete terms

PPTX Skill

Quick Reference

Task	Guide
Read/analyze content	`python -m markitdown presentation.pptx`
Edit or create from template	Read editing.md
Create from scratch	Read pptxgenjs.md

Reading Content

# Text extraction
python -m markitdown presentation.pptx

# Visual overview
python scripts/thumbnail.py presentation.pptx

# Raw XML
python scripts/office/unpack.py presentation.pptx unpacked/

Editing Workflow

Read editing.md for full details.

Analyze template with thumbnail.py
Unpack → manipulate slides → edit content → clean → pack

Creating from Scratch

Read pptxgenjs.md for full details.

Use when no template or reference presentation is available.

Design Ideas

Don't create boring slides. Plain bullets on a white background won't impress anyone. Consider ideas from this list for each slide.

Before Starting

Pick a bold, content-informed color palette: The palette should feel designed for THIS topic. If swapping your colors into a completely different presentation would still "work," you haven't made specific enough choices.
Dominance over equality: One color should dominate (60-70% visual weight), with 1-2 supporting tones and one sharp accent. Never give all colors equal weight.
Dark/light contrast: Dark backgrounds for title + conclusion slides, light for content ("sandwich" structure). Or commit to dark throughout for a premium feel.
Commit to a visual motif: Pick ONE distinctive element and repeat it — rounded image frames, icons in colored circles, thick single-side borders. Carry it across every slide.

Color Palettes

Choose colors that match your topic — don't default to generic blue. Use these palettes as inspiration:

Theme	Primary	Secondary	Accent
Midnight Executive	`1E2761` (navy)	`CADCFC` (ice blue)	`FFFFFF` (white)
Forest & Moss	`2C5F2D` (forest)	`97BC62` (moss)	`F5F5F5` (cream)
Coral Energy	`F96167` (coral)	`F9E795` (gold)	`2F3C7E` (navy)
Warm Terracotta	`B85042` (terracotta)	`E7E8D1` (sand)	`A7BEAE` (sage)
Ocean Gradient	`065A82` (deep blue)	`1C7293` (teal)	`21295C` (midnight)
Charcoal Minimal	`36454F` (charcoal)	`F2F2F2` (off-white)	`212121` (black)
Teal Trust	`028090` (teal)	`00A896` (seafoam)	`02C39A` (mint)
Berry & Cream	`6D2E46` (berry)	`A26769` (dusty rose)	`ECE2D0` (cream)
Sage Calm	`84B59F` (sage)	`69A297` (eucalyptus)	`50808E` (slate)
Cherry Bold	`990011` (cherry)	`FCF6F5` (off-white)	`2F3C7E` (navy)

For Each Slide

Every slide needs a visual element — image, chart, icon, or shape. Text-only slides are forgettable.

Layout options:

Two-column (text left, illustration on right)
Icon + text rows (icon in colored circle, bold header, description below)
2x2 or 2x3 grid (image on one side, grid of content blocks on other)
Half-bleed image (full left or right side) with content overlay

Data display:

Large stat callouts (big numbers 60-72pt with small labels below)
Comparison columns (before/after, pros/cons, side-by-side options)
Timeline or process flow (numbered steps, arrows)

Visual polish:

Icons in small colored circles next to section headers
Italic accent text for key stats or taglines

Typography

Choose an interesting font pairing — don't default to Arial. Pick a header font with personality and pair it with a clean body font.

Header Font	Body Font
Georgia	Calibri
Arial Black	Arial
Calibri	Calibri Light
Cambria	Calibri
Trebuchet MS	Calibri
Impact	Arial
Palatino	Garamond
Consolas	Calibri

Element	Size
Slide title	36-44pt bold
Section header	20-24pt bold
Body text	14-16pt
Captions	10-12pt muted

Spacing

0.5" minimum margins
0.3-0.5" between content blocks
Leave breathing room—don't fill every inch

Avoid (Common Mistakes)

Don't repeat the same layout — vary columns, cards, and callouts across slides
Don't center body text — left-align paragraphs and lists; center only titles
Don't skimp on size contrast — titles need 36pt+ to stand out from 14-16pt body
Don't default to blue — pick colors that reflect the specific topic
Don't mix spacing randomly — choose 0.3" or 0.5" gaps and use consistently
Don't style one slide and leave the rest plain — commit fully or keep it simple throughout
Don't create text-only slides — add images, icons, charts, or visual elements; avoid plain title + bullets
Don't forget text box padding — when aligning lines or shapes with text edges, set margin: 0 on the text box or offset the shape to account for padding
Don't use low-contrast elements — icons AND text need strong contrast against the background; avoid light text on light backgrounds or dark text on dark backgrounds
NEVER use accent lines under titles — these are a hallmark of AI-generated slides; use whitespace or background color instead

QA (Required)

Assume there are problems. Your job is to find them.

Your first render is almost never correct. Approach QA as a bug hunt, not a confirmation step. If you found zero issues on first inspection, you weren't looking hard enough.

Content QA

python -m markitdown output.pptx

Check for missing content, typos, wrong order.

When using templates, check for leftover placeholder text:

python -m markitdown output.pptx | grep -iE "xxxx|lorem|ipsum|this.*(page|slide).*layout"

If grep returns results, fix them before declaring success.

Visual QA

⚠️ USE SUBAGENTS — even for 2-3 slides. You've been staring at the code and will see what you expect, not what's there. Subagents have fresh eyes.

Convert slides to images (see Converting to Images), then use this prompt:

Visually inspect these slides. Assume there are issues — find them.

Look for:
- Overlapping elements (text through shapes, lines through words, stacked elements)
- Text overflow or cut off at edges/box boundaries
- Decorative lines positioned for single-line text but title wrapped to two lines
- Source citations or footers colliding with content above
- Elements too close (< 0.3" gaps) or cards/sections nearly touching
- Uneven gaps (large empty area in one place, cramped in another)
- Insufficient margin from slide edges (< 0.5")
- Columns or similar elements not aligned consistently
- Low-contrast text (e.g., light gray text on cream-colored background)
- Low-contrast icons (e.g., dark icons on dark backgrounds without a contrasting circle)
- Text boxes too narrow causing excessive wrapping
- Leftover placeholder content

For each slide, list issues or areas of concern, even if minor.

Read and analyze these images:
1. /path/to/slide-01.jpg (Expected: [brief description])
2. /path/to/slide-02.jpg (Expected: [brief description])

Report ALL issues found, including minor ones.

Verification Loop

Generate slides → Convert to images → Inspect
List issues found (if none found, look again more critically)
Fix issues
Re-verify affected slides — one fix often creates another problem
Repeat until a full pass reveals no new issues

Do not declare success until you've completed at least one fix-and-verify cycle.

Converting to Images

Convert presentations to individual slide images for visual inspection:

python scripts/office/soffice.py --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf slide

This creates slide-01.jpg, slide-02.jpg, etc.

To re-render specific slides after fixes:

pdftoppm -jpeg -r 150 -f N -l N output.pdf slide-fixed

Dependencies

pip install "markitdown[pptx]" - text extraction
pip install Pillow - thumbnail grids
npm install -g pptxgenjs - creating from scratch
LibreOffice (soffice) - PDF conversion (auto-configured for sandboxed environments via scripts/office/soffice.py)
Poppler (pdftoppm) - PDF to images

/xlsx

Source: `~/.claude/skills/xlsx/SKILL.md`

name: xlsx description: "Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved." license: Proprietary. LICENSE.txt has complete terms

Requirements for Outputs

All Excel files

Professional Font

Use a consistent, professional font (e.g., Arial, Times New Roman) for all deliverables unless otherwise instructed by the user

Zero Formula Errors

Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)

Preserve Existing Templates (when updating templates)

Study and EXACTLY match existing format, style, and conventions when modifying files
Never impose standardized formatting on files with established patterns
Existing template conventions ALWAYS override these guidelines

Financial models

Color Coding Standards

Unless otherwise stated by the user or existing template

Industry-Standard Color Conventions

Blue text (RGB: 0,0,255): Hardcoded inputs, and numbers users will change for scenarios
Black text (RGB: 0,0,0): ALL formulas and calculations
Green text (RGB: 0,128,0): Links pulling from other worksheets within same workbook
Red text (RGB: 255,0,0): External links to other files
Yellow background (RGB: 255,255,0): Key assumptions needing attention or cells that need to be updated

Number Formatting Standards

Required Format Rules

Years: Format as text strings (e.g., "2024" not "2,024")
Currency: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
Zeros: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
Percentages: Default to 0.0% format (one decimal)
Multiples: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
Negative numbers: Use parentheses (123) not minus -123

Formula Construction Rules

Assumptions Placement

Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
Use cell references instead of hardcoded values in formulas
Example: Use =B5*(1+$B$6) instead of =B5*1.05

Formula Error Prevention

Verify all cell references are correct
Check for off-by-one errors in ranges
Ensure consistent formulas across all projection periods
Test with edge cases (zero values, negative numbers)
Verify no unintended circular references

Documentation Requirements for Hardcodes

Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
Examples:
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"

XLSX creation, editing, and analysis

Overview

A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.

Important Requirements

LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the scripts/recalc.py script. The script automatically configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by scripts/office/soffice.py)

Reading and analyzing data

Data analysis with pandas

For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:

import pandas as pd

# Read Excel
df = pd.read_excel('file.xlsx')  # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets as dict

# Analyze
df.head()      # Preview data
df.info()      # Column info
df.describe()  # Statistics

# Write Excel
df.to_excel('output.xlsx', index=False)

Excel File Workflows

CRITICAL: Use Formulas, Not Hardcoded Values

Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.

❌ WRONG - Hardcoding Calculated Values

# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total  # Hardcodes 5000

# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth  # Hardcodes 0.15

# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg  # Hardcodes 42.5

✅ CORRECT - Using Excel Formulas

# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'

# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'

# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'

This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.

Common Workflow

Choose tool: pandas for data, openpyxl for formulas/formatting
Create/Load: Create new workbook or load existing file
Modify: Add/edit data, formulas, and formatting
Save: Write to file
Recalculate formulas (MANDATORY IF USING FORMULAS): Use the scripts/recalc.py script
```
python scripts/recalc.py output.xlsx
```
Verify and fix any errors:
- The script returns JSON with error details
- If status is errors_found, check error_summary for specific error types and locations
- Fix the identified errors and recalculate again
- Common errors to fix:
  - #REF!: Invalid cell references
  - #DIV/0!: Division by zero
  - #VALUE!: Wrong data type in formula
  - #NAME?: Unrecognized formula name

Creating new Excel files

# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
sheet = wb.active

# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])

# Add formula
sheet['B2'] = '=SUM(A1:A10)'

# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')

# Column width
sheet.column_dimensions['A'].width = 20

wb.save('output.xlsx')

Editing existing Excel files

# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook

# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active  # or wb['SheetName'] for specific sheet

# Working with multiple sheets
for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    print(f"Sheet: {sheet_name}")

# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2)  # Insert row at position 2
sheet.delete_cols(3)  # Delete column 3

# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'

wb.save('modified.xlsx')

Recalculating formulas

Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided scripts/recalc.py script to recalculate formulas:

python scripts/recalc.py <excel_file> [timeout_seconds]

Example:

python scripts/recalc.py output.xlsx 30

The script:

Automatically sets up LibreOffice macro on first run
Recalculates all formulas in all sheets
Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
Returns JSON with detailed error locations and counts
Works on both Linux and macOS

Formula Verification Checklist

Quick checks to ensure formulas work correctly:

Essential Verification

Test 2-3 sample references: Verify they pull correct values before building full model
Column mapping: Confirm Excel columns match (e.g., column 64 = BL, not BK)
Row offset: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)

Common Pitfalls

NaN handling: Check for null values with pd.notna()
Far-right columns: FY data often in columns 50+
Multiple matches: Search all occurrences, not just first
Division by zero: Check denominators before using / in formulas (#DIV/0!)
Wrong references: Verify all cell references point to intended cells (#REF!)
Cross-sheet references: Use correct format (Sheet1!A1) for linking sheets

Formula Testing Strategy

Start small: Test formulas on 2-3 cells before applying broadly
Verify dependencies: Check all cells referenced in formulas exist
Test edge cases: Include zero, negative, and very large values

Interpreting scripts/recalc.py Output

The script returns JSON with error details:

{
  "status": "success",           // or "errors_found"
  "total_errors": 0,              // Total error count
  "total_formulas": 42,           // Number of formulas in file
  "error_summary": {              // Only present if errors found
    "#REF!": {
      "count": 2,
      "locations": ["Sheet1!B5", "Sheet1!C10"]
    }
  }
}

Best Practices

Library Selection

pandas: Best for data analysis, bulk operations, and simple data export
openpyxl: Best for complex formatting, formulas, and Excel-specific features

Working with openpyxl

Cell indices are 1-based (row=1, column=1 refers to cell A1)
Use data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)
Warning: If opened with data_only=True and saved, formulas are replaced with values and permanently lost
For large files: Use read_only=True for reading or write_only=True for writing
Formulas are preserved but not evaluated - use scripts/recalc.py to update values

Working with pandas

Specify data types to avoid inference issues: pd.read_excel('file.xlsx', dtype={'id': str})
For large files, read specific columns: pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
Handle dates properly: pd.read_excel('file.xlsx', parse_dates=['date_column'])

Code Style Guidelines

IMPORTANT: When generating Python code for Excel operations:

Write minimal, concise Python code without unnecessary comments
Avoid verbose variable names and redundant operations
Avoid unnecessary print statements

For Excel files themselves:

Add comments to cells with complex formulas or important assumptions
Document data sources for hardcoded values
Include notes for key calculations and model sections

/doc-coauthoring

Source: `~/.claude/skills/doc-coauthoring/SKILL.md`

name: doc-coauthoring description: Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.

Doc Co-Authoring Workflow

This skill provides a structured workflow for guiding users through collaborative document creation. Act as an active guide, walking users through three stages: Context Gathering, Refinement & Structure, and Reader Testing.

When to Offer This Workflow

Trigger conditions:

User mentions writing documentation: "write a doc", "draft a proposal", "create a spec", "write up"
User mentions specific doc types: "PRD", "design doc", "decision doc", "RFC"
User seems to be starting a substantial writing task

Initial offer: Offer the user a structured workflow for co-authoring the document. Explain the three stages:

Context Gathering: User provides all relevant context while Claude asks clarifying questions
Refinement & Structure: Iteratively build each section through brainstorming and editing
Reader Testing: Test the doc with a fresh Claude (no context) to catch blind spots before others read it

Explain that this approach helps ensure the doc works well when others read it (including when they paste it into Claude). Ask if they want to try this workflow or prefer to work freeform.

If user declines, work freeform. If user accepts, proceed to Stage 1.

Stage 1: Context Gathering

Goal: Close the gap between what the user knows and what Claude knows, enabling smart guidance later.

Initial Questions

Start by asking the user for meta-context about the document:

What type of document is this? (e.g., technical spec, decision doc, proposal)
Who's the primary audience?
What's the desired impact when someone reads this?
Is there a template or specific format to follow?
Any other constraints or context to know?

Inform them they can answer in shorthand or dump information however works best for them.

If user provides a template or mentions a doc type:

Ask if they have a template document to share
If they provide a link to a shared document, use the appropriate integration to fetch it
If they provide a file, read it

If user mentions editing an existing shared document:

Use the appropriate integration to read the current state
Check for images without alt-text
If images exist without alt-text, explain that when others use Claude to understand the doc, Claude won't be able to see them. Ask if they want alt-text generated. If so, request they paste each image into chat for descriptive alt-text generation.

Info Dumping

Once initial questions are answered, encourage the user to dump all the context they have. Request information such as:

Background on the project/problem
Related team discussions or shared documents
Why alternative solutions aren't being used
Organizational context (team dynamics, past incidents, politics)
Timeline pressures or constraints
Technical architecture or dependencies
Stakeholder concerns

Advise them not to worry about organizing it - just get it all out. Offer multiple ways to provide context:

Info dump stream-of-consciousness
Point to team channels or threads to read
Link to shared documents

If integrations are available (e.g., Slack, Teams, Google Drive, SharePoint, or other MCP servers), mention that these can be used to pull in context directly.

If no integrations are detected and in Claude.ai or Claude app: Suggest they can enable connectors in their Claude settings to allow pulling context from messaging apps and document storage directly.

Inform them clarifying questions will be asked once they've done their initial dump.

During context gathering:

If user mentions team channels or shared documents:
- If integrations available: Inform them the content will be read now, then use the appropriate integration
- If integrations not available: Explain lack of access. Suggest they enable connectors in Claude settings, or paste the relevant content directly.
If user mentions entities/projects that are unknown:
- Ask if connected tools should be searched to learn more
- Wait for user confirmation before searching
As user provides context, track what's being learned and what's still unclear

Asking clarifying questions:

When user signals they've done their initial dump (or after substantial context provided), ask clarifying questions to ensure understanding:

Generate 5-10 numbered questions based on gaps in the context.

Inform them they can use shorthand to answer (e.g., "1: yes, 2: see #channel, 3: no because backwards compat"), link to more docs, point to channels to read, or just keep info-dumping. Whatever's most efficient for them.

Exit condition: Sufficient context has been gathered when questions show understanding - when edge cases and trade-offs can be asked about without needing basics explained.

Transition: Ask if there's any more context they want to provide at this stage, or if it's time to move on to drafting the document.

If user wants to add more, let them. When ready, proceed to Stage 2.

Goal: Build the document section by section through brainstorming, curation, and iterative refinement.

Instructions to user: Explain that the document will be built section by section. For each section:

Clarifying questions will be asked about what to include
5-20 options will be brainstormed
User will indicate what to keep/remove/combine
The section will be drafted
It will be refined through surgical edits

Start with whichever section has the most unknowns (usually the core decision/proposal), then work through the rest.

Section ordering:

If the document structure is clear: Ask which section they'd like to start with.

Suggest starting with whichever section has the most unknowns. For decision docs, that's usually the core proposal. For specs, it's typically the technical approach. Summary sections are best left for last.

If user doesn't know what sections they need: Based on the type of document and template, suggest 3-5 sections appropriate for the doc type.

Ask if this structure works, or if they want to adjust it.

Once structure is agreed:

Create the initial document structure with placeholder text for all sections.

If access to artifacts is available: Use create_file to create an artifact. This gives both Claude and the user a scaffold to work from.

Inform them that the initial structure with placeholders for all sections will be created.

Create artifact with all section headers and brief placeholder text like "[To be written]" or "[Content here]".

Provide the scaffold link and indicate it's time to fill in each section.

If no access to artifacts: Create a markdown file in the working directory. Name it appropriately (e.g., decision-doc.md, technical-spec.md).

Inform them that the initial structure with placeholders for all sections will be created.

Create file with all section headers and placeholder text.

Confirm the filename has been created and indicate it's time to fill in each section.

For each section:

Step 1: Clarifying Questions

Announce work will begin on the [SECTION NAME] section. Ask 5-10 clarifying questions about what should be included:

Generate 5-10 specific questions based on context and section purpose.

Inform them they can answer in shorthand or just indicate what's important to cover.

Step 2: Brainstorming

For the [SECTION NAME] section, brainstorm [5-20] things that might be included, depending on the section's complexity. Look for:

Context shared that might have been forgotten
Angles or considerations not yet mentioned

Generate 5-20 numbered options based on section complexity. At the end, offer to brainstorm more if they want additional options.

Step 3: Curation

Ask which points should be kept, removed, or combined. Request brief justifications to help learn priorities for the next sections.

Provide examples:

"Keep 1,4,7,9"
"Remove 3 (duplicates 1)"
"Remove 6 (audience already knows this)"
"Combine 11 and 12"

If user gives freeform feedback (e.g., "looks good" or "I like most of it but...") instead of numbered selections, extract their preferences and proceed. Parse what they want kept/removed/changed and apply it.

Step 4: Gap Check

Based on what they've selected, ask if there's anything important missing for the [SECTION NAME] section.

Step 5: Drafting

Use str_replace to replace the placeholder text for this section with the actual drafted content.

Announce the [SECTION NAME] section will be drafted now based on what they've selected.

If using artifacts: After drafting, provide a link to the artifact.

Ask them to read through it and indicate what to change. Note that being specific helps learning for the next sections.

If using a file (no artifacts): After drafting, confirm completion.

Inform them the [SECTION NAME] section has been drafted in [filename]. Ask them to read through it and indicate what to change. Note that being specific helps learning for the next sections.

Key instruction for user (include when drafting the first section): Provide a note: Instead of editing the doc directly, ask them to indicate what to change. This helps learning of their style for future sections. For example: "Remove the X bullet - already covered by Y" or "Make the third paragraph more concise".

Step 6: Iterative Refinement

As user provides feedback:

Use str_replace to make edits (never reprint the whole doc)
If using artifacts: Provide link to artifact after each edit
If using files: Just confirm edits are complete
If user edits doc directly and asks to read it: mentally note the changes they made and keep them in mind for future sections (this shows their preferences)

Continue iterating until user is satisfied with the section.

Quality Checking

After 3 consecutive iterations with no substantial changes, ask if anything can be removed without losing important information.

When section is done, confirm [SECTION NAME] is complete. Ask if ready to move to the next section.

Repeat for all sections.

Near Completion

As approaching completion (80%+ of sections done), announce intention to re-read the entire document and check for:

Flow and consistency across sections
Redundancy or contradictions
Anything that feels like "slop" or generic filler
Whether every sentence carries weight

Read entire document and provide feedback.

When all sections are drafted and refined: Announce all sections are drafted. Indicate intention to review the complete document one more time.

Review for overall coherence, flow, completeness.

Provide any final suggestions.

Ask if ready to move to Reader Testing, or if they want to refine anything else.

Stage 3: Reader Testing

Goal: Test the document with a fresh Claude (no context bleed) to verify it works for readers.

Instructions to user: Explain that testing will now occur to see if the document actually works for readers. This catches blind spots - things that make sense to the authors but might confuse others.

Testing Approach

If access to sub-agents is available (e.g., in Claude Code):

Perform the testing directly without user involvement.

Step 1: Predict Reader Questions

Announce intention to predict what questions readers might ask when trying to discover this document.

Generate 5-10 questions that readers would realistically ask.

Step 2: Test with Sub-Agent

Announce that these questions will be tested with a fresh Claude instance (no context from this conversation).

For each question, invoke a sub-agent with just the document content and the question.

Summarize what Reader Claude got right/wrong for each question.

Step 3: Run Additional Checks

Announce additional checks will be performed.

Invoke sub-agent to check for ambiguity, false assumptions, contradictions.

Summarize any issues found.

Step 4: Report and Fix

If issues found: Report that Reader Claude struggled with specific issues.

List the specific issues.

Indicate intention to fix these gaps.

Loop back to refinement for problematic sections.

If no access to sub-agents (e.g., claude.ai web interface):

The user will need to do the testing manually.

Step 1: Predict Reader Questions

Ask what questions people might ask when trying to discover this document. What would they type into Claude.ai?

Generate 5-10 questions that readers would realistically ask.

Step 2: Setup Testing

Provide testing instructions:

Open a fresh Claude conversation: https://claude.ai
Paste or share the document content (if using a shared doc platform with connectors enabled, provide the link)
Ask Reader Claude the generated questions

For each question, instruct Reader Claude to provide:

The answer
Whether anything was ambiguous or unclear
What knowledge/context the doc assumes is already known

Check if Reader Claude gives correct answers or misinterprets anything.

Step 3: Additional Checks

Also ask Reader Claude:

"What in this doc might be ambiguous or unclear to readers?"
"What knowledge or context does this doc assume readers already have?"
"Are there any internal contradictions or inconsistencies?"

Step 4: Iterate Based on Results

Ask what Reader Claude got wrong or struggled with. Indicate intention to fix those gaps.

Loop back to refinement for any problematic sections.

Exit Condition (Both Approaches)

When Reader Claude consistently answers questions correctly and doesn't surface new gaps or ambiguities, the doc is ready.

Final Review

When Reader Testing passes: Announce the doc has passed Reader Claude testing. Before completion:

Ask if they want one more review, or if the work is done.

If user wants final review, provide it. Otherwise: Announce document completion. Provide a few final tips:

Consider linking this conversation in an appendix so readers can see how the doc was developed
Use appendices to provide depth without bloating the main doc
Update the doc as feedback is received from real readers

Tips for Effective Guidance

Tone:

Be direct and procedural
Explain rationale briefly when it affects user behavior
Don't try to "sell" the approach - just execute it

Handling Deviations:

If user wants to skip a stage: Ask if they want to skip this and write freeform
If user seems frustrated: Acknowledge this is taking longer than expected. Suggest ways to move faster
Always give user agency to adjust the process

Context Management:

Throughout, if context is missing on something mentioned, proactively ask
Don't let gaps accumulate - address them as they come up

Artifact Management:

Use create_file for drafting full sections
Use str_replace for all edits
Provide artifact link after every change
Never use artifacts for brainstorming lists - that's just conversation

Quality over Speed:

Don't rush through stages
Each iteration should make meaningful improvements
The goal is a document that actually works for readers

Document Skills

/pdf

Source: ~/.claude/skills/pdf/SKILL.md

PDF Processing Guide

Overview

Quick Start

Python Libraries

pypdf - Basic Operations

Merge PDFs

Split PDF

Extract Metadata

Rotate Pages

pdfplumber - Text and Table Extraction

Extract Text with Layout

Extract Tables

Advanced Table Extraction

reportlab - Create PDFs

Basic PDF Creation

Create PDF with Multiple Pages

Subscripts and Superscripts

Command-Line Tools

pdftotext (poppler-utils)

qpdf

pdftk (if available)

Common Tasks

Extract Text from Scanned PDFs

Add Watermark

Extract Images

Password Protection

Quick Reference

Next Steps

/docx

Source: ~/.claude/skills/docx/SKILL.md

DOCX creation, editing, and analysis

Overview

Quick Reference

Converting .doc to .docx

Reading Content

Converting to Images

Accepting Tracked Changes

Creating New Documents

Setup

Validation

Page Size

Styles (Override Built-in Headings)

Lists (NEVER use unicode bullets)

Tables

Images

Page Breaks

Table of Contents

Headers/Footers

Critical Rules for docx-js

Editing Existing Documents

Step 1: Unpack

Step 2: Edit XML

Step 3: Pack

Common Pitfalls

XML Reference

Schema Compliance

Tracked Changes

Comments

Images

Dependencies

/pptx

Source: ~/.claude/skills/pptx/SKILL.md

PPTX Skill

Quick Reference

Reading Content

Editing Workflow

Creating from Scratch

Design Ideas

Before Starting

Color Palettes

For Each Slide

Typography

Spacing

Avoid (Common Mistakes)

QA (Required)

Content QA

Visual QA

Source: `~/.claude/skills/pdf/SKILL.md`

Source: `~/.claude/skills/docx/SKILL.md`

Source: `~/.claude/skills/pptx/SKILL.md`

Source: `~/.claude/skills/xlsx/SKILL.md`

Source: `~/.claude/skills/doc-coauthoring/SKILL.md`