# Atomic-write pattern for shared state files (POSIX os.replace)

# Atomic-Write Pattern for Shared State Files (POSIX os.replace)

## 1. Why This Matters

In a multi-session environment where hooks, tools, and daemons write to shared state files (JSON configs, task markers, session identifiers), a naive `open() + write() + close()` pattern creates a **torn-write hazard**:

- **Concurrent sessions** racing to write the same file can corrupt each other's writes (last-writer-wins with no atomicity guarantee)
- **Crash mid-write** (SIGKILL, disk-full, context compaction, kernel panic) leaves the file in a partial or zero-byte state
- **Silent corruption** of session isolation guarantees — hooks reading an empty or malformed file may silently fall back to legacy global state or fail-open, defeating ZAKON enforcement

**Impact:** ZAKON #27 (active-thread enforcement) and ZAKON #28 (max-depth gate) rely on per-session state files that must NEVER contain partial writes. A torn write to `/tmp/mc-active-task-$PID` causes the hook to fall back to the global `/tmp/mc-active-task`, silently defeating session isolation.

## 2. The Pattern — POSIX Atomic Rename

### 2.1 Python Pattern

The correct pattern uses **tempfile + fsync + os.replace()** to guarantee atomicity:

```
import os
import tempfile

def write_active_task(task_id, claude_pid=None):
    """Write active task for this session (atomic POSIX rename pattern).

    Writes to a tempfile in the same directory as the target, then uses
    os.replace() for an atomic swap. A crash or SIGKILL during the write
    leaves the target either absent (first write) or containing the previous
    complete value — never a partial write.
    """
    task_file = get_session_task_file(claude_pid)
    dir_ = os.path.dirname(task_file) or "."
    fd, tmp = tempfile.mkstemp(prefix=".active-task-", dir=dir_)
    try:
        with os.fdopen(fd, "w") as f:
            f.write(str(task_id))
            f.flush()
            os.fsync(f.fileno())
        os.replace(tmp, task_file)
    except Exception:
        try:
            os.unlink(tmp)
        except OSError:
            pass
        raise

```

**Why this works:**

1. `tempfile.mkstemp()` creates a unique temp file in the SAME directory (same filesystem) as the target
2. Write content to the temp file, flush buffers, call `fsync()` to ensure data is on disk
3. `os.replace(tmp, target)` performs an atomic rename — POSIX guarantees this is a single syscall
4. Readers see either the old complete file OR the new complete file — never a partial write
5. If the process crashes before `os.replace()`, the temp file is abandoned but the target is untouched (or absent if first write)

### 2.2 Bash Pattern

For bash hooks writing to state files, use **mktemp + mv** pattern:

```
# Atomic write in bash using mktemp + mv
TARGET="/tmp/some-state-file.json"
CONTENT='{"count":0,"ts":"2026-05-03T10:00:00Z"}'

# Create temp file in same directory as target (same filesystem requirement)
TMP=$(mktemp "${TARGET}.XXXXXX")
echo "$CONTENT" > "$TMP"
mv -f "$TMP" "$TARGET"  # POSIX atomic on same filesystem

```

**Why `mv` is atomic:** On POSIX, `mv` within the same filesystem calls `rename(2)`, which is atomic. Same guarantee as Python's `os.replace()`.

**Constraints:**

- `mktemp` template **must use same directory** as `$TARGET` (guarantees same filesystem, required for atomic `mv`)
- Use `printf` or `echo` to write to `$TMP`, NOT to `$TARGET`
- `mv -f` atomically replaces `$TARGET` (POSIX guarantees this on same filesystem)
- No portable `fsync` in bash — durability across power loss requires Python/Node.js with explicit `os.fsync()`

## 3. What It Replaces — The Anti-Pattern

### 3.1 Python Anti-Pattern

**DO NOT USE:**

```
# WRONG — non-atomic, torn-write hazard
def write_active_task_WRONG(task_id, task_file):
    with open(task_file, "w") as f:
        f.write(str(task_id))

```

**Why this is broken:**

- The `open("w")` call truncates the file immediately (size=0 bytes)
- The `write()` may be buffered and not hit disk until `close()` or explicit `flush()`
- A SIGKILL or crash between truncate and flush leaves a zero-byte file
- A concurrent reader during the write window sees partial content or empty file
- No reader/writer can distinguish "empty because not written yet" from "empty because crashed mid-write"

### 3.2 Bash Anti-Pattern

**DO NOT USE:**

```
# WRONG — torn-write hazard in bash
echo "$TASK_ID" > /tmp/mc-active-task-$$

```

The `>` operator truncates the file immediately, then writes. A crash between truncate and write completion leaves a zero-byte or partial file — identical hazard to the Python anti-pattern.

## 4. Same-Filesystem Requirement

The `dir=` kwarg in `tempfile.mkstemp(prefix=".active-task-", dir=dir_)` is **critical**:

- `os.replace()` is atomic ONLY when the source and target are on the **same filesystem**
- Cross-device rename (e.g., `/tmp` → `/home` on different partitions) degrades to copy-then-delete, which is NOT atomic
- By creating the temp file in the same directory as the target (`os.path.dirname(task_file)`), we guarantee same-device
- If `dirname` is empty (target in cwd), fallback to `"."`

**Verification:** `df -h /tmp` vs `df -h ~/.claude/hooks` — if different mount points, you MUST use `dir=` kwarg with target's parent directory.

**For bash:** Use `mktemp "${TARGET}.XXXXXX"` template — the suffix pattern ensures temp file is created in the same directory as `$TARGET`.

## 5. Crash Recovery Semantics

<table id="bkmrk-scenario-before-os.r"><thead><tr><th>Scenario</th><th>Before `os.replace()`</th><th>After `os.replace()`</th></tr></thead><tbody><tr><td>First write, no prior file</td><td>Target absent, temp exists</td><td>Target exists with new content</td></tr><tr><td>Overwrite existing file</td><td>Target has old content, temp exists</td><td>Target has new content</td></tr><tr><td>Crash during `write()`</td><td>Target unchanged (or absent), temp partial/incomplete</td><td>N/A — `replace()` never called</td></tr><tr><td>Crash during `fsync()`</td><td>Target unchanged, temp may have partial data on disk</td><td>N/A</td></tr><tr><td>Crash after `os.replace()`</td><td>N/A</td><td>Target has new complete content (atomic swap already done)</td></tr></tbody></table>

**Key guarantee:** The target file NEVER contains partial writes. A reader always sees either:

1. File absent (no write has completed yet), OR
2. File with the last successfully-completed write's full content

The exception handler (`except: os.unlink(tmp)`) cleans up the temp file on failure, preventing temp-file accumulation.

## 6. Testing Pattern

Unit test crash-recovery by mocking the write to raise an exception:

```
import unittest
import os
import tempfile
from unittest.mock import patch, mock_open

class TestAtomicWrite(unittest.TestCase):

    def test_crash_during_overwrite_preserves_old_content(self):
        """If write crashes after target exists, old content is preserved."""
        with tempfile.TemporaryDirectory() as tmpdir:
            target = os.path.join(tmpdir, "test-task.txt")

            # Write initial content
            with open(target, "w") as f:
                f.write("OLD-TASK-11111")

            # Simulate crash during second write
            with patch("builtins.open", side_effect=IOError("Simulated crash")):
                with self.assertRaises(IOError):
                    write_active_task_atomic("NEW-TASK-22222", target)

            # Old content must survive
            with open(target, "r") as f:
                content = f.read()
            self.assertEqual(content, "OLD-TASK-11111")

            # No temp files leaked
            leaked_temps = [f for f in os.listdir(tmpdir) if f.startswith(".active-task-")]
            self.assertEqual(len(leaked_temps), 0)

```

**What this validates:**

- Exception during write → old content survives intact
- No temp files leaked to disk (cleanup path works)
- File state is never partial or corrupt

## 7. When to Apply

Use this pattern for **any hook/lib writing JSON or state files where torn writes = corruption**:

- `/tmp/mc-active-task-$SESSION_ID` — ZAKON #28 depth gate relies on this
- `/tmp/active-thread-$SESSION_ID.txt` — ZAKON #27 active-thread enforcement shadow file
- `~/.claude/session-state.md` shadow files (if per-session scoping is added)
- Counter files (`/tmp/john-mc-turn-counter.json`, `/tmp/ceo-approved-token-uses-*.count`)
- Mehanik clearance markers (`/tmp/mehanik-cleared-<MC>` with session\_id field)
- Any file where a concurrent reader must NEVER see partial data

**Do NOT use for:**

- Log files (append-only, partial writes acceptable)
- Human-edited markdown files (git-tracked, editor handles temp files)
- SQLite databases (has internal transaction layer)

## 8. Sites Covered

This pattern has been applied to the following high-risk state file writes:

### 8.1 Python Sites (Phase 2A — MC #99076)

- `~/.claude/hooks/archive/lib-legacy/session_id.py:138-161` — `write_active_task()` function (S8 surface: `/tmp/mc-active-task-$SESSION_ID`)

### 8.2 Bash Hook Sites (Phase 2B-2 — MC #99080)

8 atomic-write patches applied across 4 hooks covering surfaces S3, S8, S9, S10:

<table id="bkmrk-bash-sites-table"><thead><tr><th>File</th><th>Line</th><th>Pattern</th><th>Surface</th><th>Description</th></tr></thead><tbody><tr><td>`mc-turn-reset.sh`</td><td>12</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S8</td><td>Reset MC turn counter</td></tr><tr><td>`mc-turn-reset.sh`</td><td>20</td><td>Bash `mktemp + mv`</td><td>S3</td><td>Reset CEO\_APPROVED token counter</td></tr><tr><td>`mc-turn-reset.sh`</td><td>23</td><td>Bash `mktemp + mv`</td><td>S9</td><td>Reset dispatch turn counter</td></tr><tr><td>`ceo-intent-classifier.sh`</td><td>38</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S10</td><td>Write CEO intent classification</td></tr><tr><td>`one-ceo-turn-dispatch-cap.sh`</td><td>33</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S9</td><td>Increment dispatch counter</td></tr><tr><td>`one-ceo-turn-dispatch-cap.sh`</td><td>50</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S9</td><td>Rollback dispatch counter on failure</td></tr><tr><td>`one-ceo-turn-mc-cap.sh`</td><td>40</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S8</td><td>Increment MC add counter</td></tr><tr><td>`one-ceo-turn-mc-cap.sh`</td><td>59</td><td>Python `tempfile.mkstemp + os.replace`</td><td>S8</td><td>Rollback MC counter on failure</td></tr></tbody></table>

**Validation:** All 8 sites passed Proveo crash-safety testing (AC5: runtime exception AFTER write+fsync but BEFORE os.replace/mv — old content preserved, no temp file leak). See `/tmp/proveo-99080-2026-05-03.json`.

### 8.3 Shadow-File Pattern for Human-Editable Shared State (Phase 2D — MC #99084)

For **human-readable source files** that must remain unmodified by automation (e.g., `~/.claude/session-state.md`) but where enforcement hooks need **per-session isolation**, Phase 2D introduced the **shadow-file pattern**:

#### When to Use Shadow Files

- The source file is **human-editable markdown or config** that the CEO directly modifies
- Enforcement hooks need to read session-specific values **without blocking concurrent sessions**
- Direct atomic write to the human-readable source would defeat its purpose (CEO must see/edit the canonical value)
- Session isolation requires **structural sharding** (separate files per session), not locking

#### The Shadow-File Pattern

Write a **per-session machine-readable shadow file** at `/tmp/<key>-${SESSION_ID}.txt` (atomically via mktemp+mv) at the same point the human-readable source is updated. Enforcement hooks read **shadow-first with fallback** to the human-readable source.

```
# Shadow write (in user-message-logger.sh at UserPromptSubmit)
# SESSION_ID resolution: stdin JSON → env CLAUDE_SESSION_ID → pid-$$ → REJECT (never "default")
_SHADOW_SESSION_ID="$SESSION_ID"
if [[ -z "$_SHADOW_SESSION_ID" ]]; then
    _SHADOW_SESSION_ID="${CLAUDE_SESSION_ID:-}"
fi
if [[ -z "$_SHADOW_SESSION_ID" ]]; then
    _SHADOW_SESSION_ID="pid-$$"
fi

_SHADOW_TARGET="/tmp/active-thread-${_SHADOW_SESSION_ID}.txt"
_SESSION_STATE_FILE="$HOME/.claude/session-state.md"

# Extract ACTIVE_THREAD IDs from session-state.md
_ACTIVE_THREAD_VALUE=$(python3 -c "
import re, sys
with open('$_SESSION_STATE_FILE', 'r') as f:
    content = f.read()
match = re.search(r'## ACTIVE_THREAD:.*?(?=\n---|\n## [A-Z]|\Z)', content, re.DOTALL)
if not match:
    sys.exit(1)
block = match.group(0)
ids = re.findall(r'#(\d{4,6})', block)
print('\n'.join(sorted(set(ids))))
" 2>/dev/null)

if [[ -n "$_ACTIVE_THREAD_VALUE" ]]; then
    # Atomic write: mktemp + mv
    _SHADOW_TMP=$(mktemp "${_SHADOW_TARGET}.XXXXXX")
    printf '%s\n' "$_ACTIVE_THREAD_VALUE" > "$_SHADOW_TMP"
    mv -f "$_SHADOW_TMP" "$_SHADOW_TARGET"
fi

```

```
# Shadow-first read (in active-thread-lock.sh)
_SHADOW_PATH="/tmp/active-thread-${SESSION_ID}.txt"
APPROVED_IDS=""

if [[ -f "$_SHADOW_PATH" ]]; then
    # Shadow file present: read per-session ACTIVE_THREAD (atomic, no stale-read risk)
    APPROVED_IDS=$(cat "$_SHADOW_PATH" 2>/dev/null || echo "")
else
    # Fallback: read session-state.md (global, backward-compatible)
    if [[ ! -f "$SESSION_STATE" ]]; then
        echo "[active-thread-lock] session-state.md not found and no shadow file — fail-open." >&2
        exit 0
    fi

    APPROVED_IDS=$(python3 -c "
import re, sys
with open('$SESSION_STATE', 'r') as f:
    content = f.read()
match = re.search(r'## ACTIVE_THREAD:.*?(?=\n---|\n## [A-Z]|\Z)', content, re.DOTALL)
if match:
    block = match.group(0)
    ids = re.findall(r'#(\d{4,6})', block)
    print('\n'.join(sorted(set(ids))))
" 2>/dev/null)
fi

```

#### Properties

- **Structural isolation:** Sessions read from sharded storage (`/tmp/active-thread-${SESSION_ID}.txt`), no lock contention
- **CEO-facing source unchanged:** `~/.claude/session-state.md` remains canonical human-editable markdown
- **SESSION\_ID resolution chain:** stdin JSON → env `CLAUDE_SESSION_ID` → pid-$$ → REJECT (NEVER literal "default")
- **Fail-open fallback:** If shadow absent, enforcement reads `session-state.md` (backward-compatible with pre-Phase-2D behavior)
- **Atomic shadow write:** mktemp+mv ensures concurrent sessions cannot corrupt each other's shadow files

#### Shadow-File Sites

- `~/.claude/hooks/user-message-logger.sh` lines 49-84 — Shadow write for `/tmp/active-thread-${SESSION_ID}.txt` (ACTIVE\_THREAD extraction from session-state.md)
- `~/.claude/hooks/active-thread-lock.sh` lines 23-46 (SESSION\_ID resolution) + lines 84-114 (shadow-first read with session-state.md fallback)

**Validation:** Proveo PASS (6/6 ACs) — concurrent sessions with distinct `session_id` values read their own shadow files with no cross-session leak. Sessions without shadow files fall back to `session-state.md` with identical enforcement behavior. No "default" terminal value. See `/tmp/proveo-99084-2026-05-03.json`.

## 9. Reference

- **MC #99076** — Phase 2A atomic-write patch on `session_id.py` (Python pattern)
- **MC #99080** — Phase 2B-2 atomic-write patches on 4 bash hooks (8 line-level sites)
- **MC #99084** — Phase 2D shadow-file pattern for human-editable shared state (session-state.md ACTIVE\_THREAD field)
- **MC #99078** — Phase 2B-1 bash atomicity audit (identified 8 UNSAFE sites)
- **MC #99069** — Session Isolation Audit (parent task, genesis of the finding)
- **Spec:** `~/system/specs/session-isolation-audit-2026-05-03.md` §3 W1 (Weakness 1) + Appendix A
- **Spec:** `~/system/specs/bash-atomicity-audit-2026-05-03.md` — Phase 2B-1 full inventory + fix templates
- **Source:** `~/.claude/hooks/archive/lib-legacy/session_id.py` lines 138-161 (Python pattern reference)
- **Source:** `~/.claude/hooks/mc-turn-reset.sh`, `ceo-intent-classifier.sh`, `one-ceo-turn-dispatch-cap.sh`, `one-ceo-turn-mc-cap.sh` (bash pattern implementations)
- **Source:** `~/.claude/hooks/user-message-logger.sh` (shadow write implementation), `~/.claude/hooks/active-thread-lock.sh` (shadow-first read)
- **Tests:** `~/.claude/hooks/archive/lib-legacy/test_session_id_atomic.py` (5 unit tests covering crash-recovery)
- **Proveo Reports:**
    - `/tmp/postflight-99076/proveo-report.md` (Phase 2A Python validation)
    - `/tmp/postflight-99080/proveo-report.md` (Phase 2B-2 bash validation)
    - `/tmp/postflight-99084/proveo-report.md` (Phase 2D shadow-file validation)

## 10. Further Reading

- **Martin Kleppmann panelist review** (`/tmp/forged-99069-martin-kleppmann.md` §2 Weakness 1): "write\_active\_task() is not atomic. Lines 138-142 use a bare open(task\_file, 'w') write with no mktemp + os.replace() pattern. If the hook is interrupted mid-write (SIGKILL, context compaction crash, disk-full), the file is left in a partial or zero-byte state."
- **POSIX rename(2) man page:** "If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing."
- **Best-in-class reference:** `one-ceo-turn-mc-cap.sh:108-113` (already used `mktemp + mv` for counter increment before Phase 2B audit — correct pattern)

---

*Generated by Skillforge for MC #99076 — Phase 2A Session Isolation Fix*  
*Updated: 2026-05-03 (MC #99080 — Phase 2B-2 bash hook atomicity expansion)*  
*Updated: 2026-05-03 (MC #99084 — Phase 2D shadow-file pattern for human-editable shared state)*  
*Last verified: 2026-05-03 — [Proveo Phase 2D report (PASS 6/6)](/tmp/postflight-99084/proveo-report.md)*