Atomic-write pattern for shared state files (POSIX os.replace)

Atomic-Write Pattern for Shared State Files (POSIX os.replace)

1. Why This Matters

In a multi-session environment where hooks, tools, and daemons write to shared state files (JSON configs, task markers, session identifiers), a naive open() + write() + close() pattern creates a torn-write hazard:

Concurrent sessions racing to write the same file can corrupt each other's writes (last-writer-wins with no atomicity guarantee)
Crash mid-write (SIGKILL, disk-full, context compaction, kernel panic) leaves the file in a partial or zero-byte state
Silent corruption of session isolation guarantees — hooks reading an empty or malformed file may silently fall back to legacy global state or fail-open, defeating ZAKON enforcement

Impact: ZAKON #27 (active-thread enforcement) and ZAKON #28 (max-depth gate) rely on per-session state files that must NEVER contain partial writes. A torn write to /tmp/mc-active-task-$PID causes the hook to fall back to the global /tmp/mc-active-task, silently defeating session isolation.

2. The Pattern — POSIX Atomic Rename

2.1 Python Pattern

The correct pattern uses tempfile + fsync + os.replace() to guarantee atomicity:

import os
import tempfile

def write_active_task(task_id, claude_pid=None):
    """Write active task for this session (atomic POSIX rename pattern).

    Writes to a tempfile in the same directory as the target, then uses
    os.replace() for an atomic swap. A crash or SIGKILL during the write
    leaves the target either absent (first write) or containing the previous
    complete value — never a partial write.
    """
    task_file = get_session_task_file(claude_pid)
    dir_ = os.path.dirname(task_file) or "."
    fd, tmp = tempfile.mkstemp(prefix=".active-task-", dir=dir_)
    try:
        with os.fdopen(fd, "w") as f:
            f.write(str(task_id))
            f.flush()
            os.fsync(f.fileno())
        os.replace(tmp, task_file)
    except Exception:
        try:
            os.unlink(tmp)
        except OSError:
            pass
        raise

Why this works:

tempfile.mkstemp() creates a unique temp file in the SAME directory (same filesystem) as the target
Write content to the temp file, flush buffers, call fsync() to ensure data is on disk
os.replace(tmp, target) performs an atomic rename — POSIX guarantees this is a single syscall
Readers see either the old complete file OR the new complete file — never a partial write
If the process crashes before os.replace(), the temp file is abandoned but the target is untouched (or absent if first write)

2.2 Bash Pattern

For bash hooks writing to state files, use mktemp + mv pattern:

# Atomic write in bash using mktemp + mv
TARGET="/tmp/some-state-file.json"
CONTENT='{"count":0,"ts":"2026-05-03T10:00:00Z"}'

# Create temp file in same directory as target (same filesystem requirement)
TMP=$(mktemp "${TARGET}.XXXXXX")
echo "$CONTENT" > "$TMP"
mv -f "$TMP" "$TARGET"  # POSIX atomic on same filesystem

Why mv is atomic: On POSIX, mv within the same filesystem calls rename(2), which is atomic. Same guarantee as Python's os.replace().

Constraints:

mktemp template must use same directory as $TARGET (guarantees same filesystem, required for atomic mv)
Use printf or echo to write to $TMP, NOT to $TARGET
mv -f atomically replaces $TARGET (POSIX guarantees this on same filesystem)
No portable fsync in bash — durability across power loss requires Python/Node.js with explicit os.fsync()

3. What It Replaces — The Anti-Pattern

3.1 Python Anti-Pattern

DO NOT USE:

# WRONG — non-atomic, torn-write hazard
def write_active_task_WRONG(task_id, task_file):
    with open(task_file, "w") as f:
        f.write(str(task_id))

Why this is broken:

The open("w") call truncates the file immediately (size=0 bytes)
The write() may be buffered and not hit disk until close() or explicit flush()
A SIGKILL or crash between truncate and flush leaves a zero-byte file
A concurrent reader during the write window sees partial content or empty file
No reader/writer can distinguish "empty because not written yet" from "empty because crashed mid-write"

3.2 Bash Anti-Pattern

DO NOT USE:

# WRONG — torn-write hazard in bash
echo "$TASK_ID" > /tmp/mc-active-task-$$

The > operator truncates the file immediately, then writes. A crash between truncate and write completion leaves a zero-byte or partial file — identical hazard to the Python anti-pattern.

4. Same-Filesystem Requirement

The dir= kwarg in tempfile.mkstemp(prefix=".active-task-", dir=dir_) is critical:

os.replace() is atomic ONLY when the source and target are on the same filesystem
Cross-device rename (e.g., /tmp → /home on different partitions) degrades to copy-then-delete, which is NOT atomic
By creating the temp file in the same directory as the target (os.path.dirname(task_file)), we guarantee same-device
If dirname is empty (target in cwd), fallback to "."

Verification: df -h /tmp vs df -h ~/.claude/hooks — if different mount points, you MUST use dir= kwarg with target's parent directory.

For bash: Use mktemp "${TARGET}.XXXXXX" template — the suffix pattern ensures temp file is created in the same directory as $TARGET.

5. Crash Recovery Semantics

Scenario	Before `os.replace()`	After `os.replace()`
First write, no prior file	Target absent, temp exists	Target exists with new content
Overwrite existing file	Target has old content, temp exists	Target has new content
Crash during `write()`	Target unchanged (or absent), temp partial/incomplete	N/A — `replace()` never called
Crash during `fsync()`	Target unchanged, temp may have partial data on disk	N/A
Crash after `os.replace()`	N/A	Target has new complete content (atomic swap already done)

Key guarantee: The target file NEVER contains partial writes. A reader always sees either:

File absent (no write has completed yet), OR
File with the last successfully-completed write's full content

The exception handler (except: os.unlink(tmp)) cleans up the temp file on failure, preventing temp-file accumulation.

6. Testing Pattern

Unit test crash-recovery by mocking the write to raise an exception:

import unittest
import os
import tempfile
from unittest.mock import patch, mock_open

class TestAtomicWrite(unittest.TestCase):

    def test_crash_during_overwrite_preserves_old_content(self):
        """If write crashes after target exists, old content is preserved."""
        with tempfile.TemporaryDirectory() as tmpdir:
            target = os.path.join(tmpdir, "test-task.txt")

            # Write initial content
            with open(target, "w") as f:
                f.write("OLD-TASK-11111")

            # Simulate crash during second write
            with patch("builtins.open", side_effect=IOError("Simulated crash")):
                with self.assertRaises(IOError):
                    write_active_task_atomic("NEW-TASK-22222", target)

            # Old content must survive
            with open(target, "r") as f:
                content = f.read()
            self.assertEqual(content, "OLD-TASK-11111")

            # No temp files leaked
            leaked_temps = [f for f in os.listdir(tmpdir) if f.startswith(".active-task-")]
            self.assertEqual(len(leaked_temps), 0)

What this validates:

Exception during write → old content survives intact
No temp files leaked to disk (cleanup path works)
File state is never partial or corrupt

7. When to Apply

Use this pattern for any hook/lib writing JSON or state files where torn writes = corruption:

/tmp/mc-active-task-$SESSION_ID — ZAKON #28 depth gate relies on this
/tmp/active-thread-$SESSION_ID.txt — ZAKON #27 active-thread enforcement shadow file
~/.claude/session-state.md shadow files (if per-session scoping is added)
Counter files (/tmp/john-mc-turn-counter.json, /tmp/ceo-approved-token-uses-*.count)
Mehanik clearance markers (/tmp/mehanik-cleared-<MC> with session_id field)
Any file where a concurrent reader must NEVER see partial data

Do NOT use for:

Log files (append-only, partial writes acceptable)
Human-edited markdown files (git-tracked, editor handles temp files)
SQLite databases (has internal transaction layer)

8. Sites Covered

This pattern has been applied to the following high-risk state file writes:

8.1 Python Sites (Phase 2A — MC #99076)

~/.claude/hooks/archive/lib-legacy/session_id.py:138-161 — write_active_task() function (S8 surface: /tmp/mc-active-task-$SESSION_ID)

8.2 Bash Hook Sites (Phase 2B-2 — MC #99080)

8 atomic-write patches applied across 4 hooks covering surfaces S3, S8, S9, S10:

File	Line	Pattern	Surface	Description
`mc-turn-reset.sh`	12	Python `tempfile.mkstemp + os.replace`	S8	Reset MC turn counter
`mc-turn-reset.sh`	20	Bash `mktemp + mv`	S3	Reset CEO_APPROVED token counter
`mc-turn-reset.sh`	23	Bash `mktemp + mv`	S9	Reset dispatch turn counter
`ceo-intent-classifier.sh`	38	Python `tempfile.mkstemp + os.replace`	S10	Write CEO intent classification
`one-ceo-turn-dispatch-cap.sh`	33	Python `tempfile.mkstemp + os.replace`	S9	Increment dispatch counter
`one-ceo-turn-dispatch-cap.sh`	50	Python `tempfile.mkstemp + os.replace`	S9	Rollback dispatch counter on failure
`one-ceo-turn-mc-cap.sh`	40	Python `tempfile.mkstemp + os.replace`	S8	Increment MC add counter
`one-ceo-turn-mc-cap.sh`	59	Python `tempfile.mkstemp + os.replace`	S8	Rollback MC counter on failure

Validation: All 8 sites passed Proveo crash-safety testing (AC5: runtime exception AFTER write+fsync but BEFORE os.replace/mv — old content preserved, no temp file leak). See /tmp/proveo-99080-2026-05-03.json.

8.3 Shadow-File Pattern for Human-Editable Shared State (Phase 2D — MC #99084)

For human-readable source files that must remain unmodified by automation (e.g., ~/.claude/session-state.md) but where enforcement hooks need per-session isolation, Phase 2D introduced the shadow-file pattern:

When to Use Shadow Files

The source file is human-editable markdown or config that the CEO directly modifies
Enforcement hooks need to read session-specific values without blocking concurrent sessions
Direct atomic write to the human-readable source would defeat its purpose (CEO must see/edit the canonical value)
Session isolation requires structural sharding (separate files per session), not locking

The Shadow-File Pattern

Write a per-session machine-readable shadow file at /tmp/<key>-${SESSION_ID}.txt (atomically via mktemp+mv) at the same point the human-readable source is updated. Enforcement hooks read shadow-first with fallback to the human-readable source.

# Shadow write (in user-message-logger.sh at UserPromptSubmit)
# SESSION_ID resolution: stdin JSON → env CLAUDE_SESSION_ID → pid-$$ → REJECT (never "default")
_SHADOW_SESSION_ID="$SESSION_ID"
if [[ -z "$_SHADOW_SESSION_ID" ]]; then
    _SHADOW_SESSION_ID="${CLAUDE_SESSION_ID:-}"
fi
if [[ -z "$_SHADOW_SESSION_ID" ]]; then
    _SHADOW_SESSION_ID="pid-$$"
fi

_SHADOW_TARGET="/tmp/active-thread-${_SHADOW_SESSION_ID}.txt"
_SESSION_STATE_FILE="$HOME/.claude/session-state.md"

# Extract ACTIVE_THREAD IDs from session-state.md
_ACTIVE_THREAD_VALUE=$(python3 -c "
import re, sys
with open('$_SESSION_STATE_FILE', 'r') as f:
    content = f.read()
match = re.search(r'## ACTIVE_THREAD:.*?(?=\n---|\n## [A-Z]|\Z)', content, re.DOTALL)
if not match:
    sys.exit(1)
block = match.group(0)
ids = re.findall(r'#(\d{4,6})', block)
print('\n'.join(sorted(set(ids))))
" 2>/dev/null)

if [[ -n "$_ACTIVE_THREAD_VALUE" ]]; then
    # Atomic write: mktemp + mv
    _SHADOW_TMP=$(mktemp "${_SHADOW_TARGET}.XXXXXX")
    printf '%s\n' "$_ACTIVE_THREAD_VALUE" > "$_SHADOW_TMP"
    mv -f "$_SHADOW_TMP" "$_SHADOW_TARGET"
fi

# Shadow-first read (in active-thread-lock.sh)
_SHADOW_PATH="/tmp/active-thread-${SESSION_ID}.txt"
APPROVED_IDS=""

if [[ -f "$_SHADOW_PATH" ]]; then
    # Shadow file present: read per-session ACTIVE_THREAD (atomic, no stale-read risk)
    APPROVED_IDS=$(cat "$_SHADOW_PATH" 2>/dev/null || echo "")
else
    # Fallback: read session-state.md (global, backward-compatible)
    if [[ ! -f "$SESSION_STATE" ]]; then
        echo "[active-thread-lock] session-state.md not found and no shadow file — fail-open." >&2
        exit 0
    fi

    APPROVED_IDS=$(python3 -c "
import re, sys
with open('$SESSION_STATE', 'r') as f:
    content = f.read()
match = re.search(r'## ACTIVE_THREAD:.*?(?=\n---|\n## [A-Z]|\Z)', content, re.DOTALL)
if match:
    block = match.group(0)
    ids = re.findall(r'#(\d{4,6})', block)
    print('\n'.join(sorted(set(ids))))
" 2>/dev/null)
fi

Properties

Structural isolation: Sessions read from sharded storage (/tmp/active-thread-${SESSION_ID}.txt), no lock contention
CEO-facing source unchanged: ~/.claude/session-state.md remains canonical human-editable markdown
SESSION_ID resolution chain: stdin JSON → env CLAUDE_SESSION_ID → pid-$$ → REJECT (NEVER literal "default")
Fail-open fallback: If shadow absent, enforcement reads session-state.md (backward-compatible with pre-Phase-2D behavior)
Atomic shadow write: mktemp+mv ensures concurrent sessions cannot corrupt each other's shadow files

Shadow-File Sites

~/.claude/hooks/user-message-logger.sh lines 49-84 — Shadow write for /tmp/active-thread-${SESSION_ID}.txt (ACTIVE_THREAD extraction from session-state.md)
~/.claude/hooks/active-thread-lock.sh lines 23-46 (SESSION_ID resolution) + lines 84-114 (shadow-first read with session-state.md fallback)

Validation: Proveo PASS (6/6 ACs) — concurrent sessions with distinct session_id values read their own shadow files with no cross-session leak. Sessions without shadow files fall back to session-state.md with identical enforcement behavior. No "default" terminal value. See /tmp/proveo-99084-2026-05-03.json.

9. Reference

MC #99076 — Phase 2A atomic-write patch on session_id.py (Python pattern)
MC #99080 — Phase 2B-2 atomic-write patches on 4 bash hooks (8 line-level sites)
MC #99084 — Phase 2D shadow-file pattern for human-editable shared state (session-state.md ACTIVE_THREAD field)
MC #99078 — Phase 2B-1 bash atomicity audit (identified 8 UNSAFE sites)
MC #99069 — Session Isolation Audit (parent task, genesis of the finding)
Spec: ~/system/specs/session-isolation-audit-2026-05-03.md §3 W1 (Weakness 1) + Appendix A
Spec: ~/system/specs/bash-atomicity-audit-2026-05-03.md — Phase 2B-1 full inventory + fix templates
Source: ~/.claude/hooks/archive/lib-legacy/session_id.py lines 138-161 (Python pattern reference)
Source: ~/.claude/hooks/mc-turn-reset.sh, ceo-intent-classifier.sh, one-ceo-turn-dispatch-cap.sh, one-ceo-turn-mc-cap.sh (bash pattern implementations)
Source: ~/.claude/hooks/user-message-logger.sh (shadow write implementation), ~/.claude/hooks/active-thread-lock.sh (shadow-first read)
Tests: ~/.claude/hooks/archive/lib-legacy/test_session_id_atomic.py (5 unit tests covering crash-recovery)
Proveo Reports:
- /tmp/postflight-99076/proveo-report.md (Phase 2A Python validation)
- /tmp/postflight-99080/proveo-report.md (Phase 2B-2 bash validation)
- /tmp/postflight-99084/proveo-report.md (Phase 2D shadow-file validation)

10. Further Reading

Martin Kleppmann panelist review (/tmp/forged-99069-martin-kleppmann.md §2 Weakness 1): "write_active_task() is not atomic. Lines 138-142 use a bare open(task_file, 'w') write with no mktemp + os.replace() pattern. If the hook is interrupted mid-write (SIGKILL, context compaction crash, disk-full), the file is left in a partial or zero-byte state."
POSIX rename(2) man page: "If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing."
Best-in-class reference: one-ceo-turn-mc-cap.sh:108-113 (already used mktemp + mv for counter increment before Phase 2B audit — correct pattern)

Generated by Skillforge for MC #99076 — Phase 2A Session Isolation Fix
Updated: 2026-05-03 (MC #99080 — Phase 2B-2 bash hook atomicity expansion)
Updated: 2026-05-03 (MC #99084 — Phase 2D shadow-file pattern for human-editable shared state)
Last verified: 2026-05-03 — Proveo Phase 2D report (PASS 6/6)

Revision #5
Created 2026-05-03 20:45:12 UTC by John
Updated 2026-06-07 20:01:06 UTC by John