LightRAG Backup (local + Azure offsite)

Owner: FlowForge (infra) Implemented: 2026-04-18 Schedule: Weekly Sunday 04:00 CEST Script: ~/system/tools/lightrag-backup.sh Plist: ~/Library/LaunchAgents/com.alai.lightrag-backup.plist Azure creds: ~/system/config/azure-lightrag-backup.env (mode 0600)

What is backed up

4 Docker volumes (+ checksums + README):

Volume	Content	Typical size
`lightrag-data`	LightRAG KV store + inputs	~300 MB
`lightrag-kg`	Knowledge graph files	small
`lightrag-cache`	LLM response cache	small
`lightrag-neo4j-data`	Neo4j graph entities + relations	~170 MB

Typical total: 500 MB – 1 GB compressed.

How it runs

docker compose stop lightrag neo4j — graceful shutdown (~30s downtime)
docker run alpine tar czf dumps each volume
docker compose start neo4j lightrag — resume
shasum -a 256 *.tar.gz > MANIFEST.sha256
Write README.md with restore procedure
Local rotation — keep last 4 snapshots in ~/system/backups/lightrag/
Azure offsite upload — Cool tier blob plockfrontstaging/lightrag-backup/<TS>/
Azure rotation — keep last 8 snapshots (longer offsite retention)

Downtime: ~60–90s every Sunday 04:00.

Why NOT `docker compose pause`

pause freezes LightRAG's async event loop. On unpause, uvicorn stays "running" but HTTP handler doesn't service new requests (container reports unhealthy). Requires full container restart to recover. The backup on 2026-04-18 hit this — backup itself was fine (volumes at rest during pause), but container needed restart afterwards. Switched to stop/start for future runs.

Azure storage details

Account: plockfrontstaging (swedencentral, Hot storage account)
Container: lightrag-backup
Resource group: plock-staging-rg
Tier per blob: Cool (cheaper — ~$0.01/GB/month for archived reads)
Retention: last 8 snapshots (~8 weeks)
Estimated cost: ~$0.05–0.10/month for ~4 GB retained

Restore procedure

cd ~/system/docker/lightrag
docker compose down

# Pick a snapshot (local or download from Azure first)
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }

for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
  docker volume rm $vol || true
  docker volume create $vol
  docker run --rm -v $vol:/dst -v "$SNAPSHOT":/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done

cd ~/system/docker/lightrag
docker compose up -d

# Verify
curl http://localhost:9621/health

Azure restore

source ~/system/config/azure-lightrag-backup.env
TS=20260418-085317
RESTORE_DIR=/tmp/lightrag-restore-$TS
mkdir -p "$RESTORE_DIR"
az storage blob download-batch \
  --account-name $AZURE_STORAGE_ACCOUNT \
  --account-key "$AZURE_STORAGE_KEY" \
  --source $AZURE_STORAGE_CONTAINER \
  --destination "$RESTORE_DIR" \
  --pattern "$TS/*"
# Then follow local restore procedure pointing at $RESTORE_DIR/$TS

Monitoring

Log: ~/system/logs/lightrag-backup.log
Latest snapshot size: du -sh ~/system/backups/lightrag/
Azure blob list: az storage blob list ...
Post-run LightRAG health: logged as last line of each run

Manual run

bash ~/system/tools/lightrag-backup.sh

Same 60–90s downtime applies. Log goes to same file.

LightRAG Backup (local + Azure offsite)

LightRAG Backup (local + Azure offsite)

What is backed up

How it runs

Why NOT docker compose pause

Azure storage details

Restore procedure

Azure restore

Monitoring

Manual run

Why NOT `docker compose pause`