Skip to main content

LightRAG Backup (local + Azure offsite)

LightRAG Backup (local + Azure offsite)

Owner: FlowForge (infra) Implemented: 2026-04-18 Schedule: Weekly Sunday 04:00 CEST Script: ~/system/tools/lightrag-backup.sh Plist: ~/Library/LaunchAgents/com.alai.lightrag-backup.plist Azure creds: ~/system/config/azure-lightrag-backup.env (mode 0600)

What is backed up

4 Docker volumes (+ checksums + README):

Volume Content Typical size
lightrag-data LightRAG KV store + inputs ~300 MB
lightrag-kg Knowledge graph files small
lightrag-cache LLM response cache small
lightrag-neo4j-data Neo4j graph entities + relations ~170 MB

Typical total: 500 MB – 1 GB compressed.

How it runs

  1. docker compose stop lightrag neo4j — graceful shutdown (~30s downtime)
  2. docker run alpine tar czf dumps each volume
  3. docker compose start neo4j lightrag — resume
  4. shasum -a 256 *.tar.gz > MANIFEST.sha256
  5. Write README.md with restore procedure
  6. Local rotation — keep last 4 snapshots in ~/system/backups/lightrag/
  7. Azure offsite upload — Cool tier blob plockfrontstaging/lightrag-backup/<TS>/
  8. Azure rotation — keep last 8 snapshots (longer offsite retention)

Downtime: ~60–90s every Sunday 04:00.

Why NOT docker compose pause

pause freezes LightRAG's async event loop. On unpause, uvicorn stays "running" but HTTP handler doesn't service new requests (container reports unhealthy). Requires full container restart to recover. The backup on 2026-04-18 hit this — backup itself was fine (volumes at rest during pause), but container needed restart afterwards. Switched to stop/start for future runs.

Azure storage details

  • Account: plockfrontstaging (swedencentral, Hot storage account)
  • Container: lightrag-backup
  • Resource group: plock-staging-rg
  • Tier per blob: Cool (cheaper — ~$0.01/GB/month for archived reads)
  • Retention: last 8 snapshots (~8 weeks)
  • Estimated cost: ~$0.05–0.10/month for ~4 GB retained

Restore procedure

cd ~/system/docker/lightrag
docker compose down

# Pick a snapshot (local or download from Azure first)
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }

for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
  docker volume rm $vol || true
  docker volume create $vol
  docker run --rm -v $vol:/dst -v "$SNAPSHOT":/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done

cd ~/system/docker/lightrag
docker compose up -d

# Verify
curl http://localhost:9621/health

Azure restore

source ~/system/config/azure-lightrag-backup.env
TS=20260418-085317
RESTORE_DIR=/tmp/lightrag-restore-$TS
mkdir -p "$RESTORE_DIR"
az storage blob download-batch \
  --account-name $AZURE_STORAGE_ACCOUNT \
  --account-key "$AZURE_STORAGE_KEY" \
  --source $AZURE_STORAGE_CONTAINER \
  --destination "$RESTORE_DIR" \
  --pattern "$TS/*"
# Then follow local restore procedure pointing at $RESTORE_DIR/$TS

Monitoring

  • Log: ~/system/logs/lightrag-backup.log
  • Latest snapshot size: du -sh ~/system/backups/lightrag/
  • Azure blob list: az storage blob list ...
  • Post-run LightRAG health: logged as last line of each run

Manual run

bash ~/system/tools/lightrag-backup.sh

Same 60–90s downtime applies. Log goes to same file.