LightRAG Backup (Azure-native + local safety net)

LightRAG Backup (Azure-native + local safety net)

Domain note (2026-05-17): References to lightrag.basicconsulting.no in this doc are the legacy hostname. Current live endpoint: lightrag.alai.no.

Owner: FlowForge (infra) Implemented: 2026-04-18 (updated for Azure migration 2026-04-18) Source of Truth: Azure VM vm-alai-lightrag (20.240.61.67) Schedule: Weekly Sunday 04:00 CEST Script: ~/system/tools/lightrag-backup.sh (SSH-based) Plist: ~/Library/LaunchAgents/com.alai.lightrag-backup.plist Azure creds: ~/system/config/azure-lightrag-backup.env (mode 0600)

What is backed up

4 Docker volumes (+ checksums + README):

Volume Content Typical size
lightrag-data LightRAG KV store + inputs ~300 MB
lightrag-kg Knowledge graph files small
lightrag-cache LLM response cache small
lightrag-neo4j-data Neo4j graph entities + relations ~170 MB

Typical total: 500 MB – 1 GB compressed.

How it runs (POST-MIGRATION)

Source: Azure VM vm-alai-lightrag (20.240.61.67)

  1. SSH to Azure VM: ssh -i ~/.ssh/azure_alai alai-admin@20.240.61.67
  2. docker compose stop lightrag neo4j — graceful shutdown (~30s downtime)
  3. docker run alpine tar czf dumps each volume on VM
  4. docker compose start neo4j lightrag — resume
  5. shasum -a 256 *.tar.gz > MANIFEST.sha256 on VM
  6. Write README.md with restore procedure on VM
  7. SCP from VM to Mac Studio — download snapshot to ~/system/backups/lightrag/ (safety net)
  8. Azure offsite upload — Cool tier blob plockfrontstaging/lightrag-backup/<TS>/
  9. Azure rotation — keep last 8 snapshots (longer offsite retention)
  10. Local rotation — keep last 4 snapshots in ~/system/backups/lightrag/ (7-day safety, then deletable)

Downtime: ~60–90s every Sunday 04:00 (cloud LightRAG unavailable during backup).

Key change: Local Docker volumes are NO LONGER the source of truth. Azure VM volumes are primary. Local backups are now safety net only.

Why NOT docker compose pause

pause freezes LightRAG's async event loop. On unpause, uvicorn stays "running" but HTTP handler doesn't service new requests (container reports unhealthy). Requires full container restart to recover. The backup on 2026-04-18 hit this — backup itself was fine (volumes at rest during pause), but container needed restart afterwards. Switched to stop/start for future runs.

Azure storage details

Restore procedure

Restore to Azure VM (primary, production)

# On Mac Studio: pick snapshot
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }

# SCP to Azure VM
scp -i ~/.ssh/azure_alai -r "$SNAPSHOT" alai-admin@20.240.61.67:/tmp/restore/

# SSH to Azure VM
ssh -i ~/.ssh/azure_alai alai-admin@20.240.61.67

# On Azure VM:
cd /tmp/restore/$(basename "$SNAPSHOT")
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }

cd ~/lightrag
docker compose down

for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
  docker volume rm $vol || true
  docker volume create $vol
  docker run --rm -v $vol:/dst -v /tmp/restore/$(basename "$SNAPSHOT"):/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done

docker compose up -d

# Verify
curl http://localhost:9621/health
# From Mac Studio:
curl https://lightrag.basicconsulting.no/health

Restore to Mac Studio (rollback/emergency only)

Use case: Azure VM failure, need to restore local LightRAG as emergency fallback.

cd ~/system/docker/lightrag
docker compose down

# Pick a snapshot (local or download from Azure first)
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }

for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
  docker volume rm $vol || true
  docker volume create $vol
  docker run --rm -v $vol:/dst -v "$SNAPSHOT":/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done

cd ~/system/docker/lightrag
docker compose up -d

# Verify
curl http://localhost:9621/health

# IMPORTANT: Update consumer files to use localhost:9621 instead of cloud endpoint
# (see azure-lightrag-migration.md rollback procedure)

Azure Blob restore (download offsite backup)

Use case: Local backups lost, need to restore from Azure Blob offsite storage.

source ~/system/config/azure-lightrag-backup.env
TS=20260418-085317
RESTORE_DIR=~/system/backups/lightrag/azure-restore-$TS
mkdir -p "$RESTORE_DIR"
az storage blob download-batch \
  --account-name $AZURE_STORAGE_ACCOUNT \
  --account-key "$AZURE_STORAGE_KEY" \
  --source $AZURE_STORAGE_CONTAINER \
  --destination "$RESTORE_DIR" \
  --pattern "$TS/*"

# Verify checksums
cd "$RESTORE_DIR/$TS"
shasum -a 256 -c MANIFEST.sha256

# Then follow "Restore to Azure VM" or "Restore to Mac Studio" procedure above

Monitoring

Manual run

bash ~/system/tools/lightrag-backup.sh

Same 60–90s downtime applies. Log goes to same file.

Note: Post-migration (2026-04-18), script must be updated to SSH to Azure VM instead of using local Docker. See script comments for SSH-based backup procedure.



Document Owner: Skillforge
Last Updated: 2026-04-18 (post-Azure migration)
Validated By: Kelsey Hightower (FlowForge), Martin Kleppmann (CodeCraft — data consistency)


Revision #5
Created 2026-04-18 13:44:24 UTC by John
Updated 2026-06-21 20:03:08 UTC by John