LightRAG Backup (Azure-native + local safety net)
LightRAG Backup (Azure-native + local safety net)
Owner: FlowForge (infra)
Implemented: 2026-04-18 (updated for Azure migration 2026-04-18)
Source of Truth: Azure VM vm-alai-lightrag (20.240.61.67)
Schedule: Weekly Sunday 04:00 CEST
Script: ~/system/tools/lightrag-backup.sh (SSH-based)
Plist: ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
Azure creds: ~/system/config/azure-lightrag-backup.env (mode 0600)
What is backed up
4 Docker volumes (+ checksums + README):
| Volume | Content | Typical size |
|---|---|---|
lightrag-data |
LightRAG KV store + inputs | ~300 MB |
lightrag-kg |
Knowledge graph files | small |
lightrag-cache |
LLM response cache | small |
lightrag-neo4j-data |
Neo4j graph entities + relations | ~170 MB |
Typical total: 500 MB – 1 GB compressed.
How it runs (POST-MIGRATION)
Source: Azure VM vm-alai-lightrag (20.240.61.67)
- SSH to Azure VM:
ssh -i ~/.ssh/azure_alai [email protected] docker compose stop lightrag neo4j— graceful shutdown (~30s downtime)docker run alpine tar czfdumps each volume on VMdocker compose start neo4j lightrag— resumeshasum -a 256 *.tar.gz > MANIFEST.sha256on VM- Write
README.mdwith restore procedure on VM - SCP from VM to Mac Studio — download snapshot to
~/system/backups/lightrag/(safety net) - Azure offsite upload — Cool tier blob
plockfrontstaging/lightrag-backup/<TS>/ - Azure rotation — keep last 8 snapshots (longer offsite retention)
- Local rotation — keep last 4 snapshots in
~/system/backups/lightrag/(7-day safety, then deletable)
Downtime: ~60–90s every Sunday 04:00 (cloud LightRAG unavailable during backup).
Key change: Local Docker volumes are NO LONGER the source of truth. Azure VM volumes are primary. Local backups are now safety net only.
Why NOT docker compose pause
pause freezes LightRAG's async event loop. On unpause, uvicorn stays "running" but HTTP handler doesn't service new requests (container reports unhealthy). Requires full container restart to recover. The backup on 2026-04-18 hit this — backup itself was fine (volumes at rest during pause), but container needed restart afterwards. Switched to stop/start for future runs.
Azure storage details
- Account:
plockfrontstaging(swedencentral, Hot storage account) - Container:
lightrag-backup - Resource group:
plock-staging-rg - Tier per blob: Cool (cheaper — ~$0.01/GB/month for archived reads)
- Retention: last 8 snapshots (~8 weeks)
- Estimated cost: ~$0.05–0.10/month for ~4 GB retained
Restore procedure
Restore to Azure VM (primary, production)
# On Mac Studio: pick snapshot
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }
# SCP to Azure VM
scp -i ~/.ssh/azure_alai -r "$SNAPSHOT" [email protected]:/tmp/restore/
# SSH to Azure VM
ssh -i ~/.ssh/azure_alai [email protected]
# On Azure VM:
cd /tmp/restore/$(basename "$SNAPSHOT")
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }
cd ~/lightrag
docker compose down
for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
docker volume rm $vol || true
docker volume create $vol
docker run --rm -v $vol:/dst -v /tmp/restore/$(basename "$SNAPSHOT"):/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done
docker compose up -d
# Verify
curl http://localhost:9621/health
# From Mac Studio:
curl https://lightrag.basicconsulting.no/health
Restore to Mac Studio (rollback/emergency only)
Use case: Azure VM failure, need to restore local LightRAG as emergency fallback.
cd ~/system/docker/lightrag
docker compose down
# Pick a snapshot (local or download from Azure first)
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }
for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
docker volume rm $vol || true
docker volume create $vol
docker run --rm -v $vol:/dst -v "$SNAPSHOT":/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done
cd ~/system/docker/lightrag
docker compose up -d
# Verify
curl http://localhost:9621/health
# IMPORTANT: Update consumer files to use localhost:9621 instead of cloud endpoint
# (see azure-lightrag-migration.md rollback procedure)
Azure Blob restore (download offsite backup)
Use case: Local backups lost, need to restore from Azure Blob offsite storage.
source ~/system/config/azure-lightrag-backup.env
TS=20260418-085317
RESTORE_DIR=~/system/backups/lightrag/azure-restore-$TS
mkdir -p "$RESTORE_DIR"
az storage blob download-batch \
--account-name $AZURE_STORAGE_ACCOUNT \
--account-key "$AZURE_STORAGE_KEY" \
--source $AZURE_STORAGE_CONTAINER \
--destination "$RESTORE_DIR" \
--pattern "$TS/*"
# Verify checksums
cd "$RESTORE_DIR/$TS"
shasum -a 256 -c MANIFEST.sha256
# Then follow "Restore to Azure VM" or "Restore to Mac Studio" procedure above
Monitoring
- Log:
~/system/logs/lightrag-backup.log(on Mac Studio, backup orchestrator) - Latest snapshot size (local):
du -sh ~/system/backups/lightrag/ - Latest snapshot size (Azure VM):
ssh -i ~/.ssh/azure_alai [email protected] 'du -sh ~/lightrag-backups/' - Azure blob list:
source ~/system/config/azure-lightrag-backup.env az storage blob list \ --account-name $AZURE_STORAGE_ACCOUNT \ --account-key "$AZURE_STORAGE_KEY" \ --container-name $AZURE_STORAGE_CONTAINER \ --prefix lightrag-backup/ \ -o table - Post-run LightRAG health: logged as last line of each run (should show
{"status":"healthy"}fromhttps://lightrag.basicconsulting.no/health)
Manual run
bash ~/system/tools/lightrag-backup.sh
Same 60–90s downtime applies. Log goes to same file.
Note: Post-migration (2026-04-18), script must be updated to SSH to Azure VM instead of using local Docker. See script comments for SSH-based backup procedure.
Related Runbooks
- Azure LightRAG Migration: azure-lightrag-migration.md — full migration context, rollback procedure
- Ollama Cloudflare Tunnel: ollama-cloudflare-tunnel.md — tunnel that LightRAG uses for inference
Document Owner: Skillforge
Last Updated: 2026-04-18 (post-Azure migration)
Validated By: Kelsey Hightower (FlowForge), Martin Kleppmann (CodeCraft — data consistency)