LightRAG Backup (local + Azure offsite)
LightRAG Backup (local + Azure offsite)
Owner: FlowForge (infra)
Implemented: 2026-04-18
Schedule: Weekly Sunday 04:00 CEST
Script: ~/system/tools/lightrag-backup.sh
Plist: ~/Library/LaunchAgents/com.alai.lightrag-backup.plist
Azure creds: ~/system/config/azure-lightrag-backup.env (mode 0600)
What is backed up
4 Docker volumes (+ checksums + README):
| Volume | Content | Typical size |
|---|---|---|
lightrag-data |
LightRAG KV store + inputs | ~300 MB |
lightrag-kg |
Knowledge graph files | small |
lightrag-cache |
LLM response cache | small |
lightrag-neo4j-data |
Neo4j graph entities + relations | ~170 MB |
Typical total: 500 MB – 1 GB compressed.
How it runs
docker compose stop lightrag neo4j— graceful shutdown (~30s downtime)docker run alpine tar czfdumps each volumedocker compose start neo4j lightrag— resumeshasum -a 256 *.tar.gz > MANIFEST.sha256- Write
README.mdwith restore procedure - Local rotation — keep last 4 snapshots in
~/system/backups/lightrag/ - Azure offsite upload — Cool tier blob
plockfrontstaging/lightrag-backup/<TS>/ - Azure rotation — keep last 8 snapshots (longer offsite retention)
Downtime: ~60–90s every Sunday 04:00.
Why NOT docker compose pause
pause freezes LightRAG's async event loop. On unpause, uvicorn stays "running" but HTTP handler doesn't service new requests (container reports unhealthy). Requires full container restart to recover. The backup on 2026-04-18 hit this — backup itself was fine (volumes at rest during pause), but container needed restart afterwards. Switched to stop/start for future runs.
Azure storage details
- Account:
plockfrontstaging(swedencentral, Hot storage account) - Container:
lightrag-backup - Resource group:
plock-staging-rg - Tier per blob: Cool (cheaper — ~$0.01/GB/month for archived reads)
- Retention: last 8 snapshots (~8 weeks)
- Estimated cost: ~$0.05–0.10/month for ~4 GB retained
Restore procedure
cd ~/system/docker/lightrag
docker compose down
# Pick a snapshot (local or download from Azure first)
SNAPSHOT=~/system/backups/lightrag/20260418-085317
cd "$SNAPSHOT"
shasum -a 256 -c MANIFEST.sha256 || { echo "checksum mismatch, abort"; exit 1; }
for vol in lightrag-data lightrag-kg lightrag-cache lightrag-neo4j-data; do
docker volume rm $vol || true
docker volume create $vol
docker run --rm -v $vol:/dst -v "$SNAPSHOT":/src alpine tar xzf /src/${vol}.tar.gz -C /dst
done
cd ~/system/docker/lightrag
docker compose up -d
# Verify
curl http://localhost:9621/health
Azure restore
source ~/system/config/azure-lightrag-backup.env
TS=20260418-085317
RESTORE_DIR=/tmp/lightrag-restore-$TS
mkdir -p "$RESTORE_DIR"
az storage blob download-batch \
--account-name $AZURE_STORAGE_ACCOUNT \
--account-key "$AZURE_STORAGE_KEY" \
--source $AZURE_STORAGE_CONTAINER \
--destination "$RESTORE_DIR" \
--pattern "$TS/*"
# Then follow local restore procedure pointing at $RESTORE_DIR/$TS
Monitoring
- Log:
~/system/logs/lightrag-backup.log - Latest snapshot size:
du -sh ~/system/backups/lightrag/ - Azure blob list:
az storage blob list ... - Post-run LightRAG health: logged as last line of each run
Manual run
bash ~/system/tools/lightrag-backup.sh
Same 60–90s downtime applies. Log goes to same file.