archive.alai.no — Paperless-ngx Setup & Operations
archive.alai.no — Paperless-ngx Setup & Operations
URL: https://archive.alai.no
Backend: Paperless-ngx (image ghcr.io/paperless-ngx/paperless-ngx:latest)
Host: Azure VM 4.223.110.181 (alai-admin)
Container: alai-paperless-1 (with redis, gotenberg, tika sidecars)
MC reference: #9546, #9982 (DR backup TODO)
Document management system za sve ALAI-srodne legalne, ugovorne, partnerske, istraživačke i finansijske dokumente. OCR, full-text search, taxonomy.
Access requirements
CF stack (oba sloja) traže 92.221.168.61/32 (ALAI LAN egress) u bypass listama. Vidi CF IP Access Rules — ALAI LAN Bypass.
Iz Mac Studio sa aktivnim VPN-om: bind interface 192.168.68.65 (Deco LAN) zaobilazi VPN routing:
curl --interface 192.168.68.65 https://archive.alai.no/...
Mac Air i ostali bez VPN-a: direktno radi.
API authentication
Paperless koristi DRF Token auth.
Token za admin user (root@localhost) sačuvan lokalno na Mac Studio:
~/.config/alai/paperless-token.env (mode 600)
PAPERLESS_TOKEN=c9ec30192db3c95802349335edea4bca864a937a
PAPERLESS_BASE=https://archive.alai.no
PAPERLESS_BIND_INTERFACE=192.168.68.65
Svi API zahtjevi:
Authorization: Token c9ec30192db3c95802349335edea4bca864a937a
Regenerate token (ako compromised — Django shell preko docker exec):
ssh -b 192.168.68.65 -i ~/.ssh/azure_alai [email protected] \
'docker exec alai-paperless-1 python manage.py shell -c "
from rest_framework.authtoken.models import Token
from django.contrib.auth import get_user_model
u = get_user_model().objects.get(username=\"admin\")
Token.objects.filter(user=u).delete()
print(Token.objects.create(user=u).key)
"'
Schema (taxonomy)
Setup-ovan 2026-04-28 preko /tmp/paperless-setup.sh. ID-evi mogu varirati po instanci — koristi name__iexact za lookup.
Document Types (14)
Contract, LOI, NDA, Registration, Insurance Policy, Research Paper, Invoice, Receipt, Email Archive, Identity Document, Tax Document, Financial Statement, Meeting Notes, Pitch Deck
Tags (23, color-coded)
Cross-cutting (cilj): legal, research, kuran-19, partnership, regulator, contract, nda, loi, invoice, registration, urgent, signed, pending-signature
Storage Paths (21)
Folder hijerarhija po kompaniji + funkciji:
/ALAI/legal/{created_year}/{title}
/ALAI/research/kuran-19/{title}
/ALAI/research/general/{created_year}/{title}
/ALAI/partnerships/sintef/{title}
/ALAI/partnerships/intesa/{title}
/ALAI/partnerships/pbz/{title}
/ALAI/regulators/finanstilsynet/{created_year}/{title}
/ALAI/regulators/skatteetaten/{created_year}/{title}
/ALAI/regulators/bronnoysund/{created_year}/{title}
/ALAI/contacts/{title}
/Drop/legal/{created_year}/{title}
/Drop/contracts/{title}
/Bilko/legal/{created_year}/{title}
/Bilko/contracts/{title}
/Tok/legal/{created_year}/{title}
/Lobby/legal/{created_year}/{title}
/LumisCare/legal/{created_year}/{title}
/Plock/legal/{created_year}/{title}
/ALAI-Tech-DOO/legal/{created_year}/{title}
/BasicConsulting/{created_year}/{title}
/clients/Entur/{created_year}/{title}
Initial Correspondents (25, expanded as docs ingest)
SINTEF, Finanstilsynet, Skatteetaten, Brønnøysundregistrene, PBZ Zagreb, Intesa Sanpaolo, Anthropic, Cloudflare, Tryg, Fiken AS, Entur AS — auto-create on classify match.
Upload workflow
Manual single file
source ~/.config/alai/paperless-token.env
curl -s --interface "$PAPERLESS_BIND_INTERFACE" \
-H "Authorization: Token $PAPERLESS_TOKEN" \
-F "title=My Document" \
-F "storage_path=1" \
-F "tags=30" -F "tags=17" \
-F "document=@/path/to/file.pdf" \
-X POST "$PAPERLESS_BASE/api/documents/post_document/"
Returns task UUID. Verify success via:
curl ... "$PAPERLESS_BASE/api/tasks/?task_id=<UUID>"
Batch upload sa klasifikacijom
Skripta: /tmp/paperless-classify-v2.py (commit u repo-u TBD)
python3 /tmp/paperless-classify-v2.py --dry --all # dry-run all ~/ALAI/*
python3 /tmp/paperless-classify-v2.py --all # actual upload
python3 /tmp/paperless-classify-v2.py FILE [FILE...] # specific files
Klasifikator mapira path → (storage_path, correspondent, document_type, tags) prema rules engine-u. Pre-upload dedup po normalized title; Paperless takođe ima vlastiti content-hash dedup (rejects file ako mu je sadržaj već prisutan).
Operations cheat sheet
# Document count
curl ... "$BASE/api/documents/?page_size=1" | jq '.count'
# Latest 10 docs
curl ... "$BASE/api/documents/?ordering=-created&page_size=10" | jq '.results[]|{id,title,created}'
# Search by tag
curl ... "$BASE/api/documents/?tags__id=17" | jq '.results[].title'
# Search by storage path
curl ... "$BASE/api/documents/?storage_path__id=1"
# Full-text search (OCR'd content)
curl ... "$BASE/api/documents/?query=finanstilsynet"
# Task queue status
curl ... "$BASE/api/tasks/?page_size=200" | jq 'group_by(.status)|map({status:.[0].status,count:length})'
# Failed tasks (often = content duplicates)
curl ... "$BASE/api/tasks/" | jq '[.[]|select(.status=="FAILURE")|{file:.task_file_name,reason:.result}]'
Architecture
[ALAI LAN egress 92.221.168.61]
│
▼
[Cloudflare]
├─ IP Access Rule: bypass WAF (Layer 1)
└─ CF Access policy: bypass Zero Trust (Layer 2)
│
▼
[Caddy on Azure VM 4.223.110.181]
archive.alai.no → paperless-ngx:8000
│
▼
[alai-paperless-1 container]
├─ alai-paperless-redis-1 (queue)
├─ alai-paperless-gotenberg-1 (PDF preview)
└─ alai-paperless-tika-1 (text extraction)
│
▼
[Postgres + media volume on Azure VM]
Outstanding (TODO)
- MC #9982 — DR backup automation: pg_dump cron + media volume snapshot + B2/R2 upload + 30-day retention
- Bitwarden token storage —
bw create itemblocked by node 25 incompat (Invalid versionerror). Manually add via Vaultwarden web UI ako traje - Token rotation policy — currently no expiry; consider 90-day rotation za admin token
- Per-user tokens — kreiraj user-specific tokens za audit trail (admin token shared = no per-user audit)
Related
- CF IP Access Rules — ALAI LAN Bypass — both layers documented
- DEPLOY-MAP — System Infrastructure — CF Access policies + Paperless API entry
- ZAKON NETWORK EGRESS — VPN exit vs ISP egress
- Incident origin: 2026-04-28 ALAI legal docs upload task — discovered Paperless instance had 58 pre-existing docs; after dedup-aware bulk upload, 99 docs total