Ollama Cloudflare Tunnel — Exposing Local Inference to Cloud

Ollama Cloudflare Tunnel — Exposing Local Inference to Cloud

Domain note (2026-05-17): This doc refers to ollama.basicconsulting.no — the legacy hostname. Current live endpoint: ollama.alai.no. Historical examples below retain original hostname for accuracy.

Owner: FlowForge (infra)
Implemented: 2026-04-18
Purpose: Expose Mac Studio Ollama (FORGE 10.0.0.2:11434) to Azure VM LightRAG via Cloudflare tunnel with Zero Trust IP whitelist


Why This Tunnel

Problem: LightRAG migrated to Azure VM to eliminate Docker Desktop single point of failure. But Ollama inference stays on Mac Studio (FORGE hardware, 40 local models including q8_0 quantizations).

Solution: Cloudflare tunnel from Mac Studio exposes ollama.basicconsulting.no → FORGE Ollama. Azure VM LightRAG calls this endpoint for LLM/embedding inference.

Trade-off:


Tunnel Configuration

Location: ~/.cloudflared/config.yml (Mac Studio)

tunnel: 3315a609-7934-45c5-ad0c-56d86d16374d
credentials-file: /Users/makinja/.cloudflared/3315a609-7934-45c5-ad0c-56d86d16374d.json

ingress:
  # ... other services ...
  
  - hostname: ollama.basicconsulting.no
    service: http://10.0.0.2:11434
  
  - service: http_status:404

Key points:

Restart tunnel after config changes:

launchctl kickstart -k gui/$(id -u)/com.john.cloudflared

Verify tunnel is up:

ps aux | grep cloudflared
curl https://ollama.basicconsulting.no/api/tags
# Should return JSON list of models

Zero Trust Policy — IP Whitelist

Policy Name: "Ollama Azure VM Only"
Application: ollama.basicconsulting.no
Type: Bypass (wildcard) + IP restrictions

Why bypass instead of strict Zero Trust auth:

Whitelisted IPs:

  1. Azure VM egress: Check current VM egress IP: ssh alai-admin@20.240.61.67 'curl -s https://ifconfig.co'
  2. Mac Studio (backup/testing): 46.46.251.40 (residential ISP, may rotate — see maintenance)

How policy works:

  1. Azure VM LightRAG makes HTTPS request to ollama.basicconsulting.no
  2. Cloudflare edge checks source IP against whitelist
  3. If match: forward to Mac Studio tunnel → FORGE Ollama
  4. If no match: return 403 Forbidden

Test from Azure VM:

ssh alai-admin@20.240.61.67
curl -s https://ollama.basicconsulting.no/api/tags | jq '.models | length'
# Should return model count (e.g., 12)

Test from random IP (should fail):

# From any non-whitelisted location
curl https://ollama.basicconsulting.no/api/tags
# Expected: 403 Forbidden or similar

IP Whitelist Maintenance

CRITICAL: Mac Studio ISP IP (46.46.251.40) is residential and WILL rotate periodically. When it does, both SSH to Azure VM and direct testing from Mac Studio will fail for Ollama tunnel testing.

Check Current Mac Studio IP

curl https://ifconfig.co
# Use ifconfig.co or icanhazip.com
# DO NOT use ifconfig.me (returns CDN IP on some networks)

Update NSG Rule (for Azure VM to access Mac Studio)

NEW_IP=$(curl -s https://ifconfig.co)

# Update SSH rule
az network nsg rule update \
  -g rg-alai-lightrag \
  --nsg-name vm-alai-lightragNSG \
  -n default-allow-ssh \
  --source-address-prefixes "${NEW_IP}/32"

# Update LightRAG access rule (if used for direct access)
az network nsg rule update \
  -g rg-alai-lightrag \
  --nsg-name vm-alai-lightragNSG \
  -n allow-lightrag-macstudio \
  --source-address-prefixes "${NEW_IP}/32"

Update Cloudflare Zero Trust Policy

Option 1: Cloudflare Dashboard

  1. Log in to Cloudflare Dashboard → Zero Trust
  2. Navigate to Access → Applications → "Ollama Azure VM Only"
  3. Edit policy → Update IP whitelist with new Mac Studio IP
  4. Save (takes effect within 30s)

Option 2: Cloudflare API (for automation)

# Get account ID and policy ID first (see Cloudflare API docs)
# Then use PATCH to update policy rules
# (Exact curl command omitted — requires API token with Zero Trust write access)

Verification after update:

curl https://ollama.basicconsulting.no/api/tags
# Should work from new Mac Studio IP

Failure Modes + Detection

Failure 1: Tunnel Process Down

Symptom: curl https://ollama.basicconsulting.no/api/tags returns connection timeout or 502 Bad Gateway.

Diagnosis:

ps aux | grep cloudflared
# If no process, tunnel is down

tail -f ~/Library/Logs/cloudflared/cloudflared.log
# Check for errors

Fix:

launchctl kickstart -k gui/$(id -u)/com.john.cloudflared
# Wait 10-15s
curl https://ollama.basicconsulting.no/api/tags

Persistent failure: Check launchd plist:

launchctl list | grep cloudflared
# Should show com.john.cloudflared

# If missing, reload plist
launchctl load ~/Library/LaunchAgents/com.john.cloudflared.plist

Failure 2: Model Not on Target Backend

Symptom: LightRAG logs show "model qwen2.5-coder:32b-instruct-q8_0 not found" or similar.

Diagnosis:

curl -s https://ollama.basicconsulting.no/api/tags | jq '.models[].name'
# Check which models are exposed

Cause: Tunnel points to wrong Ollama backend (ANVIL vs FORGE).

Fix:

# Check config
cat ~/.cloudflared/config.yml | grep -A2 "ollama.basicconsulting.no"

# Should be:
# service: http://10.0.0.2:11434  (FORGE)

# If wrong (e.g., http://localhost:11434 = ANVIL):
# Edit config, fix service URL
# Restart tunnel
launchctl kickstart -k gui/$(id -u)/com.john.cloudflared

Historical incident: 2026-04-18 mid-migration — tunnel initially pointed to ANVIL (localhost:11434). LightRAG couldn't find q8_0 models (only on FORGE). Changed to 10.0.0.2:11434, resolved immediately.


Failure 3: IP Whitelist Mismatch (403 Forbidden)

Symptom: Azure VM LightRAG logs show "403 Forbidden" or "Access denied" when calling Ollama endpoint.

Diagnosis:

# From Azure VM
ssh alai-admin@20.240.61.67
curl -v https://ollama.basicconsulting.no/api/tags 2>&1 | grep -E "HTTP|403"
# If 403, IP not whitelisted

# Check VM egress IP
curl -s https://ifconfig.co

Fix: Update Zero Trust policy (see IP Whitelist Maintenance section above).


Failure 4: Latency Spike (>500ms for /api/tags)

Symptom: Slow LightRAG responses; Ollama calls taking >1s for simple requests.

Diagnosis:

# From Azure VM
time curl -s https://ollama.basicconsulting.no/api/tags > /dev/null
# Should be 30-80ms typically

# From Mac Studio (local baseline)
time curl -s http://10.0.0.2:11434/api/tags > /dev/null
# Should be <10ms

Possible causes:

  1. Mac Studio network issue: Check Wi-Fi/Ethernet, router, ISP
  2. Cloudflare edge routing: Rare but possible; check Cloudflare status page
  3. FORGE overloaded: Other processes using Ollama heavily

Fix 3 (FORGE overload):

curl http://10.0.0.2:11434/api/ps
# Check running models and concurrent requests
# Identify and throttle/stop competing workloads

Performance Characteristics

Expected latency (Azure swedencentral ↔ Mac Studio Oslo):

Bandwidth: Not a bottleneck. Ollama API uses JSON over HTTPS; typical request/response <100KB except for large context prompts.

Throughput: Tunnel supports multiple concurrent requests. Bottleneck is FORGE hardware, not tunnel.

Cloudflare Tunnel SLA: 99.99% uptime (per Cloudflare SLA for paid plans). ALAI on Free plan but historically stable.


Security Considerations

Current model: IP whitelist via Cloudflare Zero Trust bypass policy.

Threat model:

Future hardening options:

  1. Service tokens: Replace IP whitelist with Cloudflare service token in request headers
  2. Mutual TLS: Require client cert from Azure VM
  3. VPN: Azure VNet peering to Mac Studio (complex, likely overkill)

Current assessment: IP whitelist sufficient for internal infrastructure. Service tokens recommended if IP rotation becomes operationally painful.


Monitoring

Health check (from Mac Studio):

curl https://ollama.basicconsulting.no/api/tags
# Should return model list

Health check (from Azure VM):

ssh alai-admin@20.240.61.67 'curl -s https://ollama.basicconsulting.no/api/tags | jq ".models | length"'
# Should return model count

Tunnel logs:

tail -f ~/Library/Logs/cloudflared/cloudflared.log

Cloudflare Analytics:



Rollback / Emergency Cutover

If tunnel becomes persistently unstable:

Option 1: Move LightRAG back to Mac Studio (see azure-lightrag-migration.md rollback procedure).

Option 2: Deploy Ollama to Azure (longer-term, requires GPU VM or accept slower inference on CPU):

  1. Provision Azure VM with GPU (e.g., Standard_NC4as_T4_v3, ~$500/month)
  2. Install Ollama on Azure VM
  3. Pull required models (qwen2.5-coder:32b-instruct-q8_0, bge-m3:latest)
  4. Update LightRAG .env: LLM_BINDING_HOST=http://localhost:11434
  5. Test inference latency (will be slower than FORGE M2 Ultra)

Option 3: Use Ollama Cloud / OpenAI API (cost implications, loses on-prem privacy):

Recommendation: Keep current tunnel setup unless persistent failures. FORGE uptime historically excellent.


Document Owner: Skillforge
Last Updated: 2026-04-18
Validated By: Kelsey Hightower (FlowForge), Parisa Tabriz (Securion — security review)


Revision #5
Created 2026-04-18 23:16:21 UTC by John
Updated 2026-06-21 20:03:09 UTC by John