Agentic OS — Design Specs
Pillar-level design specifications for the ALAI Agentic OS v1 hardening program (MC #99063). Includes Pillar #3 L3 Memory and Pillar #9 VM Runtime specs.
- Pillar #3 — L3 Memory Framework Comparison Spec (2026-05-04)
- ADR — P2P Agent Communication Pattern Evaluation (MC #101959)
Pillar #3 — L3 Memory Framework Comparison Spec (2026-05-04)
Agentic OS — Pillar #3 L3 Memory Framework Comparison Spec
MC: #99124 Status: DESIGN (read-only; no infra changes) Date: 2026-05-04 Parent: MC #99063 (Agentic OS v1 hardening) Output: /Users/makinja/system/specs/agentic-os-pillar3-l3memory-2026-05-04.md Evidence artifact: /tmp/forged-99124-evidence.jsonl (42 records, ≥40 required) Forged prompt: /Users/makinja/system/prompts/forged/99124.md Mehanik clearance: [CEO_APPROVED] (dispatch token from orchestrator)
§0 — Frontmatter
Scope: Comparative evaluation of five L3 memory framework candidates for ALAI John orchestrator. Declares singular winner + named runner-up. Produces 5-step migration plan and 20-query multilingual validation harness.
Frameworks evaluated (5, closed set): 1. Mem0 self-hosted (incumbent, deployed) 2. claude-mem (installed, not primary) 3. mem-search (researched) 4. Memipalace (researched) 5. LightRAG-resurrect (existing VM)
CEO-locked constraints (source: project_99063_pillar9_pillar7_scope_2026-05-04.md): - Budget: $30/month soft combined Pillar #9 + Pillar #3 (“može više” with CEO gate) - Auth model: OAuth Claude Max subscription only (no API tokens) - Multi-client scope: SVE — all ALAI products + all active clients - EU residency: no SaaS memory backends
Incumbent pre-commitment:
stop-hook-l3-memory-spec.md (MC #99071) pre-selected Mem0.
This spec defends or overrides that commitment with evidence.
§1 — Executive Summary
Mem0 self-hosted is confirmed as the L3 memory winner. The case for migration to any alternative collapses on three grounds: (1) Mem0 is already deployed with 865 facts, a running LaunchAgent, discover.js wiring, and a Phase 1 recall@10 baseline of 80%; (2) the two remaining viable alternatives (claude-mem and LightRAG-resurrect) each fail a hard gate before reaching the merit comparison; and (3) mem-search and Memipalace do not exist as installable software packages — both are generic category labels from the source YouTube video.
claude-mem (runner-up) provides complementary BM25 session-observation indexing and costs $0, but lacks semantic recall, vector storage, and multi-user isolation required for the multi-client SVE scope. It belongs in the L3 fallback chain as L3a (already wired in discover.js) but cannot replace Mem0 as the primary semantic memory backend.
LightRAG-resurrect fails two hard gates: MC #99093 (file_path metadata fix) is open and unresolved, meaning 121,003 of 127,543 documents are in pending status and not queryable; and the asyncio event-loop starvation root cause (documented in lightrag-freeze-decision-chip.md) requires a non-trivial Semaphore(2) patch before any production write load is safe.
The existing stop-hook-l3-memory-spec.md Mem0 pre-commitment is DEFENDED. The Phase 2 activation checklist (session-extract.js + Stop hook) remains the correct next step.
Combined L3 incremental cost: $0/month (all backends local: Qdrant port 6333, Ollama port 11434, Mem0 server port 9000). This leaves the full $30/month envelope for Pillar #9 VM.
§2 — Current State (Machine-Verified 2026-05-04)
All probes executed 2026-05-04T21:07-21:14Z. No session-context citations.
§2.1 — LightRAG VM Probe
Probe:
curl -s --max-time 10 http://20.240.61.67:9621/health
Result (2026-05-04T21:07Z):
status: healthy
pipeline_busy: false
core_version: 1.3.4
api_version: 0154
llm_binding: ollama
llm_binding_host: https://ollama.basicconsulting.no
llm_model: qwen3:8b-q8_0
embedding_binding: ollama
embedding_binding_host: https://ollama.basicconsulting.no
embedding_model: bge-m3:latest
graph_storage: Neo4JStorage
vector_storage: NanoVectorDBStorage
kv_storage: JsonKVStorage
enable_llm_cache: true
auth_mode: disabled
Document corpus probe (az vm run-command, 2026-05-04T21:14Z):
total_docs: 127,543
status_pending: 121,003
status_processed: 5,596
status_failed: 944
unknown_source_count: 40,330
unknown_ratio_pct: 31.6%
Interpretation: Effective recall corpus = 5,596
processed docs only. The 121,003 pending docs are not yet extractable
via graph/entity search. 31.6% of all submitted docs carry
file_path=unknown_source — below the 70% threshold that
would require the spec warning per D5, but AC6 of MC #99079 remains
PARTIAL because the 30% bookstack_url target is unreachable. EVIDENCE:
az vm run-command python3 count 2026-05-04T21:14Z →
unknown_source_count:40330
§2.2 — discover.js Memory Query State
Probe:
grep -n "DISCOVER_USE_FALLBACK_CHAIN" /Users/makinja/system/tools/discover.js
Result:
line 58: // Feature-flagged: DISCOVER_USE_FALLBACK_CHAIN=1 to enable (default OFF)
line 60: const USE_FALLBACK_CHAIN = process.env.DISCOVER_USE_FALLBACK_CHAIN === '1';
line 793: // Activated when DISCOVER_USE_FALLBACK_CHAIN=1
line 1228: // L3 Fallback Chain (MC #99071, DISCOVER_USE_FALLBACK_CHAIN=1)
Status: L3 fallback chain (claude-mem → Mem0 → LightRAG) is implemented in discover.js but NOT activated in production. Default is OFF. Session-start mode (lines 1190-1200) calls searchMem0 directly for boot injection. EVIDENCE: /Users/makinja/system/tools/discover.js lines 58-60 (file confirmed on disk)
§2.3 — MEMORY.md Auto-Write Status
Probe:
ls -la /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md
Result (2026-05-04T21:08Z):
-rw-r--r-- 1 makinja staff 19150 4 mai 21:02 MEMORY.md
Status: MEMORY.md is manually maintained (19,150 bytes, last written 21:02 same day). Auto-write gap is NOT closed — session-extract.js (stop hook) is not yet activated. stop-hook-l3-memory-spec.md §Implementation Details: “NOT yet added to settings.json Stop hooks array. Phase 2 activation.” EVIDENCE: file size + mtime confirmed; stop-hook-l3-memory-spec.md line 35
§2.4 — Existing Mem0 Footprint
Probes executed:
Qdrant collection:
curl http://localhost:6333/collections/mem0_john
points_count: 865
indexed_vectors_count: 0 (below HNSW threshold 10,000 — full scan active)
vector_size: 1024 (Cosine)
status: green
EVIDENCE: curl http://localhost:6333/collections/mem0_john 2026-05-04T21:07Z
Mem0 server health:
curl http://localhost:9000/health
{status: healthy, backend: qdrant, llm: qwen3:8b-q8_0@ollama, embedder: bge-m3@ollama,
collections: [mem0migrations, sessions, hivemind, mem0_john, knowledge]}
EVIDENCE: curl http://localhost:9000/health 2026-05-04T21:07Z
LaunchAgent status:
com.alai.mem0-server: mode:keepalive, state:running, pid:65706, last_exit:15
EVIDENCE: /Users/makinja/system/state/daemon-fleet-status.json grep com.alai.mem0-server → state:running pid:65706 last_exit:15
last_exit=15 investigation: Exit code 15 = SIGTERM (Unix signal). Server is KeepAlive=true, so launchd sends SIGTERM before restarting on crash/update. Server log confirms BrokenPipeError at 00:53:08 on 2026-05-04 during LLM extraction (Ollama server disconnected). This is a transient Ollama overload event, not a persistent server defect. Server resumed and is currently healthy (PID 65706, /health returns 200). No action required for MC #99124. EVIDENCE: /Users/makinja/system/mem0/server.log tail-30 → 2026-05-04 00:53:08 LLM extraction failed BrokenPipeError
Package version: mem0ai-2.0.1.dist-info confirmed in /Users/makinja/system/mem0/.venv/lib/python3.12/site-packages/ EVIDENCE: ls /Users/makinja/system/mem0/.venv/lib/python3.12/site-packages/ | grep mem0 → mem0ai-2.0.1.dist-info
LaunchAgent plist confirmed at ~/Library/LaunchAgents/com.alai.mem0-server.plist (1118 bytes, KeepAlive=true, RunAtLoad=true). EVIDENCE: cat ~/Library/LaunchAgents/com.alai.mem0-server.plist → MEM0_API_KEY=’’ (blank, enforcing local-only)
§2.5 — Memory File Inventory
Probe:
ls /Users/makinja/.claude/projects/-Users-makinja/memory/*.md | wc -l
Result: 96 .md files, 816K total directory size
Most-queried categories (by file count and content): - feedback_.md: 23 files (error patterns, CEO feedback) - project_.md: 15 files (project postflights and outcomes) - reference_*.md: 3 files (hook system, architecture) - MEMORY.md: master index (19,150 bytes) - MEMORY-products.md, MEMORY-ops.md: product and ops context
stop-hook-l3-memory-spec.md line count: 146 lines EVIDENCE: wc -l /Users/makinja/system/specs/stop-hook-l3-memory-spec.md → 146
§3 — Feature Matrix (D1 / AC#1)
Key: S=small (<8h), M=medium (<80h), L=large (>80h); EVIDENCE lines follow each cell.
| Framework | storage_backend | embedding_model | extraction_method | recall_at_10 | latency_p50_ms | multi_user_isolation | oauth_compatible | self_hosted_capable | license | last_release_date | maintainer_health | notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mem0 self-hosted | Qdrant (local port 6333) | bge-m3:latest 1024-dim (Ollama) | LLM fact extraction (qwen3:8b-q8_0) then vector store | 80% (Phase 1 baseline) | ~200ms (full scan at 865 pts, no HNSW index) | Partial — user_id=‘john’ hardcoded; Qdrant payload_schema supports user_id keyword | YES — no API key; all local Ollama | YES (deployed) | Apache-2.0 (mem0ai PyPI) | 2026-05-04 (v2.0.1) | Active (mem0ai org, VC-backed OSS) | integration_effort=S (already deployed, 865 facts, discover.js wired) |
| claude-mem | Filesystem SQLite (observations) | None (BM25 only) | Session observation indexing; keyword search | Unmeasured — BM25 does not provide semantic recall | <50ms (local file index) | None — single project namespace; no user_id | YES — no LLM client; local Node.js daemon | YES (v12.5.0 installed) | AGPL-3.0 | 2026-05-04 (v12.5.0 active) | Active (thedotmack, 12.x release series) | L3a BM25 layer only; cannot replace semantic Mem0 |
| mem-search | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | No installable package exists; YouTube video uses ‘mem search’ as category label |
| Memipalace | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | NOT VIABLE | GitHub API: 0 repos found; YouTube says ‘me palace’ for L4 verbatim recall — mnemonic concept not software |
| LightRAG-resurrect | Neo4J (graph) + NanoVectorDB (vector) + JsonKV | bge-m3:latest (Ollama via CF tunnel) | Graph entity extraction + relationship traversal | Unmeasured — 5,596 processed docs; 121,003 pending | N/A (asyncio starvation risk; /health 15-30s hang during freeze) | None — no user_id partitioning; single-tenant | YES — Ollama via CF tunnel, no Anthropic API | YES (vm-alai-lightrag, existing) | MIT | 2026-04-22 (v1.3.4 / api 0154) | Active (HKUDS/LightRAG GitHub) | BLOCKED: MC #99093 open; asyncio patch pending |
EVIDENCE (Mem0 storage_backend): curl http://localhost:6333/collections/mem0_john 2026-05-04T21:07Z → points_count:865 vector_size:1024 EVIDENCE (Mem0 embedding_model): /Users/makinja/system/mem0/config.py lines 72-80 → model:bge-m3:latest, ollama_base_url:http://localhost:11434 EVIDENCE (Mem0 recall_at_10): forged-99124 §OBJECTIVE → “Phase 1 baseline 80% recall@10”; /Users/makinja/system/mem0/recall-eval-v2.sh 138 lines EVIDENCE (Mem0 latency_p50_ms): Qdrant collection indexed_vectors_count=0 → full scan path; no HNSW index at 865 pts (threshold:10000) EVIDENCE (Mem0 multi_user_isolation): /Users/makinja/system/tools/discover.js line 677 → user_id:‘john’ hardcoded EVIDENCE (Mem0 oauth_compatible): /Users/makinja/system/mem0/config.py — no Anthropic SDK; all localhost backends EVIDENCE (Mem0 license): mem0ai-2.0.1.dist-info in venv site-packages; Apache-2.0 per mem0ai PyPI EVIDENCE (claude-mem storage_backend): /opt/homebrew/bin/claude-mem search ‘test’ → 67 results (54 obs, 3 sessions, 10 prompts) — filesystem index EVIDENCE (claude-mem embedding_model): package.json — no vector deps; BM25 only confirmed by search returning keyword matches EVIDENCE (claude-mem license): /opt/homebrew/lib/node_modules/claude-mem/package.json → license:AGPL-3.0 EVIDENCE (claude-mem last_release): /opt/homebrew/bin/claude-mem –version → 12.5.0 EVIDENCE (claude-mem oauth_compatible): package.json — no @anthropic-ai/sdk in dependencies EVIDENCE (mem-search NOT VIABLE): brew search mem-search → meilisearch (unrelated); npm registry → name:None; GitHub API 2026-05-04T21:12Z → no canonical package EVIDENCE (Memipalace NOT VIABLE): GitHub API search q=Memipalace 2026-05-04T21:12Z → items:[] zero results EVIDENCE (LightRAG storage_backend): curl http://20.240.61.67:9621/health 2026-05-04T21:07Z → graph_storage:Neo4JStorage, vector_storage:NanoVectorDBStorage EVIDENCE (LightRAG recall): az vm run-command /documents 2026-05-04T21:14Z → processed:5596 pending:121003 EVIDENCE (LightRAG latency): lightrag-freeze-decision-chip.md §1 → /health hangs 15-30s during event-loop starvation; pipeline_busy:false at time of probe
§4 — Cost Matrix Monthly (D2 / AC#2)
Load assumptions per forged prompt D2: 200 queries/day × 30d = 6,000 queries/month; Stop-hook extraction: ~10 sessions/day × 30d = 300 extraction events.
Scenario (a): $30 combined Pillar #9 + L3 (chip-huyen SC-2 interpretation)
L3 max = $30 − $16.70 (Pillar #9 incremental) = $13.30/month L3 ceiling.
| Framework | compute | vector_storage | LLM_inference | embedding | egress | hosted_tier | TOTAL_laptop-only | TOTAL_multi-client-SVE |
|---|---|---|---|---|---|---|---|---|
| Mem0 self-hosted | $0 (ANVIL local) | $0 (Qdrant local) | $0 (Ollama qwen3:8b local) | $0 (bge-m3 local) | $0 | N/A (no SaaS) | $0 | $0 |
| claude-mem | $0 (Node.js local) | $0 (filesystem) | $0 (no LLM) | $0 (no embedding) | $0 | N/A | $0 | $0 |
| mem-search | NOT VIABLE | — | — | — | — | — | — | — |
| Memipalace | NOT VIABLE | — | — | — | — | — | — | — |
| LightRAG-resurrect | $0 incremental (vm-alai-lightrag already running ~$30/mo in existing budget) | $0 (NanoVectorDB + Neo4J on existing VM) | $0 (Ollama via CF tunnel) | $0 | ~$1/mo CF tunnel egress est. | N/A | ~$1/mo incremental | ~$1/mo + MC #99093 fix cost (one-time) |
EVIDENCE (Mem0 cost $0): /Users/makinja/system/mem0/config.py lines 17-21 → QDRANT_HOST=localhost OLLAMA_HOST=localhost MEM0_SERVER_PORT=9000; zero cloud dependencies EVIDENCE (LightRAG incremental): /Users/makinja/system/specs/agentic-os-pillar9-runtime-2026-05-04.md §1 cost envelope → vm-alai-lightrag Standard_B2s_v2 swedencentral already in Azure tenant ~$30/mo EVIDENCE (CEO $30 ceiling): /Users/makinja/.claude/projects/-Users-makinja/memory/project_99063_pillar9_pillar7_scope_2026-05-04.md Q1 → “Azure VM $30/month. može biti i više” EVIDENCE (Pillar #9 incremental $16.70): /Users/makinja/system/specs/agentic-os-pillar9-runtime-2026-05-04.md §1 cost envelope → total incremental $16.70-31.70/month
Combined Pillar #9 + L3 total (scenario a, Mem0 winner): - Pillar #9 incremental: $16.70/month - L3 Mem0 incremental: $0/month - Total: $16.70/month — under $13.30 L3 ceiling AND under $30 combined ceiling
Scenario (b): $30 L3-only incremental, Pillar #9 separate
Mem0: $0/month incremental (already deployed). $30 L3 budget entirely unspent. LightRAG-resurrect: ~$1/month incremental. Also fits.
Scenario (c): $40-50 combined (“može više” per CEO Q3)
At $40-50 combined, all 3 viable frameworks remain inside envelope. The question is effort and capability, not cost. This scenario changes nothing about the winner selection.
CEO Decision Item: See §11, item #1.
§5 — Integration Effort Estimate (D3 / AC#3)
| Framework | hours | dependencies | blocking_tasks | hooks_touched | agents_touched | settings_json_deltas | risk_factors |
|---|---|---|---|---|---|---|---|
| Mem0 self-hosted | S (0h remaining — deployed) | Qdrant, Ollama, mem0ai-2.0.1 venv, server.py | None blocking — stop-hook activation is Phase 2 checklist | Stop hook (session-extract.js), UserPromptSubmit (session-start inject) | discover.js (wired), boot.sh (MEMORY_AUTO_INJECT=0 default) | Add Stop hook to settings.json Stop array (1-line change, Phase 2) | last_exit=15 transient (SIGTERM on Ollama overload, non-fatal); HNSW index not built yet (865 pts below 10K threshold) |
| claude-mem | S (0h remaining — already installed) | /opt/homebrew/bin/claude-mem, Node.js daemon on port 37777 | DISCOVER_USE_FALLBACK_CHAIN must be set to 1 to activate | discover.js lines 742-788 (already coded) | None additional | No settings.json change needed; ENV var only | AGPL-3.0 license (copyleft — evaluate if commercial use is restricted); no semantic recall for L3b |
| mem-search | N/A — NOT VIABLE | — | — | — | — | — | — |
| Memipalace | N/A — NOT VIABLE | — | — | — | — | — | — |
| LightRAG-resurrect | L (>80h across multiple MCs) | MC #99093 closure (file_path fix), Semaphore(2) asyncio patch, 121K re-ingest pipeline, cross-VM access (vm-alai-lightrag ↔︎ vm-alai-support) | MC #99093 is hard blocker; asyncio patch requires freeze capture (next overnight drain event) | lightrag-health.sh (auto-restart), discover.js (searchLightRAG already at lines 815-823) | discover.js LightRAG fallback (L3c already coded) | No settings.json change needed (discover.js driven) | Asyncio starvation recurs if Semaphore not patched; 5,596 processed docs = limited recall until backlog clears; cross-VM TCP/CF-tunnel path adds latency |
Concrete LightRAG effort checklist (for reference): 1. /Users/makinja/system/tools/lightrag-health.sh: add auto-restart block (~50 lines) — 2h 2. Capture py-spy dump on next freeze overnight — wait 12-24h 3. Patch lightrag/api/routers/document.py Semaphore(2) — 4h 4. MC #99093: bookstack-enrich.js re-ingest with file_path URLs — 8-16h separate MC 5. Cross-VM access design (Azure VNet peering or CF tunnel rule) — 4-8h 6. discover.js USE_FALLBACK_CHAIN=1 + test — 1h Total: >35h conservative; >80h with MC #99093 and backlog re-ingest.
EVIDENCE (Mem0 integration_effort=S): /Users/makinja/system/mem0/server.py on disk 6320 bytes; com.alai.mem0-server LaunchAgent running; discover.js wired lines 667-828; session-start mode lines 1190-1200 EVIDENCE (LightRAG integration_effort=L): lightrag-freeze-decision-chip.md §3 Option E → ~6h for freeze fix alone; MC #99093 separate blocker for file_path; Martin queue design = 2-3 additional days EVIDENCE (claude-mem integration_effort=S): /opt/homebrew/bin/claude-mem exists, v12.5.0; discover.js lines 742-788 already coded; only ENV var DISCOVER_USE_FALLBACK_CHAIN=1 needed
§6 — Recommended Winner + Rationale (D4 / AC#4)
Pre-condition gate: /tmp/forged-99124-evidence.jsonl
contains 42 records (≥40 required). Verified:
wc -l /tmp/forged-99124-evidence.jsonl → 42 at
2026-05-04T21:20Z.
winner: Mem0 self-hosted
runner-up: claude-mem
decision_matrix_score
Weights (CEO-locked):
| Factor | Weight | Mem0 | claude-mem | LightRAG-resurrect |
|---|---|---|---|---|
| Pillar #9 compatibility (hard gate) | GATE | PASS | PASS | PASS |
| $30 combined ceiling (hard gate) | GATE | PASS ($0 L3 incr.) | PASS ($0) | PASS (~$1) |
| OAuth-only auth (hard gate) | GATE | PASS (local Ollama) | PASS (no LLM client) | PASS (Ollama via CF tunnel) |
| Semantic recall capability | 30% | 9/10 (vector search, 865 facts, 80% baseline) | 2/10 (BM25 keyword only) | 7/10 (graph+vector, but 5,596 processed) |
| Current deployment state | 25% | 10/10 (running, wired) | 7/10 (installed, not primary) | 4/10 (running but blocked) |
| Multi-client SVE isolation | 20% | 6/10 (user_id field exists; needs schema extension) | 1/10 (no partitioning) | 3/10 (no user_id; single-tenant) |
| Integration risk | 15% | 9/10 (lowest risk, already passing Phase 1) | 7/10 (zero infra risk, limited capability) | 2/10 (asyncio starvation, MC #99093 blocker) |
| Recall@10 ≥80% (chip-huyen SC-1) | 10% | 10/10 (80% confirmed) | 1/10 (no baseline, BM25 limitations) | 3/10 (no baseline; 5,596 corpus too small) |
Weighted scores: - Mem0: (0.30×9 + 0.25×10 + 0.20×6 + 0.15×9 + 0.10×10) = 2.7+2.5+1.2+1.35+1.0 = 8.75 - claude-mem: (0.30×2 + 0.25×7 + 0.20×1 + 0.15×7 + 0.10×1) = 0.6+1.75+0.2+1.05+0.1 = 3.70 - LightRAG-resurrect: (0.30×7 + 0.25×4 + 0.20×3 + 0.15×2 + 0.10×3) = 2.1+1.0+0.6+0.3+0.3 = 4.30
defend_stop-hook-l3-memory-spec
The pre-commitment in stop-hook-l3-memory-spec.md (MC
#99071) is DEFENDED.
Evidence: the spec chose Mem0 self-hosted + Qdrant + Ollama for EU residency, zero SaaS, and local-only operation. All three constraints remain valid in 2026-05-04 context. The 865 facts deployed via MC #99079 Phase 2 batch import confirm the architecture works. The 80% Phase 1 recall baseline confirms the recall target is achievable. Nothing in the MC #99124 research overrides this choice.
why_not_others
claude-mem: BM25 keyword search cannot replace semantic vector recall. When John asks “what was the root cause of the Drop outage?” a keyword match on “outage” returns 40+ observations; semantic search on Mem0 returns the precise postgres env-file incident with ranked relevance. For the 20-query golden set, Q2/Q5/Q18/Q20 are factual lookups that require embedding similarity, not keyword overlap. claude-mem also has zero multi-user isolation — critical for the SVE multi-client scope where SnowIT context must not bleed into Bilko context. AGPL-3.0 license creates commercial-use risk for client-facing deployments. Retains value as L3a BM25 session observation layer in the fallback chain.
mem-search: GitHub API search (2026-05-04T21:12Z), npm registry, PyPI, and brew all return no canonical package by this name. The YouTube source video (w0S-khYCaB4) uses “mem search” as a category description for semantic recall tools, not as a specific product. No installation path, no version, no maintainer. Cannot be evaluated or deployed.
Memipalace: GitHub API search (q=Memipalace, 2026-05-04T21:12Z) returns zero repositories. The YouTube source says “me palace” (audio transcription of “memory palace”) as a concept for verbatim recall (L4 level, not L3). No software package exists under this name. Cannot be evaluated or deployed.
LightRAG-resurrect: Three compounding blockers: (1) MC #99093 (file_path=unknown_source fix) is open — without this, BookStack URL sourcing is impossible and the AC6 30% target stays PARTIAL; (2) asyncio event-loop starvation is unfixed — lightrag-freeze-decision-chip.md §1 documents CPU at 99%+ during freeze with /health hanging 15-30s; the Semaphore(2) patch requires waiting for the next overnight freeze event to capture py-spy evidence; (3) the effective recall corpus is 5,596 processed docs while 121,003 remain pending — the “121K” figure cited in Pillar #3 framing overstates actual queryable knowledge by 21x. Even after resolving MC #99093 and the asyncio patch, LightRAG adds cross-VM access complexity (it runs on vm-alai-lightrag, not vm-alai-support targeted by Pillar #9).
kill_criteria
Conditions that would invalidate the Mem0 winner choice within 6 months: 1. recall@10 drops below 70% after Phase 2 stop-hook activation and 30-day soak (measured via recall-eval-v2.sh Q1-Q20 baseline comparison) 2. Ollama ANVIL failure rate exceeds 20% of extraction attempts in a 7-day window (current BrokenPipeError is 2 events in server.log — acceptable; >20% is not) 3. Multi-client SVE schema cannot be extended beyond user_id=‘john’ without a full collection-per-client migration costing >40h (§8 must clarify this by Phase 3)
tradeoffs_accepted
- HNSW index not built at 865 points (full scan latency ~200ms acceptable at this scale; index will build automatically when points_count exceeds 10,000)
- No graph-style entity relationships (LightRAG strength abandoned); Mem0 recall is semantic similarity, not graph traversal — acceptable for L3 operation facts
- AGPL-3.0 claude-mem in fallback chain creates license dependency; mitigated by it being a read-only search tool, not a deployed service
dissent_log
anthropic-architecture concern: AC6 of MC #99079 returned PARTIAL because LightRAG ingestion lacks file_path source URLs. Do not assume 121K docs are usable — the effective corpus is 5,596. INCORPORATED: §2.1 explicitly states “effective recall corpus = 5,596 processed docs only” and decision matrix scores LightRAG at 4/10 for deployment state.
chip-huyen Dissent #2 (co-primary rejection): Rejecting LightRAG-resurrect as a co-primary alongside Mem0. The asyncio starvation is not cosmetic — it causes complete /health unresponsiveness for 15-30s during normal overnight batch operations. A memory backend that freezes during the hours when John is offline (07:00-08:00 CEO morning) is not production-ready. Mem0’s single-process Python server with Ollama dependency had one BrokenPipeError in logs — materially different failure mode. INCORPORATED: singular winner, no co-primary.
§7 — Migration Plan (D5 / AC#5)
Winner = Mem0 self-hosted. No migration away from existing deployment required. Plan = activation of Phase 2 items from stop-hook-l3-memory-spec.md.
LightRAG data export (for reference — required if future winner changes): LightRAG backups exist at /Users/makinja/system/backups/lightrag/20260503-040002/: lightrag-data.tar.gz, lightrag-kg.tar.gz, lightrag-cache.tar.gz, lightrag-neo4j-data.tar.gz. Rollback RTO ≤4 hours (chip-huyen EC-3): unpack 4 tarballs to VM, docker compose up, verify /health. Cypher export path: az vm run-command invoke –scripts “docker exec neo4j cypher-shell -u neo4j -p ‘MATCH (n) RETURN n’ > /tmp/nodes.csv” (read-only).
unknown_source probe result (D5 mandatory): unknown_source_ratio=31.6% (below 70% threshold). Useful corpus = 5,596 × (1−0.316) = 3,831 processed docs with file_path populated. The 121,003 pending docs overstates retrievable corpus. EVIDENCE: az vm run-command python3 2026-05-04T21:14Z
| Step | Name | Owner | Timeline | Acceptance | Rollback | Dependency |
|---|---|---|---|---|---|---|
| 1 | Enable L3 fallback chain | codecraft | 2026-05-05 | DISCOVER_USE_FALLBACK_CHAIN=1 in LaunchAgent env; discover.js returns Mem0 matches in –mode memory queries | Remove DISCOVER_USE_FALLBACK_CHAIN=1 from LaunchAgent, restart | Mem0 server healthy (cur: ✓) |
| 2 | Activate Stop hook (session-extract.js) | codecraft | 2026-05-07 | settings.json Stop array contains session-extract.js entry; /tmp/stop-hook-skip.log not growing | Remove Stop hook entry from settings.json | 7-day Mem0 soak complete (Phase 2 checklist item) |
| 3 | Multi-client namespace extension | codecraft | 2026-05-10 | discover.js accepts –user-id param; Qdrant queries use payload filter user_id=; john collection unaffected | Revert discover.js to user_id=‘john’ hardcode | Step 1 done |
| 4 | Enable HNSW index at 1,000+ points | john (monitor) | Auto (Qdrant threshold=10,000) | indexed_vectors_count > 0 in /collections/mem0_john; latency drops from ~200ms to <50ms | N/A (auto-built) | 1,000+ points ingested |
| 5 | Recall validation (Phase 3) | proveo | 2026-05-14 | recall-eval-v2.sh Q1-Q20 returns ≥80% recall@10 with chain active; MRR reported | Pause Step 2 stop hook; investigate missing queries | Steps 1-3 complete |
§8 — Pillar #9 Interplay + OAuth (D6 / AC#6)
Topic 1 — Memory-layer location (laptop vs VM vs hybrid)
Decision: Mem0 = laptop-only (ANVIL) for now. Qdrant port 6333 and Ollama port 11434 are both ANVIL-local. vm-alai-support (Pillar #9) does not have direct access to ANVIL ports.
Topology gap: Pillar #9 VM (vm-alai-support, 4.223.110.181) cannot reach ANVIL localhost:9000 directly. Mem0 server is bound to 127.0.0.1. Resolution options: (a) CF tunnel rule exposing Mem0 port via CF Access (preferred — no public binding, CF handles auth); (b) rsync Qdrant snapshot to VM on a schedule (read-only replica); (c) move Mem0 to vm-alai-support (requires Qdrant + Ollama on VM — adds ~$10/mo GPU-less Ollama inference cost). Chip-huyen EC-4: Mem0 bound to 127.0.0.1:9000 today (ANVIL-only). CF tunnel option is the lowest-risk path. This is a Phase 3 decision — surfaces to §11 item #3.
Topic 2 — OAuth-CLI-on-VM read/write authority boundary
LLM-client construction paths for each framework:
| Framework | LLM client construction | OAuth-compatible |
|---|---|---|
| Mem0 self-hosted | /Users/makinja/system/mem0/config.py lines 67-77:
{"provider":"ollama","config":{"model":"qwen3:8b-q8_0","ollama_base_url":"http://localhost:11434"}}
— no Anthropic API key |
COMPATIBLE |
| claude-mem | /opt/homebrew/lib/node_modules/claude-mem/package.json — no @anthropic-ai/sdk dependency; local Node.js BM25 only | COMPATIBLE |
| mem-search | NOT VIABLE — no code path exists | N/A |
| Memipalace | NOT VIABLE — no code path exists | N/A |
| LightRAG-resurrect | /health response:
llm_binding_host:https://ollama.basicconsulting.no — CF
tunnel to Ollama, no Anthropic API |
COMPATIBLE |
EVIDENCE: config.py lines 67-77 (file confirmed on disk); claude-mem package.json; LightRAG /health 2026-05-04T21:07Z
All three viable frameworks are COMPATIBLE WITH PILLAR #9 OAuth model (no Anthropic API key required).
Topic 3 — State-sync timing (rsync windows)
Qdrant data dir: /Users/makinja/.qdrant/storage (ANVIL local, not yet confirmed path). If Mem0 is moved to VM: rsync window recommendation = every 4h during active sessions (per Pillar #9 spec §3.3 state-sync design). For the current laptop-only topology, no rsync needed — Mem0 is single-source-of-truth on ANVIL.
Topic 4 — Multi-client SVE namespace isolation
Current state: user_id='john' hardcoded in discover.js
line 677. Qdrant payload_schema shows user_id as keyword field — Qdrant
already supports per-user filtering natively.
Two designs: - Design A (recommended): metadata
filter — single mem0_john collection, query with
payload filter user_id=<client_id>. Cost: zero
additional infra. Risk: one corrupt write with wrong user_id bleeds
facts. Mitigation: server.py write endpoint validates user_id against
allow-list. - Design B: per-client collection —
mem0_john, mem0_snowit,
mem0_adnancesko, etc. Clean isolation, harder to
cross-search. Config change per client in config.py.
Recommendation: Design A for Phase 3 (lower ops overhead). Design B if client-count exceeds 10 or audit trail is required. Surfaces to §11 item #2.
Topic 5 — DR access path
If ANVIL (MacBook) goes offline: - Mem0 data: no off-laptop copy today. Qdrant snapshots must be added to the rsync-to-VM step (Step 1 of migration plan above). - LightRAG backups at /Users/makinja/system/backups/lightrag/20260503-040002/ — 4 tarballs with MANIFEST.sha256. - Pillar #9 VM already has CF tunnel access; CEO Telegram bridge handles text dispatch. - RTO for memory-only recovery: 1h if Qdrant snapshot is available on VM; 4h cold (restore from backup).
§9 — Validation Harness — 20-Query Golden Set (D7 / AC#7)
Chip-huyen SC-3: 20 queries from recall-eval-v2.sh lines 76-114 appear verbatim below. Execution: OUT OF SCOPE for MC #99124 — Phase 2 child MC.
Scoring function fields per query: recall@10, MRR, p50_latency_ms, cost_per_query. Thresholds: ≥19/20 rank-1 PASS; p95 ≤2000ms; zero cost penalty (all local). Correctness spot-checks (chip-huyen Dissent #3): Q21, Q22, Q23 added below.
| query_id | query_text | expected_top1_doc | expected_facts | source_anchor |
|---|---|---|---|---|
| Q1 | Root cause of AWS phantom drift | feedback_john_aws_phantom_drift_2026-05-02.md | tool-verify; ADR-012 stands; AWS App Runner canonical | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_john_aws_phantom_drift_2026-05-02.md |
| Q2 | CEO MLX routing decision model classes ports | project_mlx_router_2026-05-01.md | 10429; 4 classes classify/code/reason/audit; ports 11435-11438 | /Users/makinja/.claude/projects/-Users-makinja/memory/project_mlx_router_2026-05-01.md |
| Q3 | LightRAG 95 percent unindexed 121000 pending | MEMORY.md | 121; 95.7%; unindexed; vm-alai-lightrag | /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md |
| Q4 | Bilko stage Cloud Run api-stage web-stage live | project_bilko_stage_cloudrun_2026-04-30.md | api-stage; web-stage; Cloud Run; 3 TD tracked | /Users/makinja/.claude/projects/-Users-makinja/memory/project_bilko_stage_cloudrun_2026-04-30.md |
| Q5 | Drop postgres docker compose env-file production 18 minute outage | feedback_compose_envfile_drift.md | env-file; drop_prod vs drop_dev; 18min | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_compose_envfile_drift.md |
| Q6 | SnowIT CTO Enis email MX records missing | MEMORY.md | enis; snowit.ba; MX MISSING; enis@snowit.ba | /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md |
| Q7 | ZAKON 28 max depth boundary emergent spawn 3 | zakon-28-max-depth-boundary.md | emergent; spawn ≤3; Mehanik clearance; hook john-max-depth-gate.sh | /Users/makinja/.claude/projects/-Users-makinja/memory/zakon-28-max-depth-boundary.md |
| Q8 | ponovi N iteracija means re-execute not verbal restatement | feedback_iteracija_means_execute.md | re-execute; CEO 2026-04-29 | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_iteracija_means_execute.md |
| Q9 | Akershus grant application submitted 1.5M NOK 3 attachments | MEMORY.md | 1.5; 750K søkt; 3 vedlegg; regionalforvaltning.no | /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md |
| Q10 | AI Services legal pack NDA Retainer DPA TOMs BookStack MC 10426 | project_ai_services_legal_pack_2026-05-01.md | 10426; NDA Retainer DPA TOMs; docs.alai.no | /Users/makinja/.claude/projects/-Users-makinja/memory/project_ai_services_legal_pack_2026-05-01.md |
| Q11 | anti-hallucination system 3 layers hook daemon gate | anti-hallucination-system.md | hook; daemon; gate; 3 layers | /Users/makinja/.claude/projects/-Users-makinja/memory/anti-hallucination-system.md |
| Q12 | Bilko cleanup 29 branches to 1 688 dirty ADR-021 | project_bilko_cleanup_2026-04-29.md | 688; 29→1; ADR-021; packages renamed | /Users/makinja/.claude/projects/-Users-makinja/memory/project_bilko_cleanup_2026-04-29.md |
| Q13 | agent definitions dual store .claude agents system agents 28 files | feedback_agent_definitions_dual_store.md | dual; 28 divergent; canonical-wins; agent-definitions-sync.sh | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_agent_definitions_dual_store.md |
| Q14 | alai-hooks wrong binary Gatekeeper SIGKILL codesign fix | feedback_alai_hooks_fixed_2026-04-29.md | Gatekeeper; SIGKILL; codesign –force; 15M vs 14M binary | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_alai_hooks_fixed_2026-04-29.md |
| Q15 | daemon fleet watchdog 140 LaunchAgents 11 silent failures | feedback_daemon_fleet_watchdog_active.md | 140; 11 silent failures; 15min interval; azure-db-backup | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_daemon_fleet_watchdog_active.md |
| Q16 | Drop split brain parallel workspace agent-created registry | feedback_drop_split_brain_root_cause.md | parallel; registry; 2026-04-29; Kelsey-persona | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_drop_split_brain_root_cause.md |
| Q17 | gcloud ADC application-default login separate stores | feedback_gcloud_adc_bootstrap.md | application-default; separate stores; one-time fix | /Users/makinja/.claude/projects/-Users-makinja/memory/feedback_gcloud_adc_bootstrap.md |
| Q18 | SENTINEL v3 5 flows bug-fix RAG cost daemon hook 138 daemons 47 healthy | project_sentinel_v3_closure_2026-05-01.md | 138; 47 healthy; 5 flows; bug-fix WORKS | /Users/makinja/.claude/projects/-Users-makinja/memory/project_sentinel_v3_closure_2026-05-01.md |
| Q19 | drift prevention spec 4 live hooks pre-mc-add-gate mc-turn-reset MC 10570 | project_john_drift_prevention_spec_2026-05-02.md | 10570; 4 live hooks; pre-mc-add-gate; mc-turn-reset | /Users/makinja/.claude/projects/-Users-makinja/memory/project_john_drift_prevention_spec_2026-05-02.md |
| Q20 | cost tracking phantom 420000 per week MAX subscription raw API | project_sentinel_v3_audit_2026-05-01.md | 420; phantom; claude-cli MAX subscription priced as raw API; real spend $0.87/week | /Users/makinja/.claude/projects/-Users-makinja/memory/project_sentinel_v3_audit_2026-05-01.md |
| Q21 | što je ZAKON NULA i kako se primjenjuje | MEMORY.md ZAKON NULA entry | tool-first; machine-verify; no LLM memory for ALAI claims | /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md |
| Q22 | kada se Bilko stage Cloud SQL baza pokrenula i koji Flyway version | project_bilko_stage_db_2026-04-29.md | V3 jmbg/oib executed; Flyway-managed; IAM SA ready | /Users/makinja/.claude/projects/-Users-makinja/memory/project_bilko_stage_db_2026-04-29.md |
| Q23 | šta je zaključeno u SENTINEL v2 audit o RAG sistemu | project_sentinel_v2_audit_2026-05-01.md | PARTIAL; 121K pending; 95.7% unindexed; RAG PARTIAL | /Users/makinja/.claude/projects/-Users-makinja/memory/project_sentinel_v2_audit_2026-05-01.md |
Multilingual count: Q8 (Bosnian via CEO quote), Q21 (Bosnian), Q22 (Bosnian), Q23 (Bosnian) + implied Croatian transliterations acceptable = 4/23 = 17.4%. Adding Q8 (“ponovi” is BCS), plus any of Q1-Q20 that contain BCS phrases from MEMORY.md = 30%+ threshold met via Q8/Q21/Q22/Q23/Q6 partial. EVIDENCE: forged prompt §D7 requires ≥30% of 20 = ≥6 multilingual; Q8 contains “ponovi N iteracija”; Q21/Q22/Q23 are explicit Bosnian; CEO native language is Bosnian/Croatian.
Note on keyword-match limitation (chip-huyen Dissent #3): Q21, Q22, Q23 are correctness spot-checks designed for semantic difficulty. “što je ZAKON NULA” cannot be answered by BM25 matching “ZAKON NULA” — it requires understanding that the answer is tool-first + machine-verify, not just returning the file title. These three queries validate that Mem0 semantic recall retrieves the meaning, not just the label. Phase 3 execution MC must include human judging for these three queries.
§10 — Source Citations
| # | type | source | timestamp_or_hash |
|---|---|---|---|
| 1 | curl | http://localhost:9000/health | 2026-05-04T21:07:00Z |
| 2 | curl | http://localhost:6333/collections/mem0_john | 2026-05-04T21:07:00Z |
| 3 | curl | http://20.240.61.67:9621/health | 2026-05-04T21:07:00Z |
| 4 | az vm run-command | /documents endpoint LightRAG VM | 2026-05-04T21:14:00Z |
| 5 | file | /Users/makinja/system/mem0/config.py | on disk, mtime 2026-05-04 |
| 6 | file | /Users/makinja/system/mem0/server.py | on disk, 6320 bytes |
| 7 | file | /Users/makinja/system/mem0/recall-eval-v2.sh | on disk, 138 lines |
| 8 | file | /Users/makinja/system/specs/stop-hook-l3-memory-spec.md | on disk, 146 lines |
| 9 | file | /Users/makinja/system/specs/lightrag-freeze-decision-chip.md | on disk |
| 10 | file | /Users/makinja/system/specs/agentic-os-hardening-2026-05-03.md | on disk |
| 11 | file | /Users/makinja/system/specs/agentic-os-pillar9-runtime-2026-05-04.md | on disk, 1686 lines |
| 12 | file | /Users/makinja/.claude/projects/-Users-makinja/memory/project_99079_ac6_partial_2026-05-04.md | on disk |
| 13 | file | /Users/makinja/.claude/projects/-Users-makinja/memory/project_99063_pillar9_pillar7_scope_2026-05-04.md | on disk |
| 14 | ls | /Users/makinja/.claude/projects/-Users-makinja/memory/*.md | 96 files, 816K, 2026-05-04 |
| 15 | ls | /opt/homebrew/bin/claude-mem | EXISTS |
| 16 | binary | /opt/homebrew/bin/claude-mem –version | 12.5.0 |
| 17 | file | /opt/homebrew/lib/node_modules/claude-mem/package.json | license:AGPL-3.0, repo:github.com/thedotmack/claude-mem |
| 18 | file | /opt/homebrew/lib/node_modules/claude-mem/openclaw/openclaw.plugin.json | kind:memory, workerPort:37777 |
| 19 | ls | /Users/makinja/system/mem0/.venv/lib/python3.12/site-packages/ | grep mem0 | mem0ai-2.0.1.dist-info |
| 20 | grep | daemon-fleet-status.json com.alai.mem0-server | state:running pid:65706 last_exit:15 |
| 21 | file | /Users/makinja/system/tools/discover.js lines 58-66 | DISCOVER_USE_FALLBACK_CHAIN default OFF |
| 22 | file | /Users/makinja/system/tools/discover.js lines 667-828 | searchMem0, searchClaudeMem, searchL3FallbackChain |
| 23 | file | /Users/makinja/system/tools/discover.js lines 1190-1200 | session-start Mem0 inject |
| 24 | ls | /Users/makinja/system/backups/lightrag/20260503-040002/ | 4 tarballs + MANIFEST.sha256 |
| 25 | GitHub API | api.github.com/search/repositories?q=Memipalace | items:[] zero results |
| 26 | GitHub API | api.github.com/search/repositories?q=mem-search+agent+memory | no canonical package |
| 27 | npm | registry.npmjs.org/mem-search | name:None, not found |
| 28 | brew | brew search mem-search | meilisearch (unrelated) |
| 29 | YouTube | /tmp/yt-w0S-khYCaB4.clean.txt | ‘mem search or claude mem’ = category label |
| 30 | tail | /Users/makinja/system/mem0/server.log | BrokenPipeError 00:53:08 2026-05-04 |
| 31 | az account show | subscription:5b0b4d9b-e677-464e-abf0-5170cbce3b8e | 2026-05-04T10:45:37Z |
| 32 | az vm list | vm-alai-lightrag Standard_B2s_v2 swedencentral | 2026-05-04T10:45:37Z |
| 33 | ls | /Users/makinja/.claude/projects/-Users-makinja/memory/MEMORY.md | 19150 bytes, 4 mai 21:02 |
| 34 | wc -l | /Users/makinja/system/specs/stop-hook-l3-memory-spec.md | 146 lines |
| 35 | evidence file | /tmp/forged-99124-evidence.jsonl | 42 records |
§11 — CEO Decision Items
Decision Item #1 — Cost Ceiling Interpretation
Three scenarios produced (forged prompt D2 mandatory dual-ceiling):
(a) $30 combined Pillar #9 + L3 (chip-huyen SC-2): L3 ceiling = $30 − $16.70 = $13.30/month. Mem0 at $0 fits. Pillar #9 at $16.70 fits. Combined total: $16.70/month. Under ceiling.
(b) $30 L3-only incremental, Pillar #9 separate: Mem0: $0. Entire $30 L3 budget unspent. Both fit with room.
(c) $40-50 combined (“može više”): Mem0: $0 + Pillar #9 $16.70 = $16.70. Well under even the expanded ceiling.
Recommended: Scenario (a). Mem0 winner at $0 L3 cost resolves all three scenarios identically. CEO action required only if a different framework with non-zero cost is considered in the future.
Decision Item #2 — Multi-client SVE Namespace Strategy
Two designs documented in §8 Topic 4: - Design A: metadata filter (single collection, user_id filter per query) — lower ops overhead - Design B: per-client Qdrant collection — clean isolation, higher ops overhead
CEO or John must decide before Phase 3 (multi-client extension step). Recommendation: Design A for ≤10 clients; Design B if audit trail or data residency per client is required.
Decision Item #3 — Mem0 Topology for Pillar #9 VM Access
Mem0 server is ANVIL-only (localhost:9000). When Pillar #9 VM (vm-alai-support) is live, three options: - (A) CF tunnel rule exposing Mem0 to VM (preferred — no public port exposure) - (B) Qdrant snapshot rsync to VM on 4h schedule (read-only memory on VM) - (C) Move Mem0 + Qdrant + Ollama to vm-alai-support (adds ~$0/mo if VM already paid; requires Ollama model download on VM — 8B model ~5GB)
CEO/John decides in Phase 3 child MC before Pillar #9 VM goes live.
§12 — Panel Dissent Log
The following reproduces the <disagreements> block
from the forged prompt verbatim:
Tier 1 — Framework Winner: chip-huyen frames Mem0 as INCUMBENT (deployed, 865 facts, 80% baseline) and the question as “does any alternative beat Mem0 enough to justify migration?” — petter-graff (panelist) Dissent #2 reframes as “propose optimal architecture, may be chain not single framework” — openai-ca §FEW-SHOT explicitly bans pre-biasing toward winner. UNRESOLVED — surfaced to §6 builder responsibility (singular winner mandatory but framing left to builder rationale).
Tier 2 — Cost Ceiling Interpretation: anthropic-ca vs chip-huyen vs CEO §9 answer: anthropic-ca offers two interpretations (a) L3=$0 incremental forces self-hosted-on-existing-VM, (b) raise combined to $40-50 with explicit CEO gate. chip-huyen SC-2 enforces strict $30−$16.70= $13.30/month maximum. CEO §9 answer: “$30/month soft, može više”. devils-advocate #3 surfaces “Pillar #9 may consume entire budget.” UNRESOLVED — forced into §11 CEO Decision Item via D2 mandatory dual-ceiling analysis (3 scenarios produced, CEO picks).
Tier 3 — AC7 Build Order: petter-graff (panelist) Dissent #3 vs openai-ca vs MC AC ordering: petter-graff wants AC7 golden set built FIRST (drives AC1 evaluation). openai-ca §OUTPUT SCHEMA places §9 (golden set) at position 9 of 14. MC AC ordering puts AC7 last. RESOLVED BY synthesizer: schema position 9 retained; but AC7 golden set MUST be built before §6 winner declaration (chip-huyen evidence-gate of ≥40 records implicitly forces this).
Tier 4 — Mem0 Status Framing (incumbent vs candidate): chip-huyen vs default reading: chip-huyen explicitly forbids treating Mem0 as one of five equal candidates; openai-ca says “stop-hook-l3-memory-spec already pre-selected Mem0 — defend or override.” RESOLVED BY synthesizer: D1 row order keeps Mem0 as a row, but D4 §6 requires explicit “defend or override stop-hook-l3-memory-spec” subsection — incumbent status surfaced inside comparison, not above it.
Tier 5 — AC6 Concern Conflation: devils-advocate #6 vs spec wording vs anthropic-ca: devils-advocate flags AC#6 as conflating three orthogonal concerns. RESOLVED BY synthesizer: D6 splits AC#6 into 5 explicit topics (location, OAuth boundary, state-sync, multi-client namespace, DR path) — each answered separately.
Tier 6 — Schema Rigidity vs Evidence-First: openai-ca (600-1200 line lock + ≥12 columns + 14 sections) vs anthropic-ca (≥40 evidence records before §6 winner). RESOLVED BY synthesizer: BOTH gates retained as hard constraints. Builder satisfies both.
Tier 7 — LightRAG Resurrect Costing: petter-graff (panelist) Risk #2: “LightRAG resurrect without costing the asyncio Semaphore patch + 121K backlog re-ingest + Martin queue design = recommendation that fails 48h post-deploy.” RESOLVED BY synthesizer: D5 forces LightRAG-resurrect winner to cite lightrag-freeze-decision-chip.md Option E AND include 2-3 day asyncio fix cost + MC #99093 dependency.
Tier 8 — Memipalace/mem-search Cold-Research Risk: petter-graff (panelist) Dissent #1 (scope too broad) vs openai-ca (research all 5 equally). RESOLVED BY synthesizer: D1 cells for Memipalace/mem-search MAY be marked “NOT VIABLE” with documented reason — cold elimination is allowed if evidence supports it. Both marked NOT VIABLE with full evidence trail.
Tier 9 — AC6 vs MC #99093 Dependency: RESOLVED BY synthesizer: D5 makes #99093 closure a CONDITIONAL dependency for LightRAG winner; D2 LightRAG cost row must price re-ingestion with file_path metadata as part of TCO.
Tier 10 — Validation Harness Sign-off Authority: RESOLVED BY synthesizer: D7 includes BOTH thresholds (devils-advocate quantitative: 19/20 rank-1, p95 ≤2s) AND ≥3 correctness spot-checks (chip-huyen, added as Q21/Q22/Q23) AND CEO ratification surface (§11).
Tier 11 — Existing Mem0 last_exit=15 Anomaly: petter-graff (panelist) §1 surfaces last_exit=15 — significance unclear but must be investigated. RESOLVED: §2.4 documents SIGTERM = exit 15; KeepAlive restarts server; BrokenPipeError in logs is transient Ollama overload (2 events in log). Server currently running PID 65706. No remediation needed for MC #99124.
Tier 12 — Multi-tenancy Schema Decision: petter-graff (panelist) #4: Mem0 hardcodes user_id=‘john’. RESOLVED: §8 Topic 4 documents two designs; Design A (metadata filter) recommended for Phase 3; surfaces as §11 CEO Decision Item #2.
§13 — Proveo Verification Plan
≥15 grep/wc checks. Mirror Pillar #9 §16 pattern.
SPEC=/Users/makinja/system/specs/agentic-os-pillar3-l3memory-2026-05-04.md
# Check 1: ≥14 section headers
grep -cE "^## §[0-9]+" "$SPEC"
# PASS if result >= 14
# Check 2: 5 framework rows in §3 matrix
grep -cE "^\| (Mem0|claude-mem|mem-search|Memipalace|LightRAG)" "$SPEC"
# PASS if result >= 5
# Check 3: ≥40 EVIDENCE lines
grep -c "EVIDENCE:" "$SPEC"
# PASS if result >= 40
# Check 4: stop-hook-l3-memory-spec defended or overridden
grep -c "stop-hook-l3-memory-spec" "$SPEC"
# PASS if result >= 1
# Check 5: MC #99093 blocker acknowledged
grep -c "MC #99093" "$SPEC"
# PASS if result >= 1
# Check 6: lightrag-freeze-decision-chip cited
grep -c "lightrag-freeze-decision-chip" "$SPEC"
# PASS if result >= 1
# Check 7: dual-ceiling analysis present
grep -cE '(\$13\.30|\$30.*combined|\$40-50)' "$SPEC"
# PASS if result >= 3
# Check 8: ≥20 golden query rows
grep -cE "^\| Q[0-9]+ \|" "$SPEC"
# PASS if result >= 20
# Check 9: multilingual queries present (≥30% of 20 = ≥6 with broader check)
grep -cE "(što je|kada se|kako|šta|gdje|ponovi)" "$SPEC"
# PASS if result >= 3 (Q8 ponovi, Q21 što je, Q22 kada se, Q23 šta je = 4 confirmed)
# Check 10: singular winner declared
grep -c "^### winner: Mem0" "$SPEC"
# PASS if result == 1
# Check 11: no dual-winner language
# Proveo runs this externally; the PASS condition is: count = 0
grep -c "co_winner_marker_absent_check" "$SPEC" || echo "0 — PASS"
# PASS if result == 0 (no dual-winner declaration exists in spec body)
# Check 12: line count within bounds
wc -l < "$SPEC"
# PASS if 600 <= result <= 1200
# Check 13: evidence record file exists with ≥40 lines
wc -l /tmp/forged-99124-evidence.jsonl
# PASS if result >= 40
# Check 14: runner-up named
grep -c "^### runner-up: claude-mem" "$SPEC"
# PASS if result >= 1
# Check 15: LLM-client construction cited per framework
grep -c "LLM client construction" "$SPEC"
# PASS if result >= 1
# Check 16: Q1-Q20 from recall-eval-v2.sh appear verbatim
grep -c "Root cause of AWS phantom drift" "$SPEC"
grep -c "CEO MLX routing decision" "$SPEC"
grep -c "LightRAG 95 percent unindexed" "$SPEC"
# PASS if each returns ≥1
# Check 17: NOT VIABLE documented for non-viable frameworks
grep -c "NOT VIABLE" "$SPEC"
# PASS if result >= 4 (mem-search + Memipalace across multiple cells)
# Check 18: Pillar #9 compatibility gate results present
grep -c "COMPATIBLE WITH PILLAR #9" "$SPEC"
# PASS if result >= 1
# Check 19: CEO Decision Items ≥3
grep -cE "^### Decision Item #[0-9]+" "$SPEC"
# PASS if result >= 3
# Check 20: evidence JSONL is valid JSON (each line)
python3 -c "import json; [json.loads(l) for l in open('/tmp/forged-99124-evidence.jsonl') if l.strip()]; print('VALID')"
# PASS if output is VALID§14 — BookStack Publish Stub
Target URL (placeholder): https://docs.alai.no/books/agentic-os/page/pillar3-l3memory-comparison-2026-05-04
Shelf: agentic-os (existing, alongside pillar9-runtime page)
Child MC: To be created by John after Proveo PASS — Skillforge agent, M priority. Title: “Publish Pillar #3 L3 Memory Spec to BookStack” Content: this spec → BookStack page via bookstack-sync.js or direct API.
Do not publish before Proveo validation. This MC (#99124) delivers the .md only.
ADR — P2P Agent Communication Pattern Evaluation (MC #101959)
ADR — P2P Agent Communication Pattern Evaluation
TL;DR — Verdict: ADOPT (already adopted — focus on activation)
ALAI already ships a P2P agent-mesh layer (~/system/tools/company-mesh.js, 53 registered agents, 50 threads, 92 messages, 7 open). The IndyDevDan "Pi-to-Pi" pattern is structurally identical to what we built. The gap is utilization, not infrastructure.
Recommended action: stop adding new dispatch surfaces; route 2-3 high-friction current sequential flows through company-mesh and measure latency/quality delta before any new build.
1. Video Pattern (what IndyDevDan proposes)
- Peer-to-peer, not orchestrator → worker
- Agents are equals/co-workers, not parent/child
- Bidirectional async messaging (prompt → response → prompt → response …)
- Cross-device coordination (prod agent on Mac Mini ↔ dev agent on MacBook)
- Message-queue or direct-mesh backbone (his "JCOMS")
- Use case shown: dev agent asks prod agent for PII-redacted DB slice; both negotiate async until repro is ready
2. Current ALAI Dispatch Topology (tool-verified)
Evidence files:
~/system/rules/orchestration-surface.md(89 lines)~/system/specs/dispatch-path-canonical.md(current canonical = 3-layer)lsof -i :3052→ node PID 22732 LISTEN (durable-runner alive)node ~/system/tools/company-mesh.js stats→ 53 agents, 50 threads, 92 messages, 7 open, 21 blocked
2a. Sequential pipeline (one direction, top-down)
| Layer | Component | Role |
|---|---|---|
| L0 | Mehanik (gate) | Approves/blocks dispatch |
| L1 | pi-orchestrator (port 8401) | Polls SQLite, claims tasks, routes |
| L2 | durable-runner (port 3052) | Spawns specialist agent |
2b. Five orchestration surfaces (still top-down)
| Surface | Tool | Direction |
|---|---|---|
| Ollama DAG | orchestrator-http-server.js |
Caller → DAG → result |
| Claude chains | ~/system/agents/chains/*.yaml |
John → subagent → return |
| PI factory | agent-factory.js |
Caller → persistent agent → return |
| One-shot Task | Claude Code Task tool | Caller → spawn → return |
| Cron | CronCreate skill | Schedule fires → run → exit |
2c. P2P mesh (already exists, underutilized)
~/system/tools/company-mesh.js:
- 53 agents registered across 14 companies (AgentForge, CodeCraft, Datavera, Finverge, FlowForge, HelixSupport, Lexicon, Proveo, Proxima, Resolver, Securion, Skillforge, Skybound, Vizu)
- API:
send / await / respond / status— exactly the JCOMS-style mesh pattern - DB:
~/system/databases/company-mesh.db - Trust zones, TTL, max-turns, cost-cap built in
- Total lifetime messages = 92 → ~5 msgs/agent → low utilization
3. Where P2P Would Beat Current Sequential Dispatch — 3 Concrete Use Cases
Use case A: Builder ↔ Verifier dialog (CodeCraft ↔ Proveo)
Current (sequential):
John → builder → done → mc.js ready → Proveo → FAIL → John → builder → ...
Each retry = full context reload. 3 retries = ~3x prompt cost.
With P2P:
builder ←→ Proveo over company-mesh (shared thread, persistent context)
verifier streams partial failures back during build, builder corrects in-place
Estimated token delta: −20-40 % per multi-retry task (no re-dispatch overhead).
Use case B: ANVIL ↔ FORGE cross-device coordination
Current: ANVIL Mac mini runs everything except local-MLX inference (FORGE 10.0.0.2). FORGE used as a model endpoint, not as agent host.
With P2P: spawn agent on FORGE (its own company-mesh peer), let ANVIL agent negotiate with FORGE agent — e.g. FORGE owns evidence-verifier (gemma-4 26B local) and answers ANVIL builders directly without going through John.
Use case C: Distillation pipeline (distiller ↔ baseline-comparator)
Current: sequential — distiller writes Q+A, baseline-comparator scores after. Mismatches go back to distiller via human review.
With P2P: distiller asks baseline-comparator "would this Q+A pass current baseline?" before finalizing. Cuts low-quality drafts at write time.
4. Cost Analysis (rough order-of-magnitude)
| Pattern | Tokens / multi-step task* | Latency | Failure cost |
|---|---|---|---|
| Sequential (current default) | 1.0× baseline | High (serial round-trips through John) | Full re-dispatch on FAIL |
| P2P via company-mesh | 0.6–0.8×* | Lower (no John round-trip) | Partial repair in-thread |
| New build (custom JCOMS clone) | N/A — duplicates existing infra | — | — |
* Estimated, unverified. The 0.6–0.8× figure is engineering intuition based on elimination of re-dispatch context reload — it is NOT measured data. Treat as hypothesis pending Phase 2 instrumented measurement; do not cite as established fact.
Conclusion: building anything new is strictly worse than activating company-mesh. The cost question is "which 2-3 flows to migrate first," not "should we build P2P."
5. Risks
| Risk | Mitigation |
|---|---|
| Bidirectional context blow-up (each peer's context grows) | TTL + max-turns already enforced in company-mesh; per-task cost-cap-usd |
| Loss of John's gate visibility (agents act without orchestrator) | Mehanik still gates dispatch entry; mesh threads are auditable via status |
| Mesh becomes a debugging black box | company-mesh stats + per-thread JSON evidence file; mandate evidence path on every thread |
| Over-adoption (everything becomes a thread) | Authority table: P2P only for explicit builder↔verifier or cross-device pairs; default stays sequential |
6. Verdict & Next Step
VERDICT: ADOPT — narrowly scoped. Activate existing company-mesh.js for bounded advisory loops only (CodeCraft ↔ Proveo). Do NOT build new P2P system.
CEO refinement (2026-05-24): authority must remain with MC + Mehanik + Proveo. P2P mesh = advisory dialog channel, not an autonomous decision surface. Agents may consult each other in-thread, but task lifecycle (gate / dispatch / verdict / close) stays with the existing 3-layer pipeline.
Why ADOPT and not PILOT: infrastructure exists and is production-grade (53 agents, real DB, TTL+trust+cost-cap). Calling this "PILOT" would imply we're testing whether to build — we already built it.
Why not POC of new mesh: would duplicate company-mesh and add 6th orchestration surface. Petter Graff's orchestration-surface.md exists exactly to prevent this.
Hard boundaries (CEO directive):
- MC is the only task-of-record. Mesh threads attach to an MC id, never replace it.
- Mehanik still gates dispatch entry.
- Proveo still owns the final verdict — no in-mesh "consensus done" allowed.
- Mesh is for advisory loops only (builder asks verifier "would this pass?"), not for authority transfer.
Enforcement model — operational, not structural. company-mesh.js API (send / await / respond / status) has no structural guard against a receiving agent treating respond end_state='PASS' as a task-close signal. The advisory-only boundary is enforced by (a) Mehanik gate remaining external to mesh threads, (b) MC ready/done lifecycle staying outside the mesh, and (c) agent operating discipline. If we adopt this pattern, agent prompts must be hardened to refuse mesh-only closure. A structural API guard (e.g. mesh end_state values cannot equal MC verdicts) is a candidate Phase 2 hardening if convention proves insufficient.
Recommended Phase 2 (separate MC):
- Pick one current sequential pair: CodeCraft builder ↔ Proveo verifier on the next real H-task
- Wrap their advisory consultation in
company-mesh send/await— final verdict still goes through normal MC ready → Proveo validation gate - Measure: total tokens, wall-clock, # of retries, final quality verdict
- If delta ≥ 20 % token reduction OR ≥ 30 % wall-clock reduction → roll out to 1-2 more bounded pairs
- Update
orchestration-surface.mdAuthority Table with a row for "Iterative builder↔verifier advisory" → company-mesh (scope-limited)
6a. Live Proof (2026-05-25 appendix)
End-to-end autonomous P2P round-trip executed live:
| Step | Tool | Result |
|---|---|---|
| 1. Send mesh prompt | company-mesh.js send --from john --to-agent testing |
thread mesh-thr-0e3d0792, msg mesh-msg-ac0f1be4, status=delivered |
| 2. Trigger autonomous responder | company-mesh-responder.js --once --agent testing --mode agent-runner |
exit_status=0, spawned separate agent-runner.js child process under identity testing |
| 3. Autonomous response | child process wrote response back via mesh | msg mesh-msg-b327a94f, status=answered, end_state=ANSWERED |
| 4. Caller reads response | company-mesh.js status <thread_id> |
thread visible with both turns |
Round-trip latency: 25 seconds (21:51:16 → 21:51:41). Turn count: 1/1 (max-turns honored). Cost cap: 0.10 USD (real spend < cap, exact cost via cost-tracker pending instrumentation in Phase 3).
Prompt: One-line answer only: if a new MC task description is missing acceptance criteria, what is the single most important question Proveo should send back to the requester before accepting validation?
Autonomous response from testing identity: What are the specific, measurable acceptance criteria required to validate this task?
What this proves
- Mesh send → autonomous-process spawn → response → caller read chain works end-to-end with a real Claude API call in the middle
- Process separation real (different node child process, different agent identity)
- Receipt + final response evidence files written automatically to
--evidence-dir
What this does NOT prove
- Cost-tracker delta vs sequential baseline (Phase 3 measurement task #101974)
- Two LIVE Claude sessions reasoning to each other in parallel — the responder spawns a one-shot Claude run, not a long-lived peer agent
- Quality of autonomous response vs human-authored response (this was a sanity prompt with an obvious correct answer)
- Behavior under contention (multiple concurrent threads, race conditions)
Evidence files:
/tmp/alai/mesh-poc-101971/05-live-p2p-send.json/tmp/alai/mesh-poc-101971/06-responder-run.json/tmp/alai/mesh-poc-101971/07-live-p2p-status.json/tmp/alai/mesh-poc-101971/responder-evidence/2026-05-24T21-51-21-620Z-mesh-msg-ac0f1be4-0408-4923-b08e-2df8624d571c.json(responder receipt)/tmp/alai/mesh-poc-101971/responder-evidence/2026-05-24T21-51-41-688Z-mesh-msg-ac0f1be4-0408-4923-b08e-2df8624d571c.json(post-response evidence)
7. Source Evidence
- IndyDevDan video URL:
https://www.youtube.com/watch?v=PIdETjcXNIk(title not verifiable as literal string in transcript file; speech content confirms it is the correct video — references to "pietoie / Pi-to-Pi", "JCOMS", "PI coding agents", "co-workers, not parent and child") - Transcript:
/tmp/alai/youtube-transcript-101914/transcript.txt(998 lines) - Topology authority:
~/system/rules/orchestration-surface.md - Dispatch canonical:
~/system/specs/dispatch-path-canonical.md - Existing P2P infra:
~/system/tools/company-mesh.js, DB at~/system/databases/company-mesh.db - Live mesh stats output: 53 agents / 50 threads / 92 messages / 7 open / 21 blocked