AI Model & RAG Architecture
AI Model & RAG Architecture
Pregled svih AI modela i RAG (Retrieval-Augmented Generation) komponenti u ALAI sistemu. Datum: 2026-02-23. Izvor: verifikovan inventar iz filesystem-a i running servisa. Zadnji update: RAG System Upgrade (MC #1804) — unified embedding, HiveMind vector search, retrieval orchestrator, session archiver.
Pregled na jednoj stranici
+-----------------------------------------------------------------+
| CLAUDE CODE (Opus/Sonnet/Haiku) |
| Primarni orkestrator - John |
| (Anthropic API, cloud, kontekst do 200K) |
+-----------------------------------------------------------------+
|
+------------------+------------------+
v v v
+-------------+ +-------------+ +-----------------+
| RAG Router | | Tier Router | | MCP Servers |
| (rag-mcp) | | (6 tierova) | | email, figma, |
| | | | | playwright, yt |
+------+---+--+ +------+------+ +-----------------+
| | |
+-----+ +----+ v
v v +---------------+
+--------+ +--------+| OLLAMA |
| Cache | | KB || localhost:11434|
|flywheel| |knowledge|+------+--------+
| .db | | .db | |
+--------+ +--------+ v
+---------------+
| 7 lokalnih |
| modela |
+---------------+
+------------------------------------------------------------------+
| RETRIEVAL ORCHESTRATOR |
| retrieval-orchestrator.js |
| Parallel query -> HiveMind + KB + RAG + Sessions -> RRF merge |
+------------------------------------------------------------------+
| | | |
v v v v
+--------+ +--------+ +--------+ +----------+
|HiveMind| |Knowledge| | RAG | | Sessions |
|semantic| | DB | | Cache | | (grep) |
|13,473 | |24,636 | | 2,201 | | 761 |
+--------+ +--------+ +--------+ +----------+
+-----------------------------------------------------------------+
| BOOKSTACK (Wiki) |
| http://localhost:6875 - dokumentacija |
| NE ucestvuje u RAG pipeline-u (covjek cita) |
+-----------------------------------------------------------------+
1. Lokalni AI modeli (Ollama)
Server: http://localhost:11434
Hardware: Mac Studio M3 Ultra, 96 GB RAM
LaunchAgent: homebrew.mxcl.ollama
Config: ~/system/config/ollama.json
Instalirani modeli (ollama list, 2026-02-21)
| Model | Velicina | Namjena | Status |
|---|---|---|---|
llama3.1:8b |
4.9 GB | Brzi classify/extract/filter (Tier 1) | AKTIVAN |
qwen2.5-coder:32b |
19 GB | Code review, debug, refaktor (Tier 2c) | AKTIVAN |
nomic-embed-text |
274 MB | Embeddings - 768-dim vektori za RAG | AKTIVAN |
alaiml-task-v1 |
986 MB | Fine-tuned za MC task handling (Tier 2t) | AKTIVAN |
alaiml-tender-v1 |
986 MB | Fine-tuned za tender analizu | AKTIVAN |
alaiml-email-v1 |
986 MB | Fine-tuned za email klasifikaciju | AKTIVAN |
llama-guard3:8b |
4.9 GB | Content safety / guardrails | AKTIVAN |
Konfigurirani ali NE instalirani
| Model | Razlog | Napomena |
|---|---|---|
llama3.1:70b |
42 GB - ne stane uvijek u RAM | U config-u kao Tier 3 (complex reasoning) |
qwen2.5:72b |
47 GB - ne stane uvijek u RAM | U config-u kao Tier 2 (general) |
Wrapper toolsi:
~/system/tools/ollama-engine.js- HTTP wrapper za generate/classify~/system/tools/ollama-tool-agent.js- Multi-turn agent sa READ-ONLY toolsima~/system/tools/agent-runner.js- Agent lifecycle (identity -> state -> HiveMind -> Ollama -> save)
2. Tier Routing (Task -> Model dispatch)
File: ~/system/tools/tier-router.js
Config: ~/system/config/tier-routing.json
Svaki AI request ide kroz routing koji odlucuje koji model procesira:
| Tier | Engine | Model | Namjena |
|---|---|---|---|
| 1 | Ollama | llama3.1:8b | Trivijalno: classify, filter, extract |
| 2 | Ollama | qwen2.5:72b* | Medium: summarize, draft, analyze |
| 2c | Ollama | qwen2.5-coder:32b | Code: review, debug, simple fix |
| 2t | Ollama | alaiml-task-v1 | Task-specific: MC task handling |
| 3 | Ollama | llama3.1:70b* | Complex reasoning (NO code execution) |
| 4 | Human Queue | - | Critical: multi-file, architecture, decisions |
Tier 2 i 3 modeli nisu trenutno instalirani. Fallback na Tier 2c.
Routing logika
- Caller-based - svaki daemon/agent ima fiksni tier:
- email-agent, pipeline-watcher -> Tier 1
- morning-routine, explore -> Tier 2
- autowork-standard, validator -> Tier 2c
- builder, interactive -> Tier 4 (human/Claude)
- Keyword fallback - skenira task tekst za keyword match
- Default - Tier 2
3. RAG System (Retrieval-Augmented Generation)
3.1 Arhitektura (v2, 2026-02-23)
Query dolazi
|
v
+------------------------+
| Retrieval Orchestrator | (retrieval-orchestrator.js)
| Multi-store parallel |
+-----+-----+-----+------+
| | | |
+------------+ | | +------------+
v v v v
+-----------+ +-------+ +--------+ +-----------+
| HiveMind | | KB | | RAG | | Sessions |
| semantic | | docs | | cache | | grep |
| 13,473 | |24,636 | | 2,201 | | 761 |
+-----------+ +-------+ +--------+ +-----------+
| | |
+-------+-------+----------+
v
+---------------+
| RRF Merge | Reciprocal Rank Fusion (k=60)
| Deduplicate |
+-------+-------+
|
v
Top N results
Retrieval flow:
- Embed query jednom (nomic-embed-text, 768-dim)
- Parallel query svih 4 storea (HiveMind semantic, Knowledge DB, RAG Cache, Sessions grep)
- RRF Merge — Reciprocal Rank Fusion kombinira rankings iz svih izvora
- Return top N rezultata sa RRF score + source attribution
Inspirirano: Spring AI Modular RAG (RetrievalAugmentationAdvisor + MultiQueryExpander + ConcatenationDocumentJoiner)
3.2 Retrieval Orchestrator (NOVO, 2026-02-23)
File: ~/system/tools/retrieval-orchestrator.js
MC Task: #1804
Centralni entry-point za sav retrieval u sistemu. Umjesto rucnog "BookStack PRVO -> HiveMind -> etc", orchestrator automatski paralelno pretrazuje sve storee i vraca rankirane rezultate.
CLI:
node retrieval-orchestrator.js query "tema" [--limit N] [--verbose] [--stores s1,s2]
node retrieval-orchestrator.js stats
node retrieval-orchestrator.js stores
Module:
const { RetrievalOrchestrator } = require('./retrieval-orchestrator');
const ro = new RetrievalOrchestrator();
const { results, meta } = await ro.query('tema', { limit: 5 });
Stores:
| Store | Tip pretrage | Entries | Izvor |
|---|---|---|---|
hivemind |
Cosine similarity + LIKE fallback | 13,473 | hivemind.db |
knowledge |
Cosine similarity (vector-db.js) | 24,636 | knowledge.db |
rag |
Cosine similarity na RAG cache | 2,201 | flywheel.db |
sessions |
Grep text search | 761 fajlova | ~/system/memory/sessions/ |
3.3 Vector Database
File: ~/system/tools/vector-db.js
Tip: SQLite + Float32Array BLOB kolone (custom implementacija)
Embedding model: nomic-embed-text (768-dim, lokalni, via Ollama)
Nema: ChromaDB, FAISS, Pinecone, Weaviate, pgvector — sve je custom SQLite
UNIFIED EMBEDDING (2026-02-23): Svi toolsi koriste ISTI model (nomic-embed-text via Ollama):
vector-db.js— JS modul (originalni)memory-indexer.py— Python indexer (prepisani sa sentence-transformers)hivemind.js— HiveMind embeddings (novo)session-archiver.js— Session embeddings (novo)rag-router.js— RAG cache embeddings (originalni)
Prethodno: memory-indexer.py je koristio all-MiniLM-L6-v2 (384-dim) — razliciti vektorski prostori, cosine similarity izmedju njih je besmislen. Fiksirano u MC #1804.
Mogucnosti:
- Semanticki search (cosine similarity)
- Hybrid search (SQL WHERE + vektor ranking)
- Kolekcije sa metadata kolonama
- Bulk insert sa batching-om (32 docs/batch)
3.4 Knowledge Base (Document Store)
File: ~/system/tools/knowledge-base.js
DB: ~/system/databases/knowledge.db
Velicina (2026-02-23): 24,636 entries (13,558 dokumenata + 11,075 memory-file chunks + 3 session chunks)
Schema:
kb_docs— metadata (source, title, tag, hash, chunk count)documents— vektor-indeksirani chunkovi (content, embedding BLOB, tag)
Tagovi:
| Tag kategorija | Primjer tagova | Entries |
|---|---|---|
memory-file |
Svi ~/system/ MD fajlovi | 11,075 |
| Projekti | lumiscare, drop, drop-architecture | ~8,000 |
| Patterns | pattern-security, pattern-architecture | ~500 |
| System | agents, system, rules, organization | ~900 |
| Sessions | session | 3+ (raste) |
Dva indexera:
knowledge-base.js— URL/file ingestion sa auto-chunking, tagging, dedupmemory-indexer.py— ~/system/ MD file scanner, batch embedding,tag='memory-file'
3.5 RAG Flywheel (Cache + Ucenje)
File: ~/system/tools/rag-router.js
DB: ~/system/databases/flywheel.db
MCP Server: ~/system/tools/rag-mcp.js -> registrovan u ~/.claude/mcp.json
Flywheel metrike (live, 2026-02-23):
| Metrika | Vrijednost |
|---|---|
| Total queries | 886 |
| Cache hit rate | 61.1% |
| Local model rate | 4.4% |
| External rate | 34.5% |
| Cache size | 2,201 entries |
| Cost saved queries | 580 |
MCP Tools (dostupni iz Claude Code sesije):
mcp__rag__rag_query(query, task_type)— Rutiraj upit kroz cache -> local -> externalmcp__rag__rag_learn(question, answer)— Dodaj Q&A u cachemcp__rag__rag_stats()— Flywheel metrike
RAG Router flow (Progressive Enrichment):
- Cache search — cosine similarity na rag_cache (threshold 0.75)
- Local RAW — Ollama bez KB konteksta, confidence gate (0.75+)
- Local ENRICHED — Ollama SA knowledge.db kontekstom
- External — Flag za Claude Code
DB Schema (flywheel.db):
interactions— svaki query logiran (model, routing, cost, latency)rag_cache— Q&A parovi sa embedding-om (query_embedding BLOB, response, hit_count, project_tag)shadow_log— routing odluke + top 3 similarity scores
3.6 Session Archiver (NOVO, 2026-02-23)
File: ~/system/tools/session-archiver.js
LaunchAgent: com.john.session-archiver (daily 03:00)
Upravlja lifecycleom session fajlova — cijenimo summary, cistimo raw transkripte.
Komande:
node session-archiver.js stats # Statistika
node session-archiver.js archive [--dry-run] # Strip raw transkripata >14 dana
node session-archiver.js index [--limit N] # Embeduj summarije u knowledge.db
node session-archiver.js cleanup [--dry-run] # Archive + index (cron)
Stats (2026-02-23):
- 761 session fajlova, 688 sa raw transkriptom
- 21.5 MB total, 20 MB (93%) je raw transcript bulk
- ~20 MB estimated savings od archivinga
4. HiveMind (Shared Memory Bus + Semantic Search)
File: ~/system/agents/hivemind/hivemind.js
DB: ~/system/agents/hivemind/hivemind.db
Tip: SQLite — keyword search + semantic vector search (od 2026-02-23)
Live stats (2026-02-23):
| Metrika | Vrijednost |
|---|---|
| Total intel entries | 13,473 |
| With embeddings | ~13,473 (backfill u toku) |
| Memos | 70+ |
| Retencija | 90 dana |
Upgrade (MC #1804): HiveMind je dobio vektor search:
embedding BLOBkolona dodana uinteltabelu- Svaki novi
postautomatski embeduje poruku (best-effort, skip ako Ollama down) - Tri nova search moda:
| Komanda | Tip | Opis |
|---|---|---|
query "X" |
LIKE | Keyword match (originalni, backward compat) |
semantic-query "X" |
Cosine | Embedding similarity search (top 5000 recent) |
hybrid-query "X" |
LIKE + Cosine RRF | Reciprocal Rank Fusion merge |
backfill-embeddings |
Batch | Embeduje entries bez vektora (32/batch) |
Schema:
intel— agent poruke (agent, type, message, data, priority, embedding BLOB)agents— registrovani agenti (name, role, status)subscriptions— agent topic pretplatememos— key-value memorija (key, value, access_count)
Intel tipovi: discovery, alert, opportunity, update, request, response, learning, error
Retencija: 90 dana za intel, 7 dana za event fajlove
5. Claude API (Anthropic)
Primarni AI: Claude Code (Opus za sesije, Sonnet/Haiku za agente)
Direktna API integracija:
~/system/tools/comms-agent/claude-handler.ts- Anthropic SDK wrapper za automatske odgovore~/system/tools/comms-responder.js- Komunikacijski agent
Nema OpenAI API integracija u sistemu.
6. MCP Serveri
| Server | File | Namjena |
|---|---|---|
rag |
~/system/tools/rag-mcp.js |
RAG query/learn/stats |
email |
~/system/tools/email-mcp-bridge.js |
Email operacije (2 accounta) |
youtube-transcript |
@fabriqa.ai/youtube-transcript-mcp |
YouTube transkripti |
playwright |
@playwright/mcp |
Browser automatizacija |
figma |
@anthropic-ai/figma-mcp |
Figma dizajn pristup |
7. Fine-tuned modeli (ALAI ML)
Tri custom modela trenirani na internim podacima:
| Model | Baza | Namjena | Velicina |
|---|---|---|---|
alaiml-task-v1 |
llama3.1:8b (Modelfile) | MC task klasifikacija i handling | 986 MB |
alaiml-tender-v1 |
llama3.1:8b (Modelfile) | Tender analiza i filtriranje | 986 MB |
alaiml-email-v1 |
llama3.1:8b (Modelfile) | Email klasifikacija i triage | 986 MB |
Retrain daemon: com.john.alaiml-retrain (LaunchAgent)
8. AutoCoder (Python Agent Framework)
Path: ~/system/services/autocoder/
Komponente:
agent.py- Glavni agent logicagent_classifier.py- Task klasifikacijaparallel_orchestrator.py- Multi-agent orkestracija (53 KB)mcp_server/- MCP server
UI: LaunchAgent com.john.autocoder-ui (port 8888)
Status: Instaliran, koristi se opcionalno kroz build mode.
9. Baze podataka (sve SQLite)
| Baza | Velicina | Namjena | Ima vektore? | Entries |
|---|---|---|---|---|
knowledge.db |
~50 MB | Document store (KB + memory-file + sessions) | DA (BLOB 768-dim) | 24,636 |
flywheel.db |
~10 MB | RAG cache + interaction log + routing | DA (BLOB 768-dim) | 2,201 cache + 886 interactions |
hivemind.db |
~30 MB | Agent memory bus + memos + semantic search | DA (BLOB 768-dim) | 13,473 |
mission-control.db |
~3 MB | Task management | NE | 1,804+ tasks |
events.db |
~3 MB | Event bus | NE | — |
contacts.db |
~50 KB | Kontakti | NE | — |
invoices.db |
~40 KB | Fakture | NE | — |
Unified embedding model (od 2026-02-23): Sve 3 vektor-baze koriste ISTI model (nomic-embed-text 768-dim via Ollama). Nema mismatch-a.
Nema eksternih vektor baza (ChromaDB, FAISS, Pinecone, Weaviate, Qdrant, pgvector).
10. Sto POSTOJI vs Sto NE POSTOJI
Postoji (verifikovano 2026-02-23)
- 7 lokalnih Ollama modela (ukljucujuci 3 fine-tuned)
- Unified embedding model (nomic-embed-text, 768-dim, lokalni) — ISTI za sve storee
- Custom vektor DB (SQLite + BLOB, cosine similarity)
- Retrieval Orchestrator — 4-store parallel search sa RRF merge (NOVO)
- RAG 3-tier routing sa flywheel cache-om (61.1% hit rate, 886 queries)
- Knowledge base: 24,636 entries (documents + memory files + sessions)
- HiveMind semantic search — cosine + hybrid + backfill (NOVO)
- Session archiver — cleanup + embedding + daily cron (NOVO)
- Tier router za task->model dispatch (6 tierova)
- 5 MCP servera (RAG, email, YouTube, Playwright, Figma)
- 3 ALAI fine-tuned modela
- Usage tracking za sve AI pozive
- Claude API integracija (comms-agent)
NE postoji
- Nema cloud vektor baza (ChromaDB, Pinecone, Weaviate...)
- Nema OpenAI API
- Nema LangChain / LlamaIndex / LanceDB (custom implementacija, zero external deps)
- Nema cloud embeddings (sve lokalno)
- Nema GraphRAG (prevelik effort za nas obim)
- Nema cross-encoder reranking (Ollama default dovoljan)
- llama3.1:70b i qwen2.5:72b konfigurirani ali NE instalirani
- BookStack NIJE dio RAG pipeline-a (samo human-readable wiki)
11. Arhitekturni princip
Cost-optimized hybrid: Cache prvo -> Lokalni modeli drugo -> Cloud API zadnji.
- Svi embeddings su lokalni (Ollama nomic-embed-text, 768-dim)
- Sav vektor storage je u SQLite BLOB kolonama (Float32Array)
- Jedan embedding model za cijeli sistem — nema mismatch-a
- Nema cloud zavisnosti za RAG
- Claude API se koristi samo za ono sto lokalni modeli ne mogu
- Fine-tuned modeli pokrivaju repetitivne domenske taskove (email, tender, MC tasks)
- Retrieval orchestrator objedinjuje sve storee u jedan poziv sa RRF merge
Tool-First Protocol (retrieval redoslijed)
BookStack (human wiki) -> RAG MCP (mcp__rag__rag_query) -> Manifest
-> HiveMind (semantic-query) -> Internet -> Azuriraj bazu
Za programski retrieval: node retrieval-orchestrator.js query "tema" — automatski paralelno pretrazuje sve.
12. Changelog
| Datum | Promjena | MC Task |
|---|---|---|
| 2026-02-23 | RAG System Upgrade: unified embedding, HiveMind vector search, retrieval orchestrator, session archiver | #1804 |
| 2026-02-21 | Initial document created — full system inventory | — |
No comments to display
No comments to display