AI Model & RAG Architecture
AI Model & RAG Architecture
Pregled svih AI modela i RAG (Retrieval-Augmented Generation) komponenti u ALAI sistemu. Datum: 2026-02-21. Izvor: verifikovan inventar iz filesystem-a i running servisa.
Pregled na jednoj stranici
+-----------------------------------------------------------------+
| CLAUDE CODE (Opus/Sonnet/Haiku) |
| Primarni orkestrator - John |
| (Anthropic API, cloud, kontekst do 200K) |
+-----------------------------------------------------------------+
|
+------------------+------------------+
v v v
+-------------+ +-------------+ +-----------------+
| RAG Router | | Tier Router | | MCP Servers |
| (rag-mcp) | | (6 tierova) | | email, figma, |
| | | | | playwright, yt |
+------+---+--+ +------+------+ +-----------------+
| | |
+-----+ +----+ v
v v +---------------+
+--------+ +--------+| OLLAMA |
| Cache | | KB || localhost:11434|
|flywheel| |knowledge|+------+--------+
| .db | | .db | |
+--------+ +--------+ v
+---------------+
| 7 lokalnih |
| modela |
+---------------+
+-----------------------------------------------------------------+
| HIVEMIND (SQLite) |
| Shared memory bus - 11,796 entries |
| Svi agenti citaju/pisu, keyword search |
+-----------------------------------------------------------------+
+-----------------------------------------------------------------+
| BOOKSTACK (Wiki) |
| http://localhost:6875 - dokumentacija |
| NE ucestvuje u RAG pipeline-u (covjek cita) |
+-----------------------------------------------------------------+
1. Lokalni AI modeli (Ollama)
Server: http://localhost:11434
Hardware: Mac Studio M3 Ultra, 96 GB RAM
LaunchAgent: homebrew.mxcl.ollama
Config: ~/system/config/ollama.json
Instalirani modeli (ollama list, 2026-02-21)
| Model | Velicina | Namjena | Status |
|---|---|---|---|
llama3.1:8b |
4.9 GB | Brzi classify/extract/filter (Tier 1) | AKTIVAN |
qwen2.5-coder:32b |
19 GB | Code review, debug, refaktor (Tier 2c) | AKTIVAN |
nomic-embed-text |
274 MB | Embeddings - 768-dim vektori za RAG | AKTIVAN |
alaiml-task-v1 |
986 MB | Fine-tuned za MC task handling (Tier 2t) | AKTIVAN |
alaiml-tender-v1 |
986 MB | Fine-tuned za tender analizu | AKTIVAN |
alaiml-email-v1 |
986 MB | Fine-tuned za email klasifikaciju | AKTIVAN |
llama-guard3:8b |
4.9 GB | Content safety / guardrails | AKTIVAN |
Konfigurirani ali NE instalirani
| Model | Razlog | Napomena |
|---|---|---|
llama3.1:70b |
42 GB - ne stane uvijek u RAM | U config-u kao Tier 3 (complex reasoning) |
qwen2.5:72b |
47 GB - ne stane uvijek u RAM | U config-u kao Tier 2 (general) |
Wrapper toolsi:
~/system/tools/ollama-engine.js- HTTP wrapper za generate/classify~/system/tools/ollama-tool-agent.js- Multi-turn agent sa READ-ONLY toolsima~/system/tools/agent-runner.js- Agent lifecycle (identity -> state -> HiveMind -> Ollama -> save)
2. Tier Routing (Task -> Model dispatch)
File: ~/system/tools/tier-router.js
Config: ~/system/config/tier-routing.json
Svaki AI request ide kroz routing koji odlucuje koji model procesira:
| Tier | Engine | Model | Namjena |
|---|---|---|---|
| 1 | Ollama | llama3.1:8b | Trivijalno: classify, filter, extract |
| 2 | Ollama | qwen2.5:72b* | Medium: summarize, draft, analyze |
| 2c | Ollama | qwen2.5-coder:32b | Code: review, debug, simple fix |
| 2t | Ollama | alaiml-task-v1 | Task-specific: MC task handling |
| 3 | Ollama | llama3.1:70b* | Complex reasoning (NO code execution) |
| 4 | Human Queue | - | Critical: multi-file, architecture, decisions |
Tier 2 i 3 modeli nisu trenutno instalirani. Fallback na Tier 2c.
Routing logika
- Caller-based - svaki daemon/agent ima fiksni tier:
- email-agent, pipeline-watcher -> Tier 1
- morning-routine, explore -> Tier 2
- autowork-standard, validator -> Tier 2c
- builder, interactive -> Tier 4 (human/Claude)
- Keyword fallback - skenira task tekst za keyword match
- Default - Tier 2
3. RAG System (Retrieval-Augmented Generation)
3.1 Arhitektura
Query dolazi
|
v
+----------------+
| RAG Router | (rag-router.js / rag-mcp.js)
| 3-tier routing |
+--------+-------+
|
+----+----------------+
v v v
Cache Local Model External
(hit?) (Ollama) (Claude flag)
3 tiera RAG routinga:
- Cache hit - embedding similarity >= 0.75 -> instant odgovor iz flywheel.db
- Local model - Ollama (preko tier-router-a) -> lokalni inference
- External - Flagira da treba skuplji API call (Claude)
3.2 Vector Database
File: ~/system/tools/vector-db.js
Tip: SQLite + Float32Array BLOB kolone (custom implementacija)
Embedding model: nomic-embed-text (768-dim, lokalni, via Ollama)
Nema: ChromaDB, FAISS, Pinecone, Weaviate, pgvector - sve je custom SQLite
Mogucnosti:
- Semanticki search (cosine similarity)
- Hybrid search (SQL WHERE + vektor ranking)
- Kolekcije sa metadata kolonama
- Bulk insert sa batching-om (32 docs/batch)
3.3 Knowledge Base (Document Store)
File: ~/system/tools/knowledge-base.js
DB: ~/system/databases/knowledge.db (240 KB)
Funkcionalnosti:
- URL/file ingestion (HTML -> tekst konverzija)
- Auto-chunking (1000 chars, markdown-aware)
- Tagging sistem
- Deduplikacija (SHA-256 content hash)
Schema:
kb_docs- metadata (source, title, tag, hash, chunk count)documents- vektor-indeksirani chunkovi
3.4 RAG Flywheel (Cache + Ucenje)
File: ~/system/tools/rag-router.js
DB: ~/system/databases/flywheel.db (9.1 MB)
MCP Server: ~/system/tools/rag-mcp.js -> registrovan u ~/.claude/mcp.json
Flywheel metrike (live, 2026-02-21):
| Metrika | Vrijednost |
|---|---|
| Total queries | 64 |
| Cache hit rate | 54.7% |
| Local model rate | 0% |
| External rate | 45.3% |
| Cache size | 1,840 entries |
| Training queue | 36 pending |
MCP Tools (dostupni iz Claude Code sesije):
rag_query(query, task_type)- Rutiraj upit kroz cache -> local -> externalrag_learn(question, answer)- Dodaj Q&A u cacherag_stats()- Flywheel metrike
DB Schema (flywheel.db):
interactions- svaki query logiran (model, routing, cost, latency)rag_cache- Q&A parovi sa embedding-om (query_embedding BLOB, response, hit_count)shadow_log- routing odluke + top 3 similarity scores
4. HiveMind (Shared Memory Bus)
File: ~/system/agents/hivemind/hivemind.js
DB: ~/system/databases/hivemind.db (3.7 MB)
Tip: SQLite - keyword search, NE vektor search
Nije RAG. HiveMind je message bus + key-value store za inter-agentsku komunikaciju.
Live stats (2026-02-21):
| Metrika | Vrijednost |
|---|---|
| Total intel entries | 11,796 |
| Entries danas | 667 |
| Aktivni agenti (1h) | 6 |
| Alerts danas | 24 |
Schema:
intel- agent poruke (agent, type, message, data, priority)agents- registrovani agenti (name, role, status)subscriptions- agent topic pretplatememos- key-value memorija (key, value, access_count)
Intel tipovi: discovery, alert, opportunity, update, request, response, learning, error
Retencija: 90 dana za intel, 7 dana za event fajlove
5. Claude API (Anthropic)
Primarni AI: Claude Code (Opus za sesije, Sonnet/Haiku za agente)
Direktna API integracija:
~/system/tools/comms-agent/claude-handler.ts- Anthropic SDK wrapper za automatske odgovore~/system/tools/comms-responder.js- Komunikacijski agent
Nema OpenAI API integracija u sistemu.
6. MCP Serveri
| Server | File | Namjena |
|---|---|---|
rag |
~/system/tools/rag-mcp.js |
RAG query/learn/stats |
email |
~/system/tools/email-mcp-bridge.js |
Email operacije (2 accounta) |
youtube-transcript |
@fabriqa.ai/youtube-transcript-mcp |
YouTube transkripti |
playwright |
@playwright/mcp |
Browser automatizacija |
figma |
@anthropic-ai/figma-mcp |
Figma dizajn pristup |
7. Fine-tuned modeli (ALAI ML)
Tri custom modela trenirani na internim podacima:
| Model | Baza | Namjena | Velicina |
|---|---|---|---|
alaiml-task-v1 |
llama3.1:8b (Modelfile) | MC task klasifikacija i handling | 986 MB |
alaiml-tender-v1 |
llama3.1:8b (Modelfile) | Tender analiza i filtriranje | 986 MB |
alaiml-email-v1 |
llama3.1:8b (Modelfile) | Email klasifikacija i triage | 986 MB |
Retrain daemon: com.john.alaiml-retrain (LaunchAgent)
8. AutoCoder (Python Agent Framework)
Path: ~/system/services/autocoder/
Komponente:
agent.py- Glavni agent logicagent_classifier.py- Task klasifikacijaparallel_orchestrator.py- Multi-agent orkestracija (53 KB)mcp_server/- MCP server
UI: LaunchAgent com.john.autocoder-ui (port 8888)
Status: Instaliran, koristi se opcionalno kroz build mode.
9. Baze podataka (sve SQLite)
| Baza | Velicina | Namjena | Ima vektore? |
|---|---|---|---|
flywheel.db |
9.1 MB | RAG cache + interaction log + routing | DA (BLOB) |
hivemind.db |
3.7 MB | Agent memory bus + memos | NE |
knowledge.db |
240 KB | Document store sa chunkovima | DA (BLOB) |
mission-control.db |
2.2 MB | Task management | NE |
events.db |
3.2 MB | Event bus | NE |
contacts.db |
49 KB | Kontakti | NE |
invoices.db |
37 KB | Fakture | NE |
Nema eksternih vektor baza (ChromaDB, FAISS, Pinecone, Weaviate, Qdrant, pgvector).
10. Sto POSTOJI vs Sto NE POSTOJI
Postoji (verifikovano)
- 7 lokalnih Ollama modela (ukljucujuci 3 fine-tuned)
- Embedding model (nomic-embed-text, 768-dim, lokalni)
- Custom vektor DB (SQLite + BLOB, cosine similarity)
- RAG 3-tier routing sa flywheel cache-om (54.7% hit rate)
- Knowledge base sa document ingestion i chunking-om
- Tier router za task->model dispatch (6 tierova)
- HiveMind shared memory (11,796+ entries)
- 5 MCP servera (RAG, email, YouTube, Playwright, Figma)
- 3 ALAI fine-tuned modela
- Usage tracking za sve AI pozive
- Claude API integracija (comms-agent)
NE postoji
- Nema cloud vektor baza (ChromaDB, Pinecone, Weaviate...)
- Nema OpenAI API
- Nema LangChain / LlamaIndex (custom implementacija)
- Nema cloud embeddings (sve lokalno)
- llama3.1:70b i qwen2.5:72b konfigurirani ali NE instalirani
- BookStack NIJE dio RAG pipeline-a (samo human-readable wiki)
11. Arhitekturni princip
Cost-optimized hybrid: Cache prvo -> Lokalni modeli drugo -> Cloud API zadnji.
- Svi embeddings su lokalni (Ollama nomic-embed-text)
- Sav vektor storage je u SQLite BLOB kolonama
- Nema cloud zavisnosti za RAG
- Claude API se koristi samo za ono sto lokalni modeli ne mogu
- Fine-tuned modeli pokrivaju repetitivne domenske taskove (email, tender, MC tasks)