Skip to main content

AI Model & RAG Architecture

AI Model & RAG Architecture

Pregled svih AI modela i RAG (Retrieval-Augmented Generation) komponenti u ALAI sistemu. Datum: 2026-02-21. Izvor: verifikovan inventar iz filesystem-a i running servisa.


Pregled na jednoj stranici

+-----------------------------------------------------------------+
|                      CLAUDE CODE (Opus/Sonnet/Haiku)            |
|                     Primarni orkestrator - John                  |
|                  (Anthropic API, cloud, kontekst do 200K)        |
+-----------------------------------------------------------------+
                             |
          +------------------+------------------+
          v                  v                  v
   +-------------+   +-------------+   +-----------------+
   |  RAG Router  |   |  Tier Router |   |  MCP Servers    |
   |  (rag-mcp)   |   |  (6 tierova) |   |  email, figma,  |
   |              |   |              |   |  playwright, yt  |
   +------+---+--+   +------+------+   +-----------------+
          |   |              |
    +-----+   +----+         v
    v              v   +---------------+
+--------+  +--------+|    OLLAMA      |
| Cache  |  |  KB    || localhost:11434|
|flywheel|  |knowledge|+------+--------+
|  .db   |  |  .db   |       |
+--------+  +--------+       v
                       +---------------+
                       |  7 lokalnih   |
                       |   modela      |
                       +---------------+

+-----------------------------------------------------------------+
|                       HIVEMIND (SQLite)                          |
|              Shared memory bus - 11,796 entries                  |
|          Svi agenti citaju/pisu, keyword search                 |
+-----------------------------------------------------------------+

+-----------------------------------------------------------------+
|                     BOOKSTACK (Wiki)                             |
|           http://localhost:6875 - dokumentacija                  |
|       NE ucestvuje u RAG pipeline-u (covjek cita)               |
+-----------------------------------------------------------------+

1. Lokalni AI modeli (Ollama)

Server: http://localhost:11434 Hardware: Mac Studio M3 Ultra, 96 GB RAM LaunchAgent: homebrew.mxcl.ollama Config: ~/system/config/ollama.json

Instalirani modeli (ollama list, 2026-02-21)

Model Velicina Namjena Status
llama3.1:8b 4.9 GB Brzi classify/extract/filter (Tier 1) AKTIVAN
qwen2.5-coder:32b 19 GB Code review, debug, refaktor (Tier 2c) AKTIVAN
nomic-embed-text 274 MB Embeddings - 768-dim vektori za RAG AKTIVAN
alaiml-task-v1 986 MB Fine-tuned za MC task handling (Tier 2t) AKTIVAN
alaiml-tender-v1 986 MB Fine-tuned za tender analizu AKTIVAN
alaiml-email-v1 986 MB Fine-tuned za email klasifikaciju AKTIVAN
llama-guard3:8b 4.9 GB Content safety / guardrails AKTIVAN

Konfigurirani ali NE instalirani

Model Razlog Napomena
llama3.1:70b 42 GB - ne stane uvijek u RAM U config-u kao Tier 3 (complex reasoning)
qwen2.5:72b 47 GB - ne stane uvijek u RAM U config-u kao Tier 2 (general)

Wrapper toolsi:

  • ~/system/tools/ollama-engine.js - HTTP wrapper za generate/classify
  • ~/system/tools/ollama-tool-agent.js - Multi-turn agent sa READ-ONLY toolsima
  • ~/system/tools/agent-runner.js - Agent lifecycle (identity -> state -> HiveMind -> Ollama -> save)

2. Tier Routing (Task -> Model dispatch)

File: ~/system/tools/tier-router.js Config: ~/system/config/tier-routing.json

Svaki AI request ide kroz routing koji odlucuje koji model procesira:

Tier Engine Model Namjena
1 Ollama llama3.1:8b Trivijalno: classify, filter, extract
2 Ollama qwen2.5:72b* Medium: summarize, draft, analyze
2c Ollama qwen2.5-coder:32b Code: review, debug, simple fix
2t Ollama alaiml-task-v1 Task-specific: MC task handling
3 Ollama llama3.1:70b* Complex reasoning (NO code execution)
4 Human Queue - Critical: multi-file, architecture, decisions

Tier 2 i 3 modeli nisu trenutno instalirani. Fallback na Tier 2c.

Routing logika

  1. Caller-based - svaki daemon/agent ima fiksni tier:
    • email-agent, pipeline-watcher -> Tier 1
    • morning-routine, explore -> Tier 2
    • autowork-standard, validator -> Tier 2c
    • builder, interactive -> Tier 4 (human/Claude)
  2. Keyword fallback - skenira task tekst za keyword match
  3. Default - Tier 2

3. RAG System (Retrieval-Augmented Generation)

3.1 Arhitektura

Query dolazi
    |
    v
+----------------+
|  RAG Router     |  (rag-router.js / rag-mcp.js)
|  3-tier routing |
+--------+-------+
         |
    +----+----------------+
    v    v                v
  Cache  Local Model    External
  (hit?) (Ollama)       (Claude flag)

3 tiera RAG routinga:

  1. Cache hit - embedding similarity >= 0.75 -> instant odgovor iz flywheel.db
  2. Local model - Ollama (preko tier-router-a) -> lokalni inference
  3. External - Flagira da treba skuplji API call (Claude)

3.2 Vector Database

File: ~/system/tools/vector-db.js Tip: SQLite + Float32Array BLOB kolone (custom implementacija) Embedding model: nomic-embed-text (768-dim, lokalni, via Ollama) Nema: ChromaDB, FAISS, Pinecone, Weaviate, pgvector - sve je custom SQLite

Mogucnosti:

  • Semanticki search (cosine similarity)
  • Hybrid search (SQL WHERE + vektor ranking)
  • Kolekcije sa metadata kolonama
  • Bulk insert sa batching-om (32 docs/batch)

3.3 Knowledge Base (Document Store)

File: ~/system/tools/knowledge-base.js DB: ~/system/databases/knowledge.db (240 KB)

Funkcionalnosti:

  • URL/file ingestion (HTML -> tekst konverzija)
  • Auto-chunking (1000 chars, markdown-aware)
  • Tagging sistem
  • Deduplikacija (SHA-256 content hash)

Schema:

  • kb_docs - metadata (source, title, tag, hash, chunk count)
  • documents - vektor-indeksirani chunkovi

3.4 RAG Flywheel (Cache + Ucenje)

File: ~/system/tools/rag-router.js DB: ~/system/databases/flywheel.db (9.1 MB) MCP Server: ~/system/tools/rag-mcp.js -> registrovan u ~/.claude/mcp.json

Flywheel metrike (live, 2026-02-21):

Metrika Vrijednost
Total queries 64
Cache hit rate 54.7%
Local model rate 0%
External rate 45.3%
Cache size 1,840 entries
Training queue 36 pending

MCP Tools (dostupni iz Claude Code sesije):

  • rag_query(query, task_type) - Rutiraj upit kroz cache -> local -> external
  • rag_learn(question, answer) - Dodaj Q&A u cache
  • rag_stats() - Flywheel metrike

DB Schema (flywheel.db):

  • interactions - svaki query logiran (model, routing, cost, latency)
  • rag_cache - Q&A parovi sa embedding-om (query_embedding BLOB, response, hit_count)
  • shadow_log - routing odluke + top 3 similarity scores

4. HiveMind (Shared Memory Bus)

File: ~/system/agents/hivemind/hivemind.js DB: ~/system/databases/hivemind.db (3.7 MB) Tip: SQLite - keyword search, NE vektor search

Nije RAG. HiveMind je message bus + key-value store za inter-agentsku komunikaciju.

Live stats (2026-02-21):

Metrika Vrijednost
Total intel entries 11,796
Entries danas 667
Aktivni agenti (1h) 6
Alerts danas 24

Schema:

  • intel - agent poruke (agent, type, message, data, priority)
  • agents - registrovani agenti (name, role, status)
  • subscriptions - agent topic pretplate
  • memos - key-value memorija (key, value, access_count)

Intel tipovi: discovery, alert, opportunity, update, request, response, learning, error

Retencija: 90 dana za intel, 7 dana za event fajlove


5. Claude API (Anthropic)

Primarni AI: Claude Code (Opus za sesije, Sonnet/Haiku za agente)

Direktna API integracija:

  • ~/system/tools/comms-agent/claude-handler.ts - Anthropic SDK wrapper za automatske odgovore
  • ~/system/tools/comms-responder.js - Komunikacijski agent

Nema OpenAI API integracija u sistemu.


6. MCP Serveri

Server File Namjena
rag ~/system/tools/rag-mcp.js RAG query/learn/stats
email ~/system/tools/email-mcp-bridge.js Email operacije (2 accounta)
youtube-transcript @fabriqa.ai/youtube-transcript-mcp YouTube transkripti
playwright @playwright/mcp Browser automatizacija
figma @anthropic-ai/figma-mcp Figma dizajn pristup

7. Fine-tuned modeli (ALAI ML)

Tri custom modela trenirani na internim podacima:

Model Baza Namjena Velicina
alaiml-task-v1 llama3.1:8b (Modelfile) MC task klasifikacija i handling 986 MB
alaiml-tender-v1 llama3.1:8b (Modelfile) Tender analiza i filtriranje 986 MB
alaiml-email-v1 llama3.1:8b (Modelfile) Email klasifikacija i triage 986 MB

Retrain daemon: com.john.alaiml-retrain (LaunchAgent)


8. AutoCoder (Python Agent Framework)

Path: ~/system/services/autocoder/ Komponente:

  • agent.py - Glavni agent logic
  • agent_classifier.py - Task klasifikacija
  • parallel_orchestrator.py - Multi-agent orkestracija (53 KB)
  • mcp_server/ - MCP server

UI: LaunchAgent com.john.autocoder-ui (port 8888) Status: Instaliran, koristi se opcionalno kroz build mode.


9. Baze podataka (sve SQLite)

Baza Velicina Namjena Ima vektore?
flywheel.db 9.1 MB RAG cache + interaction log + routing DA (BLOB)
hivemind.db 3.7 MB Agent memory bus + memos NE
knowledge.db 240 KB Document store sa chunkovima DA (BLOB)
mission-control.db 2.2 MB Task management NE
events.db 3.2 MB Event bus NE
contacts.db 49 KB Kontakti NE
invoices.db 37 KB Fakture NE

Nema eksternih vektor baza (ChromaDB, FAISS, Pinecone, Weaviate, Qdrant, pgvector).


10. Sto POSTOJI vs Sto NE POSTOJI

Postoji (verifikovano)

  • 7 lokalnih Ollama modela (ukljucujuci 3 fine-tuned)
  • Embedding model (nomic-embed-text, 768-dim, lokalni)
  • Custom vektor DB (SQLite + BLOB, cosine similarity)
  • RAG 3-tier routing sa flywheel cache-om (54.7% hit rate)
  • Knowledge base sa document ingestion i chunking-om
  • Tier router za task->model dispatch (6 tierova)
  • HiveMind shared memory (11,796+ entries)
  • 5 MCP servera (RAG, email, YouTube, Playwright, Figma)
  • 3 ALAI fine-tuned modela
  • Usage tracking za sve AI pozive
  • Claude API integracija (comms-agent)

NE postoji

  • Nema cloud vektor baza (ChromaDB, Pinecone, Weaviate...)
  • Nema OpenAI API
  • Nema LangChain / LlamaIndex (custom implementacija)
  • Nema cloud embeddings (sve lokalno)
  • llama3.1:70b i qwen2.5:72b konfigurirani ali NE instalirani
  • BookStack NIJE dio RAG pipeline-a (samo human-readable wiki)

11. Arhitekturni princip

Cost-optimized hybrid: Cache prvo -> Lokalni modeli drugo -> Cloud API zadnji.

  • Svi embeddings su lokalni (Ollama nomic-embed-text)
  • Sav vektor storage je u SQLite BLOB kolonama
  • Nema cloud zavisnosti za RAG
  • Claude API se koristi samo za ono sto lokalni modeli ne mogu
  • Fine-tuned modeli pokrivaju repetitivne domenske taskove (email, tender, MC tasks)