Skip to main content

AI Model & RAG Architecture

AI Model & RAG Architecture

Pregled svih AI modela i RAG (Retrieval-Augmented Generation) komponenti u ALAI sistemu. Datum: 2026-02-21.23. Izvor: verifikovan inventar iz filesystem-a i running servisa. Zadnji update: RAG System Upgrade (MC #1804) — unified embedding, HiveMind vector search, retrieval orchestrator, session archiver.


Pregled na jednoj stranici

+-----------------------------------------------------------------+
|                      CLAUDE CODE (Opus/Sonnet/Haiku)            |
|                     Primarni orkestrator - John                  |
|                  (Anthropic API, cloud, kontekst do 200K)        |
+-----------------------------------------------------------------+
                             |
          +------------------+------------------+
          v                  v                  v
   +-------------+   +-------------+   +-----------------+
   |  RAG Router  |   |  Tier Router |   |  MCP Servers    |
   |  (rag-mcp)   |   |  (6 tierova) |   |  email, figma,  |
   |              |   |              |   |  playwright, yt  |
   +------+---+--+   +------+------+   +-----------------+
          |   |              |
    +-----+   +----+         v
    v              v   +---------------+
+--------+  +--------+|    OLLAMA      |
| Cache  |  |  KB    || localhost:11434|
|flywheel|  |knowledge|+------+--------+
|  .db   |  |  .db   |       |
+--------+  +--------+       v
                       +---------------+
                       |  7 lokalnih   |
                       |   modela      |
                       +---------------+

+------------------------------------------------------------------+
|                  HIVEMINDRETRIEVAL (SQLite)ORCHESTRATOR                            |
|              Shared memory bus - 11,796 entriesretrieval-orchestrator.js                             |
|  SviParallel agentiquery citaju/pisu,-> keywordHiveMind search+ KB + RAG + Sessions -> RRF merge   |
+------------------------------------------------------------------+
    |             |              |              |
    v             v              v              v
+--------+  +--------+   +--------+   +----------+
|HiveMind|  |Knowledge|   |  RAG   |   | Sessions |
|semantic|  |  DB     |   | Cache  |   |  (grep)  |
|13,473  |  |24,636   |   | 2,201  |   |   761    |
+--------+  +--------+   +--------+   +----------+

+-----------------------------------------------------------------+
|                     BOOKSTACK (Wiki)                             |
|           http://localhost:6875 - dokumentacija                  |
|       NE ucestvuje u RAG pipeline-u (covjek cita)               |
+-----------------------------------------------------------------+

1. Lokalni AI modeli (Ollama)

Server: http://localhost:11434 Hardware: Mac Studio M3 Ultra, 96 GB RAM LaunchAgent: homebrew.mxcl.ollama Config: ~/system/config/ollama.json

Instalirani modeli (ollama list, 2026-02-21)

Model Velicina Namjena Status
llama3.1:8b 4.9 GB Brzi classify/extract/filter (Tier 1) AKTIVAN
qwen2.5-coder:32b 19 GB Code review, debug, refaktor (Tier 2c) AKTIVAN
nomic-embed-text 274 MB Embeddings - 768-dim vektori za RAG AKTIVAN
alaiml-task-v1 986 MB Fine-tuned za MC task handling (Tier 2t) AKTIVAN
alaiml-tender-v1 986 MB Fine-tuned za tender analizu AKTIVAN
alaiml-email-v1 986 MB Fine-tuned za email klasifikaciju AKTIVAN
llama-guard3:8b 4.9 GB Content safety / guardrails AKTIVAN

Konfigurirani ali NE instalirani

Model Razlog Napomena
llama3.1:70b 42 GB - ne stane uvijek u RAM U config-u kao Tier 3 (complex reasoning)
qwen2.5:72b 47 GB - ne stane uvijek u RAM U config-u kao Tier 2 (general)

Wrapper toolsi:

  • ~/system/tools/ollama-engine.js - HTTP wrapper za generate/classify
  • ~/system/tools/ollama-tool-agent.js - Multi-turn agent sa READ-ONLY toolsima
  • ~/system/tools/agent-runner.js - Agent lifecycle (identity -> state -> HiveMind -> Ollama -> save)

2. Tier Routing (Task -> Model dispatch)

File: ~/system/tools/tier-router.js Config: ~/system/config/tier-routing.json

Svaki AI request ide kroz routing koji odlucuje koji model procesira:

Tier Engine Model Namjena
1 Ollama llama3.1:8b Trivijalno: classify, filter, extract
2 Ollama qwen2.5:72b* Medium: summarize, draft, analyze
2c Ollama qwen2.5-coder:32b Code: review, debug, simple fix
2t Ollama alaiml-task-v1 Task-specific: MC task handling
3 Ollama llama3.1:70b* Complex reasoning (NO code execution)
4 Human Queue - Critical: multi-file, architecture, decisions

Tier 2 i 3 modeli nisu trenutno instalirani. Fallback na Tier 2c.

Routing logika

  1. Caller-based - svaki daemon/agent ima fiksni tier:
    • email-agent, pipeline-watcher -> Tier 1
    • morning-routine, explore -> Tier 2
    • autowork-standard, validator -> Tier 2c
    • builder, interactive -> Tier 4 (human/Claude)
  2. Keyword fallback - skenira task tekst za keyword match
  3. Default - Tier 2

3. RAG System (Retrieval-Augmented Generation)

3.1 Arhitektura (v2, 2026-02-23)

                         Query dolazi
                              |
                              v
                  +------------------------+
                  |  RAGRetrieval RouterOrchestrator |  (rag-router.js / rag-mcp.retrieval-orchestrator.js)
                  |  3-tierMulti-store routingparallel   |
                  +-----+-----+-----+------+
                        |     |     |      |
           +--------+-------+     |     |      +----+----------------+
           v                  v     v                   Cachev
    Local+-----------+     Model+-------+  External+--------+     +-----------+
    |  HiveMind |     |  KB   |  |  RAG   |     |  Sessions |
    |  semantic |     | docs  |  | cache  |     |   grep    |
    |  13,473   |     |24,636 |  | 2,201  |     |    761    |
    +-----------+     +-------+  +--------+     +-----------+
           |               |          |
           +-------+-------+----------+
                   v
           +---------------+
           |  RRF Merge    |  Reciprocal Rank Fusion (hit?)k=60)
           (Ollama)|  (ClaudeDeduplicate  flag)|
           +-------+-------+
                   |
                   v
            Top N results

3Retrieval tiera RAG routinga:flow:

  1. CacheEmbed hitquery -jednom embedding(nomic-embed-text, similarity768-dim)
  2. >=
  3. Parallel 0.75query svih 4 storea (HiveMind semantic, Knowledge DB, RAG Cache, Sessions grep)
  4. RRF Merge — Reciprocal Rank Fusion kombinira rankings iz svih izvora
  5. Return top N rezultata sa RRF score + source attribution

Inspirirano: Spring AI Modular RAG (RetrievalAugmentationAdvisor + MultiQueryExpander + ConcatenationDocumentJoiner)

3.2 Retrieval Orchestrator (NOVO, 2026-02-23)

File: ~/system/tools/retrieval-orchestrator.js MC Task: #1804

Centralni entry-point za sav retrieval u sistemu. Umjesto rucnog "BookStack PRVO -> instant odgovor iz flywheel.db

  • Local model - Ollama (preko tier-router-a)HiveMind -> lokalnietc", inference
  • orchestrator
  • automatski paralelno pretrazuje sve storee i vraca rankirane rezultate.

    ExternalCLI:

    node retrieval-orchestrator.js query "tema" [--limit FlagiraN] da[--verbose] treba[--stores skupljis1,s2]
    APInode callretrieval-orchestrator.js stats
    node retrieval-orchestrator.js stores
    

    Module:

    const { RetrievalOrchestrator } = require('./retrieval-orchestrator');
    const ro = new RetrievalOrchestrator();
    const { results, meta } = await ro.query('tema', { limit: 5 });
    

    Stores:

    StoreTip pretrageEntriesIzvor
    hivemindCosine similarity + LIKE fallback13,473hivemind.db
    knowledgeCosine similarity (Claude)vector-db.js) 24,636knowledge.db
    ragCosine similarity na RAG cache2,201flywheel.db
    sessionsGrep text search761 fajlova~/system/memory/sessions/

    3.23 Vector Database

    File: ~/system/tools/vector-db.js Tip: SQLite + Float32Array BLOB kolone (custom implementacija) Embedding model: nomic-embed-text (768-dim, lokalni, via Ollama) Nema: ChromaDB, FAISS, Pinecone, Weaviate, pgvector - sve je custom SQLite

    UNIFIED EMBEDDING (2026-02-23): Svi toolsi koriste ISTI model (nomic-embed-text via Ollama):

    • vector-db.js — JS modul (originalni)
    • memory-indexer.py — Python indexer (prepisani sa sentence-transformers)
    • hivemind.js — HiveMind embeddings (novo)
    • session-archiver.js — Session embeddings (novo)
    • rag-router.js — RAG cache embeddings (originalni)

    Prethodno: memory-indexer.py je koristio all-MiniLM-L6-v2 (384-dim) — razliciti vektorski prostori, cosine similarity izmedju njih je besmislen. Fiksirano u MC #1804.

    Mogucnosti:

    • Semanticki search (cosine similarity)
    • Hybrid search (SQL WHERE + vektor ranking)
    • Kolekcije sa metadata kolonama
    • Bulk insert sa batching-om (32 docs/batch)

    3.34 Knowledge Base (Document Store)

    File: ~/system/tools/knowledge-base.js DB: ~/system/databases/knowledge.db (240 KB)

    Funkcionalnosti:Velicina (2026-02-23):

      24,636
    • URL/entries (13,558 dokumenata + 11,075 memory-file ingestionchunks (HTML+ ->3 tekstsession konverzija)
    • Auto-chunking (1000 chars, markdown-aware)
    • Tagging sistem
    • Deduplikacija (SHA-256 content hash)
    chunks)

    Schema:

    • kb_docs - metadata (source, title, tag, hash, chunk count)
    • documents - vektor-indeksirani chunkovi (content, embedding BLOB, tag)

    Tagovi:

    Tag kategorijaPrimjer tagovaEntries
    memory-fileSvi ~/system/ MD fajlovi11,075
    Projektilumiscare, drop, drop-architecture~8,000
    Patternspattern-security, pattern-architecture~500
    Systemagents, system, rules, organization~900
    Sessionssession3+ (raste)

    Dva indexera:

    • knowledge-base.js — URL/file ingestion sa auto-chunking, tagging, dedup
    • memory-indexer.py — ~/system/ MD file scanner, batch embedding, tag='memory-file'

    3.45 RAG Flywheel (Cache + Ucenje)

    File: ~/system/tools/rag-router.js DB: ~/system/databases/flywheel.db (9.1 MB) MCP Server: ~/system/tools/rag-mcp.js -> registrovan u ~/.claude/mcp.json

    Flywheel metrike (live, 2026-02-21)23):

    Metrika Vrijednost
    Total queries 64886
    Cache hit rate 54.7%61.1%
    Local model rate 0%4.4%
    External rate 45.3%34.5%
    Cache size 1,8402,201 entries
    TrainingCost queuesaved queries 36 pending580

    MCP Tools (dostupni iz Claude Code sesije):

    • rag_query(mcp__rag__rag_query(query, task_type) - Rutiraj upit kroz cache -> local -> external
    • rag_learn(mcp__rag__rag_learn(question, answer) - Dodaj Q&A u cache
    • rag_stats(mcp__rag__rag_stats() - Flywheel metrike

    RAG Router flow (Progressive Enrichment):

    1. Cache search — cosine similarity na rag_cache (threshold 0.75)
    2. Local RAW — Ollama bez KB konteksta, confidence gate (0.75+)
    3. Local ENRICHED — Ollama SA knowledge.db kontekstom
    4. External — Flag za Claude Code

    DB Schema (flywheel.db):

    • interactions - svaki query logiran (model, routing, cost, latency)
    • rag_cache - Q&A parovi sa embedding-om (query_embedding BLOB, response, hit_count)hit_count, project_tag)
    • shadow_log - routing odluke + top 3 similarity scores

    3.6 Session Archiver (NOVO, 2026-02-23)

    File: ~/system/tools/session-archiver.js LaunchAgent: com.john.session-archiver (daily 03:00)

    Upravlja lifecycleom session fajlova — cijenimo summary, cistimo raw transkripte.

    Komande:

    node session-archiver.js stats                    # Statistika
    node session-archiver.js archive [--dry-run]      # Strip raw transkripata >14 dana
    node session-archiver.js index [--limit N]        # Embeduj summarije u knowledge.db
    node session-archiver.js cleanup [--dry-run]      # Archive + index (cron)
    

    Stats (2026-02-23):

    • 761 session fajlova, 688 sa raw transkriptom
    • 21.5 MB total, 20 MB (93%) je raw transcript bulk
    • ~20 MB estimated savings od archivinga

    4. HiveMind (Shared Memory Bus)Bus + Semantic Search)

    File: ~/system/agents/hivemind/hivemind.js DB: ~/system/databases/agents/hivemind/hivemind.db (3.7 MB) Tip: SQLite - keyword search, NE vektor search

    Nije RAG. HiveMind je message bus + key-valuesemantic storevector zasearch inter-agentsku(od komunikaciju.2026-02-23)

    Live stats (2026-02-21)23):

    Metrika Vrijednost
    Total intel entries 11,79613,473
    EntriesWith danasembeddings 667~13,473 (backfill u toku)
    Aktivni agenti (1h)Memos 670+
    Alerts danasRetencija 2490 dana

    Upgrade (MC #1804): HiveMind je dobio vektor search:

    • embedding BLOB kolona dodana u intel tabelu
    • Svaki novi post automatski embeduje poruku (best-effort, skip ako Ollama down)
    • Tri nova search moda:
    KomandaTipOpis
    query "X"LIKEKeyword match (originalni, backward compat)
    semantic-query "X"CosineEmbedding similarity search (top 5000 recent)
    hybrid-query "X"LIKE + Cosine RRFReciprocal Rank Fusion merge
    backfill-embeddingsBatchEmbeduje entries bez vektora (32/batch)

    Schema:

    • intel - agent poruke (agent, type, message, data, priority)priority, embedding BLOB)
    • agents - registrovani agenti (name, role, status)
    • subscriptions - agent topic pretplate
    • memos - key-value memorija (key, value, access_count)

    Intel tipovi: discovery, alert, opportunity, update, request, response, learning, error

    Retencija: 90 dana za intel, 7 dana za event fajlove


    5. Claude API (Anthropic)

    Primarni AI: Claude Code (Opus za sesije, Sonnet/Haiku za agente)

    Direktna API integracija:

    • ~/system/tools/comms-agent/claude-handler.ts - Anthropic SDK wrapper za automatske odgovore
    • ~/system/tools/comms-responder.js - Komunikacijski agent

    Nema OpenAI API integracija u sistemu.


    6. MCP Serveri

    Server File Namjena
    rag ~/system/tools/rag-mcp.js RAG query/learn/stats
    email ~/system/tools/email-mcp-bridge.js Email operacije (2 accounta)
    youtube-transcript @fabriqa.ai/youtube-transcript-mcp YouTube transkripti
    playwright @playwright/mcp Browser automatizacija
    figma @anthropic-ai/figma-mcp Figma dizajn pristup

    7. Fine-tuned modeli (ALAI ML)

    Tri custom modela trenirani na internim podacima:

    Model Baza Namjena Velicina
    alaiml-task-v1 llama3.1:8b (Modelfile) MC task klasifikacija i handling 986 MB
    alaiml-tender-v1 llama3.1:8b (Modelfile) Tender analiza i filtriranje 986 MB
    alaiml-email-v1 llama3.1:8b (Modelfile) Email klasifikacija i triage 986 MB

    Retrain daemon: com.john.alaiml-retrain (LaunchAgent)


    8. AutoCoder (Python Agent Framework)

    Path: ~/system/services/autocoder/ Komponente:

    • agent.py - Glavni agent logic
    • agent_classifier.py - Task klasifikacija
    • parallel_orchestrator.py - Multi-agent orkestracija (53 KB)
    • mcp_server/ - MCP server

    UI: LaunchAgent com.john.autocoder-ui (port 8888) Status: Instaliran, koristi se opcionalno kroz build mode.


    9. Baze podataka (sve SQLite)

    +semantic
    Baza Velicina Namjena Ima vektore? Entries
    knowledge.db~50 MBDocument store (KB + memory-file + sessions)DA (BLOB 768-dim)24,636
    flywheel.db 9.1~10 MB RAG cache + interaction log + routing DA (BLOB)BLOB 768-dim)2,201 cache + 886 interactions
    hivemind.db 3.7~30 MB Agent memory bus + memos NE
    knowledge.db240 KBDocument store sa chunkovimasearch DA (BLOB)BLOB 768-dim)13,473
    mission-control.db 2.2~3 MB Task management NE 1,804+ tasks
    events.db 3.2~3 MB Event bus NE
    contacts.db 49~50 KB Kontakti NE
    invoices.db 37~40 KB Fakture NE

    Unified embedding model (od 2026-02-23): Sve 3 vektor-baze koriste ISTI model (nomic-embed-text 768-dim via Ollama). Nema mismatch-a.

    Nema eksternih vektor baza (ChromaDB, FAISS, Pinecone, Weaviate, Qdrant, pgvector).


    10. Sto POSTOJI vs Sto NE POSTOJI

    Postoji (verifikovano)verifikovano 2026-02-23)

    • 7 lokalnih Ollama modela (ukljucujuci 3 fine-tuned)
    • EmbeddingUnified embedding model (nomic-embed-text, 768-dim, lokalni) — ISTI za sve storee
    • Custom vektor DB (SQLite + BLOB, cosine similarity)
    • Retrieval Orchestrator — 4-store parallel search sa RRF merge (NOVO)
    • RAG 3-tier routing sa flywheel cache-om (54.7%61.1% hit rate)rate, 886 queries)
    • Knowledge basebase: sa24,636 documententries ingestion(documents i+ chunking-ommemory files + sessions)
    • HiveMind semantic search — cosine + hybrid + backfill (NOVO)
    • Session archiver — cleanup + embedding + daily cron (NOVO)
    • Tier router za task->model dispatch (6 tierova)
    • HiveMind shared memory (11,796+ entries)
    • 5 MCP servera (RAG, email, YouTube, Playwright, Figma)
    • 3 ALAI fine-tuned modela
    • Usage tracking za sve AI pozive
    • Claude API integracija (comms-agent)

    NE postoji

    • Nema cloud vektor baza (ChromaDB, Pinecone, Weaviate...)
    • Nema OpenAI API
    • Nema LangChain / LlamaIndex / LanceDB (custom implementacija)implementacija, zero external deps)
    • Nema cloud embeddings (sve lokalno)
    • Nema GraphRAG (prevelik effort za nas obim)
    • Nema cross-encoder reranking (Ollama default dovoljan)
    • llama3.1:70b i qwen2.5:72b konfigurirani ali NE instalirani
    • BookStack NIJE dio RAG pipeline-a (samo human-readable wiki)

    11. Arhitekturni princip

    Cost-optimized hybrid: Cache prvo -> Lokalni modeli drugo -> Cloud API zadnji.

    • Svi embeddings su lokalni (Ollama nomic-embed-text)text, 768-dim)
    • Sav vektor storage je u SQLite BLOB kolonama (Float32Array)
    • Jedan embedding model za cijeli sistem — nema mismatch-a
    • Nema cloud zavisnosti za RAG
    • Claude API se koristi samo za ono sto lokalni modeli ne mogu
    • Fine-tuned modeli pokrivaju repetitivne domenske taskove (email, tender, MC tasks)
    • Retrieval orchestrator objedinjuje sve storee u jedan poziv sa RRF merge

    Tool-First Protocol (retrieval redoslijed)

    BookStack (human wiki) -> RAG MCP (mcp__rag__rag_query) -> Manifest
    -> HiveMind (semantic-query) -> Internet -> Azuriraj bazu
    

    Za programski retrieval: node retrieval-orchestrator.js query "tema" — automatski paralelno pretrazuje sve.


    12. Changelog

    DatumPromjenaMC Task
    2026-02-23RAG System Upgrade: unified embedding, HiveMind vector search, retrieval orchestrator, session archiver#1804
    2026-02-21Initial document created — full system inventory