Skip to main content
General

RAG System Architecture

Four-tier Retrieval-Augmented Generation system powering ArgoBox's AI search across all knowledge stores

February 27, 2026

RAG System Architecture

ArgoBox uses a four-tier RAG system that provides AI-powered document search across all knowledge stores. Each tier serves a different audience and security level.

Tiers at a Glance

Tier Database Docs Chunks Size Embedding Model Dims Audience
Public embeddings-index.json 775 16.1 MB OpenRouter text-embedding-3-small 1536 Anyone (CF Pages CDN)
Knowledge rag-store-blog.db 3,778 33,101 290 MB Ollama qwen3-embedding:0.6b (local GPU) 1024 Safe for external AI
Vaults rag-store-vaults.db 8,297 132,151 ~1.1 GB Ollama nomic-embed-text (local GPU) 768 Local only
Private rag-store.db 10,440 166,183 1.5 GB Ollama nomic-embed-text (local GPU) 768 Local only

Tier Details

Public Tier

Deployed as a static JSON file to Cloudflare Pages CDN. Contains sanitized blog posts, journal entries, documentation, project pages, and learning content. All content passes through the Galactic Identity System before embedding.

Uses OpenRouter's text-embedding-3-small (1536 dimensions). Rebuilt automatically during npm run build only when content hash changes — costs ~$0.01 per rebuild.

Collections: posts (263 chunks), docs (347), journal (136), projects (11), learn (5)

Knowledge / Safe Tier

SQLite database with all vault knowledge, sanitized via identity_map.json. Safe to use with external AI providers — no real hostnames, IPs, or personal info exposed.

Uses Ollama qwen3-embedding:0.6b on the local RTX 4070 Ti (1024 dimensions, free). Upgraded from nomic-embed-text (768-dim) for better retrieval quality — MTEB retrieval score 61.82 vs 49.01.

14 collections covering: technical vault, sanitized knowledge base, Argo OS docs, dev vault, blog content, AI context, build swarm docs, career docs, and more.

Vaults Tier

Everything in the Knowledge tier plus unsanitized personal vaults and old knowledge base. Excludes legal paperwork. Uses the improved paragraph-aware chunker and text normalization.

19 collections — adds: personal vault (2,243 docs), old knowledge base (2,253 docs), test conversations, ArgoBox configs, and learning content.

Private / Full Tier

Everything. All vaults plus all legal paperwork (PDF, DOCX, RTF, EML, MSG, XLSX). No sanitization. Only accessible locally via Ollama.

19 collections — adds legal-paperwork (2,144 docs, 61,585 chunks) from /mnt/workspace/Documents/Important Documents/Legal/Legal Paperwork/, excluding Duplicate/ and Criminal Case/ subdirectories.

Ingestion Pipeline

Source Files
    ↓
parseFile()          — Multi-format parser (MD, PDF, DOCX, RTF, EML, MSG, XLSX)
    ↓
normalizeForRAG()    — Fix PDF artifacts, hyphenation, page numbers, quotes
    ↓
cleanText()          — Strip HTML, markdown images, code blocks
    ↓
chunkText()          — Paragraph-aware splitting (400 words, 3000 char cap)
    ↓
SHA-256 dedup        — Skip documents with unchanged content hash
    ↓
SQLite storage       — Documents + chunks stored
    ↓
Context prefix       — "[Title | collection]" prepended for embedding only
    ↓
Ollama embedding     — qwen3-embedding:0.6b on RTX 4070 Ti GPU
    ↓
Vector storage       — 1024-dim embeddings saved to SQLite

Multi-Format Parser

Format Parser Method
.md, .txt, .csv Direct file read
.pdf pdftotext -layout (Poppler CLI)
.docx python-docx library
.rtf striprtf library
.eml Python email stdlib
.msg extract-msg library
.xlsx openpyxl library

Text Normalization

Applied to all parsed content before ingestion:

  • Fixes PDF line-break hyphenation (disagree-\nmentdisagreement)
  • Removes standalone page numbers and Page X of Y patterns
  • Removes form separator lines
  • Normalizes smart quotes and em-dashes
  • Strips control characters
  • Collapses excessive whitespace

Paragraph-Aware Chunking

The chunker groups whole paragraphs together up to the 400-word limit rather than splitting at arbitrary word boundaries. This keeps semantic units intact for better search relevance.

  • Word limit: 400 words per chunk
  • Overlap: 80 words between consecutive chunks
  • Character cap: 3,000 chars max (prevents monster chunks from CSV/URL data)
  • Minimum: 100 chars (skips tiny useless chunks)

Context-Prefixed Embeddings

Each chunk's embedding is computed with a context prefix:

[Motion to Dismiss | legal-paperwork] The court hereby orders that...

The prefix is only used for the embedding computation — stored text stays clean for FTS5 full-text search. This gives the embedding model context about what document each chunk belongs to.

Search

Three search methods available:

  1. Vector similarity — cosine similarity against stored embeddings (1024-dim for qwen3, 768-dim for nomic)
  2. FTS5 full-text search — SQLite FTS5 index on raw chunk text
  3. Hybrid — combines both scores with configurable weights

Build Commands

cd ~/Development/argobox

# Build specific tier
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier knowledge
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier vaults
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private

# Scan only (no ingestion)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --dry-run

# Resume embedding (skip file scanning)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --embed-only

# Public tier (runs automatically during npm run build)
node scripts/build-embeddings.js

Storage & Deployment

Local only (gitignored): All SQLite databases (packages/argonaut/data/*.db)

Deployed to Cloudflare Pages:

  • public/embeddings-index.json (15.8 MB) — public tier embeddings
  • public/embeddings-meta.json (367 bytes) — metadata
  • public/rag-stats.json (7 KB) — admin panel stats snapshot

The auto-embeddings.js prebuild script uses content-hash caching to skip rebuilds when content hasn't changed, keeping Cloudflare build times fast.

Vault Sources

Configured in packages/argonaut/src/rag/vault-config.ts. Supports custom additions via data/vault-config.json.

Knowledge tier (16 sources): All ~/Vaults/* directories plus ArgoBox content directories (src/content/docs, posts, journal, projects, configurations, learn).

Private extras (6 sources): ~/Vaults/main (personal), ~/Vaults/conversation-archive, ~/Vaults/test (conversations), ~/Vaults/RAG (RAG meta-docs), ~/Vaults/Instructions, and Legal Paperwork.

Vaults tier = Knowledge + Private extras minus legal-paperwork.

Key Files

File Purpose
packages/argonaut/src/rag/parsers.ts Multi-format file parser + normalizeForRAG()
packages/argonaut/src/rag/chunker.ts Paragraph-aware text chunker
packages/argonaut/src/rag/embedder.ts Ollama/OpenRouter embedding client
packages/argonaut/src/rag/sqlite-store.ts DurableRAGStore — append-only SQLite backend
packages/argonaut/src/rag/vault-config.ts Vault source configuration
packages/argonaut/src/rag/sanitizer.ts Identity map sanitization
packages/argonaut/src/rag/hybrid.ts Hybrid search engine
packages/argonaut/scripts/build-blog-rag.ts CLI build script
packages/argonaut/scripts/re-embed-db.ts Re-embed DB with different model
packages/argonaut/scripts/test-rag-search.ts Benchmark and comparison tool
scripts/auto-embeddings.js Smart public embeddings rebuild
src/pages/api/admin/rag.ts Admin API for vault management

Embedding Models

Active Models

Model Dimensions Size Speed MTEB Retrieval Used By
qwen3-embedding:0.6b 1024 639 MB ~8 chunks/s 61.82 Knowledge tier
nomic-embed-text 768 274 MB ~25 chunks/s 49.01 Vaults, Private tiers
text-embedding-3-small 1536 Cloud Instant Public tier (OpenRouter)

Nomic vs Qwen3 Comparison

Benchmark results from 10-query test across infrastructure, networking, AI, and deployment topics:

Store Model Avg Top-1 Score Avg Search Time
Knowledge qwen3-0.6b 0.809 3,031 ms
Private nomic 0.814 48,142 ms
Vaults nomic 0.770 27,177 ms

qwen3 achieves comparable relevance scores at ~15x faster search time due to smaller dimension count and DB size.

Re-Embedding

Use the re-embed-db.ts script to create model-variant copies of any tier:

# Create a nomic copy of the Knowledge tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
  --source rag-store-blog.db \
  --output rag-store-blog-nomic.db \
  --model nomic-embed-text

# Create a qwen3 copy of any tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
  --source rag-store.db \
  --output rag-store-qwen3.db \
  --model qwen3-embedding:0.6b

GPU & Cost

  • Hardware: NVIDIA RTX 4070 Ti
  • Primary model: qwen3-embedding:0.6b (1024 dimensions)
  • Speed: ~8 chunks/second (qwen3), ~25 chunks/second (nomic)
  • Knowledge tier rebuild: ~70 minutes for 33K chunks (qwen3)
  • Full private rebuild: ~110 minutes for 166K chunks (nomic)
  • Cost: $0 for local tiers, ~$0.01 for public tier rebuilds

Backups

All RAG databases are archived on /mnt/AllShare/rag/databases/ with model-tagged filenames. Nomic-era backups are preserved for A/B comparison testing. Never delete backups unless explicitly instructed.

ragaiembeddingsollamasearch