Skip to main content
Projects

Argo Knowledge RAG

Four-tier local-first RAG system with hybrid BM25 + vector search across 166K+ chunks

January 15, 2026 Updated March 19, 2026

Argo Knowledge RAG

A four-tier local-first RAG (Retrieval-Augmented Generation) system that powers ArgoBox's AI assistant. Hybrid BM25 + vector search across 166K+ chunks with zero per-query API cost for local tiers.

Repositories

Location Type URL
GitHub Public lazarusoftheshadows/argo-knowledge-rag
Gitea Primary git.argobox.com/InovinLabs/argo-knowledge-rag

Architecture

Four-Tier System

Tier Chunks Embedding Model Access Purpose
Public 874 OpenRouter text-embedding-3-small (1536d) CDN Public AI assistant on every page
Knowledge 33K qwen3-embedding:0.6b (1024d) Sanitized External AI with PII stripped
Vaults 132K nomic-embed-text (768d) Local only Full Obsidian vault content
Private 166K nomic-embed-text (768d) Local only Everything including legal docs

Data flows down (public content is a subset of knowledge, which is a subset of vaults). Each tier adds scope and security. The Galactic Identity System (148 regex patterns) strips PII between tiers.

Pipeline

  1. Ingest — Reads MD, PDF, DOCX, RTF, EML, XLSX. Extracts text and frontmatter. SHA-256 content hashing prevents duplicate ingestion.
  2. Chunk — Paragraph-aware splitting (400 words, 80 word overlap) preserves semantic units.
  3. Embed — GPU-accelerated via Ollama. Multiple models per tier for quality/speed tradeoffs.
  4. Search — Hybrid FTS5 BM25 (keyword precision) + cosine similarity (semantic understanding). Default weighting: 0.3 BM25 / 0.7 vector.

Tech Stack

  • Runtime: TypeScript / Node.js
  • Storage: SQLite + FTS5 (via better-sqlite3)
  • Embeddings: Ollama (local GPU) — qwen3-embedding, nomic-embed-text
  • Public tier: OpenRouter text-embedding-3-small (API)
  • Search: Hybrid BM25 + brute-force cosine similarity (no ANN index)

Performance

Operation Time Notes
Public search (874 chunks) <100ms CDN-served embeddings
Knowledge search (33K) ~3 sec Hybrid BM25 + vector
Vaults search (132K) ~23 sec qwen3-embedding:8b (4096d)
Private search (166K) ~48 sec Full brute-force scan
Knowledge build ~70 min qwen3-embedding:0.6b, 8 chunks/sec
Private build ~110 min nomic-embed-text, 25 chunks/sec

What It Powers

  • Public AI Assistant (/ask) — RAG-augmented chat using public tier
  • Argonaut Admin Chat (/admin/argonaut/chat) — Multi-scope search with tier selector
  • Claude Code AI Context — Semantic search across project history and documentation
  • Knowledge RAG Monitor (/admin/mm-devforge/knowledge-rag-monitor) — Training status, search testing, content source tracking

Content Sources (Public Tier)

Source Count Content
Blog posts 74 Homelab, Docker, Linux, networking
Technical docs 189 Architecture, modules, infrastructure
Public docs 68 AI systems, playground guides
Journal entries 87 Engineering diary, debugging sessions
Project descriptions 10 TerraTracer, Tendril, Build Swarm, etc.
Website pages 90 Tour, about, resources, playground

Quick Start

git clone https://github.com/lazarusoftheshadows/argo-knowledge-rag.git
cd argo-knowledge-rag && npm install
npm run build

# Ingest markdown files
node dist/cli/index.js ingest ~/my-vault/ --db rag.db --collection my-notes

# Generate embeddings (requires Ollama running)
node dist/cli/index.js embed --db rag.db --model qwen3-embedding:0.6b

# Search
node dist/cli/index.js search "Docker bridge networking" --db rag.db

Key Features

  • 100% Local — All embedding on local GPU. Zero per-query cost.
  • Hybrid Search — BM25 + vector with configurable weighting
  • Multi-Format — MD, PDF, DOCX, RTF, EML, XLSX parsing
  • Content-Hash Dedup — SHA-256 prevents duplicate ingestion
  • Identity Sanitization — 148-pattern regex strips PII for safe sharing
  • Streaming Scan — Constant memory at any scale

Interactive Demo

The project showcase page includes an interactive search demo with pre-loaded results for 5 curated queries (Docker networking, Traefik, Build Swarm, MikroTik VLAN, Cloudflare tunnels). Admin users get live search results.

Related

ragaisearchembeddingssqliteollamatypescript