Argo Knowledge RAG
Four-tier local-first RAG system with hybrid BM25 + vector search across 166K+ chunks
Argo Knowledge RAG
A four-tier local-first RAG (Retrieval-Augmented Generation) system that powers ArgoBox's AI assistant. Hybrid BM25 + vector search across 166K+ chunks with zero per-query API cost for local tiers.
Repositories
| Location | Type | URL |
|---|---|---|
| GitHub | Public | lazarusoftheshadows/argo-knowledge-rag |
| Gitea | Primary | git.argobox.com/InovinLabs/argo-knowledge-rag |
Architecture
Four-Tier System
| Tier | Chunks | Embedding Model | Access | Purpose |
|---|---|---|---|---|
| Public | 874 | OpenRouter text-embedding-3-small (1536d) | CDN | Public AI assistant on every page |
| Knowledge | 33K | qwen3-embedding:0.6b (1024d) | Sanitized | External AI with PII stripped |
| Vaults | 132K | nomic-embed-text (768d) | Local only | Full Obsidian vault content |
| Private | 166K | nomic-embed-text (768d) | Local only | Everything including legal docs |
Data flows down (public content is a subset of knowledge, which is a subset of vaults). Each tier adds scope and security. The Galactic Identity System (148 regex patterns) strips PII between tiers.
Pipeline
- Ingest — Reads MD, PDF, DOCX, RTF, EML, XLSX. Extracts text and frontmatter. SHA-256 content hashing prevents duplicate ingestion.
- Chunk — Paragraph-aware splitting (400 words, 80 word overlap) preserves semantic units.
- Embed — GPU-accelerated via Ollama. Multiple models per tier for quality/speed tradeoffs.
- Search — Hybrid FTS5 BM25 (keyword precision) + cosine similarity (semantic understanding). Default weighting: 0.3 BM25 / 0.7 vector.
Tech Stack
- Runtime: TypeScript / Node.js
- Storage: SQLite + FTS5 (via better-sqlite3)
- Embeddings: Ollama (local GPU) — qwen3-embedding, nomic-embed-text
- Public tier: OpenRouter text-embedding-3-small (API)
- Search: Hybrid BM25 + brute-force cosine similarity (no ANN index)
Performance
| Operation | Time | Notes |
|---|---|---|
| Public search (874 chunks) | <100ms | CDN-served embeddings |
| Knowledge search (33K) | ~3 sec | Hybrid BM25 + vector |
| Vaults search (132K) | ~23 sec | qwen3-embedding:8b (4096d) |
| Private search (166K) | ~48 sec | Full brute-force scan |
| Knowledge build | ~70 min | qwen3-embedding:0.6b, 8 chunks/sec |
| Private build | ~110 min | nomic-embed-text, 25 chunks/sec |
What It Powers
- Public AI Assistant (
/ask) — RAG-augmented chat using public tier - Argonaut Admin Chat (
/admin/argonaut/chat) — Multi-scope search with tier selector - Claude Code AI Context — Semantic search across project history and documentation
- Knowledge RAG Monitor (
/admin/mm-devforge/knowledge-rag-monitor) — Training status, search testing, content source tracking
Content Sources (Public Tier)
| Source | Count | Content |
|---|---|---|
| Blog posts | 74 | Homelab, Docker, Linux, networking |
| Technical docs | 189 | Architecture, modules, infrastructure |
| Public docs | 68 | AI systems, playground guides |
| Journal entries | 87 | Engineering diary, debugging sessions |
| Project descriptions | 10 | TerraTracer, Tendril, Build Swarm, etc. |
| Website pages | 90 | Tour, about, resources, playground |
Quick Start
git clone https://github.com/lazarusoftheshadows/argo-knowledge-rag.git
cd argo-knowledge-rag && npm install
npm run build
# Ingest markdown files
node dist/cli/index.js ingest ~/my-vault/ --db rag.db --collection my-notes
# Generate embeddings (requires Ollama running)
node dist/cli/index.js embed --db rag.db --model qwen3-embedding:0.6b
# Search
node dist/cli/index.js search "Docker bridge networking" --db rag.db
Key Features
- 100% Local — All embedding on local GPU. Zero per-query cost.
- Hybrid Search — BM25 + vector with configurable weighting
- Multi-Format — MD, PDF, DOCX, RTF, EML, XLSX parsing
- Content-Hash Dedup — SHA-256 prevents duplicate ingestion
- Identity Sanitization — 148-pattern regex strips PII for safe sharing
- Streaming Scan — Constant memory at any scale
Interactive Demo
The project showcase page includes an interactive search demo with pre-loaded results for 5 curated queries (Docker networking, Traefik, Build Swarm, MikroTik VLAN, Cloudflare tunnels). Admin users get live search results.
Related
- TerraTracer — Geospatial tool
- Tendril — Knowledge graph visualization
- RAG Manager — Admin RAG operations