Content Pipeline

Status: Designed, not yet implemented. See handover: Vaults/handover/2026-03-15-content-studio-pipeline.md

The Content Pipeline automates the flow from AI conversations to published blog posts and journal entries. Daniel reviews and approves — he doesn't write from scratch.

Pipeline Flow

SCAN → DISCOVER → CONTEXT → GENERATE → QUEUE → REVIEW → PUBLISH

1. SCAN — Find blog-worthy conversations

Source: 2,992 markdown conversations in Vaults/conversation-archive/conversations/

Filters:

blog_candidate: true in frontmatter (pre-flagged during extraction)
Technical category with 5+ messages and high word count
Recent handovers and session notes

Categories most likely to produce content:

technical (889 conversations) — homelab, Linux, networking, containers
knowledge-work (155) — PKM, blogging, AI/LLM
creative (209) — some technical writing

Blog candidate index: Vaults/conversation-archive/00-INDEX/BLOG-CANDIDATES.md

2. DISCOVER — Avoid duplication

Compare candidate topics against existing 104 posts + 97 journal entries. Group related conversations into topic clusters. Skip topics that already have published content.

3. CONTEXT — Auto-search RAG

For each topic, automatically search:

Knowledge (33,101 chunks) — existing blog content, docs
Vaults (294K chunks) — conversation archive, personal knowledge
Private (166,183 chunks) — sensitive context, legal docs

Pull related existing posts to reference and link. Load verified facts from identity-ground-truth.json.

4. GENERATE — AI writes with voice

Use /api/admin/content-gen with:

Input: Conversation text + RAG context
InputMode: transcript for conversations, topic for clusters
Voice profile + facts + denied claims injected automatically
Stream output with proper frontmatter (title, description, tags, mood, pubDate)

5. QUEUE — Store drafts for review

Store generated drafts in a review queue:

Generated date, source conversations, voice score, status
Storage option: src/content/drafts/ collection (fits Astro pattern)

6. REVIEW — Daniel's only step

Browse queue in Content Studio's Pipeline tab:

One-click approve, edit, or reject
Voice score shown alongside each draft
Edit in-place before approving

7. PUBLISH — Write to disk

Blog posts → src/content/posts/
Journal entries → src/content/journal/
Generate proper slug, frontmatter, hero image reference
Mark source conversations as blog_status: published

API Endpoints Needed

Endpoint	Method	Purpose
`/api/admin/pipeline/scan`	POST	Scan conversation archive for candidates
`/api/admin/pipeline/generate`	POST	Generate draft from conversation(s)
`/api/admin/pipeline/queue`	GET	List pending drafts
`/api/admin/pipeline/approve`	POST	Approve draft → publish to disk

Conversation Archive Format

Each of the 2,992 conversations is a markdown file with YAML frontmatter:

title: "Debugging Tailscale DNS Resolution"
platform: chatgpt
date: 2024-10-07
category: technical
primary_topic: networking
tags: [tailscale, dns, networking]
blog_candidate: true
blog_status: idea
message_count: 18
word_count: 6833

Processing pipeline: Vaults/conversation-archive/_scripts/extract_all.py handles extraction, parsing, sanitization, enrichment, and markdown generation.

What Already Exists

Component	Status
Content-gen API (streaming)	Working
Voice profile builder	Working
Facts loader + denied claims	Working
RAG search + format	Working
Voice scoring	Working
Content Studio UI	Working
Conversation extraction pipeline	Working
RAG database (701K chunks)	Populated
Pipeline scan endpoint	NOT BUILT
Pipeline queue management	NOT BUILT
Pipeline UI tab	NOT BUILT
Auto-RAG search	NOT BUILT
Batch generation	NOT BUILT
Draft → publish flow	NOT BUILT

Content Studio — The editor where pipeline output is reviewed
MyVoice Studio — Voice profiles used during generation
RAG System — Knowledge base powering context search

Content Pipeline (Planned)