Skip to main content
AI & Automation

Content Pipeline (Planned)

Automated conversation-to-content pipeline — scans AI conversations, auto-generates blog posts and journal entries with voice profile, queues for review

March 15, 2026

Content Pipeline

Status: Designed, not yet implemented. See handover: Vaults/handover/2026-03-15-content-studio-pipeline.md

The Content Pipeline automates the flow from AI conversations to published blog posts and journal entries. Daniel reviews and approves — he doesn't write from scratch.

Pipeline Flow

SCAN → DISCOVER → CONTEXT → GENERATE → QUEUE → REVIEW → PUBLISH

1. SCAN — Find blog-worthy conversations

Source: 2,992 markdown conversations in Vaults/conversation-archive/conversations/

Filters:

  • blog_candidate: true in frontmatter (pre-flagged during extraction)
  • Technical category with 5+ messages and high word count
  • Recent handovers and session notes

Categories most likely to produce content:

  • technical (889 conversations) — homelab, Linux, networking, containers
  • knowledge-work (155) — PKM, blogging, AI/LLM
  • creative (209) — some technical writing

Blog candidate index: Vaults/conversation-archive/00-INDEX/BLOG-CANDIDATES.md

2. DISCOVER — Avoid duplication

Compare candidate topics against existing 104 posts + 97 journal entries. Group related conversations into topic clusters. Skip topics that already have published content.

3. CONTEXT — Auto-search RAG

For each topic, automatically search:

  • Knowledge (33,101 chunks) — existing blog content, docs
  • Vaults (294K chunks) — conversation archive, personal knowledge
  • Private (166,183 chunks) — sensitive context, legal docs

Pull related existing posts to reference and link. Load verified facts from identity-ground-truth.json.

4. GENERATE — AI writes with voice

Use /api/admin/content-gen with:

  • Input: Conversation text + RAG context
  • InputMode: transcript for conversations, topic for clusters
  • Voice profile + facts + denied claims injected automatically
  • Stream output with proper frontmatter (title, description, tags, mood, pubDate)

5. QUEUE — Store drafts for review

Store generated drafts in a review queue:

  • Generated date, source conversations, voice score, status
  • Storage option: src/content/drafts/ collection (fits Astro pattern)

6. REVIEW — Daniel's only step

Browse queue in Content Studio's Pipeline tab:

  • One-click approve, edit, or reject
  • Voice score shown alongside each draft
  • Edit in-place before approving

7. PUBLISH — Write to disk

  • Blog posts → src/content/posts/
  • Journal entries → src/content/journal/
  • Generate proper slug, frontmatter, hero image reference
  • Mark source conversations as blog_status: published

API Endpoints Needed

Endpoint Method Purpose
/api/admin/pipeline/scan POST Scan conversation archive for candidates
/api/admin/pipeline/generate POST Generate draft from conversation(s)
/api/admin/pipeline/queue GET List pending drafts
/api/admin/pipeline/approve POST Approve draft → publish to disk

Conversation Archive Format

Each of the 2,992 conversations is a markdown file with YAML frontmatter:

title: "Debugging Tailscale DNS Resolution"
platform: chatgpt
date: 2024-10-07
category: technical
primary_topic: networking
tags: [tailscale, dns, networking]
blog_candidate: true
blog_status: idea
message_count: 18
word_count: 6833

Processing pipeline: Vaults/conversation-archive/_scripts/extract_all.py handles extraction, parsing, sanitization, enrichment, and markdown generation.

What Already Exists

Component Status
Content-gen API (streaming) Working
Voice profile builder Working
Facts loader + denied claims Working
RAG search + format Working
Voice scoring Working
Content Studio UI Working
Conversation extraction pipeline Working
RAG database (701K chunks) Populated
Pipeline scan endpoint NOT BUILT
Pipeline queue management NOT BUILT
Pipeline UI tab NOT BUILT
Auto-RAG search NOT BUILT
Batch generation NOT BUILT
Draft → publish flow NOT BUILT

Related

pipelineautomationcontent-generationvoiceragconversations