Neural Augmentation: The RAG Integration Protocol

Anonymous included in Clinic Records

2026-01-31 About 1100 words 5 minutes

Contents

Important

SYSTEM ALERT: Raw LLM output detected. High risk of hallucination and outdated information. Context retrieval protocols not initialized. Augment with verified data sources.

We have all been there. You ask an AI model a specific question about a recent software release. It responds confidently. The answer sounds reasonable.

But then, you check the documentation. The features it listed don’t exist. The version numbers are wrong. The model hallucinated.

Here is the hard truth: LLMs are not databases. Their knowledge is frozen at a training cutoff date. They cannot access real-time information. They make things up when uncertain.

At the Yellow Capsule, we treat AI responses like any other data source: validated against ground truth. Enter Retrieval Augmented Generation (RAG).

🧬 System Scan: The Problem with Raw Inference

`10_chat_without_rag.yaml`: The Unaugmented Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


tasks:
  - id: chat_without_rag
    type: io.kestra.plugin.ai.completion.ChatCompletion
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
    messages:
      - type: USER
        content: |
          Which features were released in Kestra 1.1? 
          Please list at least 5 major features with brief descriptions.

This task directly queries the Gemini LLM. No external context. No documentation. Just the model’s internal knowledge.

The Diagnosis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


- id: log_results
  type: io.kestra.plugin.core.log.Log
  message: |
    ❌ Response WITHOUT RAG (no retrieved context):
    {{ outputs.chat_without_rag.textOutput }}
    
    🤔 Did you notice that this response seems to be:
    - Incorrect
    - Vague/generic
    - Listing features that haven't been added in exactly this version

Expected Symptoms:

Hallucinated feature names.
Vague, non-specific descriptions.
Information from the model’s training data, not the actual release notes.
Confident but wrong.

System Note: This is not a bug in the model. It’s a fundamental limitation. LLMs are probabilistic text generators, not knowledge retrieval systems.

🛠️ The Cure: Retrieval Augmented Generation

The RAG Architecture

RAG solves the hallucination problem by:

Ingesting relevant documents into a vector database (embeddings).
Retrieving the most semantically similar chunks based on the user’s query.
Augmenting the LLM prompt with this retrieved context.
Generating a response grounded in the provided information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ User Query  │────▶│ Embedding    │────▶│ Vector DB   │
└─────────────┘     │ Model        │     │ (Search)    │
                    └──────────────┘     └──────┬──────┘
                                                │
                                        Retrieved Context
                                                │
                                                ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ Final Answer│◀────│ LLM          │◀────│ Augmented   │
│ (Grounded)  │     │ (Gemini)     │     │ Prompt      │
└─────────────┘     └──────────────┘     └─────────────┘

🔬 Procedure: `11_chat_with_rag.yaml`

Step 1: Document Ingestion (The Knowledge Transplant)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


- id: ingest_release_notes
  type: io.kestra.plugin.ai.rag.IngestDocument
  description: Ingest Kestra 1.1 release notes to create embeddings
  provider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-embedding-001
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddings:
    type: io.kestra.plugin.ai.embeddings.KestraKVStore
  drop: true
  fromExternalURLs:
    - https://raw.githubusercontent.com/kestra-io/docs/.../release-1-1/index.md

Dissection:

IngestDocument: Downloads the release notes markdown file.
gemini-embedding-001: Converts text chunks into numerical vectors (embeddings).
KestraKVStore: Stores these embeddings in Kestra’s internal vector store.
drop: true: Clears existing embeddings before ingesting. Ensures a fresh knowledge base.

System Note: This step only needs to run once per document, or whenever the source document is updated.

Step 2: RAG-Powered Chat Completion

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


- id: chat_with_rag
  type: io.kestra.plugin.ai.rag.ChatCompletion
  chatProvider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-2.5-flash
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddingProvider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-embedding-001
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddings:
    type: io.kestra.plugin.ai.embeddings.KestraKVStore
  systemMessage: |
    You are a helpful assistant that answers questions about Kestra.
    Use the provided documentation to give accurate, specific answers.
    If you don't find the information in the context, say so.
  prompt: |
    Which features were released in Kestra 1.1? 
    Please list at least 5 major features with brief descriptions.

Dissection:

io.kestra.plugin.ai.rag.ChatCompletion: The RAG-enabled chat task. Different from the basic ChatCompletion.
chatProvider: The LLM that generates the final answer.
embeddingProvider: The model that converts the user’s prompt into a vector for searching.
embeddings: Points to the KV Store where we stored our ingested documents.
systemMessage: Instructs the model on how to behave. Crucially, it tells the model to rely on the provided context.
prompt: The same question we asked in the non-RAG version.

The Result

1
2
3
4
5
6
7


- id: log_results
  type: io.kestra.plugin.core.log.Log
  message: |
    ✅ RAG Response (with retrieved context):
    {{ outputs.chat_with_rag.textOutput }}
    
    Note that this response is detailed, accurate, and grounded in the actual release documentation.

Expected Outcome:

Specific, accurate feature names from the Kestra 1.1 release notes.
Descriptions that match the official documentation.
No hallucinations. The model only reports what it found in the context.

🧪 Post-Op Analysis

Side-by-Side Comparison

Aspect	Without RAG	With RAG
Knowledge Source	Model’s frozen training data	Live, ingested documents
Accuracy	Unreliable, prone to hallucination	Grounded in provided context
Recency	Outdated	As current as the ingested docs
Auditability	Black box	Traceable to source documents

Key Architectural Insights

Separation of Concerns: The embedding model (for search) and the chat model (for generation) can be different. Use specialized tools for each job.
Kestra as the Orchestrator: Kestra manages the flow: ingest first, then query. This decoupling is critical for production systems where ingestion might be scheduled separately from querying.
The systemMessage is Crucial: Prompt engineering matters. Tell the model to use the context and admit when it doesn’t know.
Idempotent Ingestion: The drop: true flag ensures repeatability. Re-running the workflow refreshes the knowledge base.

When to Use RAG

Answering questions about your internal documentation.
Querying recent information the LLM couldn’t have seen during training.
Building chatbots that need to be factually grounded.
Any scenario where “I don’t know” is better than a confident wrong answer.

🚀 The Full Learning Arc

Across these four modules, we’ve traced a complete journey:

Part	Focus	Core Skill
Part 1: Fundamentals	YAML syntax, tasks, inputs, outputs	Understanding the Kestra data model.
Part 2: Database Orchestration	Postgres ETL, staging, merge, scheduling	Building production-grade data pipelines.
Part 3: Cloud Integration	GCS, BigQuery, KV Store, plugin defaults	Migrating workflows to scalable infrastructure.
Part 4: GenAI & RAG	LLM orchestration, embeddings, vector search	Augmenting AI with grounded knowledge.

You’ve gone from “Hello World” to “Retrieval Augmented Generation.”

This is not just learning a tool. It’s building a mental model for modern data and AI engineering.

Close the tutorial. Open a blank YAML file. Build something real.

Happy building, initiate.