Neural Augmentation: The RAG Integration Protocol

Important

SYSTEM ALERT: Raw LLM output detected. High risk of hallucination and outdated information. Context retrieval protocols not initialized. Augment with verified data sources.

We have all been there. You ask an AI model a specific question about a recent software release. It responds confidently. The answer sounds reasonable.

But then, you check the documentation. The features it listed don’t exist. The version numbers are wrong. The model hallucinated.

Here is the hard truth: LLMs are not databases. Their knowledge is frozen at a training cutoff date. They cannot access real-time information. They make things up when uncertain.

At the Yellow Capsule, we treat AI responses like any other data source: validated against ground truth. Enter Retrieval Augmented Generation (RAG).


๐Ÿงฌ System Scan: The Problem with Raw Inference

10_chat_without_rag.yaml: The Unaugmented Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
tasks:
  - id: chat_without_rag
    type: io.kestra.plugin.ai.completion.ChatCompletion
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
    messages:
      - type: USER
        content: |
          Which features were released in Kestra 1.1? 
          Please list at least 5 major features with brief descriptions.

This task directly queries the Gemini LLM. No external context. No documentation. Just the model’s internal knowledge.

The Diagnosis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
- id: log_results
  type: io.kestra.plugin.core.log.Log
  message: |
    โŒ Response WITHOUT RAG (no retrieved context):
    {{ outputs.chat_without_rag.textOutput }}
    
    ๐Ÿค” Did you notice that this response seems to be:
    - Incorrect
    - Vague/generic
    - Listing features that haven't been added in exactly this version

Expected Symptoms:

  • Hallucinated feature names.
  • Vague, non-specific descriptions.
  • Information from the model’s training data, not the actual release notes.
  • Confident but wrong.

System Note: This is not a bug in the model. It’s a fundamental limitation. LLMs are probabilistic text generators, not knowledge retrieval systems.


๐Ÿ› ๏ธ The Cure: Retrieval Augmented Generation

The RAG Architecture

RAG solves the hallucination problem by:

  1. Ingesting relevant documents into a vector database (embeddings).
  2. Retrieving the most semantically similar chunks based on the user’s query.
  3. Augmenting the LLM prompt with this retrieved context.
  4. Generating a response grounded in the provided information.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ User Query  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Embedding    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Vector DB   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚ Model        โ”‚     โ”‚ (Search)    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                โ”‚
                                        Retrieved Context
                                                โ”‚
                                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Final Answerโ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ LLM          โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ Augmented   โ”‚
โ”‚ (Grounded)  โ”‚     โ”‚ (Gemini)     โ”‚     โ”‚ Prompt      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ฌ Procedure: 11_chat_with_rag.yaml

Step 1: Document Ingestion (The Knowledge Transplant)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
- id: ingest_release_notes
  type: io.kestra.plugin.ai.rag.IngestDocument
  description: Ingest Kestra 1.1 release notes to create embeddings
  provider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-embedding-001
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddings:
    type: io.kestra.plugin.ai.embeddings.KestraKVStore
  drop: true
  fromExternalURLs:
    - https://raw.githubusercontent.com/kestra-io/docs/.../release-1-1/index.md

Dissection:

  • IngestDocument: Downloads the release notes markdown file.
  • gemini-embedding-001: Converts text chunks into numerical vectors (embeddings).
  • KestraKVStore: Stores these embeddings in Kestra’s internal vector store.
  • drop: true: Clears existing embeddings before ingesting. Ensures a fresh knowledge base.

System Note: This step only needs to run once per document, or whenever the source document is updated.

Step 2: RAG-Powered Chat Completion

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
- id: chat_with_rag
  type: io.kestra.plugin.ai.rag.ChatCompletion
  chatProvider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-2.5-flash
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddingProvider:
    type: io.kestra.plugin.ai.provider.GoogleGemini
    modelName: gemini-embedding-001
    apiKey: "{{ kv('GEMINI_API_KEY') }}"
  embeddings:
    type: io.kestra.plugin.ai.embeddings.KestraKVStore
  systemMessage: |
    You are a helpful assistant that answers questions about Kestra.
    Use the provided documentation to give accurate, specific answers.
    If you don't find the information in the context, say so.
  prompt: |
    Which features were released in Kestra 1.1? 
    Please list at least 5 major features with brief descriptions.

Dissection:

  • io.kestra.plugin.ai.rag.ChatCompletion: The RAG-enabled chat task. Different from the basic ChatCompletion.
  • chatProvider: The LLM that generates the final answer.
  • embeddingProvider: The model that converts the user’s prompt into a vector for searching.
  • embeddings: Points to the KV Store where we stored our ingested documents.
  • systemMessage: Instructs the model on how to behave. Crucially, it tells the model to rely on the provided context.
  • prompt: The same question we asked in the non-RAG version.

The Result

1
2
3
4
5
6
7
- id: log_results
  type: io.kestra.plugin.core.log.Log
  message: |
    โœ… RAG Response (with retrieved context):
    {{ outputs.chat_with_rag.textOutput }}
    
    Note that this response is detailed, accurate, and grounded in the actual release documentation.

Expected Outcome:

  • Specific, accurate feature names from the Kestra 1.1 release notes.
  • Descriptions that match the official documentation.
  • No hallucinations. The model only reports what it found in the context.

๐Ÿงช Post-Op Analysis

Side-by-Side Comparison

Aspect Without RAG With RAG
Knowledge Source Model’s frozen training data Live, ingested documents
Accuracy Unreliable, prone to hallucination Grounded in provided context
Recency Outdated As current as the ingested docs
Auditability Black box Traceable to source documents

Key Architectural Insights

  1. Separation of Concerns: The embedding model (for search) and the chat model (for generation) can be different. Use specialized tools for each job.
  2. Kestra as the Orchestrator: Kestra manages the flow: ingest first, then query. This decoupling is critical for production systems where ingestion might be scheduled separately from querying.
  3. The systemMessage is Crucial: Prompt engineering matters. Tell the model to use the context and admit when it doesn’t know.
  4. Idempotent Ingestion: The drop: true flag ensures repeatability. Re-running the workflow refreshes the knowledge base.

When to Use RAG

  • Answering questions about your internal documentation.
  • Querying recent information the LLM couldn’t have seen during training.
  • Building chatbots that need to be factually grounded.
  • Any scenario where “I don’t know” is better than a confident wrong answer.

๐Ÿš€ The Full Learning Arc

Across these four modules, we’ve traced a complete journey:

Part Focus Core Skill
Part 1: Fundamentals YAML syntax, tasks, inputs, outputs Understanding the Kestra data model.
Part 2: Database Orchestration Postgres ETL, staging, merge, scheduling Building production-grade data pipelines.
Part 3: Cloud Integration GCS, BigQuery, KV Store, plugin defaults Migrating workflows to scalable infrastructure.
Part 4: GenAI & RAG LLM orchestration, embeddings, vector search Augmenting AI with grounded knowledge.

You’ve gone from “Hello World” to “Retrieval Augmented Generation.”

This is not just learning a tool. It’s building a mental model for modern data and AI engineering.

Close the tutorial. Open a blank YAML file. Build something real.

Happy building, initiate.