Skip to content

Retrieval Augmented Generation (RAG)

This content is for the 0.6.3 version. Switch to the latest version for up-to-date documentation.

Retrieval Augmented Generation (RAG) combines the generative capabilities of large language models with precision information retrieval systems. This pattern provides LLMs with access to current, domain-specific, or private information through efficient document indexing and semantic search.

Use this pattern when building:

  • Knowledge bases and document search systems
  • Customer support chatbots with company-specific information
  • Research assistants requiring factual accuracy
  • Question-answering systems over private datasets
  • Content recommendation systems with contextual relevance
  • Educational systems with curriculum-specific content

Architecture Diagram

flowchart TB
User[User Query]
ServiceEntry[Service Entry Point]
SmartBucket[SmartBucket]
User --> ServiceEntry
ServiceEntry --> SmartBucket
subgraph Ingestion ["Data Ingestion"]
Documents[Documents/Data]
APIs[External APIs]
Files[Files/PDFs/Text]
Documents --> SmartBucket
APIs --> SmartBucket
Files --> SmartBucket
end
subgraph Processing ["SmartBucket Processing"]
Indexing[Vector Indexing]
TextSearch[Text Search]
Metadata[Metadata Filtering]
SmartBucket --> Indexing
SmartBucket --> TextSearch
SmartBucket --> Metadata
end
subgraph Generation ["Response Generation"]
Context[Retrieved Context]
AI[AI Model]
GeneratedResponse[Generated Response]
SmartBucket --> Context
Context --> AI
AI --> GeneratedResponse
end
GeneratedResponse --> ServiceEntry
ServiceEntry --> User

Components

  • SmartBucket - Core component handling data storage, indexing, and retrieval with automatic vector embeddings
  • Service (Optional) - Entry point for user interactions, handling routing, authentication, and response formatting
  • AI (Optional) - Language model for response generation using retrieved context

Logical Flow

  1. Data Ingestion - Documents and structured data uploaded to SmartBucket through various input methods

  2. Automatic Processing - SmartBucket extracts text, generates embeddings, creates search indices, and stores metadata

  3. Query Processing - SmartBucket analyzes queries to determine optimal search strategy (semantic, keyword, or hybrid)

  4. Context Retrieval - SmartBucket searches indexed content and returns most relevant chunks ranked by relevance scores

  5. Response Generation - Retrieved context optionally combined with query and sent to AI model for generation

  6. Result Delivery - Final response enriched with retrieved information returned with optional citations and references

Implementation

  1. Create SmartBucket - Deploy SmartBucket component configured for your data types and search requirements

  2. Configure Indexing - Set up chunking strategies and embedding models based on content type and use case

  3. Load Initial Data - Populate SmartBucket with initial dataset through batch upload or API ingestion

  4. Implement Query Interface - Create query endpoints accepting user questions and returning contextual responses

  5. Production Setup - Add authentication, rate limiting, monitoring, and external data source integration for updates

raindrop.manifest

raindrop.manifest
application "rag_system" {
smartBucket "knowledge_base" {
}
}

Best Practices

  • Structure your content - Organize documents with clear metadata for better filtering and categorization
  • Optimize chunk size - Balance context preservation and search precision (typically 500-1500 tokens)
  • Clean your data - Remove noise, ensure consistent formatting, and validate content quality before indexing
  • Use hybrid search - Combine semantic and keyword search for best results across different query types
  • Implement reranking - Use confidence thresholds to filter low-quality matches from search results
  • Test with real queries - Validate search quality with actual user questions from your domain
  • Monitor indexing speed - Large datasets may require batch processing strategies for optimal performance
  • Cache frequent queries - Implement result caching for commonly asked questions to improve response times
  • Scale incrementally - Start with smaller datasets and scale based on usage patterns and performance metrics