Retrieval Augmented Generation (RAG)
This content is for the 0.6.3 version. Switch to the latest version for up-to-date documentation.
Retrieval Augmented Generation (RAG) combines the generative capabilities of large language models with precision information retrieval systems. This pattern provides LLMs with access to current, domain-specific, or private information through efficient document indexing and semantic search.
Use this pattern when building:
- Knowledge bases and document search systems
- Customer support chatbots with company-specific information
- Research assistants requiring factual accuracy
- Question-answering systems over private datasets
- Content recommendation systems with contextual relevance
- Educational systems with curriculum-specific content
Architecture Diagram
flowchart TB User[User Query] ServiceEntry[Service Entry Point] SmartBucket[SmartBucket]
User --> ServiceEntry ServiceEntry --> SmartBucket
subgraph Ingestion ["Data Ingestion"] Documents[Documents/Data] APIs[External APIs] Files[Files/PDFs/Text]
Documents --> SmartBucket APIs --> SmartBucket Files --> SmartBucket end
subgraph Processing ["SmartBucket Processing"] Indexing[Vector Indexing] TextSearch[Text Search] Metadata[Metadata Filtering]
SmartBucket --> Indexing SmartBucket --> TextSearch SmartBucket --> Metadata end
subgraph Generation ["Response Generation"] Context[Retrieved Context] AI[AI Model] GeneratedResponse[Generated Response]
SmartBucket --> Context Context --> AI AI --> GeneratedResponse end
GeneratedResponse --> ServiceEntry ServiceEntry --> User
Components
- SmartBucket - Core component handling data storage, indexing, and retrieval with automatic vector embeddings
- Service (Optional) - Entry point for user interactions, handling routing, authentication, and response formatting
- AI (Optional) - Language model for response generation using retrieved context
Logical Flow
-
Data Ingestion - Documents and structured data uploaded to SmartBucket through various input methods
-
Automatic Processing - SmartBucket extracts text, generates embeddings, creates search indices, and stores metadata
-
Query Processing - SmartBucket analyzes queries to determine optimal search strategy (semantic, keyword, or hybrid)
-
Context Retrieval - SmartBucket searches indexed content and returns most relevant chunks ranked by relevance scores
-
Response Generation - Retrieved context optionally combined with query and sent to AI model for generation
-
Result Delivery - Final response enriched with retrieved information returned with optional citations and references
Implementation
-
Create SmartBucket - Deploy SmartBucket component configured for your data types and search requirements
-
Configure Indexing - Set up chunking strategies and embedding models based on content type and use case
-
Load Initial Data - Populate SmartBucket with initial dataset through batch upload or API ingestion
-
Implement Query Interface - Create query endpoints accepting user questions and returning contextual responses
-
Production Setup - Add authentication, rate limiting, monitoring, and external data source integration for updates
raindrop.manifest
application "rag_system" {
smartBucket "knowledge_base" { }
}
application "rag_application" {
service "api" { }
smartBucket "knowledge_base" { }
ai "generator" { }
}
Best Practices
- Structure your content - Organize documents with clear metadata for better filtering and categorization
- Optimize chunk size - Balance context preservation and search precision (typically 500-1500 tokens)
- Clean your data - Remove noise, ensure consistent formatting, and validate content quality before indexing
- Use hybrid search - Combine semantic and keyword search for best results across different query types
- Implement reranking - Use confidence thresholds to filter low-quality matches from search results
- Test with real queries - Validate search quality with actual user questions from your domain
- Monitor indexing speed - Large datasets may require batch processing strategies for optimal performance
- Cache frequent queries - Implement result caching for commonly asked questions to improve response times
- Scale incrementally - Start with smaller datasets and scale based on usage patterns and performance metrics