Understanding SmartBuckets
What are SmartBuckets?
SmartBuckets are more than just storage - they’re S3-compatible buckets that automatically process your files with AI as you upload them. This means your data is ready for AI applications, agents, human consumption, and RAG pipelines without any extra work on your part.
Think of a SmartBucket as a librarian who not only stores your books but reads and understands every page, image, and table, making all of that information instantly searchable and usable. This automatic processing and enhancement enable both humans and AI agents to interact with your data in sophisticated ways through specialized AI endpoints.
Understanding AI Decomposition
When you upload a file to a SmartBucket, it triggers an intelligent process we call AI decomposition. This process is fundamental to understanding how SmartBuckets transform raw files into AI-enhanced resources. Let’s look at what happens when you upload a PDF:
The decomposition process works in several stages:
- First, the system identifies and extracts different types of content from your file - text, images, tables, metadata and more.
- Each component is then processed through specialized AI models designed for that specific type of content
- The enhanced data is stored in optimized datastores, maintaining relationships between different components
- All of this processed information becomes immediately available for AI queries
AI models and AI data stores
When you upload data to a SmartBucket, our AI pipeline analyzes it and stores the results in multiple specialized systems including vector stores, graph databases, and relationship stores.
The processing pipeline includes several analysis models that:
- Detect PII (Personal Identifiable Information)
- Screen for harmful content
- Extract relationships between data
- Identify topics and themes
- Generate metadata for improved searchability
- Transcribe audio
- Describe images
While the specific implementation details of our AI models are proprietary, these automated processes ensure your data is thoroughly analyzed and indexed for advanced querying capabilities.
Supported Data Types and Processing
While you can store any file type, SmartBuckets provide AI enhancement for specific formats:
Images
image/png
image/jpg
Audio
audio/webm
audio/mpeg
audio/wav
audio/mp4
Documents
text/plain
application/pdf
This focused support allows us to provide deep, meaningful analysis of your content. For regular file storage needs, we recommend using standard Raindrop buckets, as SmartBuckets’ pricing is optimized for AI-enhanced storage rather than basic file storage.
Understanding SmartBucket Search Capabilities
SmartBuckets offer two distinct search mechanisms, each designed for specific use cases and user needs. Let’s look at how each one works and when to use them.
Natural Language Search
Natural language search lets you search your data using simple English queries instead of complex search syntax. Just describe what you’re looking for, like you would to another person, and the system will understand and find relevant results.
For example, instead of constructing a complex boolean query, you might ask: “Find all documents about climate change that include graphs of temperature data.” The system understands the semantic meaning of your request and searches across both textual content and visual elements to find relevant matches.
You can learn more about natural language search in the search documentation.
Chunk Search for AI Integration
Chunk search is designed for AI applications that need specific text passages rather than entire documents. It finds and ranks the most relevant chunks of text that match your query, making it perfect for RAG pipelines. You can drop it right into your existing RAG setup.
Think of chunk search as a highly skilled research assistant who not only finds relevant books but marks the exact pages and paragraphs that answer your question. The system returns these text chunks ranked by relevance, making them immediately usable for AI applications.
You can learn more about chunk search in the search documentation.