Skip to content

Querying a SmartBucket

Overview

SmartBuckets provide powerful search capabilities that allow you to query your data using natural language, perform semantic chunk search for AI applications, and leverage advanced content analysis features. This guide covers the different query types and how to use them effectively.

The general search capability transforms how you interact with your stored data. Instead of being limited to simple keyword matches, you can use natural language queries that span multiple types of content. For example, imagine a law firm that needs to find specific case files. They could search:

“Find documents with photos of property damage and text about insurance fraud”

This query shows how powerful the search is. It can search both images and text at the same time, understand what you’re asking for in plain English, and find related content across your documents.

The system supports complex nested queries for example:

  • “Find documents containing financial reports and images of signatures”
  • “Find documents with employee photos where the person is wearing a blue uniform and the text mentions ‘safety violations’”
  • “Find contracts containing both company logos and signatures where the document text mentions ‘confidentiality’”
  • “Find documents about company policies that contain PII specifically emails and names”
  • “Find audio files in which people discuss data policies”

Natural language search is available through the API, CLI and SDK.

Chunk Search for AI Integration

Chunk search is specifically designed for AI applications and RAG pipelines. This specialized search function:

  • Analyzes input queries for semantic understanding
  • Returns the 20 most relevant text chunks from your data
  • Ranks results based on relevance to the query
  • Optimizes output format for LLM consumption

Chunk search is available through the API, CLI and SDK.

Advanced Search Capabilities

SmartBuckets provide several specialized search features enabled by expert AI models run on your content during upload:

Content Analysis:

  • Document content search with semantic understanding
  • Image content analysis and recognition
  • Audio transcription search
  • Cross-modal query support

Security Features:

  • PII (Personal Identifiable Information) detection
  • Access logging and tracking (coming soon)
  • Sensitive data identification (coming soon)

Personal Identifiable Information (PII)

SmartBuckets can detect the following types of personal identifiable information (PII):

  • Account numbers
  • Building numbers
  • City names
  • Credit card numbers
  • Dates of birth
  • Driver’s license numbers
  • Email addresses
  • Given names (first names)
  • ID card numbers
  • Passwords
  • Social security numbers
  • Street addresses
  • Surnames (last names)
  • Tax identification numbers
  • Telephone numbers
  • Usernames
  • ZIP codes

PII detection can be incorporated into your queries using the search endpoint.

Example search queries for PII:

  • “Find documents containing personal information”
  • “Find all documents that do not contain PII”
  • “Find PDF documents that contain emails and social security numbers”