Vector

Overview

Vector databases enable you to store, index, and search high-dimensional numerical vectors that represent complex data like text embeddings, images, or any feature vectors.

Setup Flow

1. Configure in Manifest

Define a vector index in your Raindrop application manifest with the required dimensions and distance metric:

application "my-app" {
  vector_index "embeddings" {
    dimensions = 1024
    metric = "cosine"  // or "euclidean", "dot-product"
  }
}

Distance Metrics:

cosine - Measures angle between vectors (ignores magnitude). Best for text embeddings and semantic similarity.
euclidean - Straight-line distance in vector space. Good when magnitude matters (coordinates, measurements).
dot-product - Combines angle and magnitude. Useful when vector length represents confidence or importance.

2. Access in Code

The vector index becomes available through your environment bindings, allowing you to perform all vector operations:

export default class extends Service<Env> {
  async fetch(request: Request): Promise<Response> {
    // Get index information
    const indexInfo = await this.env.EMBEDDINGS.describe();

    // Store vectors (minimal example)
    await this.env.EMBEDDINGS.insert([{
      id: "doc-1",
      values: new Float32Array(1024).fill(0.1) // Must match dimensions
    }]);

    // Store vectors with optional metadata and namespace
    await this.env.EMBEDDINGS.insert([{
      id: "doc-2",
      values: new Float32Array(1024).fill(0.2),
      namespace: "documents", // Optional: organize vectors
      metadata: {             // Optional: filterable data
        title: "Getting Started Guide",
        category: "documentation"
      }
    }]);

    // Search for similar content
    const results = await this.env.EMBEDDINGS.query(
      new Float32Array(1024).fill(0.15),
      { topK: 5 }
    );

    return new Response(JSON.stringify(results));
  }
}

Vector Operations

Index Information

Get details about your vector index configuration and current state.

describe()

Returns information about the vector index, including its configuration and current statistics.

const indexInfo = await this.env.EMBEDDINGS.describe();

console.log(`Dimensions: ${indexInfo.dimensions}`);
console.log(`Vector count: ${indexInfo.vectorCount}`);
console.log(`Last processed: ${indexInfo.processedUpToDatetime}`);
console.log(`Mutation ID: ${indexInfo.processedUpToMutation}`);

Returns: VectorIndexIndexInfo containing:

vectorCount: Total number of vectors in the index
dimensions: Number of dimensions each vector must have
processedUpToDatetime: Timestamp string of last processed mutation
processedUpToMutation: String ID of the last processed mutation

Storing Vectors

Add vectors to your index with optional metadata and namespace organization. All vectors in an index must have the same number of dimensions as configured in your manifest.

insert(vectors)

Adds new vectors to the index. If any vector ID already exists, the operation fails. Use this when you don’t want to overwrite existing vectors.

Vector Object Structure:

Required:
- id - Unique string identifier for the vector
- values - Vector data as Float32Array or number[] (must match index dimensions)
Optional:
- namespace - String to organize vectors into logical groups
- metadata - Object with filterable key-value pairs (strings, numbers, booleans, string arrays)

Returns: VectorIndexAsyncMutation with:

mutationId: Unique identifier for tracking this operation

const vectors = [
  {
    id: "article-123",
    values: new Float32Array([0.1, 0.2, 0.3, 0.4]), // Must match index dimensions
    namespace: "articles", // Optional organization
    metadata: {
      title: "Vector Search Basics",
      author: "Jane Smith",
      publishedAt: "2024-01-15",
      tags: ["search", "ai", "vectors"]
    }
  },
  {
    id: "article-124",
    values: [0.2, 0.1, 0.4, 0.3], // Can also use regular number arrays
    namespace: "articles",
    metadata: {
      title: "Advanced Embedding Techniques",
      author: "John Doe",
      publishedAt: "2024-01-20",
      category: "advanced"
    }
  }
];

const result = await this.env.EMBEDDINGS.insert(vectors);
console.log(`Mutation ID: ${result.mutationId}`);

upsert(vectors)

Inserts new vectors or updates existing ones. More flexible than insert since it overwrites vectors with the same ID. Good for updating document embeddings when content changes.

Parameters:

vectors: Array of vector objects to upsert

Returns: VectorIndexAsyncMutation with mutation tracking ID

const updatedVectors = [
  {
    id: "article-123", // Will update if exists, insert if new
    values: new Float32Array([0.15, 0.25, 0.35, 0.45]), // Updated embedding
    metadata: {
      title: "Vector Search Basics - Updated",
      author: "Jane Smith",
      lastModified: "2024-02-01",
      version: 2
    }
  }
];

const result = await this.env.EMBEDDINGS.upsert(updatedVectors);

Batch Operations

Process vectors in batches for better performance and less overhead. The vector index handles bulk operations efficiently.

// Efficient: Batch insert
const batchSize = 100;
const vectorBatches = chunkArray(allVectors, batchSize);

for (const batch of vectorBatches) {
  await this.env.EMBEDDINGS.insert(batch);
}

// Less efficient: Individual inserts
for (const vector of allVectors) {
  await this.env.EMBEDDINGS.insert([vector]); // Avoid this pattern
}

Retrieving Vectors

Get specific vectors by their IDs without performing similarity searches. Fast operation for fetching known vectors or batch processing.

getByIds(ids)

Retrieves vectors by their exact IDs. Fast and doesn’t involve similarity calculations.

Parameters:

ids: Array of vector ID strings to retrieve

Returns: Array of VectorIndexVector objects (may be fewer than requested if some IDs don’t exist)

const vectorIds = ["article-123", "article-124", "article-125"];
const vectors = await this.env.EMBEDDINGS.getByIds(vectorIds);

for (const vector of vectors) {
  console.log(`Vector ${vector.id}: ${vector.metadata?.title}`);
  console.log(`Dimensions: ${vector.values.length}`);
}

Similarity Search

Find vectors most similar to a query vector. The core functionality for recommendation systems, semantic search, and content discovery.

query(vector, options?)

Finds vectors most similar to the provided query vector using the configured distance metric. Control the number of results, filter by metadata, and choose what data to return.

Parameters:

vector: Query vector as Float32Array or number[] (must match index dimensions)
options: Optional query configuration object

Returns: VectorIndexMatches containing:

matches: Array of similar vectors with scores
count: Total number of matches found

// Create a query vector (same dimensions as index)
const queryVector = new Float32Array([0.1, 0.3, 0.2, 0.4]);

// Basic similarity search
const results = await this.env.EMBEDDINGS.query(queryVector, {
  topK: 5,
  returnMetadata: true,
  returnValues: false // Don't return the vector values to save bandwidth
});

// Process results
for (const match of results.matches) {
  console.log(`${match.id}: ${match.score} - ${match.metadata?.title}`);
}

Advanced Query with Filtering:

// Search with metadata filtering
const filteredResults = await this.env.EMBEDDINGS.query(queryVector, {
  topK: 10,
  namespace: "articles", // Only search within this namespace
  returnMetadata: "indexed", // Only return indexed metadata fields
  filter: {
    category: "documentation",
    publishedAt: { $ne: null }, // Must have publication date
    tags: "tutorial" // Must include "tutorial" tag
  }
});

Query Options:

topK: Maximum number of results to return (maximum: 100)
namespace: Limit search to specific namespace
returnValues: Whether to include vector values in results
returnMetadata: Metadata return level (true, "all", "indexed", "none")
filter: Metadata filtering conditions

queryById(vectorId, options?)

Performs similarity search using an existing vector in the index as the query. Good for finding content similar to a specific document without reconstructing the embedding vector.

Parameters:

vectorId: ID string of the vector to use as query
options: Optional query configuration object (same as query method)

Returns: VectorIndexMatches with similar vectors

// Find articles similar to a specific one
const similarArticles = await this.env.EMBEDDINGS.queryById("article-123", {
  topK: 5,
  returnMetadata: true,
  filter: {
    category: "documentation"
  }
});

console.log(`Found ${similarArticles.count} similar articles`);

Removing Vectors

Delete vectors from the index when you no longer need them. Helps manage storage costs and keeps your index focused on current content.

deleteByIds(ids)

Removes vectors with the specified IDs from the index. The operation is asynchronous and returns a mutation ID for tracking. Deleting non-existent vectors won’t cause errors.

Parameters:

ids: Array of vector ID strings to delete

Returns: VectorIndexAsyncMutation with mutation tracking ID

// Remove outdated articles
const idsToDelete = ["article-old-1", "article-old-2", "draft-123"];
const result = await this.env.EMBEDDINGS.deleteByIds(idsToDelete);

console.log(`Deletion mutation ID: ${result.mutationId}`);

Metadata Filtering

Vector search supports metadata filtering that combines with similarity search. Filters use MongoDB-style operators.

Filter Operators

The vector index supports equality and inequality filtering on metadata fields:

const searchResults = await this.env.EMBEDDINGS.query(queryVector, {
  topK: 10,
  filter: {
    // Exact match
    category: "tutorial",

    // Not equal
    status: { $ne: "draft" },

    // Must not be null
    publishedAt: { $ne: null },

    // Numeric comparisons
    rating: { $eq: 5 },

    // String array contains
    tags: "beginner"
  }
});

Supported Metadata Types

Metadata values can be strings, numbers, booleans, or string arrays:

const vector = {
  id: "comprehensive-example",
  values: embeddingVector,
  metadata: {
    // String values
    title: "Complete Guide to Vectors",
    category: "documentation",

    // Numeric values
    rating: 4.8,
    viewCount: 1250,

    // Boolean values
    featured: true,
    published: true,

    // String arrays
    tags: ["vectors", "search", "ai", "tutorial"],
    authors: ["Alice Johnson", "Bob Smith"]
  }
};

Namespace Organization

Namespaces provide logical separation within a single vector index. Organize vectors by tenant, content type, or any other categorization while maintaining unified search capabilities.

// Store vectors in different namespaces
await this.env.EMBEDDINGS.insert([
  {
    id: "user-doc-1",
    values: userDocEmbedding,
    namespace: "user-documents",
    metadata: { type: "user-generated" }
  },
  {
    id: "help-doc-1",
    values: helpDocEmbedding,
    namespace: "help-documentation",
    metadata: { type: "official" }
  }
]);

// Search within specific namespace
const userResults = await this.env.EMBEDDINGS.query(queryVector, {
  namespace: "user-documents",
  topK: 5
});

Distance Metrics

Choose the distance metric based on your embedding model and use case. The metric affects how similarity is calculated and can impact search quality.

Cosine Similarity

Best for normalized embeddings where vector magnitude doesn’t matter. Most common choice for text embeddings from models like OpenAI’s text-embedding-ada-002.

vector_index "text-embeddings" {
  dimensions = 1536
  metric = "cosine"
}

When to use:

Text embeddings from modern language models
When embeddings are normalized or magnitude doesn’t matter
General-purpose semantic search applications

Euclidean Distance

Measures straight-line distance in vector space. Good when vector magnitude matters and you want to consider absolute differences.

vector_index "feature-vectors" {
  dimensions = 512
  metric = "euclidean"
}

When to use:

Image embeddings where spatial relationships matter
Feature vectors representing measurable quantities
When you need to consider vector magnitude

Dot Product

Combines similarity direction with magnitude. Good for machine learning applications where both vector direction and magnitude matter.

vector_index "ml-features" {
  dimensions = 256
  metric = "dot-product"
}

When to use:

Specialized ML applications
When magnitude represents confidence or importance
Recommendation systems with weighted features