Skip to content

Vector

Overview

Vector databases enable you to store, index, and search high-dimensional numerical vectors that represent complex data like text embeddings, images, or any feature vectors.

Setup Flow

1. Configure in Manifest

Define a vector index in your Raindrop application manifest with the required dimensions and distance metric:

application "my-app" {
vector_index "embeddings" {
dimensions = 1024
metric = "cosine" // or "euclidean", "dot-product"
}
}

Distance Metrics:

  • cosine - Measures angle between vectors (ignores magnitude). Best for text embeddings and semantic similarity.
  • euclidean - Straight-line distance in vector space. Good when magnitude matters (coordinates, measurements).
  • dot-product - Combines angle and magnitude. Useful when vector length represents confidence or importance.

2. Access in Code

The vector index becomes available through your environment bindings, allowing you to perform all vector operations:

export default class extends Service<Env> {
async fetch(request: Request): Promise<Response> {
// Get index information
const indexInfo = await this.env.EMBEDDINGS.describe();
// Store vectors (minimal example)
await this.env.EMBEDDINGS.insert([{
id: "doc-1",
values: new Float32Array(1024).fill(0.1) // Must match dimensions
}]);
// Store vectors with optional metadata and namespace
await this.env.EMBEDDINGS.insert([{
id: "doc-2",
values: new Float32Array(1024).fill(0.2),
namespace: "documents", // Optional: organize vectors
metadata: { // Optional: filterable data
title: "Getting Started Guide",
category: "documentation"
}
}]);
// Search for similar content
const results = await this.env.EMBEDDINGS.query(
new Float32Array(1024).fill(0.15),
{ topK: 5 }
);
return new Response(JSON.stringify(results));
}
}

Vector Operations

Index Information

Get details about your vector index configuration and current state.

describe()

Returns information about the vector index, including its configuration and current statistics.

const indexInfo = await this.env.EMBEDDINGS.describe();
console.log(`Dimensions: ${indexInfo.dimensions}`);
console.log(`Vector count: ${indexInfo.vectorCount}`);
console.log(`Last processed: ${indexInfo.processedUpToDatetime}`);
console.log(`Mutation ID: ${indexInfo.processedUpToMutation}`);

Returns: VectorIndexIndexInfo containing:

  • vectorCount: Total number of vectors in the index
  • dimensions: Number of dimensions each vector must have
  • processedUpToDatetime: Timestamp string of last processed mutation
  • processedUpToMutation: String ID of the last processed mutation

Storing Vectors

Add vectors to your index with optional metadata and namespace organization. All vectors in an index must have the same number of dimensions as configured in your manifest.

insert(vectors)

Adds new vectors to the index. If any vector ID already exists, the operation fails. Use this when you don’t want to overwrite existing vectors.

Vector Object Structure:

  • Required:
    • id - Unique string identifier for the vector
    • values - Vector data as Float32Array or number[] (must match index dimensions)
  • Optional:
    • namespace - String to organize vectors into logical groups
    • metadata - Object with filterable key-value pairs (strings, numbers, booleans, string arrays)

Returns: VectorIndexAsyncMutation with:

  • mutationId: Unique identifier for tracking this operation
const vectors = [
{
id: "article-123",
values: new Float32Array([0.1, 0.2, 0.3, 0.4]), // Must match index dimensions
namespace: "articles", // Optional organization
metadata: {
title: "Vector Search Basics",
author: "Jane Smith",
publishedAt: "2024-01-15",
tags: ["search", "ai", "vectors"]
}
},
{
id: "article-124",
values: [0.2, 0.1, 0.4, 0.3], // Can also use regular number arrays
namespace: "articles",
metadata: {
title: "Advanced Embedding Techniques",
author: "John Doe",
publishedAt: "2024-01-20",
category: "advanced"
}
}
];
const result = await this.env.EMBEDDINGS.insert(vectors);
console.log(`Mutation ID: ${result.mutationId}`);

upsert(vectors)

Inserts new vectors or updates existing ones. More flexible than insert since it overwrites vectors with the same ID. Good for updating document embeddings when content changes.

Parameters:

  • vectors: Array of vector objects to upsert

Returns: VectorIndexAsyncMutation with mutation tracking ID

const updatedVectors = [
{
id: "article-123", // Will update if exists, insert if new
values: new Float32Array([0.15, 0.25, 0.35, 0.45]), // Updated embedding
metadata: {
title: "Vector Search Basics - Updated",
author: "Jane Smith",
lastModified: "2024-02-01",
version: 2
}
}
];
const result = await this.env.EMBEDDINGS.upsert(updatedVectors);

Batch Operations

Process vectors in batches for better performance and less overhead. The vector index handles bulk operations efficiently.

// Efficient: Batch insert
const batchSize = 100;
const vectorBatches = chunkArray(allVectors, batchSize);
for (const batch of vectorBatches) {
await this.env.EMBEDDINGS.insert(batch);
}
// Less efficient: Individual inserts
for (const vector of allVectors) {
await this.env.EMBEDDINGS.insert([vector]); // Avoid this pattern
}

Retrieving Vectors

Get specific vectors by their IDs without performing similarity searches. Fast operation for fetching known vectors or batch processing.

getByIds(ids)

Retrieves vectors by their exact IDs. Fast and doesn’t involve similarity calculations.

Parameters:

  • ids: Array of vector ID strings to retrieve

Returns: Array of VectorIndexVector objects (may be fewer than requested if some IDs don’t exist)

const vectorIds = ["article-123", "article-124", "article-125"];
const vectors = await this.env.EMBEDDINGS.getByIds(vectorIds);
for (const vector of vectors) {
console.log(`Vector ${vector.id}: ${vector.metadata?.title}`);
console.log(`Dimensions: ${vector.values.length}`);
}

Find vectors most similar to a query vector. The core functionality for recommendation systems, semantic search, and content discovery.

query(vector, options?)

Finds vectors most similar to the provided query vector using the configured distance metric. Control the number of results, filter by metadata, and choose what data to return.

Parameters:

  • vector: Query vector as Float32Array or number[] (must match index dimensions)
  • options: Optional query configuration object

Returns: VectorIndexMatches containing:

  • matches: Array of similar vectors with scores
  • count: Total number of matches found
// Create a query vector (same dimensions as index)
const queryVector = new Float32Array([0.1, 0.3, 0.2, 0.4]);
// Basic similarity search
const results = await this.env.EMBEDDINGS.query(queryVector, {
topK: 5,
returnMetadata: true,
returnValues: false // Don't return the vector values to save bandwidth
});
// Process results
for (const match of results.matches) {
console.log(`${match.id}: ${match.score} - ${match.metadata?.title}`);
}

Advanced Query with Filtering:

// Search with metadata filtering
const filteredResults = await this.env.EMBEDDINGS.query(queryVector, {
topK: 10,
namespace: "articles", // Only search within this namespace
returnMetadata: "indexed", // Only return indexed metadata fields
filter: {
category: "documentation",
publishedAt: { $ne: null }, // Must have publication date
tags: "tutorial" // Must include "tutorial" tag
}
});

Query Options:

  • topK: Maximum number of results to return (maximum: 100)
  • namespace: Limit search to specific namespace
  • returnValues: Whether to include vector values in results
  • returnMetadata: Metadata return level (true, "all", "indexed", "none")
  • filter: Metadata filtering conditions

queryById(vectorId, options?)

Performs similarity search using an existing vector in the index as the query. Good for finding content similar to a specific document without reconstructing the embedding vector.

Parameters:

  • vectorId: ID string of the vector to use as query
  • options: Optional query configuration object (same as query method)

Returns: VectorIndexMatches with similar vectors

// Find articles similar to a specific one
const similarArticles = await this.env.EMBEDDINGS.queryById("article-123", {
topK: 5,
returnMetadata: true,
filter: {
category: "documentation"
}
});
console.log(`Found ${similarArticles.count} similar articles`);

Removing Vectors

Delete vectors from the index when you no longer need them. Helps manage storage costs and keeps your index focused on current content.

deleteByIds(ids)

Removes vectors with the specified IDs from the index. The operation is asynchronous and returns a mutation ID for tracking. Deleting non-existent vectors won’t cause errors.

Parameters:

  • ids: Array of vector ID strings to delete

Returns: VectorIndexAsyncMutation with mutation tracking ID

// Remove outdated articles
const idsToDelete = ["article-old-1", "article-old-2", "draft-123"];
const result = await this.env.EMBEDDINGS.deleteByIds(idsToDelete);
console.log(`Deletion mutation ID: ${result.mutationId}`);

Metadata Filtering

Vector search supports metadata filtering that combines with similarity search. Filters use MongoDB-style operators.

Filter Operators

The vector index supports equality and inequality filtering on metadata fields:

const searchResults = await this.env.EMBEDDINGS.query(queryVector, {
topK: 10,
filter: {
// Exact match
category: "tutorial",
// Not equal
status: { $ne: "draft" },
// Must not be null
publishedAt: { $ne: null },
// Numeric comparisons
rating: { $eq: 5 },
// String array contains
tags: "beginner"
}
});

Supported Metadata Types

Metadata values can be strings, numbers, booleans, or string arrays:

const vector = {
id: "comprehensive-example",
values: embeddingVector,
metadata: {
// String values
title: "Complete Guide to Vectors",
category: "documentation",
// Numeric values
rating: 4.8,
viewCount: 1250,
// Boolean values
featured: true,
published: true,
// String arrays
tags: ["vectors", "search", "ai", "tutorial"],
authors: ["Alice Johnson", "Bob Smith"]
}
};

Namespace Organization

Namespaces provide logical separation within a single vector index. Organize vectors by tenant, content type, or any other categorization while maintaining unified search capabilities.

// Store vectors in different namespaces
await this.env.EMBEDDINGS.insert([
{
id: "user-doc-1",
values: userDocEmbedding,
namespace: "user-documents",
metadata: { type: "user-generated" }
},
{
id: "help-doc-1",
values: helpDocEmbedding,
namespace: "help-documentation",
metadata: { type: "official" }
}
]);
// Search within specific namespace
const userResults = await this.env.EMBEDDINGS.query(queryVector, {
namespace: "user-documents",
topK: 5
});

Distance Metrics

Choose the distance metric based on your embedding model and use case. The metric affects how similarity is calculated and can impact search quality.

Cosine Similarity

Best for normalized embeddings where vector magnitude doesn’t matter. Most common choice for text embeddings from models like OpenAI’s text-embedding-ada-002.

vector_index "text-embeddings" {
dimensions = 1536
metric = "cosine"
}

When to use:

  • Text embeddings from modern language models
  • When embeddings are normalized or magnitude doesn’t matter
  • General-purpose semantic search applications

Euclidean Distance

Measures straight-line distance in vector space. Good when vector magnitude matters and you want to consider absolute differences.

vector_index "feature-vectors" {
dimensions = 512
metric = "euclidean"
}

When to use:

  • Image embeddings where spatial relationships matter
  • Feature vectors representing measurable quantities
  • When you need to consider vector magnitude

Dot Product

Combines similarity direction with magnitude. Good for machine learning applications where both vector direction and magnitude matter.

vector_index "ml-features" {
dimensions = 256
metric = "dot-product"
}

When to use:

  • Specialized ML applications
  • When magnitude represents confidence or importance
  • Recommendation systems with weighted features