Skip to content

AI Models

Overview

Raindrop provides access to a comprehensive suite of AI models through a unified interface that abstracts the complexity of working with different AI providers while maintaining type safety and performance. The AI system supports text generation, image processing, speech recognition, language translation, embeddings, and specialized capabilities like code generation and mathematical reasoning.

The framework handles model routing between integrated AI infrastructure and external providers automatically, providing consistent interfaces across all model types. Each model has specific input and output types that ensure compile-time safety while supporting both simple one-shot operations and complex streaming workflows.

Key benefits:

  • Unified Interface: Single env.AI.run() method for all model types
  • Type Safety: Compile-time validation of model inputs and outputs
  • Automatic Routing: Seamless integration across multiple AI providers
  • Streaming Support: Real-time response streaming for conversational applications
  • Advanced Options: Request queuing, caching, and gateway configuration

Prerequisites

  • Active Raindrop project with AI binding configured
  • Understanding of TypeScript generics and async/await patterns
  • Familiarity with AI model concepts (LLMs, embeddings, vision models)
  • Basic knowledge of your target AI use cases and model requirements

Creating/Getting Started

AI capabilities are automatically available to all Raindrop applications through the env.AI interface - no manifest configuration required.

application "ai-app" {
service "api" {
domain = "api.example.com"
# AI interface available as this.env.AI
}
}

Generate the service implementation:

Terminal window
raindrop build generate

The AI interface is available in your generated service class:

export default class extends Service<Env> {
async fetch(request: Request): Promise<Response> {
// AI interface available as this.env.AI
const result = await this.env.AI.run(
'llama-3.1-8b-instruct',
{ prompt: "Hello, AI!" }
);
return new Response(result.response);
}
}

Accessing/Basic Usage

Access AI models through the env.AI.run() method with model-specific parameters:

// Basic text generation
const response = await this.env.AI.run(
'llama-3.3-70b',
{
messages: [
{ role: "user", content: "Explain quantum computing" }
],
max_tokens: 150
}
);
// Generate embeddings
const embeddings = await this.env.AI.run(
'bge-large-en',
{ input: "Text to embed" }
);
// Process images with vision models
const analysis = await this.env.AI.run(
'llama-3.2-11b-vision',
{
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image" },
{ type: "image_url", image_url: { url: imageUrl } }
]
}
]
}
);

The interface automatically handles type checking and validates inputs based on the model selected.

Core Concepts

Model Routing System

Raindrop uses a sophisticated routing system that maps user-friendly model names to provider-specific endpoints. Models are distributed across two primary providers:

External Router Models: High-performance models accessed through external APIs, including the latest releases like DeepSeek R1 and Llama 4 Scout. These models typically offer superior capabilities and longer context lengths.

Platform Models: Models running directly on integrated platform infrastructure, providing fast response times and seamless integration with edge computing workflows.

Type-Safe Interfaces

Each model has specific input and output type signatures that provide compile-time validation:

// TypeScript infers correct types automatically
const llmResponse = await env.AI.run('llama-3.3-70b', {
messages: [{ role: "user", content: "Hello" }], // ← Typed input
temperature: 0.7
}); // ← Returns typed LLM output
const embedResponse = await env.AI.run('bge-large-en', {
input: ["text1", "text2"] // ← Typed embedding input
}); // ← Returns typed embedding output

Capability-Based Selection

Models are organized by capabilities rather than just size or provider:

  • Chat/Completion: Conversational AI and text generation
  • Vision: Image understanding and multimodal processing
  • Embeddings: Text representation and semantic search
  • Audio: Speech-to-text transcription
  • Specialized: Code generation, mathematical reasoning, PII detection

Text Generation Models

Text generation models handle conversational AI, content creation, and language understanding tasks.

Large Language Models

Premium External Models:

  • llama-3.3-70b - Meta Llama 3.3 70B with 128K context for complex reasoning
  • deepseek-r1 - DeepSeek R1 671B with advanced chain-of-thought reasoning
  • deepseek-v3 - DeepSeek V3 high-performance model with long context
  • kimi-k2 - Moonshot AI Kimi K2 1T parameter MoE with tool integration
  • qwen-3-32b - Qwen 3 32B with advanced multilingual capabilities
  • llama-4-maverick-17b - Llama 4 Maverick 17B advanced multimodal model

Fast External Models:

  • llama-3.1-8b-external - Meta Llama 3.1 8B for efficient processing
  • llama-3.1-70b-external - Meta Llama 3.1 70B high quality large language model
  • llama-3.1-8b-instant - Meta Llama 3.1 8B Instant for ultra-fast responses
  • gemma-9b-it - Google Gemma 9B instruction-tuned model
  • llama-3.3-swallow-70b - Llama 3.3 Swallow 70B Japanese-optimized model

Platform Models:

  • gpt-oss-120b - GPT OSS 120B advanced reasoning model with chain-of-thought capabilities
  • gpt-oss-20b - GPT OSS 20B efficient reasoning model with chain-of-thought capabilities
  • llama-3.1-70b-instruct - Meta Llama 3.1 70B large language model with long context
  • llama-3.1-8b-instruct - Meta Llama 3.1 8B fast and efficient model
  • llama-3-8b-instruct - Meta Llama 3 8B reliable model for general tasks
  • llama-3.2-3b-instruct - Meta Llama 3.2 3B compact and efficient model
  • mistral-7b-instruct - Mistral 7B high-quality 7B parameter model
  • gemma-2b - Google Gemma 2B lightweight model for basic tasks

Specialized Text Models

Reasoning Models:

  • deepseek-r1-distill-70b - Fast reasoning model with long context support
  • deepseek-r1-distill-qwen-32b - Chain-of-thought reasoning with JSON mode

Code Generation:

  • qwen-coder-32b - Qwen 2.5 Coder 32B specialized for code generation

Mathematical:

  • deepseek-math-7b - DeepSeek Math 7B specialized for mathematical reasoning

Text Generation Usage

Basic Chat Completion:

const response = await env.AI.run('llama-3.3-70b', {
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a haiku about coding" }
],
max_tokens: 100,
temperature: 0.7
});
console.log(response.choices[0].message.content);

Streaming Responses:

const stream = await env.AI.run('llama-3.1-8b-instruct', {
messages: [{ role: "user", content: "Tell me a story" }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

JSON Mode:

const analysis = await env.AI.run('deepseek-r1-distill-qwen-32b', {
messages: [
{ role: "user", content: "Analyze this data and return structured JSON" }
],
response_format: { type: "json_object" }
});

Vision Models

Vision models process and understand images alongside text for multimodal applications.

Available Vision Models

Multimodal Conversational:

  • llama-4-scout-17b - Meta Llama 4 Scout 17B multimodal model with vision capabilities
  • llama-3.2-11b-vision - Meta Llama 3.2 11B with vision capabilities
  • mistral-small-3.1 - Mistral Small 3.1 vision-enabled model with 128K context

Specialized Vision:

  • llava-1.5-7b - LLaVA 1.5 7B vision-language model

New Vision Models:

  • gemma-3-2b - Google Gemma 3 2B multimodal model with vision and 128K context
  • gemma-3-9b - Google Gemma 3 9B multimodal model with vision and 128K context

Vision Model Usage

Image Description:

const description = await env.AI.run('llama-3.2-11b-vision', {
messages: [{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ...",
detail: "high"
}
}
]
}],
max_tokens: 300
});

Image Analysis with Context:

const analysis = await env.AI.run('llama-4-scout-17b', {
messages: [
{
role: "system",
content: "You are an expert art critic. Analyze images in detail."
},
{
role: "user",
content: [
{ type: "text", text: "Analyze the artistic style and techniques" },
{ type: "image_url", image_url: { url: imageUrl } }
]
}
],
temperature: 0.3
});

Embedding Models

Embedding models convert text into numerical vectors for semantic search, similarity comparison, and retrieval-augmented generation.

Available Embedding Models

External Router:

  • embeddings - Default embeddings model - BAAI BGE Large English v1.5 via external router
  • bge-large-en-external - BAAI BGE Large English v1.5 high-quality embeddings via external router

Platform Models:

  • bge-large-en - BGE Large English high-dimensional embeddings for English text
  • bge-base-en - BGE Base English high-quality embeddings for English text
  • bge-small-en - BGE Small English compact embeddings for English text
  • bge-m3 - BGE M3 multi-lingual embeddings supporting 100+ languages

Specialized:

  • bge-reranker-base - BGE Reranker Base first reranker model for document ranking
  • pii-detection - PII Detection service for identifying personally identifiable information

Embedding Usage

Single Text Embedding:

const embedding = await env.AI.run('bge-large-en', {
input: "Natural language processing with embeddings"
});
const vector = embedding.data[0].embedding; // 1024-dimensional array

Batch Text Embedding:

const embeddings = await env.AI.run('embeddings', {
input: [
"First document text",
"Second document text",
"Third document text"
]
});
embeddings.data.forEach((item, index) => {
console.log(`Document ${index}: ${item.embedding.length} dimensions`);
});

Multilingual Embedding:

const multilingualEmbedding = await env.AI.run('bge-m3', {
text: ["Hello world", "Hola mundo", "Bonjour le monde"]
});

Document Reranking:

const rankings = await env.AI.run('bge-reranker-base', {
query: "machine learning algorithms",
contexts: [
"Neural networks and deep learning",
"Traditional statistical methods",
"Computer vision applications"
]
});

Audio Models

Audio models provide speech-to-text transcription with support for multiple languages and output formats.

Available Audio Models

External Router:

  • whisper-large-v3 - OpenAI Whisper Large v3 advanced speech-to-text transcription

Platform Models:

  • whisper - OpenAI Whisper speech-to-text transcription
  • whisper-large-v3-turbo - OpenAI Whisper Large v3 Turbo faster, more accurate speech-to-text

Audio Usage

Basic Transcription:

// Audio file from request
const audioFile = await request.blob();
const transcription = await env.AI.run('whisper', {
file: audioFile,
response_format: 'text'
});
console.log(transcription.text);

Advanced Transcription with Timestamps:

const detailedTranscription = await env.AI.run('whisper-large-v3-turbo', {
file: audioBlob,
language: 'en',
response_format: 'verbose_json',
temperature: 0.2
});
detailedTranscription.segments.forEach(segment => {
console.log(`${segment.start}s - ${segment.end}s: ${segment.text}`);
});

Advanced Options

Configure AI requests with advanced options for caching, queuing, and gateway settings.

Gateway Options

Control request handling, caching, and metadata collection:

const response = await env.AI.run(
'llama-3.3-70b',
{ messages: [{ role: "user", content: "Hello" }] },
{
gateway: {
id: 'unique-request-id',
cacheKey: 'user-greeting-cache',
cacheTtl: 3600, // 1 hour cache
skipCache: false,
collectLog: true,
metadata: {
userId: 'user123',
sessionId: 'session456'
},
requestTimeoutMs: 30000
}
}
);

Request Queuing

Queue requests for asynchronous processing:

const asyncResponse = await env.AI.run(
'deepseek-v3',
{
messages: [{ role: "user", content: "Generate a detailed analysis" }],
max_tokens: 4000
},
{ queueRequest: true }
);
// Get results later using the request ID
const requestId = asyncResponse.request_id;

Streaming Configuration

Configure streaming responses with custom options:

const stream = await env.AI.run(
'llama-3.1-8b-instruct',
{
messages: [{ role: "user", content: "Tell me a story" }],
stream: true,
max_tokens: 500
},
{
includeTimingData: true,
extraHeaders: {
'X-Custom-Header': 'value'
}
}
);
// Process streaming response
const reader = stream.getReader();
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Process chunk
const chunk = new TextDecoder().decode(value);
console.log(chunk);
}
} finally {
reader.releaseLock();
}

Raw Response Mode

Access full HTTP Response objects for custom processing:

const rawResponse = await env.AI.run(
'bge-large-en',
{ input: "Sample text" },
{ returnRawResponse: true }
);
const headers = rawResponse.headers;
const status = rawResponse.status;
const data = await rawResponse.json();
console.log(`Status: ${status}, Content-Type: ${headers.get('content-type')}`);

Smart Bucket Authentication

Authenticate AI requests using Smart Bucket credentials:

const authenticatedResponse = await env.AI.run(
'llama-3.3-70b',
{ messages: [{ role: "user", content: "Private query" }] },
{
smartBucketAuth: {
bucketId: 'secure-bucket-id',
secret: 'bucket-secret-key'
}
}
);

Interface Reference

AI Interface

The main interface for all AI operations:

interface Ai {
run<T extends AiModel>(
model: T,
inputs: AiModelInputMap[T],
options?: AiOptions
): Promise<AiModelOutputMap[T]>;
}

Type Parameters:

  • T - The AI model identifier ensuring type safety

Parameters:

  • model: T - Model identifier from available model catalog
  • inputs: AiModelInputMap[T] - Model-specific input parameters
  • options?: AiOptions - Optional configuration for request handling

Returns: Model-specific output with full type safety

Configuration Types

AiOptions Interface:

interface AiOptions {
queueRequest?: boolean; // Process as async batch request
returnRawResponse?: boolean; // Return raw HTTP Response
gateway?: GatewayOptions; // Gateway configuration
prefix?: string; // URL prefix for API endpoints
extraHeaders?: Record<string, string>; // Additional headers
}

ExtendedAiOptions Interface:

interface ExtendedAiOptions extends AiOptions {
includeTimingData?: boolean; // Include performance metrics
stream?: boolean; // Enable streaming responses
smartBucketAuth?: { // Smart Bucket authentication
bucketId: string;
secret: string;
};
}

GatewayOptions Interface:

interface GatewayOptions {
id: string; // Unique request identifier
cacheKey?: string; // Cache key for request
cacheTtl?: number; // Cache TTL in seconds
skipCache?: boolean; // Bypass caching
metadata?: Record<string, any>; // Request metadata
collectLog?: boolean; // Enable request logging
eventId?: string; // Event tracking ID
requestTimeoutMs?: number; // Request timeout
}

Code Examples

Complete implementations demonstrating AI integration patterns and common use cases.

Multi-Model Content Pipeline

export default class extends Service<Env> {
async processContent(request: Request): Promise<Response> {
const { text, imageUrl } = await request.json();
// Generate embeddings for semantic search
const embeddings = await this.env.AI.run('bge-large-en', {
input: text
});
// Analyze image if provided
let imageAnalysis = null;
if (imageUrl) {
imageAnalysis = await this.env.AI.run('llama-3.2-11b-vision', {
messages: [{
role: "user",
content: [
{ type: "text", text: "Describe this image briefly" },
{ type: "image_url", image_url: { url: imageUrl } }
]
}],
max_tokens: 100
});
}
// Generate enhanced content
const enhancement = await this.env.AI.run('llama-3.3-70b', {
messages: [{
role: "user",
content: `Enhance this content: "${text}"${
imageAnalysis ? ` Image shows: ${imageAnalysis.choices[0].message.content}` : ''
}`
}],
max_tokens: 200
});
return Response.json({
original: text,
embeddings: embeddings.data[0].embedding,
imageAnalysis: imageAnalysis?.choices[0]?.message?.content,
enhanced: enhancement.choices[0].message.content
});
}
}

Streaming Chat Interface

export default class extends Service<Env> {
async handleChat(request: Request): Promise<Response> {
const { messages } = await request.json();
// Create readable stream for real-time responses
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const aiStream = await this.env.AI.run('llama-3.1-8b-instruct', {
messages,
stream: true,
max_tokens: 500
});
const reader = aiStream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Format as Server-Sent Events
const chunk = `data: ${JSON.stringify({
content: new TextDecoder().decode(value)
})}\n\n`;
controller.enqueue(encoder.encode(chunk));
}
controller.enqueue(encoder.encode('data: [DONE]\n\n'));
} catch (error) {
controller.error(error);
} finally {
controller.close();
}
}
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
}
});
}
}

Intelligent Document Processing

interface DocumentProcessor extends Service<Env> {
async analyzeDocument(request: Request): Promise<Response> {
const formData = await request.formData();
const audioFile = formData.get('audio') as File;
const documentText = formData.get('text') as string;
// Transcribe audio if provided
let transcript = '';
if (audioFile) {
const audioResult = await this.env.AI.run('whisper-large-v3-turbo', {
file: audioFile,
response_format: 'text'
});
transcript = audioResult.text;
}
// Combine text sources
const fullText = [documentText, transcript].filter(Boolean).join('\n\n');
// Detect PII
const piiDetection = await this.env.AI.run('pii-detection', {
prompt: fullText
});
// Generate summary
const summary = await this.env.AI.run('deepseek-v3', {
messages: [{
role: "system",
content: "Summarize the key points from the provided document."
}, {
role: "user",
content: fullText
}],
max_tokens: 300
});
// Extract embeddings for search
const embeddings = await this.env.AI.run('bge-m3', {
text: fullText
});
return Response.json({
summary: summary.choices[0].message.content,
transcript,
piiDetected: piiDetection.pii_detection?.length > 0,
piiEntities: piiDetection.pii_detection || [],
searchEmbeddings: embeddings.data,
wordCount: fullText.split(/\s+/).length
});
}
}

Code Generation Service

export default class extends Service<Env> {
async generateCode(request: Request): Promise<Response> {
const { prompt, language, includeTests } = await request.json();
// Generate initial code
const codeGeneration = await this.env.AI.run('qwen-coder-32b', {
messages: [{
role: "system",
content: `You are an expert ${language} developer. Generate clean, efficient code.`
}, {
role: "user",
content: prompt
}],
max_tokens: 1000,
temperature: 0.2
});
const generatedCode = codeGeneration.choices[0].message.content;
// Generate tests if requested
let tests = null;
if (includeTests) {
const testGeneration = await this.env.AI.run('deepseek-math-7b', {
messages: [{
role: "user",
content: `Generate comprehensive unit tests for this ${language} code:\n\n${generatedCode}`
}],
max_tokens: 800
});
tests = testGeneration.choices[0].message.content;
}
// Review code quality
const review = await this.env.AI.run('llama-3.3-70b', {
messages: [{
role: "system",
content: "You are a code reviewer. Provide constructive feedback on code quality, security, and best practices."
}, {
role: "user",
content: `Review this ${language} code:\n\n${generatedCode}`
}],
max_tokens: 400
});
return Response.json({
code: generatedCode,
tests,
review: review.choices[0].message.content,
language,
timestamp: new Date().toISOString()
});
}
}

raindrop.manifest

AI models are automatically available to all Raindrop applications:

application "ai-powered-app" {
service "api" {
domain = "api.example.com"
# AI interface automatically available as this.env.AI
}
actor "ai-processor" {
# AI interface automatically available as this.env.AI
}
smartmemory "conversation_history" {
# Store AI conversation context and embeddings
}
}

The framework automatically provides the AI interface across all components, enabling seamless integration of intelligence capabilities throughout your distributed application architecture.