AI Models
Overview
Raindrop provides access to a comprehensive suite of AI models through a unified interface that abstracts the complexity of working with different AI providers while maintaining type safety and performance. The AI system supports text generation, image processing, speech recognition, language translation, embeddings, and specialized capabilities like code generation and mathematical reasoning.
The framework handles model routing between integrated AI infrastructure and external providers automatically, providing consistent interfaces across all model types. Each model has specific input and output types that ensure compile-time safety while supporting both simple one-shot operations and complex streaming workflows.
Key benefits:
- Unified Interface: Single
env.AI.run()
method for all model types - Type Safety: Compile-time validation of model inputs and outputs
- Automatic Routing: Seamless integration across multiple AI providers
- Streaming Support: Real-time response streaming for conversational applications
- Advanced Options: Request queuing, caching, and gateway configuration
Prerequisites
- Active Raindrop project with AI binding configured
- Understanding of TypeScript generics and async/await patterns
- Familiarity with AI model concepts (LLMs, embeddings, vision models)
- Basic knowledge of your target AI use cases and model requirements
Creating/Getting Started
AI capabilities are automatically available to all Raindrop applications through the env.AI
interface - no manifest configuration required.
application "ai-app" { service "api" { domain = "api.example.com" # AI interface available as this.env.AI }}
Generate the service implementation:
raindrop build generate
The AI interface is available in your generated service class:
export default class extends Service<Env> { async fetch(request: Request): Promise<Response> { // AI interface available as this.env.AI const result = await this.env.AI.run( 'llama-3.1-8b-instruct', { prompt: "Hello, AI!" } );
return new Response(result.response); }}
Accessing/Basic Usage
Access AI models through the env.AI.run()
method with model-specific parameters:
// Basic text generationconst response = await this.env.AI.run( 'llama-3.3-70b', { messages: [ { role: "user", content: "Explain quantum computing" } ], max_tokens: 150 });
// Generate embeddingsconst embeddings = await this.env.AI.run( 'bge-large-en', { input: "Text to embed" });
// Process images with vision modelsconst analysis = await this.env.AI.run( 'llama-3.2-11b-vision', { messages: [ { role: "user", content: [ { type: "text", text: "Describe this image" }, { type: "image_url", image_url: { url: imageUrl } } ] } ] });
The interface automatically handles type checking and validates inputs based on the model selected.
Core Concepts
Model Routing System
Raindrop uses a sophisticated routing system that maps user-friendly model names to provider-specific endpoints. Models are distributed across two primary providers:
External Router Models: High-performance models accessed through external APIs, including the latest releases like DeepSeek R1 and Llama 4 Scout. These models typically offer superior capabilities and longer context lengths.
Platform Models: Models running directly on integrated platform infrastructure, providing fast response times and seamless integration with edge computing workflows.
Type-Safe Interfaces
Each model has specific input and output type signatures that provide compile-time validation:
// TypeScript infers correct types automaticallyconst llmResponse = await env.AI.run('llama-3.3-70b', { messages: [{ role: "user", content: "Hello" }], // ← Typed input temperature: 0.7}); // ← Returns typed LLM output
const embedResponse = await env.AI.run('bge-large-en', { input: ["text1", "text2"] // ← Typed embedding input}); // ← Returns typed embedding output
Capability-Based Selection
Models are organized by capabilities rather than just size or provider:
- Chat/Completion: Conversational AI and text generation
- Vision: Image understanding and multimodal processing
- Embeddings: Text representation and semantic search
- Audio: Speech-to-text transcription
- Specialized: Code generation, mathematical reasoning, PII detection
Text Generation Models
Text generation models handle conversational AI, content creation, and language understanding tasks.
Large Language Models
Premium External Models:
llama-3.3-70b
- Meta Llama 3.3 70B with 128K context for complex reasoningdeepseek-r1
- DeepSeek R1 671B with advanced chain-of-thought reasoningdeepseek-v3
- DeepSeek V3 high-performance model with long contextkimi-k2
- Moonshot AI Kimi K2 1T parameter MoE with tool integrationqwen-3-32b
- Qwen 3 32B with advanced multilingual capabilitiesllama-4-maverick-17b
- Llama 4 Maverick 17B advanced multimodal model
Fast External Models:
llama-3.1-8b-external
- Meta Llama 3.1 8B for efficient processingllama-3.1-70b-external
- Meta Llama 3.1 70B high quality large language modelllama-3.1-8b-instant
- Meta Llama 3.1 8B Instant for ultra-fast responsesgemma-9b-it
- Google Gemma 9B instruction-tuned modelllama-3.3-swallow-70b
- Llama 3.3 Swallow 70B Japanese-optimized model
Platform Models:
gpt-oss-120b
- GPT OSS 120B advanced reasoning model with chain-of-thought capabilitiesgpt-oss-20b
- GPT OSS 20B efficient reasoning model with chain-of-thought capabilitiesllama-3.1-70b-instruct
- Meta Llama 3.1 70B large language model with long contextllama-3.1-8b-instruct
- Meta Llama 3.1 8B fast and efficient modelllama-3-8b-instruct
- Meta Llama 3 8B reliable model for general tasksllama-3.2-3b-instruct
- Meta Llama 3.2 3B compact and efficient modelmistral-7b-instruct
- Mistral 7B high-quality 7B parameter modelgemma-2b
- Google Gemma 2B lightweight model for basic tasks
Specialized Text Models
Reasoning Models:
deepseek-r1-distill-70b
- Fast reasoning model with long context supportdeepseek-r1-distill-qwen-32b
- Chain-of-thought reasoning with JSON mode
Code Generation:
qwen-coder-32b
- Qwen 2.5 Coder 32B specialized for code generation
Mathematical:
deepseek-math-7b
- DeepSeek Math 7B specialized for mathematical reasoning
Text Generation Usage
Basic Chat Completion:
const response = await env.AI.run('llama-3.3-70b', { messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Write a haiku about coding" } ], max_tokens: 100, temperature: 0.7});
console.log(response.choices[0].message.content);
Streaming Responses:
const stream = await env.AI.run('llama-3.1-8b-instruct', { messages: [{ role: "user", content: "Tell me a story" }], stream: true});
for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || '');}
JSON Mode:
const analysis = await env.AI.run('deepseek-r1-distill-qwen-32b', { messages: [ { role: "user", content: "Analyze this data and return structured JSON" } ], response_format: { type: "json_object" }});
Vision Models
Vision models process and understand images alongside text for multimodal applications.
Available Vision Models
Multimodal Conversational:
llama-4-scout-17b
- Meta Llama 4 Scout 17B multimodal model with vision capabilitiesllama-3.2-11b-vision
- Meta Llama 3.2 11B with vision capabilitiesmistral-small-3.1
- Mistral Small 3.1 vision-enabled model with 128K context
Specialized Vision:
llava-1.5-7b
- LLaVA 1.5 7B vision-language model
New Vision Models:
gemma-3-2b
- Google Gemma 3 2B multimodal model with vision and 128K contextgemma-3-9b
- Google Gemma 3 9B multimodal model with vision and 128K context
Vision Model Usage
Image Description:
const description = await env.AI.run('llama-3.2-11b-vision', { messages: [{ role: "user", content: [ { type: "text", text: "What's in this image?" }, { type: "image_url", image_url: { url: "...", detail: "high" } } ] }], max_tokens: 300});
Image Analysis with Context:
const analysis = await env.AI.run('llama-4-scout-17b', { messages: [ { role: "system", content: "You are an expert art critic. Analyze images in detail." }, { role: "user", content: [ { type: "text", text: "Analyze the artistic style and techniques" }, { type: "image_url", image_url: { url: imageUrl } } ] } ], temperature: 0.3});
Embedding Models
Embedding models convert text into numerical vectors for semantic search, similarity comparison, and retrieval-augmented generation.
Available Embedding Models
External Router:
embeddings
- Default embeddings model - BAAI BGE Large English v1.5 via external routerbge-large-en-external
- BAAI BGE Large English v1.5 high-quality embeddings via external router
Platform Models:
bge-large-en
- BGE Large English high-dimensional embeddings for English textbge-base-en
- BGE Base English high-quality embeddings for English textbge-small-en
- BGE Small English compact embeddings for English textbge-m3
- BGE M3 multi-lingual embeddings supporting 100+ languages
Specialized:
bge-reranker-base
- BGE Reranker Base first reranker model for document rankingpii-detection
- PII Detection service for identifying personally identifiable information
Embedding Usage
Single Text Embedding:
const embedding = await env.AI.run('bge-large-en', { input: "Natural language processing with embeddings"});
const vector = embedding.data[0].embedding; // 1024-dimensional array
Batch Text Embedding:
const embeddings = await env.AI.run('embeddings', { input: [ "First document text", "Second document text", "Third document text" ]});
embeddings.data.forEach((item, index) => { console.log(`Document ${index}: ${item.embedding.length} dimensions`);});
Multilingual Embedding:
const multilingualEmbedding = await env.AI.run('bge-m3', { text: ["Hello world", "Hola mundo", "Bonjour le monde"]});
Document Reranking:
const rankings = await env.AI.run('bge-reranker-base', { query: "machine learning algorithms", contexts: [ "Neural networks and deep learning", "Traditional statistical methods", "Computer vision applications" ]});
Audio Models
Audio models provide speech-to-text transcription with support for multiple languages and output formats.
Available Audio Models
External Router:
whisper-large-v3
- OpenAI Whisper Large v3 advanced speech-to-text transcription
Platform Models:
whisper
- OpenAI Whisper speech-to-text transcriptionwhisper-large-v3-turbo
- OpenAI Whisper Large v3 Turbo faster, more accurate speech-to-text
Audio Usage
Basic Transcription:
// Audio file from requestconst audioFile = await request.blob();
const transcription = await env.AI.run('whisper', { file: audioFile, response_format: 'text'});
console.log(transcription.text);
Advanced Transcription with Timestamps:
const detailedTranscription = await env.AI.run('whisper-large-v3-turbo', { file: audioBlob, language: 'en', response_format: 'verbose_json', temperature: 0.2});
detailedTranscription.segments.forEach(segment => { console.log(`${segment.start}s - ${segment.end}s: ${segment.text}`);});
Advanced Options
Configure AI requests with advanced options for caching, queuing, and gateway settings.
Gateway Options
Control request handling, caching, and metadata collection:
const response = await env.AI.run( 'llama-3.3-70b', { messages: [{ role: "user", content: "Hello" }] }, { gateway: { id: 'unique-request-id', cacheKey: 'user-greeting-cache', cacheTtl: 3600, // 1 hour cache skipCache: false, collectLog: true, metadata: { userId: 'user123', sessionId: 'session456' }, requestTimeoutMs: 30000 } });
Request Queuing
Queue requests for asynchronous processing:
const asyncResponse = await env.AI.run( 'deepseek-v3', { messages: [{ role: "user", content: "Generate a detailed analysis" }], max_tokens: 4000 }, { queueRequest: true });
// Get results later using the request IDconst requestId = asyncResponse.request_id;
Streaming Configuration
Configure streaming responses with custom options:
const stream = await env.AI.run( 'llama-3.1-8b-instruct', { messages: [{ role: "user", content: "Tell me a story" }], stream: true, max_tokens: 500 }, { includeTimingData: true, extraHeaders: { 'X-Custom-Header': 'value' } });
// Process streaming responseconst reader = stream.getReader();try { while (true) { const { done, value } = await reader.read(); if (done) break;
// Process chunk const chunk = new TextDecoder().decode(value); console.log(chunk); }} finally { reader.releaseLock();}
Raw Response Mode
Access full HTTP Response objects for custom processing:
const rawResponse = await env.AI.run( 'bge-large-en', { input: "Sample text" }, { returnRawResponse: true });
const headers = rawResponse.headers;const status = rawResponse.status;const data = await rawResponse.json();
console.log(`Status: ${status}, Content-Type: ${headers.get('content-type')}`);
Smart Bucket Authentication
Authenticate AI requests using Smart Bucket credentials:
const authenticatedResponse = await env.AI.run( 'llama-3.3-70b', { messages: [{ role: "user", content: "Private query" }] }, { smartBucketAuth: { bucketId: 'secure-bucket-id', secret: 'bucket-secret-key' } });
Interface Reference
AI Interface
The main interface for all AI operations:
interface Ai { run<T extends AiModel>( model: T, inputs: AiModelInputMap[T], options?: AiOptions ): Promise<AiModelOutputMap[T]>;}
Type Parameters:
T
- The AI model identifier ensuring type safety
Parameters:
model: T
- Model identifier from available model cataloginputs: AiModelInputMap[T]
- Model-specific input parametersoptions?: AiOptions
- Optional configuration for request handling
Returns: Model-specific output with full type safety
Configuration Types
AiOptions Interface:
interface AiOptions { queueRequest?: boolean; // Process as async batch request returnRawResponse?: boolean; // Return raw HTTP Response gateway?: GatewayOptions; // Gateway configuration prefix?: string; // URL prefix for API endpoints extraHeaders?: Record<string, string>; // Additional headers}
ExtendedAiOptions Interface:
interface ExtendedAiOptions extends AiOptions { includeTimingData?: boolean; // Include performance metrics stream?: boolean; // Enable streaming responses smartBucketAuth?: { // Smart Bucket authentication bucketId: string; secret: string; };}
GatewayOptions Interface:
interface GatewayOptions { id: string; // Unique request identifier cacheKey?: string; // Cache key for request cacheTtl?: number; // Cache TTL in seconds skipCache?: boolean; // Bypass caching metadata?: Record<string, any>; // Request metadata collectLog?: boolean; // Enable request logging eventId?: string; // Event tracking ID requestTimeoutMs?: number; // Request timeout}
Code Examples
Complete implementations demonstrating AI integration patterns and common use cases.
Multi-Model Content Pipeline
export default class extends Service<Env> { async processContent(request: Request): Promise<Response> { const { text, imageUrl } = await request.json();
// Generate embeddings for semantic search const embeddings = await this.env.AI.run('bge-large-en', { input: text });
// Analyze image if provided let imageAnalysis = null; if (imageUrl) { imageAnalysis = await this.env.AI.run('llama-3.2-11b-vision', { messages: [{ role: "user", content: [ { type: "text", text: "Describe this image briefly" }, { type: "image_url", image_url: { url: imageUrl } } ] }], max_tokens: 100 }); }
// Generate enhanced content const enhancement = await this.env.AI.run('llama-3.3-70b', { messages: [{ role: "user", content: `Enhance this content: "${text}"${ imageAnalysis ? ` Image shows: ${imageAnalysis.choices[0].message.content}` : '' }` }], max_tokens: 200 });
return Response.json({ original: text, embeddings: embeddings.data[0].embedding, imageAnalysis: imageAnalysis?.choices[0]?.message?.content, enhanced: enhancement.choices[0].message.content }); }}
Streaming Chat Interface
export default class extends Service<Env> { async handleChat(request: Request): Promise<Response> { const { messages } = await request.json();
// Create readable stream for real-time responses const encoder = new TextEncoder(); const stream = new ReadableStream({ async start(controller) { try { const aiStream = await this.env.AI.run('llama-3.1-8b-instruct', { messages, stream: true, max_tokens: 500 });
const reader = aiStream.getReader();
while (true) { const { done, value } = await reader.read(); if (done) break;
// Format as Server-Sent Events const chunk = `data: ${JSON.stringify({ content: new TextDecoder().decode(value) })}\n\n`;
controller.enqueue(encoder.encode(chunk)); }
controller.enqueue(encoder.encode('data: [DONE]\n\n')); } catch (error) { controller.error(error); } finally { controller.close(); } } });
return new Response(stream, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } }); }}
Intelligent Document Processing
interface DocumentProcessor extends Service<Env> { async analyzeDocument(request: Request): Promise<Response> { const formData = await request.formData(); const audioFile = formData.get('audio') as File; const documentText = formData.get('text') as string;
// Transcribe audio if provided let transcript = ''; if (audioFile) { const audioResult = await this.env.AI.run('whisper-large-v3-turbo', { file: audioFile, response_format: 'text' }); transcript = audioResult.text; }
// Combine text sources const fullText = [documentText, transcript].filter(Boolean).join('\n\n');
// Detect PII const piiDetection = await this.env.AI.run('pii-detection', { prompt: fullText });
// Generate summary const summary = await this.env.AI.run('deepseek-v3', { messages: [{ role: "system", content: "Summarize the key points from the provided document." }, { role: "user", content: fullText }], max_tokens: 300 });
// Extract embeddings for search const embeddings = await this.env.AI.run('bge-m3', { text: fullText });
return Response.json({ summary: summary.choices[0].message.content, transcript, piiDetected: piiDetection.pii_detection?.length > 0, piiEntities: piiDetection.pii_detection || [], searchEmbeddings: embeddings.data, wordCount: fullText.split(/\s+/).length }); }}
Code Generation Service
export default class extends Service<Env> { async generateCode(request: Request): Promise<Response> { const { prompt, language, includeTests } = await request.json();
// Generate initial code const codeGeneration = await this.env.AI.run('qwen-coder-32b', { messages: [{ role: "system", content: `You are an expert ${language} developer. Generate clean, efficient code.` }, { role: "user", content: prompt }], max_tokens: 1000, temperature: 0.2 });
const generatedCode = codeGeneration.choices[0].message.content;
// Generate tests if requested let tests = null; if (includeTests) { const testGeneration = await this.env.AI.run('deepseek-math-7b', { messages: [{ role: "user", content: `Generate comprehensive unit tests for this ${language} code:\n\n${generatedCode}` }], max_tokens: 800 }); tests = testGeneration.choices[0].message.content; }
// Review code quality const review = await this.env.AI.run('llama-3.3-70b', { messages: [{ role: "system", content: "You are a code reviewer. Provide constructive feedback on code quality, security, and best practices." }, { role: "user", content: `Review this ${language} code:\n\n${generatedCode}` }], max_tokens: 400 });
return Response.json({ code: generatedCode, tests, review: review.choices[0].message.content, language, timestamp: new Date().toISOString() }); }}
raindrop.manifest
AI models are automatically available to all Raindrop applications:
application "ai-powered-app" { service "api" { domain = "api.example.com" # AI interface automatically available as this.env.AI }
actor "ai-processor" { # AI interface automatically available as this.env.AI }
smartmemory "conversation_history" { # Store AI conversation context and embeddings }}
The framework automatically provides the AI interface across all components, enabling seamless integration of intelligence capabilities throughout your distributed application architecture.