Skip to content

AI Models

This content is for the 0.6.3 version. Switch to the latest version for up-to-date documentation.

Raindrop provides access to a comprehensive suite of AI models through a unified interface that abstracts the complexity of working with different AI providers while maintaining type safety and performance. The AI system supports text generation, image processing, speech recognition, language translation, embeddings, and specialized capabilities like code generation and mathematical reasoning.

The framework handles model routing automatically, providing consistent interfaces across all model types. Each model has specific input and output types that ensure compile-time safety while supporting both simple one-shot operations and complex streaming workflows. Advanced models support tool calling for function execution and integration with external systems.

Key benefits:

  • Unified Interface: Single env.AI.run() method for all model types
  • Type Safety: Compile-time validation of model inputs and outputs
  • Tool Calling: Function execution support for compatible models
  • Automatic Routing: Seamless integration across multiple AI providers
  • Streaming Support: Real-time response streaming for conversational applications
  • Advanced Options: Request queuing, caching, and gateway configuration

Prerequisites

  • Active Raindrop project with AI binding configured
  • Understanding of TypeScript generics and async/await patterns
  • Familiarity with AI model concepts (LLMs, embeddings, vision models)
  • Basic knowledge of your target AI use cases and model requirements

Configuration

AI capabilities are automatically available to all Raindrop applications through the env.AI interface - no manifest configuration required.

application "ai-app" {
service "api" {
domain = "api.example.com"
# AI interface available as this.env.AI
}
}

Generate the service implementation:

Terminal window
raindrop build generate

The AI interface is available in your generated service class:

export default class extends Service<Env> {
async fetch(request: Request): Promise<Response> {
// AI interface available as this.env.AI
const result = await this.env.AI.run(
'llama-3.1-8b-instruct',
{ prompt: "Hello, AI!" }
);
return new Response(result.response);
}
}

Access

Access AI models through the env.AI.run() method with model-specific parameters:

// Basic text generation
const response = await this.env.AI.run(
'llama-3.3-70b',
{
messages: [
{ role: "user", content: "Explain quantum computing" }
],
max_tokens: 150
}
);
// Generate embeddings
const embeddings = await this.env.AI.run(
'embeddings',
{ input: "Text to embed" }
);
// Process images with vision models
const analysis = await this.env.AI.run(
'llama-3.2-11b-vision',
{
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image" },
{ type: "image_url", image_url: { url: imageUrl } }
]
}
]
}
);

The interface automatically handles type checking and validates inputs based on the model selected.

Core Concepts

Model Routing System

Raindrop uses a sophisticated routing system that maps user-friendly model names to provider-specific endpoints. The framework automatically handles model discovery, routing, and response formatting to provide a consistent interface across all available models.

Type-Safe Interfaces

Each model has specific input and output type signatures that provide compile-time validation:

// TypeScript infers correct types automatically
const llmResponse = await env.AI.run('llama-3.3-70b', {
messages: [{ role: "user", content: "Hello" }], // ← Typed input
temperature: 0.7
}); // ← Returns typed LLM output
const embedResponse = await env.AI.run('bge-large-en', {
input: ["text1", "text2"] // ← Typed embedding input
}); // ← Returns typed embedding output

Capability-Based Selection

Models are organized by capabilities to help you choose the right model for your use case:

  • Chat/Completion: Conversational AI and text generation
  • Vision: Image understanding and multimodal processing
  • Embeddings: Text representation and semantic search
  • Audio: Speech-to-text transcription
  • Specialized: Code generation, mathematical reasoning, PII detection

Function Calling

Many chat models support function calling (tool calling), enabling AI models to execute specific functions and integrate with external systems. This allows models to access real-time data, perform calculations, and interact with APIs during conversations.

Standardized Model Interfaces

Raindrop provides standardized TypeScript interfaces for AI model capabilities, ensuring type safety and consistency across different model types.

Audio Processing

Used by: whisper-large-v3, whisper, whisper-large-v3-turbo, whisper-tiny, nova-3, smart-turn-v2

interface AudioInput {
audio: number[] | ReadableStream;
contentType: string;
language?: string;
response_format?: 'json' | 'text' | 'srt' | 'vtt';
}

Vision Analysis

Used by: llava-1.5-7b, uform-gen2-qwen-500m

interface VisionInput {
messages: Array<{
role: 'system' | 'user' | 'assistant';
content: Array<{
type: 'text' | 'image_url';
text?: string;
image_url?: {
url: string;
detail?: 'low' | 'high' | 'auto';
};
}>;
}>;
model: string;
max_tokens?: number;
temperature?: number;
}

Text-to-Speech

Used by: aura-1, melotts

interface TTSInput {
text: string;
voice?: string;
speed?: number;
response_format?: 'mp3' | 'wav' | 'ogg';
}

Image Generation

Used by: flux-1-schnell, stable-diffusion-xl-base-1.0, phoenix-1.0, etc.

interface ImageGenerationInput {
prompt: string;
negative_prompt?: string;
width?: number;
height?: number;
steps?: number;
guidance_scale?: number;
}

Document Reranking

Used by: bge-reranker-base

interface RerankerInput {
query: string;
documents: string[];
top_k?: number;
}

Text Classification

Used by: distilbert-sst-2-int8

interface TextClassificationInput {
text: string;
labels?: string[];
}

Translation

Used by: m2m100-1.2b

interface TranslationInput {
text: string;
source_language?: string;
target_language: string;
}

Summarization

Used by: bart-large-cnn

interface SummarizationInput {
text: string;
max_length?: number;
min_length?: number;
}

Image Classification

Used by: resnet-50

interface ImageClassificationInput {
image: string | File | Blob;
prompt?: string;
}

Chat Completion

Used by: All chat models

interface OpenAIChatInput {
messages: Array<{
role: 'system' | 'user' | 'assistant';
content: string;
}>;
model?: string;
temperature?: number;
max_tokens?: number;
tools?: Array<{
type: 'function';
function: {
name: string;
description?: string;
parameters?: Record<string, unknown>;
};
}>;
tool_choice?: 'auto' | 'required' | 'none';
}

Embeddings

Used by: embeddings, bge-m3, bge-large-en, bge-base-en, bge-small-en, embeddinggemma-300m

type EmbeddingInput = string | { prompt: string } | { input: string }

Interface Benefits

  • Type Safety: Compile-time validation prevents runtime errors
  • Consistency: Predictable patterns across model types and providers
  • Flexibility: Automatic format conversion by the model router
  • Future-Proofing: New models inherit appropriate interfaces automatically

Text Generation Models

Text generation models handle conversational AI, content creation, and language understanding tasks.

High-Performance Models

Advanced models for complex reasoning and long-context applications:

  • llama-3.3-70b - Meta Llama 3.3 70B with advanced reasoning capabilities
  • deepseek-r1 - DeepSeek R1 with advanced chain-of-thought reasoning
  • deepseek-v3-0324 - DeepSeek V3 high-performance model with long context
  • kimi-k2 - Kimi K2 with tool integration capabilities
  • qwen-3-32b - Qwen 3 32B with advanced multilingual capabilities
  • llama-4-maverick-17b - Llama 4 Maverick 17B advanced model
  • gpt-oss-120b - GPT OSS 120B with chain-of-thought capabilities
  • llama-3.1-70b-instruct - Meta Llama 3.1 70B large language model
  • qwen-coder-32b - Qwen 2.5 Coder 32B specialized for code generation
  • deepseek-r1-distill-qwen-32b - DeepSeek R1 distilled model with JSON mode
  • qwen-qwq-32b - QwQ 32B reasoning model capable of thinking and reasoning
  • mistral-small-3.1 - Mistral Small 3.1 with vision understanding and 128K context
  • gemma-3-12b - Gemma 3 12B multimodal model with 128K context
  • llama-3.2-11b-vision - Llama 3.2 11B with visual recognition capabilities
  • llama-4-scout-17b - Meta Llama 4 Scout 17B multimodal model with MoE architecture
  • qwen-1.5-14b - Qwen 1.5 14B with AWQ quantization
  • llama-3.3-70b-instruct-fp8 - Llama 3.3 70B quantized to FP8 for speed

Efficient Models

Optimized for speed and resource efficiency:

  • llama-3.1-8b-external - Meta Llama 3.1 8B for efficient processing
  • llama-3.1-8b-instant - Ultra-fast responses with Llama 3.1 8B
  • gemma-9b-it - Google Gemma 9B instruction-tuned model
  • gpt-oss-20b - GPT OSS 20B efficient reasoning model
  • llama-3.1-8b-instruct - Fast and efficient general-purpose model
  • llama-3-8b-instruct - Reliable model for general tasks
  • llama-3.2-3b-instruct - Compact and efficient model
  • gemma-2b - Lightweight model for basic tasks
  • llama-3.1-8b-instruct-fast - Llama 3.1 8B optimized for speed
  • llama-3.1-8b-instruct-fp8 - Llama 3.1 8B quantized to FP8
  • llama-3.1-8b-instruct-awq - Llama 3.1 8B with AWQ quantization
  • llama-3-8b-instruct-awq - Llama 3 8B with AWQ quantization
  • llama-3.2-1b-instruct - Ultra-compact 1B parameter model
  • gemma-7b - Gemma 7B with LoRA adapter support
  • gemma-7b-it - Gemma 7B instruction-tuned model
  • qwen-1.5-7b - Qwen 1.5 7B with AWQ quantization
  • qwen-1.5-1.8b - Qwen 1.5 1.8B lightweight model
  • qwen-1.5-0.5b - Qwen 1.5 0.5B ultra-lightweight model
  • tinyllama-1.1b-chat-v1.0 - TinyLlama 1.1B chat model

Reasoning Models

Models specifically optimized for reasoning and problem-solving:

  • deepseek-r1-distill-llama-70b - Fast reasoning with long context support
  • qwen-qwq-32b - Reasoning model capable of thinking and reasoning

Code Generation Models

Models specialized for programming and code generation:

  • qwen-coder-32b - Qwen 2.5 Coder 32B specialized for code generation
  • deepseek-coder-6.7b - DeepSeek Coder 6.7B instruction-tuned for code
  • deepseek-coder-6.7b-base - DeepSeek Coder 6.7B base model
  • sqlcoder-7b - SQL code generation model for database queries

Mathematical Models

Models specialized for mathematical reasoning:

  • deepseek-math-7b - DeepSeek Math 7B specialized for mathematical reasoning

Multilingual Models

Models optimized for specific languages or multilingual tasks:

  • llama-3.3-swallow-70b - Japanese-optimized model
  • discolm-german-7b-v1-awq - German language specialized model

Safety & Moderation Models

Models for content safety and moderation:

  • llama-guard-3-8b - Content safety classification
  • llamaguard-7b-awq - LLM prompt and response safety classification

Specialized Chat Models

Other specialized conversational models:

  • starling-lm-7b-beta - Reinforcement learning trained model
  • neural-chat-7b-v3-1-awq - Intel Gaudi optimized model
  • mistral-7b-instruct - Mistral 7B instruction-tuned model
  • mistral-7b-instruct-v0.2 - Mistral 7B v0.2 with 32K context
  • mistral-7b-instruct-v0.2-lora - Mistral 7B v0.2 with LoRA
  • mistral-7b-instruct-awq - Mistral 7B with AWQ quantization
  • una-cybertron-7b-v2-bf16 - Cybertron 7B v2 unified alignment model
  • falcon-7b-instruct - Falcon 7B instruction-tuned model
  • hermes-2-pro-mistral-7b - Hermes 2 Pro with function calling support
  • openhermes-2.5-mistral-7b-awq - OpenHermes 2.5 with code training
  • zephyr-7b-beta-awq - Zephyr 7B with AWQ quantization
  • openchat-3.5 - OpenChat 3.5 with C-RLFT training
  • phi-2 - Microsoft Phi-2 transformer model
  • llama-4-scout-17b - Meta Llama 4 Scout 17B multimodal model

Legacy Models

Older models maintained for compatibility:

  • llama-2-7b-chat-fp16 - Llama 2 7B full precision
  • llama-2-7b-chat-int8 - Llama 2 7B quantized
  • llama-2-7b-chat-hf-lora - Llama 2 7B with LoRA support
  • llama-2-13b-chat-awq - Llama 2 13B with AWQ quantization

Text Generation Usage

Basic Chat Completion:

const response = await env.AI.run('llama-3.3-70b', {
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a haiku about coding" }
],
max_tokens: 100,
temperature: 0.7
});
console.log(response.choices[0].message.content);

Streaming Responses:

const stream = await env.AI.run('llama-3.1-8b-instruct', {
messages: [{ role: "user", content: "Tell me a story" }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

JSON Mode:

const analysis = await env.AI.run('deepseek-r1-distill-llama-70b', {
messages: [
{ role: "user", content: "Analyze this data and return structured JSON" }
],
response_format: { type: "json_object" }
});

Function Calling:

const response = await env.AI.run('llama-3.3-70b', {
messages: [
{ role: "user", content: "What's the weather like in San Francisco?" }
],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA"
},
unit: {
type: "string",
enum: ["celsius", "fahrenheit"],
description: "Temperature unit"
}
},
required: ["location"]
}
}
}
],
tool_choice: "auto"
});
// Handle tool calls in response
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];
if (toolCall.function.name === "get_weather") {
const args = JSON.parse(toolCall.function.arguments);
// Execute function and provide result back to model
}
}

Required Function Calling:

const response = await env.AI.run('kimi-k2', {
messages: [
{ role: "user", content: "Calculate the compound interest on $1000 for 5 years at 3% annual rate" }
],
tools: [
{
type: "function",
function: {
name: "calculate_compound_interest",
description: "Calculate compound interest",
parameters: {
type: "object",
properties: {
principal: { type: "number", description: "Initial investment amount" },
rate: { type: "number", description: "Annual interest rate (as decimal)" },
time: { type: "number", description: "Time period in years" },
frequency: { type: "number", description: "Compounding frequency per year", default: 1 }
},
required: ["principal", "rate", "time"]
}
}
}
],
tool_choice: "required"
});

Vision Models

Vision models process and understand images alongside text for multimodal applications.

Available Vision Models

Vision-Language Models:

  • llava-1.5-7b - LLaVA 1.5 7B vision-language model for image analysis
  • uform-gen2-qwen-500m - Compact vision-language model for image captioning and VQA

Vision Model Usage

Image Description:

const description = await env.AI.run('llava-1.5-7b', {
messages: [{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: {
url: "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ...",
detail: "high"
}
}
]
}],
max_tokens: 300
});

Image Analysis with Context:

const analysis = await env.AI.run('llama-3.2-11b-vision', {
messages: [
{
role: "system",
content: "You are an expert art critic. Analyze images in detail."
},
{
role: "user",
content: [
{ type: "text", text: "Analyze the artistic style and techniques" },
{ type: "image_url", image_url: { url: imageUrl } }
]
}
],
temperature: 0.3
});

Embedding Models

Embedding models convert text into numerical vectors for semantic search, similarity comparison, and retrieval-augmented generation.

Available Embedding Models

High-Quality Embeddings:

  • embeddings - Default embeddings model (BGE Large English v1.5)
  • bge-large-en - BGE Large English 1024-dimensional embeddings
  • bge-base-en - BGE Base English 768-dimensional embeddings
  • bge-small-en - BGE Small English 384-dimensional embeddings

Multilingual Embeddings:

  • bge-m3 - Multi-lingual embeddings supporting 100+ languages
  • embeddinggemma-300m - Gemma 3-based 300M parameter embedding model

Specialized:

  • pii-detection - PII Detection service for identifying personally identifiable information

Embedding Usage

Single Text Embedding:

const embedding = await env.AI.run('embeddings', {
input: "Natural language processing with embeddings"
});
const vector = embedding.data[0].embedding;

Batch Text Embedding:

const embeddings = await env.AI.run('embeddings', {
input: [
"First document text",
"Second document text",
"Third document text"
]
});
embeddings.data.forEach((item, index) => {
console.log(`Document ${index}: ${item.embedding.length} dimensions`);
});

Multilingual Embedding:

const multilingualEmbedding = await env.AI.run('bge-m3', {
text: ["Hello world", "Hola mundo", "Bonjour le monde"]
});

Audio Models

Audio models provide speech-to-text transcription with support for multiple languages and output formats.

Available Audio Models

Speech Recognition:

  • whisper-large-v3 - Advanced speech-to-text transcription
  • whisper - General-purpose speech recognition model
  • whisper-large-v3-turbo - Faster, more accurate speech-to-text
  • whisper-tiny - Lightweight English-only speech recognition
  • nova-3 - Deepgram’s speech-to-text model
  • smart-turn-v2 - Audio turn detection model

Audio Usage

Basic Transcription:

// Audio file from request
const audioFile = await request.blob();
const transcription = await env.AI.run('whisper', {
file: audioFile,
response_format: 'text'
});
console.log(transcription.text);

Text-to-Speech Models

Text-to-speech models convert written text into natural-sounding speech audio.

Available TTS Models

High-Quality TTS:

  • aura-1 - Context-aware TTS with natural pacing and expressiveness
  • melotts - High-quality multi-lingual text-to-speech

TTS Usage

Basic Text-to-Speech:

const speech = await env.AI.run('aura-1', {
text: "Hello, this is a text-to-speech example.",
voice: "default",
response_format: "mp3"
});
const audioBuffer = speech.audio;

Multi-language TTS:

const speech = await env.AI.run('melotts', {
text: "Bonjour le monde",
voice: "french",
speed: 1.0
});

Image Generation Models

Image generation models create images from text descriptions using diffusion techniques.

Available Image Generation Models

Advanced Generation:

  • flux-1-schnell - FLUX.1 12B parameter model for high-quality image generation
  • stable-diffusion-xl-base-1.0 - SDXL base model for high-resolution images
  • stable-diffusion-xl-lightning - SDXL Lightning for fast generation
  • phoenix-1.0 - Leonardo.AI model with exceptional prompt adherence
  • lucid-origin - Leonardo.AI’s most adaptable and prompt-responsive model

Specialized Generation:

  • stable-diffusion-v1-5-inpainting - Stable Diffusion with inpainting capability
  • stable-diffusion-v1-5-img2img - Generate images from input images
  • dreamshaper-8-lcm - Fine-tuned for photorealism

Image Generation Usage

Basic Image Generation:

const image = await env.AI.run('flux-1-schnell', {
prompt: "A serene mountain landscape at sunset",
width: 1024,
height: 1024
});
const imageData = image.data[0].url || image.data[0].b64_json;

Advanced Image Generation:

const image = await env.AI.run('phoenix-1.0', {
prompt: "Professional headshot of a businesswoman, studio lighting",
negative_prompt: "blurry, low quality, distorted",
guidance_scale: 7.5,
steps: 20
});

Image Inpainting:

const inpaintedImage = await env.AI.run('stable-diffusion-v1-5-inpainting', {
prompt: "A red car in the driveway",
image: originalImageBlob,
mask: maskImageBlob
});

Text Classification Models

Text classification models categorize and analyze text content.

Available Text Classification Models

Sentiment Analysis:

  • distilbert-sst-2-int8 - Sentiment classification (positive/negative)

Document Ranking:

  • bge-reranker-base - Document relevance ranking and scoring

Text Classification Usage

Sentiment Analysis:

const sentiment = await env.AI.run('distilbert-sst-2-int8', {
text: "This product is amazing and works perfectly!"
});
console.log(sentiment.label); // "POSITIVE"
console.log(sentiment.score); // 0.98

Document Reranking:

const rankings = await env.AI.run('bge-reranker-base', {
query: "machine learning algorithms",
documents: [
"Neural networks and deep learning techniques",
"Traditional statistical methods",
"Computer vision applications"
],
top_k: 3
});
rankings.ranked_documents.forEach(doc => {
console.log(`Score: ${doc.relevance_score} - ${doc.document}`);
});

Image Classification Models

Image classification models identify and categorize objects within images.

Available Image Classification Models

Object Recognition:

  • resnet-50 - 50-layer CNN trained on ImageNet for object classification

Image Classification Usage

Basic Image Classification:

const classification = await env.AI.run('resnet-50', {
image: imageBlob
});
console.log(classification.label); // "golden_retriever"
console.log(classification.score); // 0.95

Translation Models

Translation models convert text between different languages.

Available Translation Models

Multilingual Translation:

  • m2m100-1.2b - Many-to-many multilingual translation model

Translation Usage

Basic Translation:

const translation = await env.AI.run('m2m100-1.2b', {
text: "Hello, how are you today?",
source_language: "en",
target_language: "es"
});
console.log(translation.translation);

Summarization Models

Summarization models create concise summaries of longer text content.

Available Summarization Models

Extractive Summarization:

  • bart-large-cnn - BART model fine-tuned for text summarization

Summarization Usage

Text Summarization:

const summary = await env.AI.run('bart-large-cnn', {
text: "Very long article text here...",
max_length: 150,
min_length: 50
});
console.log(summary.summary);

PII Detection Models

Personally Identifiable Information (PII) detection models identify sensitive data in text.

Available PII Detection Models

PII Identification:

  • pii-detection - Identifies personally identifiable information in text

PII Detection Usage

Basic PII Detection:

const piiResult = await env.AI.run('pii-detection', {
text: "My name is John Doe and my email is john@example.com. My SSN is 123-45-6789."
});
piiResult.pii_detection.forEach(entity => {
console.log(`${entity.entity_type}: ${entity.text} (confidence: ${entity.confidence})`);
});