AI Models

This content is for the 0.6.3 version. Switch to the latest version for up-to-date documentation.

Raindrop provides access to a comprehensive suite of AI models through a unified interface that abstracts the complexity of working with different AI providers while maintaining type safety and performance. The AI system supports text generation, image processing, speech recognition, language translation, embeddings, and specialized capabilities like code generation and mathematical reasoning.

The framework handles model routing automatically, providing consistent interfaces across all model types. Each model has specific input and output types that ensure compile-time safety while supporting both simple one-shot operations and complex streaming workflows. Advanced models support tool calling for function execution and integration with external systems.

Key benefits:

Unified Interface: Single env.AI.run() method for all model types
Type Safety: Compile-time validation of model inputs and outputs
Tool Calling: Function execution support for compatible models
Automatic Routing: Seamless integration across multiple AI providers
Streaming Support: Real-time response streaming for conversational applications
Advanced Options: Request queuing, caching, and gateway configuration

Prerequisites

Active Raindrop project with AI binding configured
Understanding of TypeScript generics and async/await patterns
Familiarity with AI model concepts (LLMs, embeddings, vision models)
Basic knowledge of your target AI use cases and model requirements

Configuration

AI capabilities are automatically available to all Raindrop applications through the env.AI interface - no manifest configuration required.

application "ai-app" {
  service "api" {
    domain = "api.example.com"
    # AI interface available as this.env.AI
  }
}

Generate the service implementation:

raindrop build generate

The AI interface is available in your generated service class:

export default class extends Service<Env> {
  async fetch(request: Request): Promise<Response> {
    // AI interface available as this.env.AI
    const result = await this.env.AI.run(
      'llama-3.1-8b-instruct',
      { prompt: "Hello, AI!" }
    );

    return new Response(result.response);
  }
}

Access

Access AI models through the env.AI.run() method with model-specific parameters:

// Basic text generation
const response = await this.env.AI.run(
  'llama-3.3-70b',
  {
    messages: [
      { role: "user", content: "Explain quantum computing" }
    ],
    max_tokens: 150
  }
);

// Generate embeddings
const embeddings = await this.env.AI.run(
  'embeddings',
  { input: "Text to embed" }
);

// Process images with vision models
const analysis = await this.env.AI.run(
  'llama-3.2-11b-vision',
  {
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Describe this image" },
          { type: "image_url", image_url: { url: imageUrl } }
        ]
      }
    ]
  }
);

The interface automatically handles type checking and validates inputs based on the model selected.

Core Concepts

Model Routing System

Raindrop uses a sophisticated routing system that maps user-friendly model names to provider-specific endpoints. The framework automatically handles model discovery, routing, and response formatting to provide a consistent interface across all available models.

Type-Safe Interfaces

Each model has specific input and output type signatures that provide compile-time validation:

// TypeScript infers correct types automatically
const llmResponse = await env.AI.run('llama-3.3-70b', {
  messages: [{ role: "user", content: "Hello" }], // ← Typed input
  temperature: 0.7
}); // ← Returns typed LLM output

const embedResponse = await env.AI.run('bge-large-en', {
  input: ["text1", "text2"] // ← Typed embedding input
}); // ← Returns typed embedding output

Capability-Based Selection

Models are organized by capabilities to help you choose the right model for your use case:

Chat/Completion: Conversational AI and text generation
Vision: Image understanding and multimodal processing
Embeddings: Text representation and semantic search
Audio: Speech-to-text transcription
Specialized: Code generation, mathematical reasoning, PII detection

Function Calling

Many chat models support function calling (tool calling), enabling AI models to execute specific functions and integrate with external systems. This allows models to access real-time data, perform calculations, and interact with APIs during conversations.

Standardized Model Interfaces

Raindrop provides standardized TypeScript interfaces for AI model capabilities, ensuring type safety and consistency across different model types.

Audio Processing

Used by: whisper-large-v3, whisper, whisper-large-v3-turbo, whisper-tiny, nova-3, smart-turn-v2

Input
Output

interface AudioInput {
  audio: number[] | ReadableStream;
  contentType: string;
  language?: string;
  response_format?: 'json' | 'text' | 'srt' | 'vtt';
}

interface AudioOutput {
  text: string;
}

Vision Analysis

Used by: llava-1.5-7b, uform-gen2-qwen-500m

Input
Output

interface VisionInput {
  messages: Array<{
    role: 'system' | 'user' | 'assistant';
    content: Array<{
      type: 'text' | 'image_url';
      text?: string;
      image_url?: {
        url: string;
        detail?: 'low' | 'high' | 'auto';
      };
    }>;
  }>;
  model: string;
  max_tokens?: number;
  temperature?: number;
}

interface VisionOutput {
  choices: Array<{
    message: {
      role: 'assistant';
      content: string;
    };
    finish_reason?: string;
  }>;
}

Text-to-Speech

Used by: aura-1, melotts

Input
Output

interface TTSInput {
  text: string;
  voice?: string;
  speed?: number;
  response_format?: 'mp3' | 'wav' | 'ogg';
}

interface TTSOutput {
  audio: ArrayBuffer | Uint8Array;
  response_format?: string;
}

Image Generation

Used by: flux-1-schnell, stable-diffusion-xl-base-1.0, phoenix-1.0, etc.

Input
Output

interface ImageGenerationInput {
  prompt: string;
  negative_prompt?: string;
  width?: number;
  height?: number;
  steps?: number;
  guidance_scale?: number;
}

interface ImageGenerationOutput {
  data: Array<{
    url?: string;
    b64_json?: string;
  }>;
}

Document Reranking

Used by: bge-reranker-base

Input
Output

interface RerankerInput {
  query: string;
  documents: string[];
  top_k?: number;
}

interface RerankerOutput {
  ranked_documents: Array<{
    index: number;
    document: string;
    relevance_score: number;
  }>;
}

Text Classification

Used by: distilbert-sst-2-int8

Input
Output

interface TextClassificationInput {
  text: string;
  labels?: string[];
}

interface TextClassificationOutput {
  label: string;
  score: number;
}

Translation

Used by: m2m100-1.2b

Input
Output

interface TranslationInput {
  text: string;
  source_language?: string;
  target_language: string;
}

interface TranslationOutput {
  translation: string;
  source_lang?: string;
  target_lang: string;
}

Summarization

Used by: bart-large-cnn

Input
Output

interface SummarizationInput {
  text: string;
  max_length?: number;
  min_length?: number;
}

interface SummarizationOutput {
  summary: string;
}

Image Classification

Used by: resnet-50

Input
Output

interface ImageClassificationInput {
  image: string | File | Blob;
  prompt?: string;
}

interface ImageClassificationOutput {
  label: string;
  score: number;
}

Chat Completion

Used by: All chat models

Input
Output

interface OpenAIChatInput {
  messages: Array<{
    role: 'system' | 'user' | 'assistant';
    content: string;
  }>;
  model?: string;
  temperature?: number;
  max_tokens?: number;
  tools?: Array<{
    type: 'function';
    function: {
      name: string;
      description?: string;
      parameters?: Record<string, unknown>;
    };
  }>;
  tool_choice?: 'auto' | 'required' | 'none';
}

interface OpenAIChatOutput {
  choices: Array<{
    message: {
      role: 'assistant';
      content: string;
    };
    finish_reason?: string;
  }>;
  usage?: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

Embeddings

Used by: embeddings, bge-m3, bge-large-en, bge-base-en, bge-small-en, embeddinggemma-300m

Input
Output

type EmbeddingInput = string | { prompt: string } | { input: string }

interface OpenAIEmbeddingOutput {
  data: Array<{
    embedding: number[];
    index: number;
  }>;
  usage?: {
    prompt_tokens: number;
    total_tokens: number;
  };
}

Interface Benefits

Type Safety: Compile-time validation prevents runtime errors
Consistency: Predictable patterns across model types and providers
Flexibility: Automatic format conversion by the model router
Future-Proofing: New models inherit appropriate interfaces automatically

Text Generation Models

Text generation models handle conversational AI, content creation, and language understanding tasks.

High-Performance Models

Advanced models for complex reasoning and long-context applications:

llama-3.3-70b - Meta Llama 3.3 70B with advanced reasoning capabilities
deepseek-r1 - DeepSeek R1 with advanced chain-of-thought reasoning
deepseek-v3-0324 - DeepSeek V3 high-performance model with long context
kimi-k2 - Kimi K2 with tool integration capabilities
qwen-3-32b - Qwen 3 32B with advanced multilingual capabilities
llama-4-maverick-17b - Llama 4 Maverick 17B advanced model
gpt-oss-120b - GPT OSS 120B with chain-of-thought capabilities
llama-3.1-70b-instruct - Meta Llama 3.1 70B large language model
qwen-coder-32b - Qwen 2.5 Coder 32B specialized for code generation
deepseek-r1-distill-qwen-32b - DeepSeek R1 distilled model with JSON mode
qwen-qwq-32b - QwQ 32B reasoning model capable of thinking and reasoning
mistral-small-3.1 - Mistral Small 3.1 with vision understanding and 128K context
gemma-3-12b - Gemma 3 12B multimodal model with 128K context
llama-3.2-11b-vision - Llama 3.2 11B with visual recognition capabilities
llama-4-scout-17b - Meta Llama 4 Scout 17B multimodal model with MoE architecture
qwen-1.5-14b - Qwen 1.5 14B with AWQ quantization
llama-3.3-70b-instruct-fp8 - Llama 3.3 70B quantized to FP8 for speed

Efficient Models

Optimized for speed and resource efficiency:

llama-3.1-8b-external - Meta Llama 3.1 8B for efficient processing
llama-3.1-8b-instant - Ultra-fast responses with Llama 3.1 8B
gemma-9b-it - Google Gemma 9B instruction-tuned model
gpt-oss-20b - GPT OSS 20B efficient reasoning model
llama-3.1-8b-instruct - Fast and efficient general-purpose model
llama-3-8b-instruct - Reliable model for general tasks
llama-3.2-3b-instruct - Compact and efficient model
gemma-2b - Lightweight model for basic tasks
llama-3.1-8b-instruct-fast - Llama 3.1 8B optimized for speed
llama-3.1-8b-instruct-fp8 - Llama 3.1 8B quantized to FP8
llama-3.1-8b-instruct-awq - Llama 3.1 8B with AWQ quantization
llama-3-8b-instruct-awq - Llama 3 8B with AWQ quantization
llama-3.2-1b-instruct - Ultra-compact 1B parameter model
gemma-7b - Gemma 7B with LoRA adapter support
gemma-7b-it - Gemma 7B instruction-tuned model
qwen-1.5-7b - Qwen 1.5 7B with AWQ quantization
qwen-1.5-1.8b - Qwen 1.5 1.8B lightweight model
qwen-1.5-0.5b - Qwen 1.5 0.5B ultra-lightweight model
tinyllama-1.1b-chat-v1.0 - TinyLlama 1.1B chat model

Reasoning Models

Models specifically optimized for reasoning and problem-solving:

deepseek-r1-distill-llama-70b - Fast reasoning with long context support
qwen-qwq-32b - Reasoning model capable of thinking and reasoning

Code Generation Models

Models specialized for programming and code generation:

qwen-coder-32b - Qwen 2.5 Coder 32B specialized for code generation
deepseek-coder-6.7b - DeepSeek Coder 6.7B instruction-tuned for code
deepseek-coder-6.7b-base - DeepSeek Coder 6.7B base model
sqlcoder-7b - SQL code generation model for database queries

Mathematical Models

Models specialized for mathematical reasoning:

deepseek-math-7b - DeepSeek Math 7B specialized for mathematical reasoning

Multilingual Models

Models optimized for specific languages or multilingual tasks:

llama-3.3-swallow-70b - Japanese-optimized model
discolm-german-7b-v1-awq - German language specialized model

Safety & Moderation Models

Models for content safety and moderation:

llama-guard-3-8b - Content safety classification
llamaguard-7b-awq - LLM prompt and response safety classification

Specialized Chat Models

Other specialized conversational models:

starling-lm-7b-beta - Reinforcement learning trained model
neural-chat-7b-v3-1-awq - Intel Gaudi optimized model
mistral-7b-instruct - Mistral 7B instruction-tuned model
mistral-7b-instruct-v0.2 - Mistral 7B v0.2 with 32K context
mistral-7b-instruct-v0.2-lora - Mistral 7B v0.2 with LoRA
mistral-7b-instruct-awq - Mistral 7B with AWQ quantization
una-cybertron-7b-v2-bf16 - Cybertron 7B v2 unified alignment model
falcon-7b-instruct - Falcon 7B instruction-tuned model
hermes-2-pro-mistral-7b - Hermes 2 Pro with function calling support
openhermes-2.5-mistral-7b-awq - OpenHermes 2.5 with code training
zephyr-7b-beta-awq - Zephyr 7B with AWQ quantization
openchat-3.5 - OpenChat 3.5 with C-RLFT training
phi-2 - Microsoft Phi-2 transformer model
llama-4-scout-17b - Meta Llama 4 Scout 17B multimodal model

Legacy Models

Older models maintained for compatibility:

llama-2-7b-chat-fp16 - Llama 2 7B full precision
llama-2-7b-chat-int8 - Llama 2 7B quantized
llama-2-7b-chat-hf-lora - Llama 2 7B with LoRA support
llama-2-13b-chat-awq - Llama 2 13B with AWQ quantization

Text Generation Usage

Basic Chat Completion:

const response = await env.AI.run('llama-3.3-70b', {
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a haiku about coding" }
  ],
  max_tokens: 100,
  temperature: 0.7
});

console.log(response.choices[0].message.content);

Streaming Responses:

const stream = await env.AI.run('llama-3.1-8b-instruct', {
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

JSON Mode:

const analysis = await env.AI.run('deepseek-r1-distill-llama-70b', {
  messages: [
    { role: "user", content: "Analyze this data and return structured JSON" }
  ],
  response_format: { type: "json_object" }
});

Function Calling:

const response = await env.AI.run('llama-3.3-70b', {
  messages: [
    { role: "user", content: "What's the weather like in San Francisco?" }
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get the current weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: {
              type: "string",
              description: "The city and state, e.g. San Francisco, CA"
            },
            unit: {
              type: "string",
              enum: ["celsius", "fahrenheit"],
              description: "Temperature unit"
            }
          },
          required: ["location"]
        }
      }
    }
  ],
  tool_choice: "auto"
});

// Handle tool calls in response
if (response.choices[0].message.tool_calls) {
  const toolCall = response.choices[0].message.tool_calls[0];
  if (toolCall.function.name === "get_weather") {
    const args = JSON.parse(toolCall.function.arguments);
    // Execute function and provide result back to model
  }
}

Required Function Calling:

const response = await env.AI.run('kimi-k2', {
  messages: [
    { role: "user", content: "Calculate the compound interest on $1000 for 5 years at 3% annual rate" }
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "calculate_compound_interest",
        description: "Calculate compound interest",
        parameters: {
          type: "object",
          properties: {
            principal: { type: "number", description: "Initial investment amount" },
            rate: { type: "number", description: "Annual interest rate (as decimal)" },
            time: { type: "number", description: "Time period in years" },
            frequency: { type: "number", description: "Compounding frequency per year", default: 1 }
          },
          required: ["principal", "rate", "time"]
        }
      }
    }
  ],
  tool_choice: "required"
});

Vision Models

Vision models process and understand images alongside text for multimodal applications.

Available Vision Models

Vision-Language Models:

llava-1.5-7b - LLaVA 1.5 7B vision-language model for image analysis
uform-gen2-qwen-500m - Compact vision-language model for image captioning and VQA

Vision Model Usage

Image Description:

const description = await env.AI.run('llava-1.5-7b', {
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "What's in this image?" },
      {
        type: "image_url",
        image_url: {
          url: "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ...",
          detail: "high"
        }
      }
    ]
  }],
  max_tokens: 300
});

Image Analysis with Context:

const analysis = await env.AI.run('llama-3.2-11b-vision', {
  messages: [
    {
      role: "system",
      content: "You are an expert art critic. Analyze images in detail."
    },
    {
      role: "user",
      content: [
        { type: "text", text: "Analyze the artistic style and techniques" },
        { type: "image_url", image_url: { url: imageUrl } }
      ]
    }
  ],
  temperature: 0.3
});

Embedding Models

Embedding models convert text into numerical vectors for semantic search, similarity comparison, and retrieval-augmented generation.

Available Embedding Models

High-Quality Embeddings:

embeddings - Default embeddings model (BGE Large English v1.5)
bge-large-en - BGE Large English 1024-dimensional embeddings
bge-base-en - BGE Base English 768-dimensional embeddings
bge-small-en - BGE Small English 384-dimensional embeddings

Multilingual Embeddings:

bge-m3 - Multi-lingual embeddings supporting 100+ languages
embeddinggemma-300m - Gemma 3-based 300M parameter embedding model

Specialized:

pii-detection - PII Detection service for identifying personally identifiable information

Embedding Usage

Single Text Embedding:

const embedding = await env.AI.run('embeddings', {
  input: "Natural language processing with embeddings"
});

const vector = embedding.data[0].embedding;

Batch Text Embedding:

const embeddings = await env.AI.run('embeddings', {
  input: [
    "First document text",
    "Second document text",
    "Third document text"
  ]
});

embeddings.data.forEach((item, index) => {
  console.log(`Document ${index}: ${item.embedding.length} dimensions`);
});

Multilingual Embedding:

const multilingualEmbedding = await env.AI.run('bge-m3', {
  text: ["Hello world", "Hola mundo", "Bonjour le monde"]
});

Audio Models

Audio models provide speech-to-text transcription with support for multiple languages and output formats.

Available Audio Models

Speech Recognition:

whisper-large-v3 - Advanced speech-to-text transcription
whisper - General-purpose speech recognition model
whisper-large-v3-turbo - Faster, more accurate speech-to-text
whisper-tiny - Lightweight English-only speech recognition
nova-3 - Deepgram’s speech-to-text model
smart-turn-v2 - Audio turn detection model

Audio Usage

Basic Transcription:

// Audio file from request
const audioFile = await request.blob();

const transcription = await env.AI.run('whisper', {
  file: audioFile,
  response_format: 'text'
});

console.log(transcription.text);

Text-to-Speech Models

Text-to-speech models convert written text into natural-sounding speech audio.

Available TTS Models

High-Quality TTS:

aura-1 - Context-aware TTS with natural pacing and expressiveness
melotts - High-quality multi-lingual text-to-speech

TTS Usage

Basic Text-to-Speech:

const speech = await env.AI.run('aura-1', {
  text: "Hello, this is a text-to-speech example.",
  voice: "default",
  response_format: "mp3"
});

const audioBuffer = speech.audio;

Multi-language TTS:

const speech = await env.AI.run('melotts', {
  text: "Bonjour le monde",
  voice: "french",
  speed: 1.0
});

Image Generation Models

Image generation models create images from text descriptions using diffusion techniques.

Available Image Generation Models

Advanced Generation:

flux-1-schnell - FLUX.1 12B parameter model for high-quality image generation
stable-diffusion-xl-base-1.0 - SDXL base model for high-resolution images
stable-diffusion-xl-lightning - SDXL Lightning for fast generation
phoenix-1.0 - Leonardo.AI model with exceptional prompt adherence
lucid-origin - Leonardo.AI’s most adaptable and prompt-responsive model

Specialized Generation:

stable-diffusion-v1-5-inpainting - Stable Diffusion with inpainting capability
stable-diffusion-v1-5-img2img - Generate images from input images
dreamshaper-8-lcm - Fine-tuned for photorealism

Image Generation Usage

Basic Image Generation:

const image = await env.AI.run('flux-1-schnell', {
  prompt: "A serene mountain landscape at sunset",
  width: 1024,
  height: 1024
});

const imageData = image.data[0].url || image.data[0].b64_json;

Advanced Image Generation:

const image = await env.AI.run('phoenix-1.0', {
  prompt: "Professional headshot of a businesswoman, studio lighting",
  negative_prompt: "blurry, low quality, distorted",
  guidance_scale: 7.5,
  steps: 20
});

Image Inpainting:

const inpaintedImage = await env.AI.run('stable-diffusion-v1-5-inpainting', {
  prompt: "A red car in the driveway",
  image: originalImageBlob,
  mask: maskImageBlob
});

Text Classification Models

Text classification models categorize and analyze text content.

Available Text Classification Models

Sentiment Analysis:

distilbert-sst-2-int8 - Sentiment classification (positive/negative)

Document Ranking:

bge-reranker-base - Document relevance ranking and scoring

Text Classification Usage

Sentiment Analysis:

const sentiment = await env.AI.run('distilbert-sst-2-int8', {
  text: "This product is amazing and works perfectly!"
});

console.log(sentiment.label); // "POSITIVE"
console.log(sentiment.score); // 0.98

Document Reranking:

const rankings = await env.AI.run('bge-reranker-base', {
  query: "machine learning algorithms",
  documents: [
    "Neural networks and deep learning techniques",
    "Traditional statistical methods",
    "Computer vision applications"
  ],
  top_k: 3
});

rankings.ranked_documents.forEach(doc => {
  console.log(`Score: ${doc.relevance_score} - ${doc.document}`);
});

Image Classification Models

Image classification models identify and categorize objects within images.

Available Image Classification Models

Object Recognition:

resnet-50 - 50-layer CNN trained on ImageNet for object classification

Image Classification Usage

Basic Image Classification:

const classification = await env.AI.run('resnet-50', {
  image: imageBlob
});

console.log(classification.label); // "golden_retriever"
console.log(classification.score); // 0.95

Translation Models

Translation models convert text between different languages.

Available Translation Models

Multilingual Translation:

m2m100-1.2b - Many-to-many multilingual translation model

Translation Usage

Basic Translation:

const translation = await env.AI.run('m2m100-1.2b', {
  text: "Hello, how are you today?",
  source_language: "en",
  target_language: "es"
});

console.log(translation.translation);

Summarization Models

Summarization models create concise summaries of longer text content.

Available Summarization Models

Extractive Summarization:

bart-large-cnn - BART model fine-tuned for text summarization

Summarization Usage

Text Summarization:

const summary = await env.AI.run('bart-large-cnn', {
  text: "Very long article text here...",
  max_length: 150,
  min_length: 50
});

console.log(summary.summary);

PII Detection Models

Personally Identifiable Information (PII) detection models identify sensitive data in text.

Available PII Detection Models

PII Identification:

pii-detection - Identifies personally identifiable information in text

PII Detection Usage

Basic PII Detection:

const piiResult = await env.AI.run('pii-detection', {
  text: "My name is John Doe and my email is john@example.com. My SSN is 123-45-6789."
});

piiResult.pii_detection.forEach(entity => {
  console.log(`${entity.entity_type}: ${entity.text} (confidence: ${entity.confidence})`);
});