AI Models

The Raindrop framework provides access to a comprehensive suite of AI models through a unified interface. These models enable sophisticated AI capabilities including text generation, image processing, speech recognition, language translation, and content analysis directly within your applications.

Raindrop’s AI system abstracts the complexity of working with different AI models while providing a consistent, type-safe interface for all operations. Whether you’re building chatbots, content analysis tools, image processing workflows, or multilingual applications, the AI interface provides the building blocks needed for intelligent applications.

The AI system supports both simple one-shot operations and complex workflows involving multiple model types. All models are accessed through the same env.AI.run() interface, with model-specific input and output types ensuring type safety and clear documentation of capabilities.

Prerequisites

Basic understanding of AI model concepts and use cases
Raindrop framework installed in your project
Familiarity with TypeScript and async/await patterns
Understanding of different AI model types and their applications

Accessing AI Models

AI models are available through the env.AI interface in your services and actors. The interface provides a unified run() method that accepts different model identifiers and their corresponding input types.

export default class extends Service<Env> {
  async fetch(request: Request): Promise<Response> {
    // Generate text using LLaMA model
    const textResult = await this.env.AI.run(
      '@cf/meta/llama-3.1-8b-instruct',
      {
        prompt: "Explain quantum computing in simple terms",
        max_tokens: 150
      }
    );

    return new Response(JSON.stringify(textResult));
  }
}

Text Generation Models

Text generation models create human-like text based on prompts or conversational context.

Available Models

@cf/meta/llama-3.1-8b-instruct - Latest LLaMA 3.1 with 8B parameters
@cf/meta/llama-3-8b-instruct - LLaMA 3 with improved reasoning
@cf/meta/llama-2-7b-chat-fp16 - LLaMA 2 optimized for chat
@cf/mistral/mistral-7b-instruct-v0.1 - Efficient 7B parameter model
@hf/mistral/mistral-7b-instruct-v0.2 - Enhanced instruction following
@cf/deepseek-ai/deepseek-math-7b-instruct - Mathematical reasoning
@cf/defog/sqlcoder-7b-2 - SQL query generation
@cf/google/gemma-7b-it-lora - Google’s Gemma model

const result = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  {
    prompt: "Explain quantum computing in simple terms",
    max_tokens: 150,
    temperature: 0.7
  }
);

{
  "response": "Quantum computing is a new way of processing information that's different from the computers we use today...",
  "usage": {
    "completion_tokens": 100,
    "prompt_tokens": 54,
    "total_tokens": 154
  }
}

Available Input Parameters

prompt (string): Text prompt for generation
messages (array): Chat conversation history with roles (system, user, assistant, tool)
tools (array): Function definitions for tool calling
max_tokens (number): Maximum tokens to generate (1-4096)
temperature (number): Creativity level (0.0-1.0)
top_p (number): Nucleus sampling parameter (0.0-1.0)
top_k (number): Limits vocabulary to top K tokens
repetition_penalty (number): Reduces repetitive text (1.0-2.0)
frequency_penalty (number): Penalty for frequency of token usage
presence_penalty (number): Penalty for presence of tokens
seed (number): Random seed for reproducibility
raw (boolean): Return raw model output
stream (boolean): Stream the response

Chat Messages

Build conversational AI with multi-turn dialogue using role-based message contexts.

Input
Output

const result = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  {
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Explain quantum computing' },
      { role: 'assistant', content: 'Quantum computing uses quantum mechanics...' },
      { role: 'user', content: 'Give me a simple analogy' }
    ]
  }
);

{
  "response": "Think of it like a coin that can be heads, tails, or spinning in the air all at once..."
}

Function Calling

Enable AI models to call external tools and APIs by defining function schemas. The model decides when and how to use these tools.

Input
Output

const result = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  {
    messages: [{ role: 'user', content: 'What\'s the weather in Paris?' }],
    tools: [{
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather for a city',
        parameters: {
          type: 'object',
          properties: {
            city: { type: 'string', description: 'City name' }
          },
          required: ['city']
        }
      }
    }]
  }
);

{
  "tool_calls": [
    {
      "name": "get_weather",
      "arguments": { "city": "Paris" }
    }
  ]
}

Streaming

Stream responses in real-time as they’re generated, improving perceived performance for long outputs.

Input
Output

const stream = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  {
    prompt: "Write a story about AI",
    stream: true
  }
);

ReadableStream<Uint8Array>

Text Classification Models

Text classification models analyze and categorize text content for sentiment analysis, content moderation, and automated categorization.

Available Models

@cf/huggingface/distilbert-sst-2-int8 - Sentiment analysis model trained on movie reviews

Example

Input
Output

const result = await env.AI.run(
  '@cf/huggingface/distilbert-sst-2-int8',
  {
    text: "I absolutely love this new feature! It makes everything so much easier."
  }
);

[
  { "label": "NEGATIVE", "score": 0.00012130672257626429 },
  { "label": "POSITIVE", "score": 0.9998786449432373 }
]

Available Input Parameters

text (string): Text content to classify

Text Embeddings Models

Text embeddings convert text into numerical vectors that capture semantic meaning for similarity search and clustering.

Available Models

@cf/baai/bge-small-en-v1.5 - Efficient 384-dimensional embeddings
@cf/baai/bge-base-en-v1.5 - Balanced performance 768-dimensional
@cf/baai/bge-large-en-v1.5 - Highest quality 1024-dimensional

Example

Input
Output

const embedding = await env.AI.run(
  '@cf/baai/bge-small-en-v1.5',
  {
    text: "Machine learning is transforming how we process data."
  }
);

{
  "shape": [1, 384],
  "data": [
    [0.123, -0.456, 0.789, ...]
  ]
}

Available Input Parameters

text (string | string[]): Single text or array of texts to embed
pooling (string): Pooling method - ‘cls’ or ‘mean’

Batch Processing

Process multiple texts in a single request for improved efficiency and reduced API calls.

Input
Output

const result = await env.AI.run(
  '@cf/baai/bge-base-en-v1.5',
  {
    text: [
      "Machine learning transforms data processing",
      "AI enables intelligent automation",
      "Deep learning uses neural networks"
    ]
  }
);

{
  "shape": [3, 768],
  "data": [
    [0.123, -0.456, 0.789, ...],
    [0.234, -0.567, 0.891, ...],
    [0.345, -0.678, 0.912, ...]
  ]
}

Image Classification Models

Identify and categorize objects, scenes, or concepts in images.

Available Models

@cf/microsoft/resnet-50 - General-purpose image classification with 1000+ categories

Example

Input
Output

const result = await env.AI.run(
  '@cf/microsoft/resnet-50',
  {
    image: imageBytes
  }
);

[
  { "label": "golden retriever", "score": 0.8234 },
  { "label": "dog", "score": 0.7891 },
  { "label": "pet", "score": 0.6543 }
]

Available Input Parameters

image (number[]): Image data as byte array (JPEG, PNG, WebP)

Object Detection Models

Locate and identify multiple objects within images.

Available Models

@cf/facebook/detr-resnet-50 - DETR (Detection Transformer) for object detection

Example

Input
Output

const result = await env.AI.run(
  '@cf/facebook/detr-resnet-50',
  {
    image: imageBytes
  }
);

[
  { "label": "car", "score": 0.9234 },
  { "label": "person", "score": 0.8567 },
  { "label": "traffic light", "score": 0.7891 }
]

Available Input Parameters

image (number[]): Image data as byte array (JPEG, PNG, WebP)

Image-to-Text Models

Convert images into descriptive text using multimodal models.

Available Models

@cf/llava-hf/llava-1.5-7b-hf - LLaVA multimodal model for image understanding
@cf/unum/uform-gen2-qwen-500m - Efficient image captioning model

Example

Input
Output

const result = await env.AI.run(
  '@cf/llava-hf/llava-1.5-7b-hf',
  {
    image: imageBytes,
    prompt: "Describe this image in detail.",
    max_tokens: 100
  }
);

{
  "description": "A serene mountain lake surrounded by pine trees..."
}

Available Input Parameters

image (number[]): Image data as byte array (JPEG, PNG, WebP)
prompt (string): Optional prompt to guide description generation
max_tokens (number): Maximum tokens to generate
temperature (number): Creativity level (0.0-1.0)
top_p (number): Nucleus sampling parameter (0.0-1.0)
top_k (number): Limits vocabulary to top K tokens
seed (number): Random seed for reproducibility
repetition_penalty (number): Reduces repetitive text (1.0-2.0)
frequency_penalty (number): Penalty for frequency of token usage
presence_penalty (number): Penalty for presence of tokens
raw (boolean): Return raw model output
messages (array): Optional chat context for conversational image understanding

Conversational Understanding

Have multi-turn conversations about images, building context and asking follow-up questions about visual content.

Input
Output

const result = await env.AI.run(
  '@cf/llava-hf/llava-1.5-7b-hf',
  {
    image: imageBytes,
    messages: [
      { role: 'user', content: 'What do you see in this image?' },
      { role: 'assistant', content: 'I see a mountain landscape with trees.' },
      { role: 'user', content: 'What season does it appear to be?' }
    ]
  }
);

{
  "description": "Based on the golden foliage visible on the trees, this appears to be autumn or fall season."
}

Text-to-Image Models

Create images from text descriptions using diffusion models.

Available Models

@cf/stabilityai/stable-diffusion-xl-base-1.0 - High-quality image generation
@cf/runwayml/stable-diffusion-v1-5-img2img - Image-to-image transformation
@cf/runwayml/stable-diffusion-v1-5-inpainting - Image editing and inpainting
@cf/bytedance/stable-diffusion-xl-lightning - Fast generation variant

Example

Input
Output

const result = await env.AI.run(
  '@cf/stabilityai/stable-diffusion-xl-base-1.0',
  {
    prompt: "A serene Japanese garden with cherry blossoms",
    num_steps: 20,
    guidance: 7.5
  }
);

ReadableStream<Uint8Array>

Available Input Parameters

prompt (string): Text description of desired image
image (number[]): Optional input image for img2img or inpainting
mask (number[]): Optional mask for inpainting
num_steps (number): Number of denoising steps
strength (number): Conditioning strength for img2img (0.0-1.0)
guidance (number): Classifier-free guidance scale

Image-to-Image & Inpainting

Transform existing images with text prompts or edit specific regions using masks for precise control.

Input
Output

// Image-to-image transformation
const result = await env.AI.run(
  '@cf/runwayml/stable-diffusion-v1-5-img2img',
  {
    prompt: "Turn this into a watercolor painting",
    image: originalImageBytes,
    strength: 0.7
  }
);

// Inpainting with mask
const inpainted = await env.AI.run(
  '@cf/runwayml/stable-diffusion-v1-5-inpainting',
  {
    prompt: "A red sports car",
    image: imageBytes,
    mask: maskBytes
  }
);

ReadableStream<Uint8Array>

Speech Recognition Models

Convert spoken audio into text using automatic speech recognition (ASR) models.

Available Models

@cf/openai/whisper - Multilingual speech recognition
@cf/openai/whisper-tiny-en - Fast English-only model
@cf/openai/whisper-sherpa - Optimized Whisper variant

Example

Input
Output

const result = await env.AI.run(
  '@cf/openai/whisper',
  {
    audio: audioBytes
  }
);

{
  "text": "Hello, this is a test recording.",
  "word_count": 6,
  "words": {
    "word": "Hello",
    "start": 0.0,
    "end": 0.5
  },
  "vtt": "WEBVTT\n\n00:00:00.000 --> 00:00:02.000\nHello, this is a test..."
}

Available Input Parameters

audio (number[]): Audio data as byte array (WAV, MP3, FLAC, etc.)

Translation Models

Translate text between different languages using neural machine translation models.

Available Models

@cf/meta/m2m100-1.2b - Multilingual translation supporting 100+ languages

Example

Input
Output

const result = await env.AI.run(
  '@cf/meta/m2m100-1.2b',
  {
    text: "Hello, how are you today?",
    source_lang: "en",
    target_lang: "es"
  }
);

{
  "translated_text": "Hola, ¿cómo estás hoy?",
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 11,
    "total_tokens": 20
  }
}

Available Input Parameters

text (string): Text to translate
source_lang (string): Source language code (e.g., “en”, “es”, “fr”)
target_lang (string): Target language code

Text Summarization Models

Generate concise summaries of long text content while preserving key information.

Available Models

@cf/facebook/bart-large-cnn - BART model fine-tuned on CNN news articles

Example

Input
Output

const result = await env.AI.run(
  '@cf/facebook/bart-large-cnn',
  {
    input_text: "Artificial Intelligence (AI) has transformed numerous industries over the past decade, fundamentally changing how businesses operate...",
    max_length: 50
  }
);

{
  "summary": "AI has transformed industries through automation and insights. Machine learning processes data to identify patterns, enabling breakthroughs in analytics and vision."
}

Available Input Parameters

input_text (string): Text content to summarize
max_length (number): Maximum length of summary in tokens