Anthropic

Claude Haiku 4.5 API

Claude Haiku 4.5 is the smallest and fastest Claude — sub-second time-to-first-token, very low cost, and still strong on routine tasks. Use it when latency or volume matters more than nuance.

Input
$1.00
/ 1M tokens
Output
$5.00
/ 1M tokens
Context
200K
tokens
Vision
Yes

Top use cases

  • High-throughput classification and tagging
  • Real-time chat with strict latency budgets
  • Lightweight agents and routing models
  • Pre/post-processing in larger pipelines

Use Claude Haiku 4.5 in 30 seconds

ModelServer is OpenAI-compatible. Point your existing OpenAI SDK at modelserver.dev/v1 and set the model name to claude-haiku-4-5.

claude-haiku-4-5.py
from openai import OpenAI

client = OpenAI(
    api_key="sk-modelserver-...",
    base_url="https://modelserver.dev/v1",
)

response = client.chat.completions.create(
    model="claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "Hello, Claude Haiku 4.5!"}
    ],
)

print(response.choices[0].message.content)

Frequently asked questions

How much does Claude Haiku 4.5 cost?
Claude Haiku 4.5 is priced at $1.00 per 1M input tokens and $5.00 per 1M output tokens via ModelServer. ModelServer adds a flat 5.5% platform fee on top — no markups on individual tokens, no monthly minimum.
What is the Claude Haiku 4.5 context window?
Claude Haiku 4.5 supports a 200K token context window. You can put roughly 150,000 words in a single prompt.
Is Claude Haiku 4.5 OpenAI-compatible via ModelServer?
Yes. Point your OpenAI SDK base_url to https://modelserver.dev/v1 and set model="claude-haiku-4-5". Existing OpenAI-SDK code works without modification.
Who is Claude Haiku 4.5 best for?
High-volume, latency-sensitive workloads.
Does Claude Haiku 4.5 support vision input?
Yes. Claude Haiku 4.5 accepts image inputs alongside text. Pass images as base64 or URL in the OpenAI-compatible message format.

Other Anthropic models

Claude Haiku 4.5 API — Pricing & Access — ModelServer