Chapter 04 of 14 · Part 1: Foundations

Chapter 4: The Three Claude Models — Haiku, Sonnet, Opus

By the end of this chapter, you will know which Claude model to choose for any given agent task — and understand the real pricing differences so you can design cost-effectively from day one.


The Big Idea

Not every task needs the most powerful model. Using Opus for a task Haiku handles well is like sending a neurosurgeon to stitch a paper cut — technically capable, but expensive and unnecessary.

Claude Managed Agents supports three current model families, each with a different point on the capability/cost curve. Choosing the right one matters: the same task can cost five times more with Opus than with Haiku. For workflows that run many sessions or handle high volume, that multiplier compounds fast.

According to the models overview documentation, the current supported models for Managed Agents (all Claude 4.5 and later) are:

Model Claude API ID Context Window Max Output Input / Output Pricing
Claude Opus 4.7 claude-opus-4-7 1M tokens 128k tokens $5 / $25 per MTok
Claude Sonnet 4.6 claude-sonnet-4-6 1M tokens 64k tokens $3 / $15 per MTok
Claude Haiku 4.5 claude-haiku-4-5-20251001 200k tokens 64k tokens $1 / $5 per MTok

All three support adaptive thinking. Sonnet 4.6 and Haiku 4.5 also support extended thinking. (Models overview)

The session runtime cost — separate from token costs — is $0.08 per session-hour for active runtime. (Anthropic pricing)

DiagramThree model cards arranged side by side. Each card has: model name at top, Claude API ID in monospace, key stats (context window, max output), pricing badge, and a "Best for" summary at the bottom. Color coding: Haiku = green (fast/cheap), Sonnet = yellow (balanced), Opus = blue (powerful/deep). A "cost multiplier" bar at the bottom showing 1x / 3x / 5x relative input pricing.

The Analogy

Think of hiring for a research project.

You could hire a research intern (Haiku) — fast, eager, handles structured tasks reliably, costs less. Perfect for data entry, summarizing documents, running defined workflows.

You could hire a senior analyst (Sonnet) — strong reasoning, handles ambiguity well, balances depth and speed. Good for most professional-grade work where you need judgment, not just execution.

You could hire a domain expert consultant (Opus) — the deepest expertise, best on complex multi-step reasoning and open-ended problems with no clear path. Worth the premium when the problem genuinely requires it; overkill when it doesn't.

The mistake most people make is defaulting to the consultant for everything. Prototype with Haiku. Upgrade to Sonnet when you need more judgment. Reserve Opus for tasks where complexity demands it and you've confirmed the simpler models aren't adequate.

DiagramThree-tier pyramid. Bottom (largest): Haiku — "Fast. Structured. High volume." Middle: Sonnet — "Balanced. Professional. Most tasks." Top (smallest): Opus — "Complex reasoning. Agentic coding. Step-change quality." Price labels on the right: $1/$5, $3/$15, $5/$25 per MTok. Small callout: "Start at the bottom. Move up only when you need to."

How It Actually Works

Claude Opus 4.7 — The Deep Reasoner

API ID: claude-opus-4-7 Context window: 1M tokens Max output: 128k tokens Pricing: $5 input / $25 output per MTok

Opus 4.7 is described in the models overview as "Our most capable generally available model for complex reasoning and agentic coding, with a step-change jump over Claude Opus 4.6."

The key phrase: "step-change jump." This isn't marginal improvement — it's a different level of capability on hard problems. When your task involves:

  • Multi-step code that requires architectural decisions
  • Complex analytical work with many interdependent variables
  • Open-ended research where the path forward isn't clear
  • Tasks where errors have high downstream consequences

...Opus 4.7 is the right choice. The 128k token output limit also matters — it can produce longer, more complete outputs in a single turn than the other models.

Note on fast mode: The speed: fast option is available for Claude Opus 4.6 (not 4.7) with dedicated rate limits separate from standard Opus rate limits. If you need Opus-class reasoning with faster response time and can work with Opus 4.6 capabilities, this is an option. Pass the model as an object: {"id": "claude-opus-4-6", "speed": "fast"}. (Agent setup)

Claude Sonnet 4.6 — The Balanced Workhorse

API ID: claude-sonnet-4-6 Context window: 1M tokens Max output: 64k tokens Pricing: $3 input / $15 output per MTok

Sonnet 4.6 hits the sweet spot for the majority of production agent workflows. It handles:

  • Content creation and editing
  • Code generation for typical engineering tasks
  • Data analysis and reporting
  • Structured research with clear requirements
  • Customer-facing agents where quality matters but deep reasoning isn't required

At roughly 60% of Opus's input cost and 60% of its output cost, Sonnet is the default recommendation for most production workloads. Unless you've tested with Sonnet and found specific capability gaps, start here.

Sonnet 4.6 supports both adaptive thinking and extended thinking, giving it access to reasoning chains for harder problems while still being faster and cheaper than Opus.

Claude Haiku 4.5 — The Fast, Economical Option

API ID: claude-haiku-4-5-20251001 Context window: 200k tokens Max output: 64k tokens Pricing: $1 input / $5 output per MTok

Haiku 4.5 is five times cheaper than Opus on input and five times cheaper on output. For the right tasks, that difference is irrelevant — Haiku does the job just as well.

Those tasks are:

  • High-volume, structured pipelines where the task is well-defined
  • Extraction and transformation tasks (parse this JSON, reformat this data)
  • Classification and tagging
  • Simple Q&A over structured documents
  • Prototype development and testing (before you commit to a production model)

The context window is 200k tokens, compared to 1M for Opus and Sonnet. For most tasks, this is more than enough. If you're working with very large codebases or extremely long documents, Sonnet or Opus may be necessary.

Cache Pricing: The Hidden Saving

All three models support prompt caching, which the Managed Agents harness uses automatically. The cache pricing is:

Model Cache Write Cache Read
Claude Opus 4.7 $6.25/MTok $0.50/MTok
Claude Sonnet 4.6 $3.75/MTok $0.30/MTok
Claude Haiku 4.5 $1.25/MTok $0.10/MTok

(Anthropic pricing)

Cache read costs are dramatically lower than input costs. The harness uses a 5-minute TTL for cache entries. For workflows with repeated context (a stable system prompt, a large codebase mounted at session start), the effective input cost can drop substantially. The session usage object tracks cache_creation_input_tokens and cache_read_input_tokens separately, so you can measure the actual savings.

Session Runtime Cost

Beyond token costs, every session incurs a runtime charge: $0.08 per session-hour for active runtime. (Anthropic pricing)

A session running for 30 minutes costs $0.04 in runtime. Running one agent session continuously for a month costs about $58 in runtime alone — before token costs.

This means:

  • Short, focused sessions are more cost-efficient than long, wandering ones
  • Sessions should be closed when work is complete rather than left running
  • For high-frequency tasks, batch your requests into single sessions where possible

Choosing Your Model: A Decision Framework

Is the task well-defined and high-volume?
  → Haiku 4.5

Does the task require professional-grade judgment or balanced capability?
  → Sonnet 4.6 (default for most production agents)

Does the task involve complex multi-step reasoning, hard agentic coding, 
or problems where you need maximum capability and cost isn't the constraint?
  → Opus 4.7

Are you prototyping or testing?
  → Start with Haiku, upgrade when Haiku fails the task

Adaptive Thinking vs. Extended Thinking

Both are thinking features available on the current model generation:

  • Adaptive thinking is available on Opus 4.7, Sonnet 4.6, and Haiku 4.5 — the model dynamically adjusts how much reasoning it applies.
  • Extended thinking is available on Sonnet 4.6 and Haiku 4.5 — it enables longer, deeper reasoning chains for harder problems.

Note: Extended thinking is listed as not available on Opus 4.7 in the models overview. Opus 4.7 supports adaptive thinking but not extended thinking. Sonnet 4.6 supports both.

This is a nuance worth knowing: for tasks where you want both maximum capability and extended thinking chains, Sonnet 4.6 with extended thinking may outperform Opus 4.7 for certain problem types.

DiagramDecision tree flowchart. Starting node: "What does my task need?" Three branches: (1) "Well-defined, repeatable, high volume" → Haiku 4.5 (green). (2) "Professional judgment, general capability, production quality" → Sonnet 4.6 (yellow). (3) "Deep reasoning, complex coding, open-ended problems" → Opus 4.7 (blue). Second-level node on the Sonnet branch: "Need deep reasoning chains?" → "Yes" → Sonnet 4.6 with Extended Thinking.

Try it yourself

Try It Yourself

  1. Find the models overview page. Go to docs.anthropic.com/en/docs/about-claude/models/overview and look up the current models table. Confirm the API IDs match what's in this chapter.

  2. Calculate the cost of your planned task. Estimate the input and output tokens your agent will use per session. (A good starting estimate: a typical system prompt is 500–2,000 tokens; a full codebase for a medium project might be 20,000–100,000 tokens; agent output per session might be 5,000–30,000 tokens.) Calculate the cost at Haiku, Sonnet, and Opus rates. Feel the difference.

  3. Create an agent pinned to Haiku for prototyping:

    ant beta:agents create \
      --name "Prototype Agent" \
      --model '{id: claude-haiku-4-5-20251001}' \
      --system "You are a helpful assistant." \
      --tool '{type: agent_toolset_20260401}'
    

    (Haiku ID verbatim from the models overview. Use this during development to keep costs low.)

  4. Create a second agent using Sonnet (for comparison):

    ant beta:agents create \
      --name "Production Agent" \
      --model '{id: claude-sonnet-4-6}' \
      --system "You are a helpful assistant." \
      --tool '{type: agent_toolset_20260401}'
    

    (You'll run the same task on both in Chapter 5 and compare outputs.)

  5. Add a session runtime line to your cost calculation. For your planned workflow: how long will a typical session run? Multiply by $0.08/hour. Is this a meaningful cost for your use case?

DiagramCost calculator mockup. Input fields: "Number of sessions per month," "Average input tokens per session," "Average output tokens per session." Three output rows (Haiku, Sonnet, Opus) showing computed monthly token cost + runtime cost + total. Design as a simple table layout that could be recreated as a spreadsheet.

Common pitfalls

Common Pitfalls

  • Defaulting to Opus for everything. Unless you've tried a simpler model and found it lacking, start with Haiku for prototyping and Sonnet for production. The 5x cost multiplier between Haiku and Opus adds up quickly.

  • Forgetting session runtime costs. Token pricing gets all the attention, but $0.08/session-hour is a real cost for long-running agents. A session that runs for 10 hours costs $0.80 in runtime — before any token costs. Design sessions to be task-focused and close them when work is complete.

  • Using the wrong API ID format. Model IDs must be exact. claude-haiku-4-5-20251001 has a date suffix that the other two don't. Double-check the models overview before coding.

  • Not tracking per-session usage. The session object includes a usage field with cumulative token statistics. Fetch the session after it goes idle to read the actual cost. Without this, you're flying blind on real-world costs.

  • Assuming the newest model is always best for your use case. Opus 4.7 is the most capable model generally. But for a well-defined, high-volume task, Haiku might have a 98% accuracy rate at 20% of the cost. Run both on a sample and measure before committing to a model.


Toolkit

Toolkit

  • Model Selection Cheat Sheet — Quick-reference card with all three model IDs, context windows, max output tokens, and pricing in a single table, plus the decision framework from this chapter.

  • Cost Calculator Template — A spreadsheet template to calculate monthly token and runtime costs across all three models based on your estimated usage.


Chapter Recap

  • The three supported models are Claude Opus 4.7 (claude-opus-4-7), Claude Sonnet 4.6 (claude-sonnet-4-6), and Claude Haiku 4.5 (claude-haiku-4-5-20251001). All are supported in Managed Agents. All require Claude 4.5 or later.
  • Pricing ranges from $1/$5 per MTok (Haiku) to $5/$25 per MTok (Opus) for input/output tokens. Cache read costs are dramatically lower. Session runtime costs $0.08 per active hour.
  • The default recommendation: prototype with Haiku, run production on Sonnet, upgrade to Opus only when you've verified a simpler model isn't sufficient for the task. Cost discipline early compounds over time.