Anthropic Drops Claude Opus 4.6: Beats GPT-5.2, Gets 1M Token Context, and Ships Agent Teams

Feb 5 2026, Anthropic just launched Claude Opus 4.6 — and the benchmarks are brutal for competitors.

The new model outperforms OpenAI’s GPT-5.2 by 144 Elo points on real-world work tasks, scores highest in the industry on agentic coding, and becomes the first Opus-class model with a 1 million token context window.

It’s live now on claude.ai and the API.

The Numbers That Matter

144 Elo points ahead of GPT-5.2 on GDPval-AA (knowledge work tasks in finance, legal, and other domains)
190 Elo points ahead of its own predecessor, Claude Opus 4.5
#1 score on Terminal-Bench 2.0 (agentic coding evaluation)
#1 score on Humanity’s Last Exam (complex reasoning)
#1 score on BrowseComp (finding hard-to-find information online)
76% vs 18.5% — Opus 4.6 vs Sonnet 4.5 on long-context retrieval (MRCR v2)
90.2% on BigLaw Bench (legal reasoning) — highest of any Claude model
38 out of 40 blind cybersecurity investigations won against Claude 4.5

What’s Actually New

1M Token Context Window (Beta)

Opus 4.6 is the first Opus-class model that can handle a million tokens. Premium pricing kicks in above 200k tokens ($10/$37.50 per million input/output).

Agent Teams in Claude Code

You can now spin up multiple AI agents that work in parallel, coordinate autonomously, and tackle tasks together. Best for codebase reviews and read-heavy work that splits into independent chunks.

Adaptive Thinking

Previously, extended thinking was either on or off. Now the model decides when deeper reasoning would actually help — and developers can tune how selective it is.

Effort Controls

Four levels: low, medium, high (default), max. Dial down if the model is overthinking simple tasks. Dial up for complex problems.

Context Compaction

Long-running tasks used to hit the context window and stall. Now Claude can automatically summarize older context and keep going.

128K Output Tokens

Opus 4.6 can generate outputs up to 128,000 tokens in a single response — no need to break large tasks into multiple requests.

Claude in PowerPoint (Research Preview)

New. Claude can now build presentations from scratch, read your layouts and fonts, and stay on brand. Pairs with the upgraded Claude in Excel for full document workflows.

What Partners Are Saying

The early access feedback is unusually strong:

“Claude Opus 4.6 is the biggest leap I’ve seen in months. I’m more comfortable giving it a sequence of tasks across the stack and letting it run.” — Austin Ray, Staff Software Engineer, Ramp

“Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time.” — Gregor Stewart, Chief AI Officer, SentinelOne

“Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories.” — Yusuke Kaji, General Manager AI, Rakuten

“Both hands-on testing and evals show Claude Opus 4.6 is a meaningful improvement for design systems and large codebases. It also one-shotted a fully functional physics engine.” — Eric Simons, CEO, Bolt.new

The Safety Angle

Anthropic says Opus 4.6 shows “an overall safety profile as good as, or better than, any other frontier model in the industry.”

Key points:

Lowest rate of over-refusals (refusing harmless requests) of any recent Claude model
Low rates of deception, sycophancy, and misuse cooperation in automated audits
Six new cybersecurity probes to detect misuse of the model’s enhanced hacking capabilities
Full system card published with comprehensive testing details

Pricing

Unchanged: $5 per million input tokens / $25 per million output tokens

Premium pricing for 1M context (above 200k tokens): $10/$37.50 per million.

US-only inference available at 1.1× token pricing for compliance-sensitive workloads.

The Bottom Line

Claude Opus 4.6 isn’t a minor version bump. It’s a significant capability jump that:

Dominates benchmarks across coding, reasoning, search, and professional work tasks
Finally gives Opus users the million-token context window Sonnet already had
Ships genuinely new features (agent teams, adaptive thinking, context compaction)
Maintains or improves safety alignment despite the intelligence gains

Available now on claude.ai, the API, Amazon Bedrock, and Google Cloud Vertex AI.

Model string for developers: claude-opus-4-6