Vibe Check: Claude 4 Opus

Dan Shipper / Chain of Thought

Length13m

About this audiobook

Was this newsletter forwarded to you?Sign upto get it in your inbox.This week has been a doozy: I went to Microsoft Build andinterviewed the company's CTOKevin Scott, weannounced our fundraisein theNew York Times, Google held its I/O event (more on that fromAlex Duffytomorrow), OpenAIacqui-hiredApple designerJony Ive,and today I’m at Anthropic’s Code With Claude event. Let me state for the record: I am tired of all of this progress. My fingers feel like they are about to fall off, and my brain is functioning at a comparable intelligence to GPT-2.But there’s a new Claude model launching today, for which I had to uphold my promise of writingday-o, hands-on vibe checks. So here it is for the long-awaited Claude 4 Opus (which Anthropic had code-named Linen), the follow-up model toClaude 3.7 Sonnet. (Besides, who needs fingers when voice-to-text AI is this good?)I tried Opus on a variety of tasks, from coding to writing to researching. My verdict: Anthropic cooked with this one. In fact, it does some things that no model I’ve ever tried has been able to do, including OpenAI’so3and Google’sGemini 2.5 Pro.Let’s get into benchmarks. We’ll start, as always, with theReach Test.The Reach Test: Do we reach for Opus over other models?For day-to-day tasks...Become apaid subscriber to Everyto unlock the rest of this piece and learn about:The results of more than a half-dozen benchmarks that Every ran on Claude 4 Opus, assessing daily tasks, writing and editing, coding, game creation, knowledge discovery, imagination, and moreWhat really matters now for the model raceUpgrade to paidClick hereto read the full postWant the full text of all articles in RSS?Become a subscriber, orlearn more.

Artificial Intelligence

Futuristic

Robots

Memory

Exploration