Vibe Check: Claude 3.7 Sonnet and Claude Code

Vivian Meng / Context Window

Length12m

About this audiobook

Was this newsletter forwarded to you?Sign upto get it in your inbox.Last week, Anthropic released Claude 3.7 Sonnet, the first “hybrid reasoning” model on the market, and Claude Code, its first agentic coding tool that lives in your terminal.Hybrid reasoning, a term the company has coined, referring to the model’s dual modes of thinking (more on that in a second).We’ve been playing around with them at Every since they first came out. Based on our experience with ChatGPT-4.5—on which we wereinitially lukewarmbeforeit grew on us—we wanted a bit more time to develop our point of view. We’ve now done a vibe check with theStudio teamabout their first impressions of each, as well as from the tech landscape.The all-around consensus: These tools are incredibly powerful, especially for coding. But in order to use them well, you need to know how they’re meant to be used, and what they still struggle with.Corageneral managerKieran Klaassensummed them both up well: “3.7 is great, but not yet ready to get work done—it's too wild. But for new projects, Claude Code is very impressive.”Here’s what’s new, what the team thinks, and what everyone else thinks.What is 3.7 Sonnet?3.7 Sonnet is a hybrid AI model that combines two distinct thinking approaches in one system: quick standard responses and in-depth extended thinking.Standard mode:An upgraded version of 3.5 Sonnet that delivers rapid responses to straightforward queries with improved accuracy and performance.Extended thinking mode:A new feature that enables Claude to demonstratechain of thought reasoning, breaking down complex problems step-by-step and showing its deliberative process on a visible “scratchpad” before providing a final answer.Unlike other models that specialize only in reasoning, or require users to opt in for reasoning over quick responses, 3.7 Sonnet canautomatically detectwhich thinking style is required for the prompt. As a result, you can fluidly move between simple queries and complex reasoning tasks in one conversation, just as you would with a person.Real-world applications3.7 Sonnet’s reasoning capabilities are optimized for real-world tasks designed around how businesses use LLMs, rather than for embodying the persona of, say, aMath Olympiad winner. The result: It’s more practical for the everyday workplace user.Here are a few of its capabilities:It’s a coding whiz.3.7 Sonnet, the model that serves as the foundation for Claude Code, is specifically trained for real-world coding. According to Every’sAlex Duffy, “Sonnet 3.7 is clearly optimized for code over anything else, probably because they saw over one-third of all requests [made were]related to math and/or code.It plays Pokémon.3.7 Sonnet has improved “action scaling,” allowing it to focus and accomplish open-ended tasks, applicable to both real-world tasks and milestones in Pokémon Red. It’s able to iteratively call functions, respond to environmental changes, and work until it’s complete.Source: Anthropic.A developer’s dream.GitHub is integrated into the interface, allowing developers to connect their code repositories and easily give Claude all the information it needs about their code, alongside their prompts—saving you the time of copying and pasting your files into an insanely long prompt, explaining to Claude how they work and interact, andthenasking your question.Let’s talk about Claude CodeAnthropic simultaneously announced the release of Claude Code, the company’s first agentic coding tool, which works directly with your codebase in your terminal, instead of in an integrated development environment (IDE), such as VSCode.It’s Anthropic’s answer to the growing popularity of agentic coding assistants like Cursor and Devin. It enables engineers to delegate tasks to Claude in their terminal. No more copy-pasting your code back and forth between your IDE and browser—just prompt straight in your command line, and get to “Clauding” away.Claude Code and Cursor both use 3.7 Sonnet, but Claude Code is betterA new version of coding agent Cursor was released around the same time, but according to Every’s Klaassen, it’s not as good. “Cursor’s new version succumbs to 3.7 Sonnet’s power and goes off in wild directions, not following instructions nor reading the right files,” he says. “Claude Code is tamed better, but they use the same model under the hood. So it's really about how they use the model and the tool calling structure on top..”How does it work?Let’s say you’ve downloaded a starter template for a side project you’re building from somewhere likeGitHubor v0. Claude can:Explain your codebase:Claude can analyze the code structure and give you a clear explanation of how everything works—"Here's how the app stores tasks, this is how the UI components are organized, and this is how data flows between components."Implement and update functionality:Claude can find the right files to update and write new lines of code.Design and run tests:Claude can test your code to make sure the changes it implemented are working correctly.Compile your code:Claude can build, debug, and run your code until it works.Push and commit your changes to GitHub: Claude can help you manage version control.At each step along the way, Claude asks you to accept or reject the changes it's making, so you’re still in the loop.How much does it all cost?3.7 Sonnet is available to all users, though extended thinking mode is not part of the free tier, and is only available to subscribers of Claude’s paid plans.For API users, in both extended and standard thinking mode, Claude runs the same price as all of its predecessors: $3 per million input tokens and $15 per million output tokens—including thinking tokens.Claude Code does not have a separate pricing structure, and runs on the same token costs outlined above. Still, it’s expensive: “It's just tough when you spend 25 percent of your monthly Cursor subscription on a single problem with Claude Code,”Duffy says.What everyone at Every is thinkingSonnet 3.7 is too eager to helpBecome apaid subscriber to Everyto unlock this piece and learn about:The Every team's assessments of Claude 3.7 Sonnet and Claude CodeImpressions from the tech worldUpgrade to paidClick hereto read the full postWant the full text of all articles in RSS?Become a subscriber, orlearn more.

Artificial Intelligence

Futuristic

Exploration