Vibe Check: Genie 3, Claude 4.1, GPT-oss, and GPT-5

Katie Parrott / Vibe Check

Length15m

About this audiobook

Was this newsletter forwarded to you?Sign upto get it in your inbox.It was a crowded week for AI model releases—so crowded, it’s hard not to suspect that the big labs are stepping on each other's toes on purpose in a bid to hog the spotlight. AsSpiralgeneral managerDanny Azizput it in Every’s Discord, “Begun, the petty AI wars have.”In a 24-hour span on Tuesday, the big labs releasedGenie 3, a text-to-world simulator from Google DeepMind that you can actually walk around in (once it’s out of research preview, anyway);Claude Opus 4.1, Anthropic’s incremental (but important) coding upgrade; andgpt-oss-120b and 20b, OpenAI’s firstopen-weight modelssince GPT-2back in 2019.Then came OpenAI again on Thursday, crashing through a wall like the Kool-Aid man toannouncethe arrival of its newest flagship model,GPT-5.We've spent the last few days testing what we can, reading every benchmark table, and parsing the takes. Let's talk about what's new, what each model does best, and what the industry thinks about this AI rush hour.Genie 3: Google's text-to-world machineGenie 3 feels like the most sci-fi release of the week. Unlike earlier text-to-video models likeSora, which generate fixed, non-interactive clips, , Genie 3 creates interactive 3D worlds that respond to your actions within the virtual environment in real time. Think of it as the difference between watching a movie and playing a video game, except the game is generated on the fly from whatever you type.What it's great at:Real-time world building with memory:Genie 3 generates 720-pixel worlds at 24 frames per second and remembers them—maintaining visual consistency for minutes and recalling details from up to a minute ago. You can walk away (virtually) and return, and everything is right where you left it.Promptable world events:Mid-exploration, you can type new commands to alter the simulation in real time. Exploring a desert? Add a thunderstorm. Walking through a forest? Spawn a herd of deer.Physics without programming:Genie 3 develops an intuitive understanding of physics—water flows, objects fall, lighting behaves naturally—without relying on hard-coded engines. The model teaches itself how the world works by remembering what it has generated and reasoning over long time horizons.Claude Opus 4.1: The precision upgradeIf Genie 3 is the flashy sci-fi reveal, Claude Opus 4.1 is the practical upgrade that developers will use every day. It’s the kind of measured improvement that Anthropic has become known for, targeting the specific pain points that matter most in real-world programming workflows.What it's great at:Precision coding:Opus 4.1 achieves 74.5 percent onSWE-benchVerified, a version of the software engineering coding benchmark that’s been checked by humans. That's up from 72.5 percent for the originalOpus 4, and it’s near the top in real-world coding tasks where precision matters more than speed.Making changes across many files without breaking the code:The model excels at making complex changes across multiple files without introducing bugs—a notoriously difficult task that separates good coding assistants from great ones.Enhanced research and detail tracking:Beyond coding, Opus 4.1 improves at in-depth research and data analysis tasks. It remembers small but important facts across long documents (aka detail tracking), and uses agentic search to find and connect relevant information on its own.gpt-oss 120b and 20b: OpenAI's open-source returnFive years after GPT-2, OpenAI is back in the open-source game withtwo modelsthat pack serious reasoning capabilities into surprisingly efficient packages.What they're great at:Efficient reasoning on your own hardware:The larger gpt-oss 120b runs only part of its brain at a time, making it powerful but still able to run on a single high-end graphics processing unit (GPU). The smaller 20b model is light enough to run on a standard laptop with 16 gigabytes of memory, instead of needing expensive cloud servers.Tool use and agentic workflows:Both models excel atchain-of-thought reasoningand can act as intelligent intermediaries. They can decide whether to handle a task themselves, or hand it off to another tool or system..Customizable and permissive:Unlike models you access via API and pay-per-use, gpt-oss models are released under the Apache 2.0 license, so you can download, modify, and use them commercially without paying royalties—even offline on your own hardware.GPT-5: The everything model?OpenAI is betting big on its new flagship model, GPT-5. So big, they’re sunsetting past models you may know and love, like4oand4.5. GPT-5 collapses the choice paralysis of modern AI into a single system that figures out what you need and how hard to think about it.In ChatGPT, it's the end of the model picker. In the API, it's priced to make competitors question their life choices. But for the cutting edge of AI development, it feels more like a refinement of the old paradigm than a glimpse of the new one.What it’s great at:Speed-adaptive intelligence:Routes questions between fast chat mode and deeper reasoning, adjusting how long it “thinks” based on complexity.Aggressive pricing:GPT-5 Standard is $1.25 per million input tokens—1/12th the cost of Claude Opus 4.1—with GPT-5-mini even cheaper, on par with Google’sGemini 2.5 Flash.Beginner-friendly coding:In ChatGPT’s Canvas, can generate working front-end apps from a description—good enough to wow first-time users, if not yet a pro’s first choice.Read our full Vibe Check onGPT-5.What everyone at Every is thinking…… about GPT-5When do we stop switching models?Become apaid subscriber to Everyto unlock this piece and learn what the Every team and developers, writers, and industry leaders think of all of the new models.Upgrade to paidClick hereto read the full postWant the full text of all articles in RSS?Become a subscriber, orlearn more.

Artificial Intelligence

Futuristic

Political Intrigue

Virtual Reality

Memory

Rebellion