Why o3 Is the Best Model Yet for Real-world Learning

Rhea Purohit / Learning Curve

Length10m

About this audiobook

Was this newsletter forwarded to you?Sign upto get it in your inbox.WhenOpenAI’s new reasoning model o3came out, Every’s CEODan Shipperand OpenAI’sSam Altmanagreed that AI is changing the future of learning: If you aren’t using it to learn every day,they said, you’re “not going to make it.”OK, I thought, I’ve got a challenge for o3: Make me physically stronger. Ten times stronger, in fact.It’s been a life goal of mine to improve my chinups. I started 2024 unable to do even one, and months of working out alone got me nowhere. It wasn’t until I started working with acalisthenics trainer, Silvia, that I finally, after half a dozen focused sessions, got my first shaky repetition.Now I want to do ten.What better way to test AI’s capacity for teaching people in the real world than to ask it to help me achieve a goal I’ve never even come close to?The more I thought about it, the more I liked this plan. I’d pit GPT-4o against o3 and see which model gave me a better chance of progressing from one to 10 unassisted chin-ups. I wanted to know which one would be a better teacher: 4o, the fast and reliable model I’ve been using as my daily driver, or o3, the more advanced reasoning model. Would either be up to the task? Would one emerge victorious? Let’s find out.What I’m going to judge GPT-4o and o3 onI would useOpenAI’s older standard model GPT-4oand o3 separately to generate a training plan. I created a set of rubrics against which to evaluate the models, based on what I think matters when you’re trying to learn something in the real world: quick feedback so you don’t make the same mistake over and over again, advice that’s tailored to your specific situation, incremental progress, and the motivation to keep going.Responsiveness:How quickly do I get feedback?Personalization:Is the advice tailored to me?Progress:Does it help me get closer to my goal?Motivation:How excited am I to keep showing up and putting in the work?To judge the LLMs’ training plans, I also needed to define what “good” looks like. I trust my trainer, and she’s already delivered real results—so her guidance and the techniques she uses with me will serve as my baseline, the standard against which I’ll measure everything else.Become apaid subscriber to Everyto unlock the rest of this piece and learn about:How a simple exercise plan can reveal models' strengths and weaknesseso3's ability to catch tiny details and understand them correctlyThe subtleties of o3's communication style that set it apartUpgrade to paidClick hereto read the full postWant the full text of all articles in RSS?Become a subscriber, orlearn more.

Artificial Intelligence

Futuristic

Psychological

Journey

Exploration

Healing