Skip to playerSkip to main content
  • 5 months ago
Testing ChatGPT-5 and comparing it to ChatGPT 4o and other older models. This is a pretty substantial setup up.
Transcript
00:00I've been using the new ChatGPT5 for the last few days.
00:03This will, starting today, become your default ChatGPT from now on, even if you're a free user.
00:08So, how is it better?
00:10Well, the first thing you'll probably notice is speed.
00:12So, this laptop has the previous model, GPT-40, loaded up.
00:16This has GPT-5.
00:17How many rings does Saturn have?
00:22Go!
00:24Oh my god.
00:25GPT-5 took 3.1 seconds.
00:27This took 6.5.
00:29How many distinct iPhones has Apple made?
00:33Go!
00:35Oh goodness, okay.
00:368.3 seconds, 11.2 seconds.
00:40How many possible type combinations exist in Pokémon?
00:46Not far off.
00:47Actually, quite far off.
00:48This has massively overcomplicated it.
00:505 seconds, 11 seconds.
00:52So really, when you're asking light questions like this,
00:55GPT-5 is between 30 to 50% faster than the old model.
00:59But I wouldn't really say that speed was one of the key issues before.
01:03There's actually a much more important change happening this generation.
01:06So, if you're a really big user of ChatGPT and you're paying the monthly subscription,
01:11it's actually been really freaking confusing to use.
01:14Because you open up the app and you have options.
01:16You can either pick GPT-4-0, O3, O3 Pro, O4 Mini, O4 Mini High.
01:22And each one of these models is its own AI that you can talk to.
01:25And each one is slightly better than the others at one thing or another.
01:28And there's even more than those when you click here to expand.
01:31How on earth is any normal consumer supposed to be able to decide which one to use for which question?
01:36I have always wondered, if this AI is actually smart enough to be in the top 10% of test takers
01:42sitting the most gruelling exams on the planet, why can't it also just know what's the best model
01:47for your specific question and make the choice for you?
01:50Well now, finally, GPT-5 can.
01:53This model effectively brings together all the different capabilities of the
01:58previously separated specialized models into one general model.
02:02Which means if you ask it a question that requires no thinking, then as we've seen,
02:06it will use a very light model that's going to be even faster than the old one.
02:09But it also means that if you then ask a heavier question that would benefit from more thinking,
02:15it can now do that thinking without you needing to specifically select a mode designed for it.
02:20So let's try something very complex.
02:22Make me a game of Tetris that I can play in Canvas.
02:28This is going to mean I can actually play it on the app.
02:30So I'm doing this first on the old ChatGPT 4.0.
02:33This is the best model that you'd have had access to before without paying.
02:37Still though, easily one of the coolest use cases of an AI.
02:41We're approaching 200 lines of code here.
02:43And okay, that took 57 seconds.
02:47And okay, it's very simple.
02:49This is like the most basic possible form of Tetris.
02:52But I suppose at the same time, I have just entered one sentence and generated a game.
02:57So now I want to try the exact same thing, but using 04 Mini High.
03:02So this in the previous generation is the model that would have been
03:06specifically recommended to you for coding.
03:08And you can only get this by paying the monthly subscription.
03:11Go.
03:12Oh, blimey now, this is significantly faster.
03:14It's done already.
03:16What?
03:17That was 14 seconds.
03:18And the game is actually almost identical to what ChatGPT 4.0 produced.
03:23Just in quarter of the time.
03:26Okay, let's try it on five.
03:27And the thing that I'm curious to see here is A, does it decide that this is something
03:31that would benefit from more thinking time and take longer?
03:34And if it does, does it then make it better?
03:36Go.
03:37I'd say it's coding slower than 04 Mini High, but still faster than 4.0.
03:42Okay, that was actually pretty fast too.
03:4519 seconds.
03:46Interestingly, more lines of code than both of them.
03:49So it seems like it is making a more complex game.
03:52That is like worlds apart.
03:55So it's the same game, but the pieces now have divides in them.
03:58It's giving me a score.
03:59It's giving me a level.
04:00It's showing me the upcoming piece.
04:02There's controls at the bottom.
04:03Like, okay, don't get me wrong.
04:05None of this is something that you couldn't have achieved using previous ChatGPT models.
04:09But it's just the fact that it had the intuition to bake it all in completely unprompted.
04:14That's quite impressive.
04:16Yeah, it's a vastly better game of Tetris.
04:18While we're here though, let's just do something crazy.
04:21So using 04 Mini High, make me a playable game of chess where the pieces are all Pokémon
04:27with sprites from the internet and make it look premium so that the sprites fill up the squares.
04:33Go.
04:35An error occurred while trying to run your game.
04:37It's very cool that you can ask it to fix its own bugs.
04:40But in my experience, there's a fairly low success rate.
04:44And enter.
04:46Loading this time.
04:47Error again.
04:47Come on, third time lucky.
04:48Ah, yes, finally.
04:50It doesn't look terrible.
04:52I mean, it has gone out and picked up Pokémon images from like the very first games in the 90s.
04:58Also, it doesn't work.
04:59Give me an error message every single time I touch a piece.
05:02Same thing in GPT-5.
05:04Go.
05:05Ah, I rate that.
05:05You see it specifically said, thinking longer for a better answer.
05:09And thinking longer, it absolutely is.
05:11This has taken like three times longer than 04.
05:14It's actually a little bit crazy just to what extent it's going based on my simple command.
05:20Like you can see it coding in like themes and ways to evolve these Pokémon.
05:25And like seemingly an entirely new rule set that themes chess itself to Pokémon.
05:32I'm actually so curious what it's hooked up here.
05:34Oh, wow.
05:35Oh, my God.
05:36That is a world apart.
05:37It's the exact same thing at the Tetris.
05:39It's like an implicit understanding of what makes the game look more premium.
05:45So if I click a piece.
05:46Okay.
05:47It not only works, but it also highlights all of the legally available moves
05:52that you can make with that piece.
05:54It tells you whose turn it is at the top here.
05:56It is actually just leagues ahead of last gen.
05:59Right now, at least, GPT-5 is feeling a lot less like a sidekick.
06:03That's useful, but only useful in so much as you have the coding knowledge yourself to be able to fix it.
06:08But okay.
06:09Coding is still obviously a bit of a niche use.
06:11The thing that I would say is most useful to the most people for AI
06:15is just making sure that it's right.
06:17A hallucination has been a very real issue with AI.
06:20This ability of these chatbots to tell you with full confidence and a straight face
06:25something that is actually completely incorrect.
06:27And let's be honest, that's not an issue that's suddenly going to be fixed in one generational jump.
06:32But they are saying it's better.
06:33Thanks to a combination of more powerful hardware running this AI,
06:37integration of user feedback from the older models,
06:39as well as just better ways to benchmark AI,
06:42and therefore better ways to identify places in which it's lacking.
06:46So put all these things together,
06:47it should mean ChatGPT 5 doesn't get things wrong as much.
06:50Let me try one that I know ChatGPT used to struggle with,
06:53because I was asking it this two days ago.
06:55Give me 10 tech products made by food brands that I can actually buy.
07:02So this is Taforo.
07:04What?
07:04McDonald's X T-Mobile phone?
07:07I'm very skeptical.
07:08This is not a thing.
07:09McDonald's did not make a phone with T-Mobile.
07:12And then it's got the KFC gaming console,
07:14which you definitely can't buy.
07:15You see what I mean right though?
07:16It just invents stuff with complete confidence.
07:20Let's see if five is any better.
07:21I'm a little skeptical.
07:22Coca-Cola mini fridge.
07:25Okay.
07:26Oreo made smart speakers.
07:28Did they though?
07:29No, to be honest,
07:31GPT 5 is not doing any better of a job than 4 was.
07:35It is still inventing stuff.
07:37So this still does need work.
07:39Here's another one I know that used to really trip up the older AI.
07:42What AI are you?
07:44It just gives me gobbledygook.
07:47What about here?
07:48What AI is this?
07:51I mean, a much more human oriented answer.
07:54This is ChatGPT with the GPT-5 model.
07:56I specifically asked the question in a slightly obtuse way
08:00to see if it could still figure out my intentions.
08:02And this did.
08:04Right.
08:04For me, GPT has been pretty good,
08:07but also pretty inconsistent when it comes to creative tasks.
08:10So I want to give both of these something
08:11that is extremely challenging to pull off.
08:14Make me a YouTube thumbnail for a Mr. Who's the Boss video
08:17titled I tested every Star Wars gadget ever.
08:23Off you go.
08:23And it's pretty clear why this is such a challenge.
08:26It's got to do my face.
08:27It's got to understand the YouTube thumbnail image size.
08:30It's got to decide what out of all these potential Star Wars gadgets
08:34is the most clickable to an average user.
08:36Right, Sephora first.
08:37Second one's not bad.
08:38It actually has me holding like a Star Wars blaster and a lightsaber.
08:41That's pretty good.
08:42I'm just quite cursed.
08:43And I don't think those are products you can buy.
08:46They're just objects from the universe.
08:48What have you done?
08:49I would actually say that's worse.
08:51It's a little bit less good composition.
08:53The text isn't as Star Wars-y and it's a square.
08:56YouTube thumbnails have to be 16 by 9.
08:59What if you were planning a 30th birthday party
09:02and you want it to be Star Wars themed?
09:04How well can these make the invitations?
09:06Addressed to Rick.
09:087pm to 11pm.
09:10Star Wars formal attire.
09:12And go.
09:17Oh, this is taking a real long time.
09:20That's actually so good.
09:21It's Vader in a suit.
09:22It's just the right level of sophisticated for a 30 year old
09:25who happens to still be doing a Star Wars party.
09:27Maybe you could have cut one of these Star Wars logos.
09:30But apart from that, GPT-5 is actually really bland.
09:34It just looks like so much less effort has been put into that than 4.0.
09:38So it's hard to make a sweeping conclusion based on just a few tests.
09:42But one thing's pretty clear.
09:44GPT-5 is not a step up when it comes to image generation.
09:48But one thing that OpenAI is saying GPT-5 should have for sure is better writing flair.
09:54Which is actually one of the things I've noticed lacking in the older models.
09:57They've always felt a little bit wooden.
10:00Kind of like they're focusing on getting you an answer.
10:03As opposed to actually thinking about the delivery of that answer.
10:06So how about we do the old how soon will I be jobless test.
10:09Write me a 30 second Mr. Who's the Boss tech fail about the windows phone.
10:14Three, two, one, go.
10:16Okay, these are actually quite different looking answers.
10:21Okay, let's do the 4.0 one.
10:22And you've got to tell me how close this gets to feeling like I've made it.
10:27You ever hear the phrase, great idea, wrong decade?
10:29That was the windows phone.
10:31In 2010 Microsoft launched a slick, fast, super stable phone OS that nobody wanted.
10:36Why?
10:37No Instagram, no Snapchat, no YouTube app.
10:39Even the calculator felt like a cry for help.
10:41They tried paying developers to make apps.
10:43Didn't work.
10:44They bought Nokia for 7.2 billion to boost hardware sales.
10:47Still didn't work.
10:48Then they deleted Nokia.
10:50By 2017 Windows Phone had just 0.1% market share.
10:53Microsoft's official advice, switch to Android or iPhone.
10:57To me, that feels about what you'd expect.
10:59I feel like it's not quite done the best job of
11:02picking the parts of that story that were the most important to tell.
11:06Also, the calculator on Windows Phone was actually pretty great.
11:09So not at all a reason it failed.
11:11The cool thing about this GPT-5 answer is it not just has the script,
11:15but it also has notes for how to film it
11:18and what b-roll shots we should be using to show each line.
11:21So this is the Windows Phone.
11:23Microsoft's big attempt to take on iOS and Android.
11:26It looked different, it felt different, and it failed spectacularly.
11:30Turns out, having no apps in 2012 was like opening a restaurant without food.
11:34Developers didn't bother, customers didn't care, and Microsoft just gave up.
11:38By 2017, the dream was dead.
11:40And now, the only thing it's good for is being a retro paperweight.
11:44Rest in peace Windows Phone, you tiled too close to the sun.
11:49It's good. I'd say it still needs work, but that's actually a much better starting point.
11:53And it's using analogies in a way that actually makes you think, huh, great point.
11:58So, ultimately, GPT-5 is not better at every single thing,
12:03but it's still one of those tech launches that's pretty hard to complain about.
12:07Like, it's clearly a lot better in a lot of ways.
12:10They've integrated all the complexity into one simple model,
12:13so they're going to start to phase out the whole
12:16this model for coding and this model for reasoning.
12:18The monthly subscription is staying the same price.
12:20And also, you still get access to GPT-5, even if you're not paying.
12:25There are a couple of asterisks.
12:26If you're not a paying user, then it is capped.
12:28You'll get, say, eight to ten requests per day using the most powerful GPT-5.
12:32And it's looking like, when you go beyond that limit,
12:35you will then use a less powerful, but still more powerful than last generation, GPT-5 mini.
12:40And that's it.
12:41Who is athen-a-t zu-an sonraki?
12:42And that's it.
12:44And that's it.
12:46And that's it.
12:47I'll keep going beyond that, because I think.
12:48And I'm going beyond that much.
12:49And we'll keep going beyond that one of the things that I've got in this case.
12:52Which can be a little more than a part of...
12:52But I think that...
12:54...
12:54...
13:00What can I do?
Be the first to comment
Add your comment

Recommended