GEMINI part en C0illes !

#ParlonsIA CHAT GPT - Perplexity -Claude

1 like de la video = 1 merci ❤️   MES FORMATIONS →   https://parlonsia.teachizy.fr/    🔗 Rejoins la communauté IA & Business  🌐 https://parlonsia.teachizy.fr  📺 https://www.youtube.com/@IAExpliquee.x  📺 https://www.youtube.com/@ParlonIAhizy.fr  📘 Facebook : https://bit.ly/4kabhuA  🐦 Twitter / X : https://x.com/ParlonsIAx  📩 Contact : formation.ai87@gmail.com  🎙️Podcast – https://spoti.fi/4dqZ3uO  ✍ BLog: https://medium.com/@flma1349/  💃https://www.tiktok.com/@parlonsia  🎁 Gagner des Formations : https://bit.ly/cadeauxIA  ---------------------------  IA a tester :   Coder agent IA: https://bit.ly/Coder_agentiA  Short AI: http://bit.ly/4lzE782  SEO agent IA : https://urlr.me/P8AS5N code rection 25% : PARLONSIA25   Gemini 2.5 accélère les cas d’usage avec thinking models, thinking budget et 2.5 flash-lite en preview à lowest latency et cost.  Fonctionnalités clés: function calling, code execution, url context et grounding with google search pour des flux temps réel.  Pour la conception, gemini api et prompt design strategies offrent des best practices pour designing prompts sur gemini ai models.  Côté expérience, google gemini mobile app propose gemini live avec microphone, upload an image, drag and drop et canvas.  En productivité, deep research agit comme un beginner’s guide pour structurer et valider des workflows de prompts.  #gemini 2.5 #gemini api #thinking models #gemini live #deep research

Transcript

00:00In this video, I'm going to tell you about the brand new arrival of Gemini 2.5

00:04which did not arrive alone but arrived with 3 new models.

00:08Gemini 2.5 PRON, FLASH and FLASH LITTLE were just released just a few hours ago

00:13and I'm going to reveal some of the advanced features to you.

00:17Head to the official documentation to explain what I discovered.

00:20The 2.5 FLASH model above all, it is a model which today has a dual operating mode,

00:27an advanced reasoning mode and a FLASH mode, i.e. immediate response.

00:32This is an automatic switch that is performed based on the complexity of your domain.

00:37The best part about this AI is that it's completely free.

00:42I already showed you this in previous benchmarks.

00:45The 2.5 FLASH model, of course, remains one of the cheapest models on the market today

00:51with the highest cost-performance ratio for the lowest costs.

00:56Besides official benchmarks, I often show you independent benchmarks

01:00and it is an equivalent of DeepSync versions.

01:03Above V3 and below R1 at coding level,

01:07but on the other hand clearly superior in terms of writing.

01:10With the ability to actually handle multiple requests.

01:13So we are already in reasoning models.

01:15We are on a thinking model that was updated.

01:17And the arrival of a FLASH LITTLE, that is to say there, we are on an ultra competitive system in terms of price.

01:240.40 cents for life to everyone who makes chatbots and AI agents.

01:30You have a highly profitable model to be able to deploy extremely high-performance and fast systems today.

01:36I'm not going to dwell on the GPQA part because that depends enormously on the benchmarks that were used,

01:40test conditions.

01:42We have talked about it at length, I refer you to the videos on the tests we have already done.

01:45But I want to tell you about the announcement of 2.5 Pro on version 0605.

01:51Version 0605 is a tool that we have already used and that I showed you a few hours before the release.

01:57I saw that the new models were going to be released and I made a video for you on the page.

02:02Those who were connected were able to see a preview of what was going to happen.

02:05And just before they replace the version for me, they tell us that officially, they have not made an update on 0605.

02:12But I'm not sure. So, since I filmed just before and I'm giving you the film sequence,

02:18We're going to compare how the 0605 behaves because there's a very big problem.

02:24In any case, you will see it in the test version that I will put for you just after,

02:28of response malfunction on prompts.

02:31This is a model that has greatly reduced inference time.

02:35They did the same thing to us that they did with the O1 at OpenAI.

02:38At one point we had an O1 Preview version that worked very well.

02:41And he released an O1 version for the general public, like the O4 version,

02:45who suddenly realized that he responded almost immediately,

02:49but that he no longer followed the instructions at all, which he did in his own way.

02:52And that's exactly what I saw on 0605.

02:55And that's why I made the video for you just before he deleted it.

02:58We will see by doing a second test if he really reacts in the same way,

03:02that is, it no longer respects prompts when performing interaction steps.

03:06What I suggest is that you immediately switch to the interface

03:09and add the new model compared with the New 2.5 Pro version

03:15and see how he will behave in terms of exchanges.

03:17So what's going to be super interesting is I'm going to send a text.

03:21And normally, as you will see in the following video,

03:240506 will stop to ask me if I want to make any changes

03:29to the podcast creation plan before you start editing the dialogue and structure.

03:34Whereas the previous version, which was 0605, no longer respected the directives at the prompt level.

03:40And so, in my opinion, it was in Gemini's best interest to intervene very quickly.

03:45And in my opinion, I wasn't the only one who had this problem.

03:48Which made it seem pretty obvious to me that they announced that 3 models were going to be updated.

03:55And if we don't have the same behavior there, now, as what you're going to see right after,

04:00I'll show you how they behaved.

04:01Just in the video I showed you to tell you that 3 versions of Gemini were going to be released

04:05and which allowed me to record them.

04:07So here we see that they behave in roughly the same way,

04:10of 5 Pro and the previous one.

04:11And you see, we no longer have the same way of operating at all.

04:14And we have a stop sign that has been put in place.

04:15So, officially, you see the fixes.

04:18And you see one thing clearly.

04:19This is because in advanced training you were already aware of this problem and this bug.

04:24which has now been resolved.

04:26That is, if you were making AI agent and chatbot

04:29and that you had switched all your prompts to the version that was deployed 0.6.0.5,

04:34I'm speaking a little bit for the specialists, I agree,

04:36but at some point you have to put your foot down

04:39and see what works and what doesn't.

04:41And not just promoting at G24.

04:44You will see right after that the version we had just a few hours ago

04:48no longer responded to user interactions.

04:50He was drawing straight, he was drawing like a straight line system

04:54without creating an interaction zone while it was structured at the prompt level.

04:59So here we have something almost identical.

05:01If you look between right and left,

05:03we have sections that have not been cut in the same way.

05:06But overall, what we realize,

05:09is that in the analysis structure,

05:10we have different stages which were almost similar.

05:13So they made updates and that reassures me.

05:15And I can tell you this evening that we will be able to test

05:17so the new version 0.6.0.5

05:20because in my opinion they have corrected this problem.

05:22And I'll put the sequence right after you.

05:24to show you what happened and what was happening.

05:28So if most of you have been using the previous version,

05:30and well after, I know that there are quite few people

05:33who use prompt engineering advance,

05:35that is, the system's ability to work in an advanced manner.

05:38What's going on?

05:39Let me explain.

05:40I sent a text sequence

05:41which is one of my videos inside.

05:43And I actually asked the AI

05:45to break down the entire structure of my text,

05:47to correct all mistakes,

05:49to understand how the concept of my video was structured

05:52and structure a dialogue on a podcast

05:55and to build all the main lines.

05:57And then interact with me to tell me

05:59if I wanted to change the plan,

06:00if I wanted to give him some guidance,

06:02if I wanted to give him a timing.

06:04So there I can still interact with the AI

06:06to give him direction.

06:07I'm going to say that I approve of the plan.

06:09I didn't read, I didn't watch.

06:10I wanted to do a test.

06:11And you will see what will happen in the next step,

06:13is that he will meditate on the entire dialogue.

06:15But of course, these dialogues are not taken from nowhere.

06:17They are actually taken from my videos

06:19which are converted into dynamic form.

06:22And there you actually have the beginning of a dialogue structure.

06:25So what we see, already the first thing,

06:27it is that at the level, we will say, of the writing speed,

06:30they changed the inference time from 0.6.

06:33That's the first thing I can tell you.

06:35because in terms of time,

06:37I think they went back to something much slower,

06:39much slower.

06:40And that was the big problem there was,

06:42that is, the previous model.

06:44But I still find the same problem.

06:45Look carefully at what happened.

06:4631 seconds 0.4, 40 seconds.

06:49So, we are going to see something that is very clear.

06:51I will copy this sequence

06:52and I'll just check the word count.

06:55Let me explain why.

06:55Because one of the second settings

06:57which was done on version 0.6,

06:59It was a reduction in inference time.

07:02And that means they spent less time thinking.

07:04And less time to respond too.

07:05They did, in quotation marks,

07:07savings on exit tokens.

07:09So it costs them less processors.

07:11With, precisely, shorter response sequences.

07:14And I think that's still there.

07:15First glance, I would say that

07:17if he stayed less time

07:18while the subjects are the same,

07:20I think they didn't change that point.

07:22We will test it immediately.

07:23So we will count the number of words.

07:251,483.

07:27And I'm almost certain

07:28that we have at least 20% more

07:30on the other version.

07:31Ah, much more than that.

07:32So you see,

07:32That's the second point to know.

07:34And you won't see that again.

07:35on other channels.

07:36Version 0.5.0.6

07:38is once again more suitable

07:40if you want to manage

07:41long sequences of text.

07:43They still narrowed the window.

07:45Officially, it has not been changed.

07:47But that's the way it works.

07:48of the neural system

07:49which no longer works in exactly the same way,

07:51which no longer goes as far or as much

07:53than version 0.5.0.6

07:56which, admittedly, is a little slower.

07:57But you saw,

07:58will produce content for me

08:00much more aligned with demand.

08:02Please note that in the parameter requests,

08:05there were 2000 word releases

08:07with a variable of 15%.

08:10So we are perfectly respected

08:11on this version here,

08:12when we are not

08:13on the 5 Pro Preview version.

08:15So, what that means is

08:16that if today,

08:16you generate long sequences of text,

08:18it is better to stay

08:20on version 0.5.0.6.

08:21That's the first point.

08:22So, now,

08:23what will interest us,

08:24it's to see how

08:25both models will react

08:27on the code generation part.

08:29compare to 0.5.0.6

08:31and I will explain to you

08:31immediately what has changed.

08:33First thing,

08:34model 0.6.0.5,

08:36so the brand new

08:37who is currently

08:37which should be replaced,

08:39in my opinion,

08:40if they replaced it so quickly,

08:41It's because they understood

08:42that there was a problem.

08:43The major problem,

08:44It is the respect of the instructions.

08:46There was no more

08:47instruction alignment.

08:48However,

08:49it's a model

08:49which is much better

08:50on the editorial part.

08:52That is to say, it looks like

08:52much more to a GPT-4O

08:54in the way of writing.

08:55It's more fluid.

08:56But on the respect of instructions,

08:57if you put

08:58the two models in parallel,

08:59to do this,

09:00you simply have

09:00to click,

09:01add the models

09:02and you put them in parallel

09:03by synchronizing

09:05their abilities.

09:06For example,

09:06internet research,

09:07For example,

09:08the system prompt

09:09that you can synchronize

09:10on both sides

09:11when you do

09:12the modifications,

09:12which actually allows you

09:13to see systematically

09:14the evolutions of the models.

09:15SO,

09:16if I will ask

09:17to the model there,

09:18here present,

09:19and I send him

09:20a request

09:20with a prompt system

09:22which aims,

09:23Normally,

09:23to work

09:24sequentially,

09:25model 0506

09:27should respect the prompt

09:29and ask me

09:30a confirmation of the plan

09:31before starting work,

09:33which is quite

09:33what is expected.

09:34Do you want me to generate

09:35now the full text?

09:37And model 0605

09:38risk,

09:39in my opinion,

09:40as per his previous habit,

09:41to start directly

09:43the part of the work

09:44without at all

09:45ask me anything.

09:46And unfortunately,

09:47that's the problem

09:48that I met.

09:48You see,

09:49here we have respect

09:50instructions,

09:50an alignment,

09:51here we don't have any at all

09:52and that's part of it

09:53major issues.

09:54And then,

09:55they no longer respect at all

09:56data such as

09:58the length of the content,

09:59information control.

10:01Brief,

10:01it was quite complicated

10:02to manage.

10:03And in my opinion,

10:04This is one of the reasons

10:04for which we have the exit.

10:05But that's not all.

10:06We will send a request

10:09to create a giant colorful Boyd

10:11inside a hexagon

10:13rotating

10:13with the nebulae

10:15supernova type.

10:17So we will try

10:17to do something

10:18visually dynamic.

10:19We'll see already

10:20the speed at which

10:20both models are moving forward.

10:22In general,

10:22I have the impression

10:23than the 2.5 Pro 06

10:25is faster.

10:26So it's always

10:27this story

10:27of inference time.

10:28I look,

10:29the cutoff is identical

10:30and I asked that

10:32in P5JS

10:34and no HTML

10:35to see the part

10:36of generation.

10:37And I propose to you

10:38to compare

10:38what will we do

10:39both models

10:40in terms of quality

10:41visual,

10:42rendering

10:43to see if there is

10:43a difference

10:44in the way

10:44which he works.

10:45So which of the two?

10:46will finish first?

10:47Come on,

10:47the 5th and 6th

10:48they are almost finished

10:49at the same time.

10:50We will copy the code

10:51and that's it

10:52the old version

10:53who edited this for us

10:54still quite quickly.

10:56Okay,

10:56there is no drag

10:57exceptional.

10:58However,

10:58there is a job

10:59rebound

10:59which is done at the level

11:00of the structure.

11:01It's not bad

11:01but it is not either

11:02out of the ordinary.

11:03Come on,

11:03we will see what will happen to us

11:05the new version

11:06and it's still

11:06actually prettier.

11:08It's still prettier.

11:09It's better finished.

11:10It's better worked.

11:12Effectively,

11:12there is an evolution.

11:14SO,

11:14there is something

11:14anyway

11:15which is more graphic

11:16I would say in the new version

11:17in the way

11:17whose code he interpreted.

11:19We could even have

11:19ask him for a system

11:20to be able to accelerate

11:22the rotation.

11:23I will ask him

11:23to add in fact

11:24three variables

11:26who will be

11:27the number of elements

11:28who will compose

11:29the nebula.

11:30Afterwards,

11:30the possibility

11:31to increase speed

11:31of the hexagon

11:32of the nebula

11:33and the number of elements

11:34composing the nebula.

11:35We're going to see something

11:36more dynamic

11:36and interact.

11:37The 2.5 Pro Preview

11:38has almost already finished

11:40while the 0.5.0.6

11:41is still in the game

11:43UX integration.

11:44SO,

11:45the fact that it's fast

11:46it is not necessarily

11:46good sign.

11:47I already told you.

11:48On the part

11:48text management

11:50inference,

11:51I showed it to you,

11:51it was already a question

11:52a number of problems

11:53because he doesn't respect

11:54necessarily the instructions

11:55especially in the part

11:56content dimension.

11:58Come on,

11:58we will see from a visual point of view

11:59This is what we can see.

12:01Error,

12:01syntax,

12:01symbol,

12:02presenting.

12:02All right,

12:03so there is an error

12:04at the code level.

12:05SO,

12:05first element

12:05and we have the version

12:072.5 Pro Preview

12:08keep thinking.

12:10SO,

12:10what we can do here,

12:11it's a little bit annoying

12:12but we will try

12:12to have it corrected

12:13this version here

12:14by Gemini.

12:15We will try

12:16to give him

12:16the error indications.

12:170.5.0.6

12:18was much longer

12:19but it can also be

12:20the opportunity to correct

12:21what doesn't work.

12:22In any case,

12:23we will test it immediately

12:24and we have integration

12:26buttons

12:27which were requested.

12:28SO,

12:28we don't have the part

12:29here,

12:30we have the part

12:31speed of the hexagon

12:32which is reduced,

12:33which is accelerated,

12:34the speed of the nebula

12:35inside

12:36faster and faster

12:37and the number

12:37nebulas.

12:38SO,

12:38everything is perfectly respected.

12:40What did I tell you?

12:40just now ?

12:41Inference time

12:42and speed

12:42when sometimes

12:43we are ecstatic about the speed

12:44is not necessarily

12:45sign of optimal functioning.

12:48There,

12:48I'm going to slow down

12:48the nebulas,

12:49I will leave them

12:50continue to accelerate.

12:51SO,

12:52it is not necessarily

12:52sign precisely

12:54that AI takes enough

12:55of time

12:56to understand

12:56any errors.

12:57SO,

12:57what we are going to do here,

12:58that's what we're going to check

12:58if the latest version

13:00is able

13:01to correct this

13:02and unfortunately,

13:03he does it for me in HTML.

13:05SO,

13:05what we are going to do,

13:05we're still going

13:06give a second chance

13:07to 2.5 Pro Preview

13:09and you see,

13:10That,

13:10it's part of things

13:11who immediately

13:12make me say

13:12attention,

13:13we had an excellent model

13:15and there,

13:16we are still

13:16on the same principle.

13:17by the GMI 2.5 Pro,

13:18version 0.6

13:19update

13:20just a few hours ago

13:22to a complexion of inference

13:23extremely short,

13:24that's what I'm telling you,

13:25while we are still

13:26in the reflection part

13:27on the other model

13:27which takes much more

13:28time to compare.

13:29SO,

13:30we will do the test,

13:30we will delete,

13:31we will update,

13:32we will restart

13:33and this time,

13:34it works.

13:34Okay,

13:35Perfect.

13:35SO,

13:35he found the mistake

13:36and it's still

13:37actually better presented

13:38comparatively

13:39than what we had here

13:40in relation to the layout.

13:42SO,

13:42there is a better one,

13:43rotation speed,

13:43rather nice

13:44with the little effect

13:45of alloremanence,

13:46speed of the nebula

13:48faster and faster

13:49in travel

13:49and the number of elements

13:51and it's pretty cool.

13:52Come on,

13:52we validate this part,

13:54he managed to find

13:55the solution to his problem

13:56although I am still

13:57really surprised

13:58in ultra-reduced time

14:01with which he answers us.

14:02What I suggest you do,

14:03it is to do other tests

14:04pushed on the code part

14:06and make a little video game

14:09directly with both models

14:11and we will compare

14:12what they will be able to do.

14:12Come on,

14:13we started a runner game,

14:15a game where you have to move forward

14:17with a little dinosaur,

14:18with a background,

14:19still in P5JS.

14:22SO,

14:22under this blow,

14:23I will not give them

14:23only one chance

14:24because the next step,

14:25it is to do it squarely

14:26a Space Invader game

14:27well-off

14:28and he finished first

14:29of a few seconds

14:31once again.

14:32SO,

14:32we still have

14:32a time saver

14:33of 30 seconds,

14:34which is still considerable.

14:36Come on,

14:36we will test immediately,

14:37we'll see what happens

14:38and we still have

14:39an error

14:40on the first shot.

14:41So,

14:41that's the point

14:42which worries me every time,

14:43it is that systematically,

14:45if we have errors

14:46and that we must resume

14:47the code part,

14:48you don't stop

14:49to go back and forth

14:50and that's not what we want.

14:51Come on,

14:51so we're going to test it

14:52the old version,

14:530.5.0.6

14:54which was much slower

14:56and who made us

14:56a game of ino

14:57which is functional.

14:58Press start

14:59to start

15:00and we have a game of ino

15:01which is functional

15:01where we're going to jump

15:02and it's absolutely functional.

15:03Good,

15:04we even have the little button

15:05to start again

15:06So I find that

15:07really super nice

15:07and we have a game

15:08which is absolutely functional

15:09and which worked the first time.

15:11SO,

15:11you see again,

15:12we're going to waste time

15:13this time

15:14to want to return

15:15on the part

15:15of the code sequence

15:16which is dysfunctional.

15:18Come on,

15:19I had said

15:19that I wasn't going to put it on

15:20but in the end

15:21It bothers me a lot

15:22because we got

15:23something just now

15:24on a Nebula

15:24who was super pretty

15:25and I would like to obtain

15:26something that works

15:27but it still does

15:28two out of two

15:29two requests

15:30on which the model

15:30he's going too fast,

15:32he doesn't take the time

15:33to analyze sufficiently

15:34long his code

15:35and you imagine

15:36if you are obliged

15:37systematically

15:37to return

15:38as we do there

15:39Currently.

15:39SO,

15:40you see,

15:40what I had identified

15:41what other influencers

15:42will not do

15:43it's that they are not going

15:43look for the limits

15:44of the system.

15:46SO,

15:46it's sure that

15:46if you ask

15:47the pancake recipe

15:49to an AI

15:49it's not a problem

15:50but if you work

15:51if you develop code

15:52if you are in the process

15:54to analyze files

15:55and that you need

15:56data

15:56we can't afford

15:57to have shells

15:59also important

16:00in the generation part.

16:01Know the models

16:02on the fingertips

16:05how to intervene

16:07how to make prompts

16:07sequential

16:08all that

16:09This is what I teach you

16:09in training

16:10don't forget to put

16:11a like

16:11a comment

16:12it's a bit crucial

16:13for algorithms

16:14and if you want to have

16:14a little bit of content

16:15with creators

16:16who continue to invest

16:17for you

16:17move

16:19otherwise you continue

16:20to eat

16:20of training

16:21at 99 euros

16:22in which you are told

16:23you tell the AI

16:24that he is a great developer

16:25and you will have a great application

16:26and make you believe

16:27that you will get everything

16:28simply

16:29by asking the AI

16:30make me a video game

16:31and that you will have

16:32whatever you want.

16:33it's not the house speech

16:34and we are a little bit

16:35the only ones unfortunately

16:36to have a speech of honesty

16:37on the field of AI

16:39without overselling yourself

16:40the capabilities of the models

16:41but by making people aware

16:42that there are enormous possibilities

16:44but also

16:44technical problems

16:45to know how to solve.

16:47So press, paste, jump

16:48come on, let's go

16:49click to start

16:50we will click to start

16:51we'll see a little bit

16:52what it gives

16:53at the jump level

16:53just now it was very dynamic

16:55it's my fault

16:56that's what I took

16:56the same button

16:57that he gave me just now

16:58to jump

16:58and I tried

16:59he changed

17:00yeah ok

17:01I find that the fluidity

17:02On the other hand

17:03was a little bit better

17:04on the previous game

17:06In any case

17:07on the visual level

17:07come on, let's take it

17:08a quick look

17:08compared to clouds

17:09in the backgrounds

17:10I find it correct

17:12but it's not crazy either

17:13I don't know why

17:14but I like this one

17:15I don't know

17:15there is a bit of a speed aspect

17:16a little cooler

17:17and at the sensation level

17:18I can tell you that

17:19then the buttons

17:20are not the same

17:21There

17:21at the sensation level

17:22the jump is much nicer

17:24here than on the other

17:25it is much more dynamic

17:26it gives more of a gaming feel

17:28so well

17:29I would tend

17:29to tell you

17:30I confirm

17:31I validate the old version

17:32of Gemini

17:332.5 Pro 05-06

17:35Come on, we're going to leave now

17:36on what I told you

17:37the Space Invader

17:38so we're going to go up a notch

17:39here we are on a big sequence of code

17:41where I will push the models

17:43once again

17:43in areas

17:44where we can see

17:46their possible failures

17:47with games

17:48who in general

17:49were successful

17:50than by Grog 3

17:50the first time

17:51So I don't know if you saw it

17:53In fact

17:53how the two models

17:54worked completely differently

17:56and the speed of the 2.5 Pro

17:57preview version

17:580.6-0.5

18:00here he cut into three sequences

18:02which he then assembled

18:03and there is only one sequence

18:05and then

18:06he deployed

18:07the whole sequence

18:08part of the code

18:09and you still have

18:10the 2.5

18:11who had not yet finished

18:12so we'll see what happens

18:13in the interface

18:14first element

18:14first test

18:15Well, it's not functional

18:17and so

18:18There

18:19test that was not passed

18:20I sent it

18:21in the interface

18:22of Gmini 2.5 Pro

18:23the latest version

18:24who tells me

18:25there are several errors

18:26there are elements

18:27to correct

18:27I'll let him see

18:28if he will be able

18:29to correct me

18:30the game part

18:31Well, it's quite nice

18:32on the graphic level

18:34SO

18:34I don't have unfortunately

18:36the buttons

18:36to be able to

18:37Yeah

18:37I found them

18:38to shoot

18:39and on the visual level

18:40it's really very pretty

18:41there is nothing to say

18:42there is a very big progress

18:43which was done

18:43well done

18:44it's much cleaner

18:45until now

18:46but once again

18:47there was an error

18:49there was an error

18:50we are obliged

18:50to get back to it

18:51twice

18:51personally

18:52it bothers me

18:53I don't know

18:53what you think about it

18:54tell me in the comments

18:55but it bothers me

18:55so it's well done

18:56on the graphic level

18:57we have a very very beautiful

18:59evolution work

19:00compared to what could have been

19:00be done previously

19:01it's clear

19:02but to start again

19:03systematically

19:03to resume the code

19:04it's a bit annoying

19:05I'll show you anyway

19:06the analysis part

19:07of AI

19:08who tells us

19:09that the shared code

19:10is a mixture

19:10of several fragments

19:11with many errors

19:12of syntax

19:13and duplicate parts

19:13difficult to correct

19:15directly

19:15because it is very fragmented

19:16with a version

19:17said damaged

19:18Space Invader game

19:19so this is to tell you

19:20that in my opinion

19:21there is still

19:22things to review

19:22and that once again

19:24the official benchmarks

19:25let them fall

19:26I told you

19:27in lots of videos

19:28look what they are

19:29actually capable of doing

19:30ok

19:30there he arrived

19:31to build it

19:32this is another version

19:33of the game

19:34you saw

19:35the movement

19:36which is done

19:37at the shuttle level

19:38in terms of travel

19:39there are fewer missiles

19:40who fall

19:41it's rather pretty

19:42it's rather graphic

19:43but obviously

19:442 of 2

19:46at the level of dysfunction

19:47what is your opinion

19:48I have one

19:49but tell me

19:50the top in comment

19:51what is your opinion

19:51on the question

19:52so what we come

19:53to see there

19:53just now

19:54In my opinion

19:54it's quite annoying

19:57because we are arriving

19:58to have errors

19:59on sequences

20:00very short

20:00I will show you

20:01a little bit

20:02data

20:02that we will resume

20:03in other videos

20:04to explain to you

20:04a point I raise

20:06in training

20:06This is what we call

20:08context management

20:09in this study

20:10of open AI

20:10the open AI MCRC

20:13we tested

20:14in fact the capacity

20:15models

20:15to maintain consistency

20:16and to understand

20:17the instructions

20:17in context size

20:19we always announce to you

20:20that the models

20:20have 100,000 tokens

20:22on average

20:22and 1 million

20:23for GEMINI models

20:24but we don't tell you

20:25what is the level

20:26in fact of quality

20:27of understanding

20:27instructions

20:28and that too

20:28you won't find it again

20:29on other channels

20:30than this one

20:31because we're going to the end

20:32things

20:32and in training

20:33they already know

20:34of all this information

20:35I tell you this

20:35because the technical gap

20:37which has between entertainment

20:39on Youtube

20:39and the professional part

20:41she is still

20:41gigantic

20:42especially when I see

20:43what we are given

20:44in terms of prompt

20:45and level

20:46in the French-speaking world

20:48compared to other countries

20:49which are much more

20:50pro than what we manage to do

20:51first element

20:52that's what we realize

20:53that already

20:53with 8000 tokens

20:54we have a search

20:56which allows

20:57to position

20:58GEMINI in first position

21:000605

21:01with a value

21:02of 86%

21:04so it's not either

21:05as high

21:06with so little

21:07of tokens

21:088000 tokens

21:09to give you an idea

21:10in the discussion

21:11that we just did

21:12we made 24,000 of them

21:13so you understand

21:14that we have already exceeded

21:153 times

21:16the dimension

21:17from this window

21:18so you realize

21:19that with so little

21:20we're already starting

21:21to lose

21:22consistency

21:23in the instructions

21:24but even more worrying

21:26let's see how

21:27OpenAI

21:27rated

21:28his own models

21:28at 63%

21:30up to 48%

21:31so barely 8000 tokens

21:33remember

21:33than GPT 3.5

21:34we had a maximum window

21:36of 16000 tokens

21:38so we were

21:38roughly speaking

21:39this is a logarithmic scale

21:40on the order of X

21:41on the order of Y

21:42this is the average score

21:44of what you will get

21:44and that as soon as we start

21:46to obtain

21:47values

21:47which are around 100,000

21:49we have a crash

21:50monstrous

21:51of almost all models

21:52who descend

21:52around

21:53between 15 and 22

21:54and GEMINI

21:55which is at 30

21:55and GEMINI

21:56stabilizes

21:57where other models

21:58can't keep up

21:59and remain

22:00roughly speaking

22:00at 17

22:01as well as there is

22:02some models

22:02which do not exceed

22:03the 100,000 tokens

22:04anyway

22:04apart from the version

22:05from O3

22:06which peaks at 17

22:08and which is also

22:09overpriced

22:09it shows you one thing

22:11it is that

22:11as soon as we start coding

22:12long sequences

22:13as soon as you go to work

22:14on 100 pages

22:15so we talk

22:16of a technical problem

22:17major

22:17who will have

22:18many repercussions

22:19and we'll talk about it

22:20to bring solutions

22:21in training

22:21it is that

22:22as soon as you bring

22:23important documents

22:24Why

22:25all influencers

22:26tell you

22:26write a blog post

22:27of 500 words

22:28it's because

22:29when you are on the train

22:30to generate content

22:31with a large window

22:32you start to lose

22:33consistency

22:34and that

22:35you have to understand

22:35that there is a very big part

22:37in the prompt

22:38and the prompt engineering part

22:39and it's not fair

22:40the three bullshits

22:41that we give you

22:41on the internet

22:42which make the model

22:43will work properly

22:44and especially over time

22:45Next

22:46there is another element

22:47it is that

22:47here we have a system

22:48at Anthropic

22:49which is extremely unstable

22:50Anthropic

22:51with version 4

22:53and I told you

22:54that there was a lot

22:54of problems

22:55on this model

22:56it starts

22:57roughly speaking

22:5838%

22:59so it's among

23:00the worst models

23:01Today

23:01at startup

23:02for a version too

23:03I would say

23:04advanced

23:05in terms of the neural system

23:06and we have a change

23:08of behavior

23:09when it exceeds

23:10the 100,000

23:10for a simple reason

23:11it's the activation

23:13In fact

23:13of a system

23:14of compression LLM

23:15we'll talk about it

23:16in other videos

23:16that's to say

23:17that when the context

23:18becomes very important

23:19there is an LLM

23:21which actually comes

23:22compress a part

23:23information

23:23to free up memory

23:25and it allows him

23:26to recover

23:27a little bit

23:28of coherence

23:29in long contexts

23:30and that's how it is

23:31that they arrive

23:31to carry their window

23:33of 100,000

23:34to 200,000

23:35and some tokens

23:36but it is with

23:37this artifact there

23:38it's because in fact

23:38natively

23:40their model

23:40at Anthropic

23:41has a very big problem

23:43management

23:44of coherence

23:45in the windows

23:46contextual

23:47now that it

23:48it was put forward

23:48by OpenAI

23:49we can still see

23:50than other studies

23:51do not show

23:52such a level

23:53of malfunction

23:54as high

23:55because for example

23:56here on 8K

23:57O3 remains at 100

23:59O4 to 66

24:00so we still have

24:01consistency

24:01the studies do not contradict each other

24:03you have the source too

24:05but what we see

24:05it is that

24:06if Claude 3.7

24:08had on 8K

24:0997 and 83

24:10he collapses

24:11at 53 to 120

24:12but what's going on

24:13for Claude Opus

24:14we see that

24:15in 8K

24:16Effectively

24:1772

24:18and he would be

24:19quite stable

24:20at 65

24:20so the studies

24:21don't quite go

24:22in the same direction

24:23than at Anthropique

24:23so it's important

24:24to take several sources

24:25but we have a degradation

24:27quite substantial

24:27if we start from the principle

24:29that the others are

24:30when I say the others

24:31it's the O3

24:33it's Gemini 2.5

24:35Preview

24:35who him

24:36is quite consistent

24:37compared to values

24:38so there is one thing

24:39that I can't understand

24:39that's how it's done

24:40than in 8K

24:41we are at 80%

24:42and we arrive

24:43at 91

24:43and 91.7

24:45in windows

24:46most important

24:46and it's around 60K

24:48that we have a fall again

24:49and around 200,000K

24:51we have 90 again

24:52which is excellent

24:53let's be very clear

24:54but these variations

24:55fact that should be known

24:57the protocols

24:57which have been tested

24:58to understand

24:58this fluctuation

24:59There

25:00it's not always very coherent

25:01but it gives a trend

25:02in any case of the model

25:03to be relatively reliable

25:04on long sequences

25:05but you saw one thing

25:06it's only 0.6.0.5

25:08just been updated

25:09This is what I'm explaining to you

25:10so he respects better

25:11the instructions

25:12prompt system

25:13complex

25:13multiple chaining

25:14but in the code part

25:16and I alert you immediately

25:18an inference system

25:19who goes faster

25:20and that will remind you

25:21what happened at 0.1

25:22with the 0.1 preview

25:23or even with the 0.4 mini

25:26and there you have an example

25:27that is to say that

25:28you have on sequences

25:29already 8K

25:31a loss of 40

25:3235%

25:33of coherence

25:34in long contexts

25:35so when we have falls

25:37also important

25:38we understand why

25:39we have trouble

25:40to work with these models

25:41and that when people

25:42tell you

25:43send your PDF

25:44in the interface

25:44Today

25:45it may not be

25:46the good idea

25:47and the right strategy

25:48to use

25:48because with

25:50about twenty pages

25:51you will fall

25:52in these border areas

25:53where you start

25:54to lose in fact

25:55the model

25:55he loses completely

25:56in performance

25:57and he will not allow you

25:58to find the information

26:00that you want

26:00to work

26:01There

26:01I hope this information

26:02were useful to you

26:03they are technical

26:04we can do it too

26:05technical

26:06we like to do technical things

26:07and I think we all need

26:08when we really want to

26:09working with AI

26:10and when you want

26:10a real qualitative leap

26:12you have the training

26:13in description

26:13I guide you step by step

26:15in training

26:16more than 45 hours

26:17because

26:18Today

26:18mastery of tools

26:20is done by understanding

26:21the difference in models

26:22their potential

26:23the problems

26:24that they pose

26:25and how we solved them

26:27and that's what I bring you

26:28to take in hand

26:29I'll see you next time

26:29see you soon

26:30bye

26:31I'll see you next time

26:33see you next time

Catégorie

Transcription

Écris le tout premier commentaire

Recommandations