1 like de la video = 1 merci ❤️ MES FORMATIONS → https://parlonsia.teachizy.fr/
🔗 Rejoins la communauté IA & Business
🌐 https://parlonsia.teachizy.fr
📺 https://www.youtube.com/@IAExpliquee.x
📺 https://www.youtube.com/@ParlonIAhizy.fr
📘 Facebook : https://bit.ly/4kabhuA
🐦 Twitter / X : https://x.com/ParlonsIAx
📩 Contact : formation.ai87@gmail.com
🎙️Podcast – https://spoti.fi/4dqZ3uO
✍ BLog: https://medium.com/@flma1349/
💃https://www.tiktok.com/@parlonsia
🎁 Gagner des Formations : https://bit.ly/cadeauxIA
---------------------------
IA a tester :
Coder agent IA: https://bit.ly/Coder_agentiA
Short AI: http://bit.ly/4lzE782
SEO agent IA : https://urlr.me/P8AS5N code rection 25% : PARLONSIA25
Gemini 2.5 accélère les cas d’usage avec thinking models, thinking budget et 2.5 flash-lite en preview à lowest latency et cost.
Fonctionnalités clés: function calling, code execution, url context et grounding with google search pour des flux temps réel.
Pour la conception, gemini api et prompt design strategies offrent des best practices pour designing prompts sur gemini ai models.
Côté expérience, google gemini mobile app propose gemini live avec microphone, upload an image, drag and drop et canvas.
En productivité, deep research agit comme un beginner’s guide pour structurer et valider des workflows de prompts.
#gemini 2.5 #gemini api #thinking models #gemini live #deep research
🔗 Rejoins la communauté IA & Business
🌐 https://parlonsia.teachizy.fr
📺 https://www.youtube.com/@IAExpliquee.x
📺 https://www.youtube.com/@ParlonIAhizy.fr
📘 Facebook : https://bit.ly/4kabhuA
🐦 Twitter / X : https://x.com/ParlonsIAx
📩 Contact : formation.ai87@gmail.com
🎙️Podcast – https://spoti.fi/4dqZ3uO
✍ BLog: https://medium.com/@flma1349/
💃https://www.tiktok.com/@parlonsia
🎁 Gagner des Formations : https://bit.ly/cadeauxIA
---------------------------
IA a tester :
Coder agent IA: https://bit.ly/Coder_agentiA
Short AI: http://bit.ly/4lzE782
SEO agent IA : https://urlr.me/P8AS5N code rection 25% : PARLONSIA25
Gemini 2.5 accélère les cas d’usage avec thinking models, thinking budget et 2.5 flash-lite en preview à lowest latency et cost.
Fonctionnalités clés: function calling, code execution, url context et grounding with google search pour des flux temps réel.
Pour la conception, gemini api et prompt design strategies offrent des best practices pour designing prompts sur gemini ai models.
Côté expérience, google gemini mobile app propose gemini live avec microphone, upload an image, drag and drop et canvas.
En productivité, deep research agit comme un beginner’s guide pour structurer et valider des workflows de prompts.
#gemini 2.5 #gemini api #thinking models #gemini live #deep research
Catégorie
🤖
TechnologieTranscription
00:00In this video, I'm going to tell you about the brand new arrival of Gemini 2.5
00:04which did not arrive alone but arrived with 3 new models.
00:08Gemini 2.5 PRON, FLASH and FLASH LITTLE were just released just a few hours ago
00:13and I'm going to reveal some of the advanced features to you.
00:17Head to the official documentation to explain what I discovered.
00:20The 2.5 FLASH model above all, it is a model which today has a dual operating mode,
00:27an advanced reasoning mode and a FLASH mode, i.e. immediate response.
00:32This is an automatic switch that is performed based on the complexity of your domain.
00:37The best part about this AI is that it's completely free.
00:42I already showed you this in previous benchmarks.
00:45The 2.5 FLASH model, of course, remains one of the cheapest models on the market today
00:51with the highest cost-performance ratio for the lowest costs.
00:56Besides official benchmarks, I often show you independent benchmarks
01:00and it is an equivalent of DeepSync versions.
01:03Above V3 and below R1 at coding level,
01:07but on the other hand clearly superior in terms of writing.
01:10With the ability to actually handle multiple requests.
01:13So we are already in reasoning models.
01:15We are on a thinking model that was updated.
01:17And the arrival of a FLASH LITTLE, that is to say there, we are on an ultra competitive system in terms of price.
01:240.40 cents for life to everyone who makes chatbots and AI agents.
01:30You have a highly profitable model to be able to deploy extremely high-performance and fast systems today.
01:36I'm not going to dwell on the GPQA part because that depends enormously on the benchmarks that were used,
01:40test conditions.
01:42We have talked about it at length, I refer you to the videos on the tests we have already done.
01:45But I want to tell you about the announcement of 2.5 Pro on version 0605.
01:51Version 0605 is a tool that we have already used and that I showed you a few hours before the release.
01:57I saw that the new models were going to be released and I made a video for you on the page.
02:02Those who were connected were able to see a preview of what was going to happen.
02:05And just before they replace the version for me, they tell us that officially, they have not made an update on 0605.
02:12But I'm not sure. So, since I filmed just before and I'm giving you the film sequence,
02:18We're going to compare how the 0605 behaves because there's a very big problem.
02:24In any case, you will see it in the test version that I will put for you just after,
02:28of response malfunction on prompts.
02:31This is a model that has greatly reduced inference time.
02:35They did the same thing to us that they did with the O1 at OpenAI.
02:38At one point we had an O1 Preview version that worked very well.
02:41And he released an O1 version for the general public, like the O4 version,
02:45who suddenly realized that he responded almost immediately,
02:49but that he no longer followed the instructions at all, which he did in his own way.
02:52And that's exactly what I saw on 0605.
02:55And that's why I made the video for you just before he deleted it.
02:58We will see by doing a second test if he really reacts in the same way,
03:02that is, it no longer respects prompts when performing interaction steps.
03:06What I suggest is that you immediately switch to the interface
03:09and add the new model compared with the New 2.5 Pro version
03:15and see how he will behave in terms of exchanges.
03:17So what's going to be super interesting is I'm going to send a text.
03:21And normally, as you will see in the following video,
03:240506 will stop to ask me if I want to make any changes
03:29to the podcast creation plan before you start editing the dialogue and structure.
03:34Whereas the previous version, which was 0605, no longer respected the directives at the prompt level.
03:40And so, in my opinion, it was in Gemini's best interest to intervene very quickly.
03:45And in my opinion, I wasn't the only one who had this problem.
03:48Which made it seem pretty obvious to me that they announced that 3 models were going to be updated.
03:55And if we don't have the same behavior there, now, as what you're going to see right after,
04:00I'll show you how they behaved.
04:01Just in the video I showed you to tell you that 3 versions of Gemini were going to be released
04:05and which allowed me to record them.
04:07So here we see that they behave in roughly the same way,
04:10of 5 Pro and the previous one.
04:11And you see, we no longer have the same way of operating at all.
04:14And we have a stop sign that has been put in place.
04:15So, officially, you see the fixes.
04:18And you see one thing clearly.
04:19This is because in advanced training you were already aware of this problem and this bug.
04:24which has now been resolved.
04:26That is, if you were making AI agent and chatbot
04:29and that you had switched all your prompts to the version that was deployed 0.6.0.5,
04:34I'm speaking a little bit for the specialists, I agree,
04:36but at some point you have to put your foot down
04:39and see what works and what doesn't.
04:41And not just promoting at G24.
04:44You will see right after that the version we had just a few hours ago
04:48no longer responded to user interactions.
04:50He was drawing straight, he was drawing like a straight line system
04:54without creating an interaction zone while it was structured at the prompt level.
04:59So here we have something almost identical.
05:01If you look between right and left,
05:03we have sections that have not been cut in the same way.
05:06But overall, what we realize,
05:09is that in the analysis structure,
05:10we have different stages which were almost similar.
05:13So they made updates and that reassures me.
05:15And I can tell you this evening that we will be able to test
05:17so the new version 0.6.0.5
05:20because in my opinion they have corrected this problem.
05:22And I'll put the sequence right after you.
05:24to show you what happened and what was happening.
05:28So if most of you have been using the previous version,
05:30and well after, I know that there are quite few people
05:33who use prompt engineering advance,
05:35that is, the system's ability to work in an advanced manner.
05:38What's going on?
05:39Let me explain.
05:40I sent a text sequence
05:41which is one of my videos inside.
05:43And I actually asked the AI
05:45to break down the entire structure of my text,
05:47to correct all mistakes,
05:49to understand how the concept of my video was structured
05:52and structure a dialogue on a podcast
05:55and to build all the main lines.
05:57And then interact with me to tell me
05:59if I wanted to change the plan,
06:00if I wanted to give him some guidance,
06:02if I wanted to give him a timing.
06:04So there I can still interact with the AI
06:06to give him direction.
06:07I'm going to say that I approve of the plan.
06:09I didn't read, I didn't watch.
06:10I wanted to do a test.
06:11And you will see what will happen in the next step,
06:13is that he will meditate on the entire dialogue.
06:15But of course, these dialogues are not taken from nowhere.
06:17They are actually taken from my videos
06:19which are converted into dynamic form.
06:22And there you actually have the beginning of a dialogue structure.
06:25So what we see, already the first thing,
06:27it is that at the level, we will say, of the writing speed,
06:30they changed the inference time from 0.6.
06:33That's the first thing I can tell you.
06:35because in terms of time,
06:37I think they went back to something much slower,
06:39much slower.
06:40And that was the big problem there was,
06:42that is, the previous model.
06:44But I still find the same problem.
06:45Look carefully at what happened.
06:4631 seconds 0.4, 40 seconds.
06:49So, we are going to see something that is very clear.
06:51I will copy this sequence
06:52and I'll just check the word count.
06:55Let me explain why.
06:55Because one of the second settings
06:57which was done on version 0.6,
06:59It was a reduction in inference time.
07:02And that means they spent less time thinking.
07:04And less time to respond too.
07:05They did, in quotation marks,
07:07savings on exit tokens.
07:09So it costs them less processors.
07:11With, precisely, shorter response sequences.
07:14And I think that's still there.
07:15First glance, I would say that
07:17if he stayed less time
07:18while the subjects are the same,
07:20I think they didn't change that point.
07:22We will test it immediately.
07:23So we will count the number of words.
07:251,483.
07:27And I'm almost certain
07:28that we have at least 20% more
07:30on the other version.
07:31Ah, much more than that.
07:32So you see,
07:32That's the second point to know.
07:34And you won't see that again.
07:35on other channels.
07:36Version 0.5.0.6
07:38is once again more suitable
07:40if you want to manage
07:41long sequences of text.
07:43They still narrowed the window.
07:45Officially, it has not been changed.
07:47But that's the way it works.
07:48of the neural system
07:49which no longer works in exactly the same way,
07:51which no longer goes as far or as much
07:53than version 0.5.0.6
07:56which, admittedly, is a little slower.
07:57But you saw,
07:58will produce content for me
08:00much more aligned with demand.
08:02Please note that in the parameter requests,
08:05there were 2000 word releases
08:07with a variable of 15%.
08:10So we are perfectly respected
08:11on this version here,
08:12when we are not
08:13on the 5 Pro Preview version.
08:15So, what that means is
08:16that if today,
08:16you generate long sequences of text,
08:18it is better to stay
08:20on version 0.5.0.6.
08:21That's the first point.
08:22So, now,
08:23what will interest us,
08:24it's to see how
08:25both models will react
08:27on the code generation part.
08:29compare to 0.5.0.6
08:31and I will explain to you
08:31immediately what has changed.
08:33First thing,
08:34model 0.6.0.5,
08:36so the brand new
08:37who is currently
08:37which should be replaced,
08:39in my opinion,
08:40if they replaced it so quickly,
08:41It's because they understood
08:42that there was a problem.
08:43The major problem,
08:44It is the respect of the instructions.
08:46There was no more
08:47instruction alignment.
08:48However,
08:49it's a model
08:49which is much better
08:50on the editorial part.
08:52That is to say, it looks like
08:52much more to a GPT-4O
08:54in the way of writing.
08:55It's more fluid.
08:56But on the respect of instructions,
08:57if you put
08:58the two models in parallel,
08:59to do this,
09:00you simply have
09:00to click,
09:01add the models
09:02and you put them in parallel
09:03by synchronizing
09:05their abilities.
09:06For example,
09:06internet research,
09:07For example,
09:08the system prompt
09:09that you can synchronize
09:10on both sides
09:11when you do
09:12the modifications,
09:12which actually allows you
09:13to see systematically
09:14the evolutions of the models.
09:15SO,
09:16if I will ask
09:17to the model there,
09:18here present,
09:19and I send him
09:20a request
09:20with a prompt system
09:22which aims,
09:23Normally,
09:23to work
09:24sequentially,
09:25model 0506
09:27should respect the prompt
09:29and ask me
09:30a confirmation of the plan
09:31before starting work,
09:33which is quite
09:33what is expected.
09:34Do you want me to generate
09:35now the full text?
09:37And model 0605
09:38risk,
09:39in my opinion,
09:40as per his previous habit,
09:41to start directly
09:43the part of the work
09:44without at all
09:45ask me anything.
09:46And unfortunately,
09:47that's the problem
09:48that I met.
09:48You see,
09:49here we have respect
09:50instructions,
09:50an alignment,
09:51here we don't have any at all
09:52and that's part of it
09:53major issues.
09:54And then,
09:55they no longer respect at all
09:56data such as
09:58the length of the content,
09:59information control.
10:01Brief,
10:01it was quite complicated
10:02to manage.
10:03And in my opinion,
10:04This is one of the reasons
10:04for which we have the exit.
10:05But that's not all.
10:06We will send a request
10:09to create a giant colorful Boyd
10:11inside a hexagon
10:13rotating
10:13with the nebulae
10:15supernova type.
10:17So we will try
10:17to do something
10:18visually dynamic.
10:19We'll see already
10:20the speed at which
10:20both models are moving forward.
10:22In general,
10:22I have the impression
10:23than the 2.5 Pro 06
10:25is faster.
10:26So it's always
10:27this story
10:27of inference time.
10:28I look,
10:29the cutoff is identical
10:30and I asked that
10:32in P5JS
10:34and no HTML
10:35to see the part
10:36of generation.
10:37And I propose to you
10:38to compare
10:38what will we do
10:39both models
10:40in terms of quality
10:41visual,
10:42rendering
10:43to see if there is
10:43a difference
10:44in the way
10:44which he works.
10:45So which of the two?
10:46will finish first?
10:47Come on,
10:47the 5th and 6th
10:48they are almost finished
10:49at the same time.
10:50We will copy the code
10:51and that's it
10:52the old version
10:53who edited this for us
10:54still quite quickly.
10:56Okay,
10:56there is no drag
10:57exceptional.
10:58However,
10:58there is a job
10:59rebound
10:59which is done at the level
11:00of the structure.
11:01It's not bad
11:01but it is not either
11:02out of the ordinary.
11:03Come on,
11:03we will see what will happen to us
11:05the new version
11:06and it's still
11:06actually prettier.
11:08It's still prettier.
11:09It's better finished.
11:10It's better worked.
11:12Effectively,
11:12there is an evolution.
11:14SO,
11:14there is something
11:14anyway
11:15which is more graphic
11:16I would say in the new version
11:17in the way
11:17whose code he interpreted.
11:19We could even have
11:19ask him for a system
11:20to be able to accelerate
11:22the rotation.
11:23I will ask him
11:23to add in fact
11:24three variables
11:26who will be
11:27the number of elements
11:28who will compose
11:29the nebula.
11:30Afterwards,
11:30the possibility
11:31to increase speed
11:31of the hexagon
11:32of the nebula
11:33and the number of elements
11:34composing the nebula.
11:35We're going to see something
11:36more dynamic
11:36and interact.
11:37The 2.5 Pro Preview
11:38has almost already finished
11:40while the 0.5.0.6
11:41is still in the game
11:43UX integration.
11:44SO,
11:45the fact that it's fast
11:46it is not necessarily
11:46good sign.
11:47I already told you.
11:48On the part
11:48text management
11:50inference,
11:51I showed it to you,
11:51it was already a question
11:52a number of problems
11:53because he doesn't respect
11:54necessarily the instructions
11:55especially in the part
11:56content dimension.
11:58Come on,
11:58we will see from a visual point of view
11:59This is what we can see.
12:01Error,
12:01syntax,
12:01symbol,
12:02presenting.
12:02All right,
12:03so there is an error
12:04at the code level.
12:05SO,
12:05first element
12:05and we have the version
12:072.5 Pro Preview
12:08keep thinking.
12:10SO,
12:10what we can do here,
12:11it's a little bit annoying
12:12but we will try
12:12to have it corrected
12:13this version here
12:14by Gemini.
12:15We will try
12:16to give him
12:16the error indications.
12:170.5.0.6
12:18was much longer
12:19but it can also be
12:20the opportunity to correct
12:21what doesn't work.
12:22In any case,
12:23we will test it immediately
12:24and we have integration
12:26buttons
12:27which were requested.
12:28SO,
12:28we don't have the part
12:29here,
12:30we have the part
12:31speed of the hexagon
12:32which is reduced,
12:33which is accelerated,
12:34the speed of the nebula
12:35inside
12:36faster and faster
12:37and the number
12:37nebulas.
12:38SO,
12:38everything is perfectly respected.
12:40What did I tell you?
12:40just now ?
12:41Inference time
12:42and speed
12:42when sometimes
12:43we are ecstatic about the speed
12:44is not necessarily
12:45sign of optimal functioning.
12:48There,
12:48I'm going to slow down
12:48the nebulas,
12:49I will leave them
12:50continue to accelerate.
12:51SO,
12:52it is not necessarily
12:52sign precisely
12:54that AI takes enough
12:55of time
12:56to understand
12:56any errors.
12:57SO,
12:57what we are going to do here,
12:58that's what we're going to check
12:58if the latest version
13:00is able
13:01to correct this
13:02and unfortunately,
13:03he does it for me in HTML.
13:05SO,
13:05what we are going to do,
13:05we're still going
13:06give a second chance
13:07to 2.5 Pro Preview
13:09and you see,
13:10That,
13:10it's part of things
13:11who immediately
13:12make me say
13:12attention,
13:13we had an excellent model
13:15and there,
13:16we are still
13:16on the same principle.
13:17by the GMI 2.5 Pro,
13:18version 0.6
13:19update
13:20just a few hours ago
13:22to a complexion of inference
13:23extremely short,
13:24that's what I'm telling you,
13:25while we are still
13:26in the reflection part
13:27on the other model
13:27which takes much more
13:28time to compare.
13:29SO,
13:30we will do the test,
13:30we will delete,
13:31we will update,
13:32we will restart
13:33and this time,
13:34it works.
13:34Okay,
13:35Perfect.
13:35SO,
13:35he found the mistake
13:36and it's still
13:37actually better presented
13:38comparatively
13:39than what we had here
13:40in relation to the layout.
13:42SO,
13:42there is a better one,
13:43rotation speed,
13:43rather nice
13:44with the little effect
13:45of alloremanence,
13:46speed of the nebula
13:48faster and faster
13:49in travel
13:49and the number of elements
13:51and it's pretty cool.
13:52Come on,
13:52we validate this part,
13:54he managed to find
13:55the solution to his problem
13:56although I am still
13:57really surprised
13:58in ultra-reduced time
14:01with which he answers us.
14:02What I suggest you do,
14:03it is to do other tests
14:04pushed on the code part
14:06and make a little video game
14:09directly with both models
14:11and we will compare
14:12what they will be able to do.
14:12Come on,
14:13we started a runner game,
14:15a game where you have to move forward
14:17with a little dinosaur,
14:18with a background,
14:19still in P5JS.
14:22SO,
14:22under this blow,
14:23I will not give them
14:23only one chance
14:24because the next step,
14:25it is to do it squarely
14:26a Space Invader game
14:27well-off
14:28and he finished first
14:29of a few seconds
14:31once again.
14:32SO,
14:32we still have
14:32a time saver
14:33of 30 seconds,
14:34which is still considerable.
14:36Come on,
14:36we will test immediately,
14:37we'll see what happens
14:38and we still have
14:39an error
14:40on the first shot.
14:41So,
14:41that's the point
14:42which worries me every time,
14:43it is that systematically,
14:45if we have errors
14:46and that we must resume
14:47the code part,
14:48you don't stop
14:49to go back and forth
14:50and that's not what we want.
14:51Come on,
14:51so we're going to test it
14:52the old version,
14:530.5.0.6
14:54which was much slower
14:56and who made us
14:56a game of ino
14:57which is functional.
14:58Press start
14:59to start
15:00and we have a game of ino
15:01which is functional
15:01where we're going to jump
15:02and it's absolutely functional.
15:03Good,
15:04we even have the little button
15:05to start again
15:06So I find that
15:07really super nice
15:07and we have a game
15:08which is absolutely functional
15:09and which worked the first time.
15:11SO,
15:11you see again,
15:12we're going to waste time
15:13this time
15:14to want to return
15:15on the part
15:15of the code sequence
15:16which is dysfunctional.
15:18Come on,
15:19I had said
15:19that I wasn't going to put it on
15:20but in the end
15:21It bothers me a lot
15:22because we got
15:23something just now
15:24on a Nebula
15:24who was super pretty
15:25and I would like to obtain
15:26something that works
15:27but it still does
15:28two out of two
15:29two requests
15:30on which the model
15:30he's going too fast,
15:32he doesn't take the time
15:33to analyze sufficiently
15:34long his code
15:35and you imagine
15:36if you are obliged
15:37systematically
15:37to return
15:38as we do there
15:39Currently.
15:39SO,
15:40you see,
15:40what I had identified
15:41what other influencers
15:42will not do
15:43it's that they are not going
15:43look for the limits
15:44of the system.
15:46SO,
15:46it's sure that
15:46if you ask
15:47the pancake recipe
15:49to an AI
15:49it's not a problem
15:50but if you work
15:51if you develop code
15:52if you are in the process
15:54to analyze files
15:55and that you need
15:56data
15:56we can't afford
15:57to have shells
15:59also important
16:00in the generation part.
16:01Know the models
16:02on the fingertips
16:05how to intervene
16:07how to make prompts
16:07sequential
16:08all that
16:09This is what I teach you
16:09in training
16:10don't forget to put
16:11a like
16:11a comment
16:12it's a bit crucial
16:13for algorithms
16:14and if you want to have
16:14a little bit of content
16:15with creators
16:16who continue to invest
16:17for you
16:17move
16:19otherwise you continue
16:20to eat
16:20of training
16:21at 99 euros
16:22in which you are told
16:23you tell the AI
16:24that he is a great developer
16:25and you will have a great application
16:26and make you believe
16:27that you will get everything
16:28simply
16:29by asking the AI
16:30make me a video game
16:31and that you will have
16:32whatever you want.
16:33it's not the house speech
16:34and we are a little bit
16:35the only ones unfortunately
16:36to have a speech of honesty
16:37on the field of AI
16:39without overselling yourself
16:40the capabilities of the models
16:41but by making people aware
16:42that there are enormous possibilities
16:44but also
16:44technical problems
16:45to know how to solve.
16:47So press, paste, jump
16:48come on, let's go
16:49click to start
16:50we will click to start
16:51we'll see a little bit
16:52what it gives
16:53at the jump level
16:53just now it was very dynamic
16:55it's my fault
16:56that's what I took
16:56the same button
16:57that he gave me just now
16:58to jump
16:58and I tried
16:59he changed
17:00yeah ok
17:01I find that the fluidity
17:02On the other hand
17:03was a little bit better
17:04on the previous game
17:06In any case
17:07on the visual level
17:07come on, let's take it
17:08a quick look
17:08compared to clouds
17:09in the backgrounds
17:10I find it correct
17:12but it's not crazy either
17:13I don't know why
17:14but I like this one
17:15I don't know
17:15there is a bit of a speed aspect
17:16a little cooler
17:17and at the sensation level
17:18I can tell you that
17:19then the buttons
17:20are not the same
17:21There
17:21at the sensation level
17:22the jump is much nicer
17:24here than on the other
17:25it is much more dynamic
17:26it gives more of a gaming feel
17:28so well
17:29I would tend
17:29to tell you
17:30I confirm
17:31I validate the old version
17:32of Gemini
17:332.5 Pro 05-06
17:35Come on, we're going to leave now
17:36on what I told you
17:37the Space Invader
17:38so we're going to go up a notch
17:39here we are on a big sequence of code
17:41where I will push the models
17:43once again
17:43in areas
17:44where we can see
17:46their possible failures
17:47with games
17:48who in general
17:49were successful
17:50than by Grog 3
17:50the first time
17:51So I don't know if you saw it
17:53In fact
17:53how the two models
17:54worked completely differently
17:56and the speed of the 2.5 Pro
17:57preview version
17:580.6-0.5
18:00here he cut into three sequences
18:02which he then assembled
18:03and there is only one sequence
18:05and then
18:06he deployed
18:07the whole sequence
18:08part of the code
18:09and you still have
18:10the 2.5
18:11who had not yet finished
18:12so we'll see what happens
18:13in the interface
18:14first element
18:14first test
18:15Well, it's not functional
18:17and so
18:18There
18:19test that was not passed
18:20I sent it
18:21in the interface
18:22of Gmini 2.5 Pro
18:23the latest version
18:24who tells me
18:25there are several errors
18:26there are elements
18:27to correct
18:27I'll let him see
18:28if he will be able
18:29to correct me
18:30the game part
18:31Well, it's quite nice
18:32on the graphic level
18:34SO
18:34I don't have unfortunately
18:36the buttons
18:36to be able to
18:37Yeah
18:37I found them
18:38to shoot
18:39and on the visual level
18:40it's really very pretty
18:41there is nothing to say
18:42there is a very big progress
18:43which was done
18:43well done
18:44it's much cleaner
18:45until now
18:46but once again
18:47there was an error
18:49there was an error
18:50we are obliged
18:50to get back to it
18:51twice
18:51personally
18:52it bothers me
18:53I don't know
18:53what you think about it
18:54tell me in the comments
18:55but it bothers me
18:55so it's well done
18:56on the graphic level
18:57we have a very very beautiful
18:59evolution work
19:00compared to what could have been
19:00be done previously
19:01it's clear
19:02but to start again
19:03systematically
19:03to resume the code
19:04it's a bit annoying
19:05I'll show you anyway
19:06the analysis part
19:07of AI
19:08who tells us
19:09that the shared code
19:10is a mixture
19:10of several fragments
19:11with many errors
19:12of syntax
19:13and duplicate parts
19:13difficult to correct
19:15directly
19:15because it is very fragmented
19:16with a version
19:17said damaged
19:18Space Invader game
19:19so this is to tell you
19:20that in my opinion
19:21there is still
19:22things to review
19:22and that once again
19:24the official benchmarks
19:25let them fall
19:26I told you
19:27in lots of videos
19:28look what they are
19:29actually capable of doing
19:30ok
19:30there he arrived
19:31to build it
19:32this is another version
19:33of the game
19:34you saw
19:35the movement
19:36which is done
19:37at the shuttle level
19:38in terms of travel
19:39there are fewer missiles
19:40who fall
19:41it's rather pretty
19:42it's rather graphic
19:43but obviously
19:442 of 2
19:46at the level of dysfunction
19:47what is your opinion
19:48I have one
19:49but tell me
19:50the top in comment
19:51what is your opinion
19:51on the question
19:52so what we come
19:53to see there
19:53just now
19:54In my opinion
19:54it's quite annoying
19:57because we are arriving
19:58to have errors
19:59on sequences
20:00very short
20:00I will show you
20:01a little bit
20:02data
20:02that we will resume
20:03in other videos
20:04to explain to you
20:04a point I raise
20:06in training
20:06This is what we call
20:08context management
20:09in this study
20:10of open AI
20:10the open AI MCRC
20:13we tested
20:14in fact the capacity
20:15models
20:15to maintain consistency
20:16and to understand
20:17the instructions
20:17in context size
20:19we always announce to you
20:20that the models
20:20have 100,000 tokens
20:22on average
20:22and 1 million
20:23for GEMINI models
20:24but we don't tell you
20:25what is the level
20:26in fact of quality
20:27of understanding
20:27instructions
20:28and that too
20:28you won't find it again
20:29on other channels
20:30than this one
20:31because we're going to the end
20:32things
20:32and in training
20:33they already know
20:34of all this information
20:35I tell you this
20:35because the technical gap
20:37which has between entertainment
20:39on Youtube
20:39and the professional part
20:41she is still
20:41gigantic
20:42especially when I see
20:43what we are given
20:44in terms of prompt
20:45and level
20:46in the French-speaking world
20:48compared to other countries
20:49which are much more
20:50pro than what we manage to do
20:51first element
20:52that's what we realize
20:53that already
20:53with 8000 tokens
20:54we have a search
20:56which allows
20:57to position
20:58GEMINI in first position
21:000605
21:01with a value
21:02of 86%
21:04so it's not either
21:05as high
21:06with so little
21:07of tokens
21:088000 tokens
21:09to give you an idea
21:10in the discussion
21:11that we just did
21:12we made 24,000 of them
21:13so you understand
21:14that we have already exceeded
21:153 times
21:16the dimension
21:17from this window
21:18so you realize
21:19that with so little
21:20we're already starting
21:21to lose
21:22consistency
21:23in the instructions
21:24but even more worrying
21:26let's see how
21:27OpenAI
21:27rated
21:28his own models
21:28at 63%
21:30up to 48%
21:31so barely 8000 tokens
21:33remember
21:33than GPT 3.5
21:34we had a maximum window
21:36of 16000 tokens
21:38so we were
21:38roughly speaking
21:39this is a logarithmic scale
21:40on the order of X
21:41on the order of Y
21:42this is the average score
21:44of what you will get
21:44and that as soon as we start
21:46to obtain
21:47values
21:47which are around 100,000
21:49we have a crash
21:50monstrous
21:51of almost all models
21:52who descend
21:52around
21:53between 15 and 22
21:54and GEMINI
21:55which is at 30
21:55and GEMINI
21:56stabilizes
21:57where other models
21:58can't keep up
21:59and remain
22:00roughly speaking
22:00at 17
22:01as well as there is
22:02some models
22:02which do not exceed
22:03the 100,000 tokens
22:04anyway
22:04apart from the version
22:05from O3
22:06which peaks at 17
22:08and which is also
22:09overpriced
22:09it shows you one thing
22:11it is that
22:11as soon as we start coding
22:12long sequences
22:13as soon as you go to work
22:14on 100 pages
22:15so we talk
22:16of a technical problem
22:17major
22:17who will have
22:18many repercussions
22:19and we'll talk about it
22:20to bring solutions
22:21in training
22:21it is that
22:22as soon as you bring
22:23important documents
22:24Why
22:25all influencers
22:26tell you
22:26write a blog post
22:27of 500 words
22:28it's because
22:29when you are on the train
22:30to generate content
22:31with a large window
22:32you start to lose
22:33consistency
22:34and that
22:35you have to understand
22:35that there is a very big part
22:37in the prompt
22:38and the prompt engineering part
22:39and it's not fair
22:40the three bullshits
22:41that we give you
22:41on the internet
22:42which make the model
22:43will work properly
22:44and especially over time
22:45Next
22:46there is another element
22:47it is that
22:47here we have a system
22:48at Anthropic
22:49which is extremely unstable
22:50Anthropic
22:51with version 4
22:53and I told you
22:54that there was a lot
22:54of problems
22:55on this model
22:56it starts
22:57roughly speaking
22:5838%
22:59so it's among
23:00the worst models
23:01Today
23:01at startup
23:02for a version too
23:03I would say
23:04advanced
23:05in terms of the neural system
23:06and we have a change
23:08of behavior
23:09when it exceeds
23:10the 100,000
23:10for a simple reason
23:11it's the activation
23:13In fact
23:13of a system
23:14of compression LLM
23:15we'll talk about it
23:16in other videos
23:16that's to say
23:17that when the context
23:18becomes very important
23:19there is an LLM
23:21which actually comes
23:22compress a part
23:23information
23:23to free up memory
23:25and it allows him
23:26to recover
23:27a little bit
23:28of coherence
23:29in long contexts
23:30and that's how it is
23:31that they arrive
23:31to carry their window
23:33of 100,000
23:34to 200,000
23:35and some tokens
23:36but it is with
23:37this artifact there
23:38it's because in fact
23:38natively
23:40their model
23:40at Anthropic
23:41has a very big problem
23:43management
23:44of coherence
23:45in the windows
23:46contextual
23:47now that it
23:48it was put forward
23:48by OpenAI
23:49we can still see
23:50than other studies
23:51do not show
23:52such a level
23:53of malfunction
23:54as high
23:55because for example
23:56here on 8K
23:57O3 remains at 100
23:59O4 to 66
24:00so we still have
24:01consistency
24:01the studies do not contradict each other
24:03you have the source too
24:05but what we see
24:05it is that
24:06if Claude 3.7
24:08had on 8K
24:0997 and 83
24:10he collapses
24:11at 53 to 120
24:12but what's going on
24:13for Claude Opus
24:14we see that
24:15in 8K
24:16Effectively
24:1772
24:18and he would be
24:19quite stable
24:20at 65
24:20so the studies
24:21don't quite go
24:22in the same direction
24:23than at Anthropique
24:23so it's important
24:24to take several sources
24:25but we have a degradation
24:27quite substantial
24:27if we start from the principle
24:29that the others are
24:30when I say the others
24:31it's the O3
24:33it's Gemini 2.5
24:35Preview
24:35who him
24:36is quite consistent
24:37compared to values
24:38so there is one thing
24:39that I can't understand
24:39that's how it's done
24:40than in 8K
24:41we are at 80%
24:42and we arrive
24:43at 91
24:43and 91.7
24:45in windows
24:46most important
24:46and it's around 60K
24:48that we have a fall again
24:49and around 200,000K
24:51we have 90 again
24:52which is excellent
24:53let's be very clear
24:54but these variations
24:55fact that should be known
24:57the protocols
24:57which have been tested
24:58to understand
24:58this fluctuation
24:59There
25:00it's not always very coherent
25:01but it gives a trend
25:02in any case of the model
25:03to be relatively reliable
25:04on long sequences
25:05but you saw one thing
25:06it's only 0.6.0.5
25:08just been updated
25:09This is what I'm explaining to you
25:10so he respects better
25:11the instructions
25:12prompt system
25:13complex
25:13multiple chaining
25:14but in the code part
25:16and I alert you immediately
25:18an inference system
25:19who goes faster
25:20and that will remind you
25:21what happened at 0.1
25:22with the 0.1 preview
25:23or even with the 0.4 mini
25:26and there you have an example
25:27that is to say that
25:28you have on sequences
25:29already 8K
25:31a loss of 40
25:3235%
25:33of coherence
25:34in long contexts
25:35so when we have falls
25:37also important
25:38we understand why
25:39we have trouble
25:40to work with these models
25:41and that when people
25:42tell you
25:43send your PDF
25:44in the interface
25:44Today
25:45it may not be
25:46the good idea
25:47and the right strategy
25:48to use
25:48because with
25:50about twenty pages
25:51you will fall
25:52in these border areas
25:53where you start
25:54to lose in fact
25:55the model
25:55he loses completely
25:56in performance
25:57and he will not allow you
25:58to find the information
26:00that you want
26:00to work
26:01There
26:01I hope this information
26:02were useful to you
26:03they are technical
26:04we can do it too
26:05technical
26:06we like to do technical things
26:07and I think we all need
26:08when we really want to
26:09working with AI
26:10and when you want
26:10a real qualitative leap
26:12you have the training
26:13in description
26:13I guide you step by step
26:15in training
26:16more than 45 hours
26:17because
26:18Today
26:18mastery of tools
26:20is done by understanding
26:21the difference in models
26:22their potential
26:23the problems
26:24that they pose
26:25and how we solved them
26:27and that's what I bring you
26:28to take in hand
26:29I'll see you next time
26:29see you soon
26:30bye
26:30bye
26:31I'll see you next time
26:33see you next time
Recommandations
2:16
|
À suivre
2:52
1:57
22:48
23:23
14:51
2:08
2:04
4:05
0:21
Écris le tout premier commentaire