Passer au playerPasser au contenu principal
  • il y a 2 jours
Building with OpenAI What’s Ahead

Catégorie

🤖
Technologie
Transcription
00:00Je suis vraiment heureux d'être ici en Paris aujourd'hui.
00:04C'est génial de voir l'AI énergie qui se passe ici en Paris
00:08et c'est vraiment génial de revenir.
00:13Je suis en train de faire ma carrière ici
00:14et c'est génial de voir comment le français écosystème
00:16a développé dans les dernières 15 ans.
00:19Je vais essayer de voir si les slides vont venir.
00:23Mais je peux continuer en même temps.
00:26Ce que je fais maintenant à OpenAI,
00:29c'est de développer expérience,
00:31c'est vraiment focussing
00:32sur la façon de faire vous,
00:34développer, builders et founders,
00:36vraiment successeful building sur notre plateforme
00:39et vraiment en sorte que c'est vraiment frictionless
00:41pour que vous puissiez l'AI dans vos apps.
00:44Et avant de OpenAI,
00:45je suis vraiment un founder, comme beaucoup de vous,
00:47et j'ai expérimenté
00:49ce que c'est de jouer avec les frontières modèles firsthand
00:54et maintenant je suis excited
00:55pour vous donner la magie ici à VivaTech.
01:00Pouvez-vous avoir mes notes sur le screen ?
01:03Merci.
01:04Donc, aujourd'hui, je voudrais parler de trois choses.
01:07La première, l'AI Outlook
01:09et comment on est où nous sommes aujourd'hui.
01:11Ensuite, je voudrais toucher sur GPT-4O,
01:14notre next flagship model
01:17que nous avons déjà publié la semaine dernière,
01:18comme vous avez entendu.
01:20Et la dernière, je voudrais parler
01:22d'avoir une nouvelle question sur OpenAI.
01:25Et vraiment, si vous avez été sitting
01:26dans ces panels
01:28avant de parler de l'AI,
01:29je voudrais vous donner quelques
01:31des vidéos pour faire des choses
01:32intéressantes
01:32et voir comment vous pouvez
01:34les technologies concrètement
01:35dans vos applications aujourd'hui.
01:40First, en touchant OpenAI
01:43et notre mission.
01:44Le mission d'OpenAI
01:45est vraiment créé
01:47de l'AI,
01:48ou Artificial General Intelligence,
01:50en un moyen que bénéficiez
01:51à l'humanité.
01:52Mais nous sommes,
01:52le premier et la première,
01:53un chercheur de l'AI.
01:56Et quand nous regardons
01:57ce qui se passe aujourd'hui
01:59sur le platform
02:00et sur le adoption,
02:01nous avons maintenant 3 millions
02:02de développeurs
02:03sur OpenAI.
02:05Et nous avons le privilège
02:07de travailler
02:07de la première,
02:09de la première,
02:09de la première,
02:10de la première,
02:12de la première.
02:13Et en certains,
02:14nous sommes assez surpris
02:15de ce que nous avons
02:16ici.
02:17Parce que, vous savez,
02:18beaucoup d'entreprises
02:19ont des années
02:20à venir ici,
02:21mais c'est un bon
02:22de la première,
02:23c'est un bon
02:24de la première,
02:26avec un field
02:28qui évolve
02:28plus rapidement
02:29que l'AI.
02:32En fait,
02:34ChatGPT
02:34est un bon exemple
02:35de ça.
02:36Quand nous lançons ChatGPT
02:37en novembre 2022,
02:39c'est proposé un launch
02:41d'un chemistry
02:42de la première,
02:43un bascet
02:44sur la première,
02:44donc nous avons
02:45la quelle à la première,
02:46la première,
02:48c'est l'avion
02:49de la première.
02:50Et c'est tout
02:51la première,
02:51c'est tout
02:51la première,
02:53c'est très éclat possible,
02:53mais plus de 100 millions
02:59de personnes
02:59qui sont
03:00sérieux
03:00pour ChatGPT,
03:05et nous continue
03:06s'yad 23.
03:07Donc, c'est vraiment un fast adaption.
03:09Mais quand les gens pensent sur OpenAI,
03:11ils pensent sur ChatGPT.
03:13Mais ce qui est plus intéressant, c'est de remember
03:15que ce n'est pas le premier product
03:17que OpenAI a été construit.
03:19Le premier product était le développer
03:21et l'API.
03:23Et GPT-3
03:25nous avons un test
03:27sur ce que les LLM ont été construit.
03:29Et c'était le premier modèle
03:30que nous avons exposé en 2020
03:32pour développer et builders
03:34à intégrer dans leurs apps.
03:37Pour le contexte,
03:38et pour les choses en perspective,
03:40c'était l'AiDungeon,
03:42l'un de la plupart des casques
03:43pour OpenAI à l'époque.
03:45Et c'était un role-play game
03:47où on pouvait générer
03:48unique stories
03:49sur le go.
03:50Et c'était ce qui s'est passé
03:53sur la plateforme à l'époque.
03:55GPT-4,
03:56en 2023,
03:58a complètement new era.
04:00Il a complètement changé
04:03le set de casques
04:05que les gens puissent tackle.
04:06Il a avait access
04:07à la tool.
04:07Il a avait beaucoup
04:08de reasoning.
04:09Et, of course,
04:11so many founders
04:12et builders
04:13ont essayé de adopter
04:14ce que
04:16ils ont utilisé
04:17leur produit.
04:18Sorry,
04:19c'est complètement
04:19sur le côté
04:20de l'AiDungeon.
04:23Je pense que
04:24vous avez encore
04:25qui est encore
04:25à l'aider.
04:27Donc, GPT-4,
04:29un exemple
04:29de ma préférée
04:30de l'AiDungeon.
04:30de l'AiDungeon
04:31de l'AiDungeon
04:32de l'AiDungeon
04:33de l'AiDungeon
04:37a simple way
04:37to have
04:38a unique
04:41playlist
04:42for your mornings.
04:43But let's remember
04:45when we talk
04:46about all of this,
04:46we are still
04:47in the early innings
04:48of this transition,
04:49which represents
04:50a really fundamental shift
04:51into how we build,
04:53implement,
04:54and how we consume software.
04:56So,
04:56our goal
04:57at l'AiDungeon
04:58is to be the best
04:59at serving
05:01platform solutions
05:02so that you,
05:03in turn,
05:03can build
05:04in this transition.
05:08And on the right side,
05:10you can see, like,
05:10GPT-4O
05:11that we just launched
05:12last week,
05:13our new flagship model.
05:15And it has,
05:16now, the ability
05:17to reason
05:17across audio,
05:19text,
05:20and vision
05:20in real-time.
05:22And I expect
05:23that we'll see
05:24a new set
05:24of assistive experiences
05:26are sharing
05:26with GPT-4O.
05:28With this native audio
05:29and image capabilities,
05:30you can now have,
05:32really, like,
05:33human-to-computer
05:34interactions
05:34in a way
05:35that we've never
05:35seen before.
05:38And for each
05:39of those waves,
05:40we truly believe
05:41that the startups
05:42are the ones
05:42that drive innovation.
05:44You know, empowering startups
05:45is critical for us.
05:46We have the deep conviction
05:48that the startups
05:49are the ones
05:49that are always going
05:50to be pushing the envelope
05:51and inventing
05:52the AI-native
05:54form factors
05:54for products.
05:55And the key categories
05:56that you see
05:57on the screen,
05:58you know, customer service,
05:59knowledge assistants,
06:00agents, and so many more
06:01are going to be
06:02where we see
06:03the most reinvention.
06:06And in fact,
06:07you can also see
06:08many logos here
06:09with a ton of innovation
06:11happening in Europe.
06:12And we expect
06:13that several products
06:14actually will most likely
06:16come out of here
06:17in France
06:18and ultimately
06:19become category leaders
06:21as well.
06:21And this is really
06:22one of the most
06:23exciting time
06:24for builders
06:25to be building
06:26an AI-native company.
06:29So next,
06:30let's dive into
06:30our new model
06:32that we launched
06:32last week,
06:33GPT-4.0.
06:37GPT-4.0 is our new
06:38flagship model,
06:39as I mentioned.
06:40It allows you to build
06:41the most natural
06:43human-to-computer interfaces.
06:44There are three
06:45key step functions
06:46that I'd like to highlight
06:47today.
06:47The first one
06:49is that
06:51it's really like
06:52multi-modal
06:53at ultra-fast latency.
06:55and I think that's something
06:56truly unique
06:57that we've never been
06:57able to achieve before.
06:59For those of you
07:00who have used
07:00voice mode in chat GPT
07:02before,
07:02or maybe some developers
07:03in this room
07:04who have built
07:05voice experiences
07:06in the past,
07:07you might remember
07:08that you had to stitch
07:09together multiple models
07:10to do transcription,
07:11to do reasoning,
07:12and finally to do text-to-speech.
07:14And that, of course,
07:15in turn,
07:16added some latency
07:17and a little bit
07:17of a loss of context.
07:19But now,
07:20with GPT-4.0,
07:21a true omni-model
07:22as we call it,
07:23it's the same model
07:24that handles
07:25all of these steps.
07:28And when you look
07:29at this chart,
07:30GPT-4.0 is really
07:31our new state-of-the-art
07:33reasoning model, too,
07:34with performance improvements
07:35in chat and coding.
07:37This leaderboard
07:38is managed
07:38by LMSYS
07:39where users actually
07:40try in the blind,
07:42essentially,
07:42two models side-by-side
07:43to see what they prefer
07:44and without knowing
07:46what model
07:47is underlying them.
07:49And you can see
07:49GPT-4.0 on the far left
07:51is really in a class
07:52of its own
07:53when it comes
07:54to those reasoning capabilities.
07:58Last thing I want
07:59to call out,
07:59we also are shipping
08:00massive efficiency improvements
08:02in GPT-4.0 as a model.
08:04And what that means
08:05is that we are also able
08:06to pass those efficiencies
08:08to all of you building
08:09on our platform.
08:10So for developers
08:11using the API,
08:12that means that GPT-4.0
08:13is twice as fast
08:15as what GPT-4.0 was.
08:17It's half the price
08:18of GPT-4 Turbo.
08:20And finally,
08:21it's also 5x
08:23higher rate limit.
08:24So we're getting
08:24closer and closer
08:26to a place
08:28where developers
08:28can really scale
08:29massively
08:30on the platform
08:31compared to GPT-4.0
08:33Turbo previously.
08:34But really,
08:35the best way
08:35to show this to you
08:36is to do some live demos.
08:38So I'm going to switch
08:39here to my laptop.
08:44Initially,
08:45I wanted to actually
08:46demo a few things
08:47from the iPhone.
08:48But we had a little bit
08:49of an issue here
08:50to get the iPhone
08:51connected to the venue.
08:53So instead,
08:53I'm going to try
08:53to actually demo this
08:55from the Mac app
08:56that you see here
08:56on the screen.
08:57And we'll pull up
08:59voice mode.
09:00So let's try this out.
09:07Hey, Chad, GPT.
09:08How are you?
09:09Hey there.
09:10I'm good.
09:10Thanks for asking.
09:11How about you?
09:12I'm great.
09:13I'm on stage here
09:14at Viva Tech.
09:15Can you say a few words
09:16to welcome the attendees?
09:19Absolutely.
09:20Hello, Viva Tech attendees.
09:22It's fantastic
09:23to connect
09:23with so many brilliant minds
09:25and innovators.
09:25I really like how you started,
09:27but could you make it
09:28even more alive
09:29and a bit more excited
09:30as if you were talking
09:31to a large, large audience
09:33of amazing founders?
09:36Absolutely.
09:37Hello, Viva Tech attendees.
09:39Are we ready
09:40to ignite the spark
09:41of innovation
09:41and set the tech...
09:43Okay, Chad, GPT.
09:43Thank you.
09:44I really enjoyed that.
09:46That's awesome to hear.
09:47If there's anything else
09:48I can help with
09:49or add some excitement to,
09:50just let me know.
09:51So the first thing
09:52to notice here
09:53in this experience,
09:54like first,
09:55extremely low latency.
09:56It's amazing to talk
09:57to ChadGPT in real time.
09:59The second thing to note
10:01is that I can now
10:02interrupt the model
10:03in real time.
10:03I don't really have to wait
10:05for the entire audio
10:05to finish.
10:06I can just re-engage
10:07in the conversation
10:08whenever I feel like.
10:10And the third thing
10:10that you might have noticed
10:11as well is that
10:12that now understands tones
10:13and can generate tones
10:15as well.
10:15So when I was trying
10:16to get ChadGPT
10:17to be more engaged,
10:18more excited,
10:18that in turn changed
10:20the audio output.
10:22And of course,
10:23you have all of the amazing
10:25reasoning capabilities
10:26behind the scenes,
10:27which means I can use this
10:28for an immense amount
10:30of use cases.
10:31Like for instance,
10:31say real-time translation.
10:33Let's try this.
10:34Hey, ChadGPT.
10:35I'm here at VivaTech,
10:36but I'd like you to act
10:38as a translator.
10:39So can you translate
10:41everything I say in English
10:42back to French
10:43and vice versa?
10:46Salut.
10:46Bien sûr.
10:47Je peux faire ça.
10:48Dis-moi simplement
10:49ce que tu veux dire
10:50et je le traduirai pour toi.
10:52What do you think
10:54is going to be your favorite sport
10:55at the Olympics?
10:57Qu'est-ce que tu prévois?
10:59What will it be your favorite sport
11:01at the Olympics?
11:02A ton avis,
11:03quel sera ton sport
11:04préféré aux Jeux Olympiques?
11:06So see,
11:07you can really use it
11:08in real-time.
11:08And we're still, of course,
11:09making progress
11:10on the French accent here.
11:12But it's really interesting
11:14to see how you can use
11:15these experiences
11:16for education
11:17or any kind of use cases
11:19like travelling
11:20and real-time use case.
11:22But the second thing
11:23that I also wanted to show you
11:25beyond the voice capabilities
11:26was also the vision capabilities.
11:28So I've never done this before
11:30on the Mac app
11:30because I was supposed
11:31to show this to you
11:32on iPhone,
11:35but I'm just going to try it anyway.
11:37So let's go ahead.
11:40Hey, CHIGPT,
11:41I'd love your help
11:41to navigate around the city.
11:43So I'm going to draw something
11:44and I'll show it to you.
11:46Does that sound good?
11:47Sure thing.
11:48Go ahead and draw what you need
11:50and I'll do my best
11:51to help you navigate
11:52around the city.
11:53All right.
11:54Give me a moment.
11:56No rush.
11:57Take your time.
11:58He's patient.
12:01I'm right here
12:02when you're ready.
12:05All right.
12:06So I'm going to turn
12:06on the camera here.
12:09Got it.
12:10Ready when you are.
12:11So what do you see?
12:13Looks like you've drawn
12:14the Eiffel Tower
12:15and the Arc de Triomphe.
12:17Two iconic Parisian landmarks.
12:19How can I help you with them?
12:20That's pretty awesome.
12:21Thank you for understanding
12:23my bite drawings.
12:24Yeah.
12:24So I'd like your help
12:26to navigate there
12:27and I'll show you a map here.
12:31It looks like we are at Porte de Versailles
12:35right here.
12:37got it.
12:38You're starting at Porte de Versailles
12:41to get to the Eiffel Tower.
12:43You can take the Metro line 12 M12
12:45from Porte de Versailles
12:47to Concorde,
12:48then switch to line 8 M8
12:51towards Ballard
12:52and get off at Ecole Militaire.
12:54From there,
12:55it's a short walk to the Eiffel Tower.
12:57Awesome. Thank you.
12:58I think I got it sorted.
13:00Thank you so much.
13:00You're welcome.
13:01Have a great time.
13:09So something that's interesting here to note
13:11is this idea of vision capabilities
13:14are in real time.
13:15So I can actually draw something,
13:17show a video
13:18as opposed to just taking a photo like before.
13:20So with just this bad drawing,
13:23ChatGPT was able to figure out
13:24which were the landmarks
13:25and use the subway map
13:26to actually navigate.
13:28But one of my favorite use cases...
13:30Lauren, Paris.
13:31Sounds like you're on stage again.
13:32What's next?
13:33I am.
13:35I'm just gonna...
13:36Yeah, it's muted.
13:36The last thing I wanted to show you
13:38with the Mac app here
13:39is an example
13:40where I love using it personally
13:41as a coding assistant.
13:42So let's try that.
13:44So I'm gonna pull up ChatGPT
13:46on the right side here
13:48and I'm gonna bring up some code
13:51and what I'm gonna do
13:52is I'm gonna unmute ChatGPT
13:54and I'm gonna send ChatGPT some code
13:56to give us a bit of context on this code
13:58and we'll try to fix a bug in real time.
14:01Hey ChatGPT,
14:02so I'd love your help for a coding question.
14:05Sure, I can help with that.
14:07Yeah.
14:07What's your recruiting question?
14:08So I'm just gonna copy some code over to you.
14:12All right.
14:13Go ahead and share the code when you're ready.
14:15Can you describe just in one line of a view
14:18like what's the code looking like?
14:21The code is a React component named Discover
14:24that renders a grid of Discover card components
14:26with data for each card being pulled
14:28from a trips.json file.
14:30Awesome. Yeah, that's right.
14:32Now what I'd like you to do
14:33is to take a look at my screen
14:34that I'm gonna share with you here.
14:37And now if you look at my screen,
14:39what I'd like to change
14:41is that it looks like it's not really responsive
14:43on mobile screens.
14:45See like the content is really squashed.
14:46Can you help me make this page responsive?
14:50Sure, let's make it more responsive.
14:52You can adjust the number of columns in your grid
14:54based on the screen size
14:55using Tailwind CSS's responsive design features.
14:59Here's a quick way to modify your grid layout.
15:02Div color call this scheme grid call of one,
15:05same as zero to four column.
15:07That makes up most...
15:08I see.
15:09So I'm actually using Tailwind.
15:10So can you tell me like what should the attributes be?
15:13Absolutely.
15:14For Tailwind,
15:15you can adjust your grid calls classes
15:17to make the grid responsive.
15:19Here's how you can set it up.
15:21Grid calls will control how many columns you have
15:24based on the current screen size.
15:26So you don't need too many to fill the grid.
15:29Which is a good rule of thumb
15:30for most responsive designs like this.
15:32This sets one column for most screen sizes,
15:35two columns for small...
15:37I see.
15:38So I should just do like small,
15:40I should put one.
15:41Is that right?
15:42And then more as we go?
15:44Exactly.
15:46Okay.
15:47You can adjust the number of columns
15:49based on different screen sizes.
15:51For example,
15:52grid calls one for one column on default,
15:55small screens.
15:56SM-grid calls dash two for two columns
15:59on small screens and up.
16:00MD grid calls dash three for three columns
16:02on medium screens and up and so on.
16:04Tailwind's response...
16:05Okay, I think I got this.
16:06Can you take a look and see if it looks good?
16:10Sure.
16:11Let me take a look.
16:12It looks like the changes you've made
16:14should make your grid more responsive
16:16on different screen sizes.
16:18Perfect.
16:19Awesome.
16:19Thank you so much.
16:19I think we have the app perfectly responsive now.
16:23Thank you.
16:25You're welcome.
16:26I'm glad I could help.
16:27Enjoy your responsive app.
16:31And one thing I wanted to call out here
16:33is not only can we engage,
16:35practically dive into a problem,
16:36like a coding problem,
16:37and you can copy code,
16:38you can actually look in real time at the screen,
16:40but what's also interesting to call out
16:42is the speed of GPT-4,
16:44even for text, right?
16:45Like, when we have this new playground
16:47where you can compare side-by-side GPT-4
16:48to say GPT-4 Turbo,
16:50if I hit Run to create this song about the Olympics,
16:53you can see how much faster it is
16:55to create content versus GPT-4 Turbo.
16:58So it's a very high reasoning ability
17:01at, like, more than twice the speed.
17:04All right.
17:04So let's go back to the slides now.
17:07I'd like to give you a little bit of a glimpse
17:09as to what we're thinking about
17:10for what's next at OpenAI.
17:15There are four investment areas I'd like to cover.
17:18The first key priority that we have
17:21is textual intelligence.
17:22And our core belief is that
17:24if we increase textual intelligence,
17:25that will unlock transformational value in AI.
17:29And you can see on the screen here,
17:32these are the two major models that we offer today.
17:34GPT-4, the best model with native multi-modality
17:37that we just showed,
17:38and GPT-3.5 Turbo, 10x cheaper,
17:41which is convenient for simple tasks
17:42where what you need is really things like classification
17:45or very simple entity extraction.
17:49And we really expect that the potential
17:51to increase the LLM intelligence remains huge.
17:55And today we think models are pretty great.
17:57You know, they're kind of like first or second graders.
18:01They respond appropriately,
18:02but they still make some mistakes every now and then.
18:05But the cool thing that we should remind ourselves
18:07is that those models are the dumbest they'll ever be.
18:10You know, they may become master students
18:12in the blink of an eye.
18:13They will excel at medical research or scientific reasoning.
18:18And we think that within maybe a year or two from now,
18:21like the models will be unrecognizable from what they are today.
18:24And so this year we also plan to push that boundary even more.
18:27And we expect our next frontier model to come
18:30and provide like a step function in reasoning improvements as well.
18:36The second investment area for us is to make sure the models are cheaper
18:40and faster all the time.
18:41And we know that not every use case requires the highest level of intelligence.
18:45And so that's why we want to make sure that we invest.
18:49And you can see here on the screen the GPT-4 pricing
18:52and how much it's decreased by like 80% in just a year.
18:56It's quite unique, by the way, for a new technology to decrease in price so quickly.
19:02But we think it's like really critical in order for all of you to build
19:06and reach scale with what you're trying to accomplish
19:09and innovate with your AI native products.
19:13The second thing to call out is like we also want to make sure
19:16that we release models of different sizes to address like all of your needs
19:21when you want to scale.
19:22You know, we know that some agentic workflows, for instance,
19:25may always push the reasoning needs at the frontier,
19:28but they may also need for sub-processes some smaller models
19:33that are more latency sensitive, for instance.
19:35So we don't have any timelines to share on this today,
19:38but just wanted to share this is something that we are very excited about investing in.
19:42And finally, we also want to help you run async workloads.
19:45Just a few weeks ago, we introduced the Batch API.
19:47And the Batch API is a very convenient way
19:50to send all of your requests to OpenAI in bulk.
19:53And what that means is that you will be able to have another 50% discount
19:58for the tasks that don't have the need for an immediate response.
20:01So it's perfect, for instance, if you want to send like a lot of pictures and images
20:05to analyze with vision or like, you know, lots of documents, for instance,
20:10at scale, really convenient to use this Batch API.
20:15Third, we are very much investing in model customization.
20:19And in the future, we think that every organization will have a different approach to that,
20:24but they'll want to have a model that really knows their business in depth.
20:28And so for that, we're going to have a range of fine-tuning offering.
20:32And the first one is simply the fine-tuning API that any of you can start using today.
20:38It's completely self-serve.
20:40And this is a way to kind of bring, you know, data on top of the models
20:45to fine-tune some parameters and make it, you know, adapted behavior for your particular use case.
20:53The second one is assisted fine-tuning, when our teams can work with yours
20:58in order to kind of think about how we can get even higher performances for very specific tasks.
21:05And our team of experts will assist you in doing that.
21:09And last but not least, there's also the ability to train a custom model entirely with OpenAI.
21:14And these models are trained completely, you know, and safely using proprietary data
21:20just for a company that we work with.
21:22And one example I'd like to highlight for this, two examples actually to highlight for this.
21:28The first one is from SK Telecom.
21:30They were able to do, like, the assisted fine-tuning technique,
21:33and that led into, like, an increase in model performance across the board
21:38for very much, like, you know, Korean operations and telco businesses.
21:44And the second one that I'm pretty excited about also is a startup called Harvey in the US.
21:49And they are essentially building a product for lawyers.
21:52And what they've done is, like, a completely custom model to be an expert at US case law.
21:58And they've achieved, like, 83 increase in factual responses.
22:03And 97% of their users immediately preferred their custom model for that particular use case.
22:09Last but not least, we'll continue to invest in enabling agents as well.
22:13And we want to make sure that you can actually build assistive experiences and agents that use all of the
22:19modalities we've shown.
22:24We really believe that, like, in the future,
22:27agents may be the biggest change that will happen to software and how we interact with computers.
22:32And depending on the task, they'll be able to leverage text.
22:34They'll be able to leverage access to some context and tools.
22:37So, and again, all of these modalities that we mentioned will bring also, like, a fully natural and knowable way
22:46to interact with the software.
22:52One example of this that I personally love is DevIn by the team at Cognition.
22:56They built, like, essentially an AI software engineer.
22:59And it's pretty fascinating because it's able to kind of, like, take a complex task.
23:04And it's able to not just write code, but it's able to also understand the task, create, like, tickets, browse
23:10the internet for documentation when it needs to fetch, you know, to fetch new information.
23:17It's able to deploy solutions to create pull requests and so on.
23:21So it's kind of, like, one of those agentic use cases that I really love.
23:24And in fact, like, this tweet from Paul Graham earlier this year kind of caught my eye because he mentioned
23:30or realized that, like, the 22-year-old programmers these days are often as good as the 28-year-old
23:36programmers.
23:37And I think when you reason about, like, how the 22-year-olds are already adopting AI and tools like
23:42DevIn, it's no surprise that they're getting more and more productive thanks to AI.
23:48Another agent experience that I think this time is more towards consumer is Presto.
23:52And Presto is letting customers place orders with their voice, so using a voice agent.
24:00And, of course, there's not many drive-thrus here in Europe.
24:03But what I found compelling about this example is that it's really helping a market where there's been a labor
24:10shortage.
24:11And so, in turn, that helps offer not only a great experience, but also let the staff actually focus on
24:18food and serving the customers.
24:21But with that, I'd like to dive into a couple more live demos to illustrate a little bit how you
24:27can build assistive experiences and agents practically today.
24:30Okay. So, our first incarnation of agents for developers is what we call the Assistance API.
24:38And the Assistance API is a complete toolkit that all of you can use in order to bring assistance into
24:44your products.
24:45So, in this case here, I'm building this, like, travel app called Wanderlust.
24:48And you can see, there's, like, a map on the right side, but there's also an assistive experience on the
24:53left side.
24:53And so, this is completely powered by the Assistance API.
24:56So, let's take a quick look.
24:58If I say, top five venues for the Olympics in Paris.
25:04First of all, first thing to note, I don't have to manage any of those.
25:08Oh, let's refresh the app a little bit.
25:10Sounds like we maybe lost network.
25:12Top five venues for the Paris Olympics.
25:19The first thing to note is, like, I don't have to manage that conversation history.
25:23That conversation history is automatically managed by the Assistance API from OpenAI.
25:27And so, I don't have to kind of manage my prompt and so on.
25:31Not sure what's happening here.
25:34Let's take a quick look.
25:35Might have lost some Wi-Fi.
25:38No connection.
25:39Nope.
25:41Let's try it one last time.
25:43Let's go to Rome.
25:47Ah, there we go.
25:49Sounds like the Olympics was bad luck, but sounds like we're back.
25:51So, yeah.
25:52I don't have to actually manage any of those messages.
25:54The conversation history is automatically managed by OpenAI.
25:58The second thing that's really cool to call out here is that, as you could see,
26:02when I started to interact with these messages, the map zoomed automatically.
26:06And that's one of my favorite features when I build agents.
26:09It's called function calling.
26:10And function calling is the ability for all of you to bring knowledge about your unique features in your app
26:17and your unique functions over to the model, in this case, GPT-4O.
26:20So, if I say top five things to see in Rome, let's see what happens here.
26:27In theory, what should pop up here is, once again, an interaction between the text and the map.
26:32Here we go.
26:33So, now, as you can see, as we are talking to the model, it's able to actually pinpoint the map
26:39because it knows that this feature exists.
26:41So, it's really, really cool.
26:43And that's already available as part of the toolkit of the Assistant CPI.
26:47Now, another tool I wanted to call out here is knowledge retrieval.
26:51And we know so many of you want to bring, like, factual data into the conversations with models like GPT
26:57-4O.
26:58And usually, you have to build, like, a retrieval stack to do so.
27:01And we've learned from so many developers how complex that can be.
27:04And so, we've made a ton of improvements in our retrieval stack.
27:07And so, I'm going to try to see if I can actually demo this in real time.
27:11So, I actually bought this, like, book to prepare a trip to Italy from Lonely Planet.
27:16It's a pretty comprehensive book.
27:18It has, like, 250 pages.
27:19It's, like, 95 megabytes.
27:21So, I hope the upload is going to work.
27:22I'm taking a bit of a risk here.
27:24But what's happening in real time is, like, as soon as the file will be uploaded,
27:28it will be automatically embedded by the Assistant CPI so that I don't have to think about any of these
27:34things to do.
27:34I will be able to just start interacting in the conversation and say, based on this book, what's the best
27:42photo spot in Lazio?
27:45So, before I press Enter, I'll show you a quick look at page 126, I believe.
27:52Let's go to page 126.
27:55So, the page 126 talks about Lazio, right?
27:58And so, I'm going to, like, ask the question here, what's the best photo spot in Lazio?
28:04And, as I'm browsing the book, we're noticing here that, like, the photo opportunity was mentioned on page 128,
28:11and it's supposed to be Pitigliano.
28:13And, boom, in real time, we were able to find in this book that this is exactly the place for
28:21a photo spot.
28:22And, again, I had to do no engineering work.
28:24I just had to upload the file in the conversation that was all taken care of for me.
28:29Last but not least, there's also another tool that I wanted to highlight called Code Interpreter.
28:33And Code Interpreter is this ability to write Python code in the background to answer some very precise questions,
28:39usually around, like, numbers and math and financial data.
28:43So, here, for instance, if I were to say in this conversation, we are sharing an Airbnb for four.
28:54It's 1,200 euros.
28:58What's my share plus my flight cost of, let's say, 260?
29:06Now, by asking this question, this is not a typical thing that LLMs do great at by default, right?
29:11But what's happening behind the scenes is that we're actually computing all of this, including currency conversion and so on,
29:18by writing code in the sandbox.
29:20And, once again, as a developer, I have nothing to do.
29:22But because OpenAI is managing this does not mean it's a black box.
29:26In fact, if I go here and if we refresh the threads,
29:30we should see here that this is the exact threads that we've been, you know, feeding.
29:34And you can see we're going to roam.
29:37Like, all of the messages, we see the function calls that I highlighted to annotate the map.
29:41And here, this is the Python code that was written behind the scenes to actually answer the question, you know,
29:47compute the currency conversion, divide by the number of people, and so on.
29:50So, really, like, the Assistant's API, a complete toolkit with conversation history, with access to retrieval and files.
29:57You can upload, now, up to 10,000 files in retrieval and even code interpreter and function calling.
30:03All of this, what you can build on from day one.
30:06Now, last but not least, I wanted to kind of show you another kind of demo that goes beyond just
30:11conversation.
30:12So, using the Mac app from ChatGPT, we've seen audio conversations.
30:17And this is also going to be coming in the next few weeks in the API.
30:21We've also seen the Assistant's API available today for text-based conversation that has vision support as well.
30:27But I wanted to do something a little different with our multi-modality.
30:31So, I'm sure most of you have seen the preview of Sora.
30:35And Sora is our diffusion model that's able to generate videos from a simple prompt.
30:40And at this stage, it's still a preview, but I wanted to show you what the model can create with
30:46a prompt like this one.
30:49And this prompt talks about Paris during the Expo Universelle in a colorized vintage film.
30:55So, this is the only part of the demo that's not live because, in the interest of time, I previously
30:59sent this prompt to Sora.
31:01This is what came out of it.
31:07So, this is all directly coming out of the model directly.
31:15No edits.
31:16The only edit we made was actually just to add some sound on top of it.
31:20But this is really magical to experience, these frames.
31:25But now, what if we want to take this video, but do something quite interesting with it?
31:29What if we want to create a script to narrate what's happening on those visuals?
31:34So, the first step here is that I'm going to slice a few frames out of this video for a
31:38minute.
31:39So, here we are. We have six frames.
31:40And what I'm going to do now is I'm going to send these six frames to GPT-4O.
31:45And in real time, we'll get a story about what is happening in those frames.
31:52So, to give you a sneak peek behind the scenes, this is the code I'm going to be running.
31:56So, I'm going to be saying, hey, you are a history professor.
31:58You'll be given a series of images, and I would like you to narrate those.
32:02And I'm simply taking all of the images that we are showing on screen,
32:08sending that to GPT-4O and streaming the responses back.
32:11So, again, this is happening in real time.
32:13So, let's see what GPT-4O will come up with for that story.
32:18And here we go.
32:20So, that's a story completely generated on the fly using our vision capabilities in GPT-4O.
32:27And now, what if I want to go one step ahead?
32:30And what if I would like to share this video, but this time bring it to life with some audio
32:35to share it more broadly?
32:36Well, you know, you can use our text-to-speech voices that we offer in the API.
32:42It's like a handful of voices in the API.
32:44But a few weeks back, we also previewed our voice engine model.
32:48And the reason why we preview these models as we are doing research is to really engage with all of
32:55the stakeholders
32:56and kind of show what the technology is good at and engage with trusted partners to see and gather feedback
33:02from them along the way.
33:04So, here I wanted to show you a quick preview of what that could look like here for the voice
33:09engine.
33:10So, I'm just going to record a little bit of a sample here of my voice and see what comes
33:14out for the narration.
33:16So, let's take a look.
33:17Hey, so I'm very excited to be on stage here at Vivotech.
33:20I've been meeting some amazing founders and developers already.
33:24I'm very excited as well to show them some live demos and how they can really apply, like, the OpenAI
33:30technology and models in their own products and businesses.
33:34All right, so I think that should be good enough.
33:36Hey, so I'm very excited to be on stage here at Vivotech.
33:38And now, the last step is that I'm going to share this audio sample with the script that we created
33:44over to Text2Speech.
33:47And we'll bring everything together for our modalities to experience this history lesson.
34:00In the heart of Paris, during the 1889 Exposition Universelle, the Eiffel Tower stands proudly as a symbol of...
34:07So, it's now narrating the video that I can share.
34:10And, of course, I don't speak many languages.
34:12But now, if I want to share it, say, not just in French but other languages, I can click through
34:19to be able to share that content more broadly.
34:23So, here in the Paris, during the Exposition Universelle de 1889,
34:27the Eiffel Tower stands proudly as a symbol of innovation and vision for the future.
34:33And let's try one last, for instance, Japanese.
34:42It's going to take a second.
34:50This is me now speaking Japanese to share this audience to Japan.
34:55And last but not least, I can also add, you know, transcription to add subtitles on top of it.
35:01So, once again, this is very much like a preview.
35:03I wanted to give you a sneak peek.
35:04We take safety extremely seriously with these kind of models and capabilities.
35:09So, that's why we're only giving this to trusted partners at this time.
35:12But I hope, just in general, this inspires you in terms of, like, what all of these modalities will be
35:18able to accomplish
35:19and how you can start thinking about the future when it comes to building your own apps and products.
35:26So, let's go back to our presentation to wrap up.
35:34So, to wrap up, we're focused on four major investment areas, as I mentioned.
35:40We're trying to drive up, you know, textual intelligence.
35:43We're making our models faster and more affordable for all of you to scale.
35:47We're helping you customize the models as well for your needs.
35:51And finally, we're empowering you to build multi-modal agents, as you've seen with the assistance API.
35:59So, if there's one thing I'd like you to take away from this presentation, I'd like to leave you with
36:04this.
36:04Our goal at OpenAI is not for you to spend more with OpenAI.
36:09But, in fact, our goal is for you to build more with OpenAI.
36:13And that's really, like, the takeaway for us.
36:16And we really want you to kind of take this thought with you, you know.
36:21Like, I think that's why we're trying to push the textual intelligence.
36:24That's why we're trying to always drive the cost down for the model.
36:28So, you all can build amazing things for your own users and customers.
36:33We're really dedicated to supporting startups and developers.
36:36We look forward to partnering with as many of you as possible.
36:39One of the top questions that we often receive from developers is, like, how could you prepare?
36:44How could I prepare for future models? Like, what should I expect?
36:47And, you know, we really think there's two strategies to think about.
36:50One is, like, you can bet the models are going to stay roughly the same.
36:54And you can work around GPT-4 and what it does today.
36:57Or second, you can think about how the models are going to be much more capable in the future.
37:03And you can bet on the future of AI.
37:05And we think the latter is much more likely.
37:07So, you know, our recommendation is really to build for that future of AI.
37:13And we know that this is how the most innovative developers can out-execute the slower incumbents.
37:20So, with that, thank you so much.
37:23And we can't wait to see what you build.
37:25Thank you.
37:29Well, man, that was awesome.
37:32Thank you so much to you.
37:34Thank you to chat GPT.
37:35Thank you to OpenAI for that demo.
37:37We're going to have a quick stage change here.
37:39Now, do not go away because we're going to move straight on with our next session,
37:44which is on personalizing the customer experience with AI.
37:47For those of you who have just seen that, you're going to want to stay for this.
37:50We're going to have our moderator, who is the CTO of Generative AI at Capgemini coming on stage.
37:57It's Andy Vickers with his panel in just a few seconds.
38:01Stay right here for that.
38:02Thank you.
38:05Applause
Commentaires

Recommandations