00:00 My name is Billy Pirago, I'm a tech correspondent at Time Magazine and I've spent much of this year
00:04 reporting on artificial intelligence. In a lot of ways 2023 was the year that people began to
00:11 understand what AI really is, but there were plenty of innovations as well. Here's three to keep an eye on.
00:16 The first is multimodality. That's the ability of an AI system to work with lots of different
00:29 types of data, not just text but also images, video, audio and more. 2023 was the first year
00:35 that the public really gained access to powerful multimodal AI models like OpenAI's GPT-4 which
00:41 allowed users to upload images as well as text. GPT-4 could see the contents of images which opened
00:47 up all kinds of possibilities. You could ask it what to make for dinner based on a photograph of
00:51 what was inside your fridge or you could ask it how to fix your bike based on a photograph of a
00:55 broken part. Google DeepMind's latest model Gemini is also able to work with images as well as text.
01:01 In its launch video, after being shown an image of pink and blue yarn and asked what it could be used
01:06 to create, Gemini generated an image of a pink and blue octopus plushie. The real innovation behind
01:12 multimodality is that instead of just being trained on text, the new generation of models are trained
01:17 on video, images and audio. The belief inside many top AI companies is that this extra training data
01:24 will help these models become more capable and more powerful. It's a step on the path,
01:29 many AI scientists hope, towards so-called artificial general intelligence, the kind of
01:34 system that can act in the world, make new scientific discoveries and perform economically
01:39 valuable labour. The second big thing to watch in AI innovation from 2023 is constitutional AI. One
01:47 of the biggest unanswered questions in AI is how to align it to human values. If AI becomes smarter
01:53 and more powerful than humans, it could cause untold harm to our species, some even say total
01:58 extinction, unless somehow it's constrained by a set of rules that puts human survival and human
02:05 flourishing at its centre. Constitutional AI, first described by researchers at Anthropic in December
02:11 last year, harnesses the fact that AI systems are now basically capable enough to understand
02:15 natural language. The idea is quite simple. First, you write a constitution that lays out the values
02:22 you'd like your AI to follow. Then, you train the AI to score its own responses based on how aligned
02:28 they are to the constitution, and then incentivise the model to output only the responses that score
02:34 more highly. If you run that cycle enough times, you're left with an AI that has been reinforced
02:41 to behave in the way that you want it to, and to not behave in the way that you don't want it to.
02:46 There are some problems with constitutional AI. It requires trusting that the AI is interpreting
02:52 your constitution correctly, for example, but it's a promising addition to a field where new
02:56 alignment strategies are few and far between. Of course, constitutional AI doesn't solve the
03:01 problem of whose values AI should be aligned to. Today, it's a small number of Silicon Valley
03:06 executives who are writing those rules. But by making the act of setting rules for an AI so
03:11 explicit, constitutional AI could open the door to a future where the public gets more of a say
03:15 in how AI is governed. The third big thing to watch this year is text-to-video. One of the
03:23 noticeable outcomes of billions of dollars pouring to AI this year has been the rapid rise of text-to-video
03:28 tools. Last year, even text-to-image tools had barely emerged from their infancy, but now there
03:33 are several companies offering the ability to turn normal sentences into moving images with
03:38 increasingly fine-grained levels of accuracy. One of those companies is Runway, a Brooklyn-based AI
03:43 video startup that wants to make filmmaking accessible to anybody. And another is Pika AI,
03:48 which isn't pitched at professional filmmakers but at the general user. Tools like Pika and Runway
03:53 could transform the user-generated content experience as early as 2024, but text-to-video
03:59 is quite computationally expensive still, so don't be surprised if tools start charging for access.
04:04 [Music]
Comments