- 5 tuần trước
Danh mục
🤖
Công nghệPhụ đề
00:00What if all of the world's biggest problems, from climate change to curing diseases to
00:05disposal of plastic waste, what if they all had the same solution? A solution so tiny it would
00:12be invisible? I'm inclined to believe this is possible thanks to a recent breakthrough that
00:17solved one of the biggest problems of the last century. How to determine the structure of a
00:22protein. It's been described to me as equivalent to Fermat's last theorem, but for biology.
00:28Over six decades, tens of thousands of biologists painstakingly worked out the structure of 150,000
00:34proteins. Then, in just a few years, a team of around 15 determined the structure of 200 million.
00:42That's basically every protein known to exist in nature. So how did they do it? And why does this
00:49have the potential to solve problems way outside the realm of biology?
00:53A protein starts simply as a string of amino acids. Each amino acid has a carbon atom at the center,
01:02then on one side is an amine group, and on the other side is a carboxyl group. And the last thing
01:07it's bonded to could be one of 20 different side chains, and which one determines which of the 20
01:13different amino acids this molecule is. The amine group from one amino acid can react with the
01:20carboxyl group of another to form a peptide bond. So a series of amino acids can bond to form a string,
01:27and pushing and pulling between countless molecules, electrostatic forces, hydrogen bonds,
01:33solvent interactions, can cause this string to coil up and fold onto itself. This ultimately determines the
01:393D structure of the protein. And this shape is the thing that really matters about the protein.
01:45It's built for a specific purpose, like how hemoglobin has the perfect binding site to carry
01:50around oxygen in your blood. These are machines. They need to be in their correct orientation
01:57in order to work together to move, for example, the proteins in your muscles. They change their shape
02:02a little bit in order to pull and contract. But it would take people a long time to get the structure
02:07of just one protein. Absolutely. So what should proteins look like was only started to answer
02:12really with experimental techniques. The first way protein structure was determined was by creating
02:18a crystal out of that protein. This was then exposed to x-rays to get a diffraction pattern,
02:24and then scientists would work backwards to try to figure out what shape of molecules would create
02:29such a pattern. It took British biochemist John Kendrew 12 years to get the first protein structure.
02:36His target was an oxygen storing protein called myoglobin, an important protein in our hearts.
02:43He first tried a horse heart, but this produced rather small crystals because it didn't have
02:47enough myoglobin. He knew diving mammals would have lots of myoglobin in their muscles,
02:53since they're the best at conserving oxygen. So he obtained a huge chunk of whale meat from Peru.
02:59This finally gave Kendrew large enough crystals to create an x-ray diffraction image.
03:04And when it came out, it looked really weird. People expected something kind of logical,
03:10mathematical, understandable, and it almost looked, I wouldn't say ugly, but intricate and complex,
03:15and kind of like if you see a rocket motor, right, and all the parts hanging off.
03:20This structure, which has been called turd of the century, won Kendrew the 1962 Nobel Prize in Chemistry.
03:26Over the next two decades, only around 100 more structures were resolved. Even today,
03:33protein crystallization remains a big challenge.
03:36Frankly, you know, it is not uncommon that just a couple protein structures can be someone's entire
03:42PhD, sometimes just one, sometimes even just progress toward one.
03:45And it's expensive. X-ray crystallography can cost tens of thousands of dollars per protein.
03:52So scientists sought another way to work out protein structure. It only costs around $100 to
03:57find a protein sequence of amino acids. So if you could use this to figure out how the protein would
04:02fold, that would save a lot of time, effort, and money.
04:06I kind of know how carbon behaves, and I know how carbon sticks to a sulfur, and how that might,
04:11you know, stick next to a nitrogen. And if these ones are here, then I can imagine this one folding,
04:14making that bond there. So it seems like if you have some sense of basic molecular dynamics,
04:19you might be able to figure out how this protein is going to fold.
04:22One of the few true predictions in biology was actually Linus Pauling looking at just the geometry of
04:28the building blocks of proteins and saying, actually, they should make helices and sheets.
04:32That's what we call secondary structure, the very local kind of twists and turns of the protein.
04:37But beyond helices and sheets, biochemists could not figure out any reliable patterns
04:42that would lead to the final structure of all proteins.
04:46One reason for this is that evolution didn't design proteins from the ground up.
04:50It's kind of like a programmer that doesn't know what they're doing,
04:53and whenever it looked good, they just kept adding that kind of thing. And that's
04:57that's how you end up with these both amazing objects and incredibly complex and hard to
05:01describe. They don't have purpose underneath them in the same way as like a human designed
05:07machine would. To illustrate just how complicated this process can get,
05:12MIT biologist Cyrus Leventhal did a back of the envelope calculation. And he showed that
05:17even a short protein chain with 35 amino acids can fold in an astronomical number of ways.
05:24So even if a computer checked the energy and stability of 30,000 configurations every nanosecond,
05:30it would take 200 times the age of the universe to find the correct structure.
05:38Refusing to give up, University of Maryland professor John Molt started a competition called
05:43CASP in 1994. The challenge was simple, to design a computer model that could take an amino acid
05:50sequence and output its structure. The modelers would not know the correct structure beforehand,
05:56but the output from each model would be compared to the experimentally determined structure.
06:02A perfect match would get a score of 100, but anything over 90 was considered close enough
06:07that the structure was solved. CASP competitors gathered at an old wooden chapel-turned conference
06:13center in Monterey, California. And at any point where a prediction didn't make sense,
06:17they were encouraged to tap their feet as friendly banter. There was a lot of foot tapping.
06:25In the first year, teams could not achieve scores higher than 40. The early front-runner was an
06:31algorithm called Rosetta, created by University of Washington biologist David Baker. One of his
06:37innovations was to boost computation by pooling together processing power from idle computers in homes,
06:43schools, and libraries that volunteered to install his software called Rosetta at Home.
06:49As part of it, there was a screensaver that showed basically the course of the protein folding
06:54calculation. And then we started getting people writing in saying that they were watching the
06:58screensaver and they thought they could do better than the computer. So Baker had an idea. He created a video game.
07:06The game, called Foldit, set up a protein chain capable of twisting and turning into different
07:13arrangements. But now, instead of the computer making the moves, the game players, the humans,
07:19could make the moves. Within three weeks, more than 50,000 gamers pooled their efforts to decipher
07:24an enzyme that plays a key role in HIV. X-ray crystallography showed their result was correct.
07:30The gamers even got credited as co-authors on the research paper.
07:36Now, one man who played Foldit was a former child chess prodigy named Demis Hassabis. Hassabis had
07:42recently started an AI company called DeepMind. Their AI algorithm AlphaGo made headlines for beating
07:48world champion Lee Sedol at the game of Go. One of AlphaGo's moves, Move 37, shook Sedol to his core. But
07:56Hassabis never forgot about his time as a Foldit gamer.
08:00So, of course, I was fascinated this just from games design perspective. You know,
08:04wouldn't it be amazing if we could mimic the intuition of these gamers who were only,
08:08by the way, of course, amateur biologists?
08:11After returning from Korea, DeepMind researchers had a week-long hackathon where they tried to train
08:16AI to play Foldit. This was the beginning of Hassabis' long-standing goal of using AI to advance science.
08:23He initiated a new project called AlphaFold to solve the protein folding problem.
08:30Meanwhile, at CASP, the quality of prediction from the best performers,
08:33including Rosetta, had plateaued. In fact, the performance went downhill after CASP 8.
08:40The predictions weren't good enough, even with faster computers and a growing number of
08:44structures in the protein databank to train on. DeepMind hoped to change this with AlphaFold.
08:50AlphaFold. Its first iteration, AlphaFold 1, was a standard off-the-shelf deep neural network like
08:57the ones used for computer vision at that time. The researchers trained it on lots and lots of
09:02protein structures from the protein databank. As input, AlphaFold took the protein's amino acid
09:08sequence and an important set of clues given by evolution. Evolution is driven by mutations,
09:15changes in the genetic code, which in turn change the amino acids within a given protein sequence.
09:21But as species evolve, proteins need to retain the shape that allows them to perform their specific
09:26function. For instance, hemoglobin looks the same in humans, cats, horses, and basically any mammal.
09:33Evolution says, if it ain't broke, don't fix it. So we can compare sequences of the same protein across
09:39different species in this evolutionary table. Where sequences are similar, it's likely they are
09:45important in the protein's structure and function. But even where the sequences are different, it's
09:50helpful to look at where mutations happen in pairs, because they can identify which amino acids are close
09:57to each other in the final structure. Say two amino acids, a positively charged lysine and a negatively
10:03charged glutamic acid attract and hold each other in the folded protein. Now, if a mutation changes
10:10lysine to a negatively charged amino acid, it would repel glutamic acid and destabilize the whole protein.
10:17Therefore, another mutation must replace glutamic acid with a positively charged amino acid. This is
10:23known as co-evolution. These evolutionary tables were an important input for AlphaFold.
10:29As output, instead of directly producing a 3D structure, AlphaFold predicted a simpler 2D
10:37pair representation of that structure. The amino acid sequence is laid out horizontally and vertically.
10:43Whenever two amino acids are close to each other in the final structure,
10:47their corresponding row-column intersection is bright. Distant amino acid pairs are dim.
10:54In addition to distances, the pair representation can also hold information on how amino acid
11:01molecules are twisted within the structure. AlphaFold1 fed the protein sequence and its evolutionary
11:08table into its deep neural network, which it had trained to predict the pair representation.
11:13Once it had this, a separate algorithm folded the amino acid string based on the distance and
11:18torsion constraints. And this was the final protein structure prediction.
11:24With this framework, AlphaFold entered CASP 13, and it immediately turned heads.
11:31It was the clear winner after many additions. But it wasn't perfect. Its score of 70 was not
11:38enough to clear the CASP threshold of 90. DeepMind needed to get back to the drawing board to get better results,
11:46so Hassabis recruited John Jumper to lead AlphaFold.
11:50AlphaFold2 was really a system about designing our deep learning, the individual blocks to be good at
11:56learning about proteins. Have the types of geometric, physical, evolutionary concepts that were needed
12:02and put it into the middle of the network instead of a process around it. And that was a tremendous
12:05accuracy boost. There were three key steps to get better results with AI. First, maximum compute power.
12:13Here, DeepMind was already better positioned than anybody in the world. It had access to the
12:19enormous computing power of Google, including their tensor processing units. Second, they needed a large and
12:25diverse data set. Is data the biggest roadblock? And why? I think it's too easy to say data's the roadblock,
12:33and we should be careful about it. AlphaFold2 was trained on the exact same data with much,
12:37much better machine learning as AlphaFold1. So everyone overestimates the data blockage because
12:44it gets less severe with better machine learning. And that was the third key element, better AI
12:51algorithms. Now, AI is not just good at protein folding. It can do all kinds of tasks that no one
12:58likes, from writing emails to answering phone calls. Something I hate is building and maintaining a
13:04website. It's so much work, from optimizing the website for different platforms, finding a good
13:09design so it looks professional, to constantly updating it with new information about the business
13:15as it grows. That's why we partnered with Hostinger, the sponsor of today's video. Hostinger makes it super
13:21easy to build a website for yourself or your business, and with their advanced AI tools, you can
13:26simply describe what you want your website to look like. And in just a few seconds, your personalized
13:32website is up and running. Hostinger is designed to be as easy as possible for beginners and
13:37professionals, so any tweaks you need to make after that are super easy too. Just drag and drop any
13:43pictures or videos you want, where you want them, or just type what you want to say, or have the AI
13:48help you hear too if writing isn't your thing either. And if you still want that human touch, Hostinger is
13:54always available with 24-7 support if you ever run into any issues. But when you're done building,
13:58in just a few clicks, your website is live. It's all incredibly affordable too, with a domain and
14:04business email included for free. So to take your big idea online today, visit hostinger.com
14:10slash ve, or scan this QR code right here. And when you sign up, remember to use code ve at checkout to
14:17get 10% off your plan. I want to thank Hostinger for sponsoring this part of the video, and now back to
14:22protein folding. As the AlphaFold2 team searched for better algorithms, they turned to the transformer,
14:29that's the T in chat GPT, and it relies on a concept called attention. In the sentence, the animal didn't
14:36cross the street because it was too tired. Attention recognizes that it refers to animal and not street,
14:43based on the word tired. Attention adds context to any kind of sequential information by breaking it down
14:50into chunks, converting these into numerical representations or embeddings, and making
14:55connections between them. In this case, the word it and animal. 3Blue1Brown has a great series of videos
15:02specifically about transformers and attention. Large language models use attention to predict the
15:08most appropriate word to add to a sentence, but AlphaFold also has sequential information. Not sentences,
15:14but amino acid sequences. And to analyze them, the AlphaFold team built their own version of the
15:20transformer, called an Evoformer. The Evoformer contained two towers, evolutionary information in
15:28the biology tower, and pair representations in the geometry tower. Gone was AlphaFold1's deep neural
15:35network that started with one tower and predicted the other. Instead, AlphaFold2's Evoformer builds each
15:41tower separately. It starts with some initial guesses, evolutionary tables taken from known
15:46data sets as before, and the pair representations based on similar known proteins. And this time,
15:52there's a bridge connecting the two towers that conveys newly found biological and geometry clues
15:58back and forth. In the biology tower, attention applied on a column identifies amino acid sequences
16:04that have been conserved, while along a row it finds amino acid mutations that have occurred together.
16:10Whenever the Evoformer finds two closely linked amino acids in the evolutionary table,
16:15it means they are important to structure, and it sends this information to the geometry tower.
16:20Here, attention is applied to help calculate distances between amino acids.
16:25There's also this thing called triangular attention that got introduced, which is essentially about
16:31letting triplets attend to each other. For each triplet of amino acids, AlphaFold applies the triangle
16:36of two sides of the triangle. The sum of two sides must be greater than the third. This constrains
16:42how far apart these three amino acids can be. This information is used to update the pair representation.
16:49And that helps the model produce a self-consistent picture of the structure.
16:53If the geometry tower finds it's impossible for two amino acids to be close to each other,
16:58then it tells the first tower to ignore their relationship in the evolutionary table.
17:02This exchange of information within the Evoformer goes on for 48 times, until information within
17:09both towers is refined. The geometrical features learnt by this network are passed on to AlphaFold2's
17:15second main innovation, the structure module.
17:18For each amino acid, we pick three special atoms in the amino acid and say that those define a frame. And
17:24what the network does is it imagines that all the amino acids start out at the origin, and it has to
17:29predict the appropriate translation and rotation to move these frames to where they sit in the real
17:34structure. So that's essentially what the structure module does.
17:36But the thing that sets the structure module apart is what it doesn't do.
17:41Previously, people might have imagined that you would like to encode the fact that this is a chain,
17:46you know, and that, you know, certain residues should sit next to each other.
17:50We don't really explicitly tell AlphaFold that. It's more like we give it a bag of amino acids,
17:56and it's allowed to position each of them separately. And some people have thought that that helps it
18:02to not get stuck in terms of where things should be placed. It doesn't have to always be thinking
18:06about the constraint of these things forming a chain. That's something that emerges naturally later.
18:11That's why live AlphaFold folding videos can show it doing some weirdly non-physical stuff.
18:20The structure module outputs a 3D protein, but it still isn't ready. It's recycled at least three
18:26more times through the Evoformer to gain a deeper understanding of the protein. Only then the final
18:32prediction is made. In December 2020, DeepMind returned to a virtual CASP with AlphaFold 2.
18:41And this time, they did it. I'm going to read an email from John Malt.
18:46Your group has performed amazingly well in CASP 14, both relative to other groups and an absolute model
18:53accuracy. Congratulations on this work. For many proteins, AlphaFold 2 predictions were virtually
19:00indistinguishable from the actual structures. And they finally beat the gold standard score of 90.
19:10For me, having worked on this problem so long, after many, many stops and starts,
19:16suddenly this is a solution. We solved the problem. This gives you such excitement about the way science
19:22works. Over six decades, all of the scientists working around the world on proteins painstakingly
19:28found about 150,000 protein structures. Then, in one fell swoop, AlphaFold came in and unveiled over
19:37200 million of them, nearly all proteins known to exist in nature. In just a few months, AlphaFold
19:45advanced the work of research labs worldwide by several decades. It has directly helped us develop
19:53a vaccine for malaria. It's made possible the breaking down of antibiotic resistance enzymes, which
19:59make many life-saving drugs effective again. It's even helped us understand how protein mutations lead to
20:04various diseases, from schizophrenia to cancer. And biologists studying little-known and endangered
20:10species suddenly had access to proteins and their life mechanism. The AlphaFold 2 paper has been cited over
20:1730,000 times. It has truly made a step-function leap in our understanding of life. John Jumper and Demis
20:25Asabas were awarded one half of the 2024 Nobel Prize in Chemistry for this breakthrough. The other half
20:31went to David Baker, but not for predicting structures using Rosetta. Instead, it was for designing completely new
20:38proteins from scratch. It was really hard to make brand new proteins that would do things. So that's kind of the
20:43problem that we solved. To do so, he uses the same kind of generative AI that makes art in programs like
20:50DALI. You can say draw a picture of a kangaroo riding on a rabbit or something, and it will do that. And so
20:56it's exactly what we did with proteins. His technique, called RF diffusion, is trained by adding random
21:02noise to a known protein structure. And then the AI has to remove this noise. Once trained in this way, the AI can
21:09be asked to produce proteins for various functions. It's given a random noise input, and the AI figures
21:16out a brand new protein that does what you asked it to do. This work has huge implications. I mean, imagine
21:23you got bitten by a venomous snake. If you're lucky, you'll have access to antivenom prepared by milking
21:29venom from the exact kind of snake, which is then injected into live animals. And the antibodies from that animal
21:36are extracted and refined and then given to you as an antivenom. The trouble is, often people have
21:43allergic reactions to these antibodies from other organisms. But your odds of survival can be a lot
21:48better with the latest synthetic proteins designed in Baker's lab. They've created human compatible
21:53antibodies that can neutralize lethal snake venom. This antivenom could be manufactured in large
21:59quantities and easily transported to the places where it's needed. With these tiny molecular machines,
22:05the possibilities are endless. What are the applications you're most excited about?
22:10I think vaccines are going to be really powerful. We have a number of proteins that are in human
22:14clinical trials for cancer, and we're working on autoimmune disease now. We're really excited about
22:19problems like capturing greenhouse gases. So we're designing enzymes that can fix methane,
22:25break down plastic. What makes this approach so effective is how fast they can create and iterate the
22:31proteins. It's really quite miraculous for anyone who's a conventional old school biochemist or protein
22:37scientist. We can now have designs on the computer, get the amino acid sequence of the designed proteins,
22:43and then in just a couple days, we can get the get the protein out. Yeah, we've given a name to this,
22:49which is cowboy biochemistry, because we just like we just got to kind of go for it as fast as you can. And it
22:55turns out to work pretty well. What AI has done for proteins is just a hint of what it can do in
23:01other fields and on larger scales. In material science, for example, DeepMind's GNOME program
23:08has found 2.2 million new crystals, including over 400,000 stable materials that could power future
23:15technologies from superconductors to batteries. AI is creating transformative leaps in science by
23:22helping to solve some of the fundamental problems that have blocked human progress. If you think of
23:26the whole tree of knowledge, you know, there are certain problems where, you know, if they're root
23:30node problems, if you unlock them, if you discover a solution to them, it would unlock a whole new
23:35branch or avenue of discovery. And with this, AI is pushing forward the boundaries of human knowledge
23:42at a rate never seen before. You know, speedups of 2x are nice. They're great. We love them. Speedups of
23:50100,000 times. Change what you do. You do fundamentally different stuff. And you start to rebuild
23:57your science around the things that got easy. And that's what I'm excited about. These discoveries
24:04represent real step function changes in science. Even if AI doesn't advance beyond where it is today,
24:10we will be reaping the benefits of these breakthroughs for decades. And assuming AI does continue to develop,
24:17well, it will open up opportunities that were previously thought impossible. Whether that's
24:22curing all diseases, creating novel materials, or restoring the environment to a pristine state.
24:29This sounds like an amazing future, as long as the AI doesn't take over and destroy us all first.
24:47But if it's not possible, you'll be thinking about it.
24:51If you've heard of problems, where you think about it, what you think about it, what you see
24:53are most likely to use at real estate. How complicated?
24:55It will be so good.
24:57The next thing you have to offer, as long as the AI doesn't take over,
24:58how much is it can affect the future.
24:59It also takes over and over all the time, if you want to expand your mind,
25:02it's not to end its life.
25:04I think it's a nice future.
25:07It's a nice future.
25:08And if you want to make a picture of these people,
25:11it may be so many months to come out,
25:12you can see that.
25:13And if you want to make a picture of these people that are just
Được khuyến cáo
1:22
|
Sắp Tới
2:29
3:58
1:23
22:02
0:28
13:14
8:01
4:03
4:00
7:47
Hãy là người đầu tiên nhận xét