In diesem Boston Dynamics Tech Talk diskutieren Zack Jackowski und Alberto Rodriguez die Vision hinter humanoiden Robotern wie Atlas. Sie erklären, wie Forschung und Entwicklung den Weg zu praxisnahen Anwendungen in der Fertigung ebnen.
😇 Dein Abo hilft uns: https://tublo.eu/abonnieren
✅ Source: Boston Dynamics
➡️ Mehr Infos: https://www.tuningblog.eu/tipps_tuev-dekra-u-co/atlas-die-zukunft-der-robotik-tech-talk-762787/
Was braucht es, damit humanoide Roboter wie Atlas eines Tages produktiv in der Industrie arbeiten? Im Tech Talk geben Zack Jackowski (VP Atlas) und Alberto Rodriguez (Director Robot Behavior) detaillierte Einblicke in die Strategie und Technologie von Boston Dynamics.
Es geht um die Entwicklung von Bewegungsintelligenz, Kontrolle, Mobilität und KI, um humanoide Roboter vom Forschungsprojekt zum einsatzfähigen Produkt zu machen. Atlas dient dabei als Plattform für das Erkunden des vollen Potenzials menschlicher Bewegungsformen – mit dem Ziel, Roboter alltagstauglich und wirtschaftlich nutzbar zu machen.
#BostonDynamics #TechTalk #Atlas
#HumanoidRobot #Robotik #AI #Fertigung #Innovation
#tuningblog - das Magazin für Auto-Tuning und Mobilität!
😇 Dein Abo hilft uns: https://tublo.eu/abonnieren
✅ Source: Boston Dynamics
➡️ Mehr Infos: https://www.tuningblog.eu/tipps_tuev-dekra-u-co/atlas-die-zukunft-der-robotik-tech-talk-762787/
Was braucht es, damit humanoide Roboter wie Atlas eines Tages produktiv in der Industrie arbeiten? Im Tech Talk geben Zack Jackowski (VP Atlas) und Alberto Rodriguez (Director Robot Behavior) detaillierte Einblicke in die Strategie und Technologie von Boston Dynamics.
Es geht um die Entwicklung von Bewegungsintelligenz, Kontrolle, Mobilität und KI, um humanoide Roboter vom Forschungsprojekt zum einsatzfähigen Produkt zu machen. Atlas dient dabei als Plattform für das Erkunden des vollen Potenzials menschlicher Bewegungsformen – mit dem Ziel, Roboter alltagstauglich und wirtschaftlich nutzbar zu machen.
#BostonDynamics #TechTalk #Atlas
#HumanoidRobot #Robotik #AI #Fertigung #Innovation
#tuningblog - das Magazin für Auto-Tuning und Mobilität!
Kategorie
🚗
MotorTranskript
00:00Welcome, I'm Zach Jakowski, and this is Alberto Rodriguez, and we're here in the Boston Dynamics
00:08studio to talk about the humanoid mission in manufacturing for Atlas and how you build a
00:13humanoid brain. I'm Zach, I'm the lead for the humanoids or Atlas at Boston Dynamics.
00:20I've been at the company for about 15 years. Prior to leading Atlas, I led Spot. You want
00:26to say a little bit about yourself? Yeah. Hi, Zach. My name is Alberto Rodriguez,
00:31and I joined Boston Dynamics about four years ago. Before that, I was faculty at MIT about the time
00:39where I got very excited about the prospects of the humanoid journey. So I decided to step down
00:45and join the mission at Boston Dynamics. I started working, focusing on manipulation,
00:51and now I direct robot behavior in Atlas, where jointly with my colleague, Scott Kindersma,
01:02we lead the AI strategy and oversee the implementation of that strategy for Atlas.
01:07Cool. I got to ask, in the four years you've been at Boston Dynamics, what has surprised you most
01:14about the transition from being a professor at MIT to going into startup life?
01:21Yeah. Maybe one of the things that surprised me the most, and I don't know if it should have,
01:27but I actually have more day-to-day technical conversations now at BD than I used to have at
01:33MIT. Not because there's no technical people at MIT, which is many, but I just have more time for it.
01:40I used to be very busy with teaching and with surveys and just doing random stuff. And now that's my
01:47mission, right? So I really enjoy that. That pretty much mirrors my experience coming out of grad school,
01:55also at MIT and coming over to Boston Dynamics 11 years earlier, was when I was in academia,
02:05I, well, I was in academia because I loved building robots. Um, and everything that I had to do that
02:13wasn't involved in building robots, I kind of dreaded, uh, going to class, reading papers,
02:20writing papers. Um, like I did it all so I could just be wrenching on robots and writing code on robots.
02:26And then going over into industry, you're like, wait, I get to do this all day, every day.
02:33It's pretty cool. Yep. So our vision for humanoids essentially is to build the world's first
02:43commercially successful humanoid robot, essentially solve manual labor in the process, kind of a big
02:50goal. We have a structured way of getting there, you know, strategy wise, we really believe in
02:56general purpose hardware. So we're, we're building a general purpose robot, a humanoid general purpose
03:02hands. And that robot is going to be able to do just about any task in the world, but we're going
03:08to start easy. We're going to start in manufacturing, which is a pretty special environment, both because
03:15it's a great starting place. And it's also a crazy interesting. We're also going to build a robot brain.
03:22You can't really build a humanoid robot hardware generalist without having a humanoid brain
03:27that goes along with it. And that's going to be the primary topic of conversation today is how
03:32exactly we're going to build that brain.
03:34You mentioned a couple of times general or generalist.
03:38Yep.
03:39Why general?
03:40So I've got this video of Hyundai's plant at HMG Metal Plant America up. How about you tell us a
03:48little bit about what you see in that plant, and then we'll get into why we need a generalist for it.
03:53That's one of the many car manufacturing plants we've visited. And HMGMA is one of the newest ones. So
03:59it has the highest ratio of automation. But despite that, once you visit it, you realize that there's
04:05still many tasks that are still, for the most part, done manually. And the first thing that comes to
04:10mind when you visit that plant is that those tasks are, on one hand, very hard, very complex. There's
04:16tons of variability that comes with adapting with the large variety of objects that go into assembling a
04:22car, tons of very dexterous tasks, like using tools, assemblies, handling small parts. But if you talk
04:30to manufacturing engineers, they will tell you that the reason why they're not automated is not
04:35because of how complex they are. They have great solutions, lots of technologies to automate it.
04:39It's just, it takes too much space. It takes too much engineering, too much cost to automate each one
04:46of those. There's just too many, there's thousands. So a special purpose solution for each one of them
04:51is just not economically viable.
04:53So there's like the conventional manufacturing automation world, where if you want to say,
04:59bolt a wheel onto a car, you would commission a machine that bolts a wheel onto that car. And that
05:05machine probably has a general purpose robot arm in it. But it's still like a thing where you're going
05:11to write a specification. I want my wheel bolter runner machine and a bunch of manufacturing and
05:17automation engineers are going to design that machine for you, make a special end effector for
05:22that arm and install it on the line. There's going to be a bolt vibratory bolt feeder and all the,
05:29all the stuff.
05:30You're going to need tons of, uh, specialized programming to make the machines work.
05:36Yep. Um, and so like when you look at a car plant, we're looking at a bunch of tasks from,
05:43uh, real Hyundai plants here, there's probably tens of thousands of different tasks that need to be
05:49done in that plant. And the thing that you really rapidly get to is if you're going to design automation
05:57for each of those things, uh, you'd be, uh, spending way more than it's worth to automate those tasks.
06:05And you'd probably be well into the next century, uh, like, uh, an average estimate of like how long
06:12it takes for any, any one of those, um, integration efforts, um, it's in the order of about a year
06:18and north of a million dollars for any step that you want to automate. That's sort of the state of
06:24the art today. Let's talk about how the tasks that we see in a normal car plant breakdown. Um,
06:30you kind of have, well, really, really simply, I'd say you have material handling tasks and then you
06:38have assembly tasks, right? In those material handling tasks, like the, the plant words,
06:45they would call those things like sequencing, kidding, racking, moving material from, you know,
06:50the warehouse to line side. And then assembly is like, we were talking about attach the wheel to
06:56the car and put the bolts in and all that stuff. What are the high points of those tasks to you?
07:02I think that, I mean, while those are clearly different, the level of dexterity that is necessary
07:07to execute them is different. One is mostly about picking and placing and inserting parts in containers
07:13while the other one has to do more with very dexterous part handling. The biggest source of
07:19complexity is the large variability, which is shared by both, right? Like you were saying a car has like
07:26tens of thousands of parts, uh, all of them different. And in the same assembly line, you
07:32might assemble between five and 10 different, uh, types of cars, each one with its own trim and each
07:39one with like 10 to 20 different colors. And you, you turn over the model year every year. So you change
07:45a bunch of trim parts, interior technology and stuff. And then you do a major model line refresh,
07:50say every five years.
07:51So it's that variability that really, um, puts a ceiling to, um, like traditional automation,
07:59right? How, what the, the core question is how do we make something that is more flexible,
08:04that provides, uh, at the degree of generality so that you can automate those tasks.
08:10Yeah. So how do we make something more general?
08:13That's a trillion dollar question. Let me maybe backtrack one second before answering how we make
08:22something more general, uh, and describe how we make something that is more specialized. It's not
08:29that long ago, right? If you backtrack like four to five years, this is how robots, even humanoids
08:34were programmed, right? So you basically have three to four core workflows that are necessary. Like you
08:42model your assets, you model the parts that you have to manipulate, you model the environment that
08:48you're going to operate in, you build a perception system that are capable of recognizing and perceiving
08:55those parts and those environments. Then you build skills that are capable of handling those parts,
09:01manipulation skills, like let's say in a logistics application, it would be picking those parts or
09:07extracting or inserting those parts from the containers where they're contained. And then you build on top
09:12of that, an agent that is capable of going sequentially between the steps of the task.
09:18So, yeah. And when you say build the behavior that you made it sound pretty easy, but like,
09:24well, like what, what is the actual work in doing something like, you know, getting a robot to say,
09:32you know, pick, we, we showed a video of moving engine covers around, like what is involved in getting
09:37a robot to like reach in there and grab that engine cover?
09:40Yeah. In practice, what that means is you have engineers that have the right intuition
09:47to program how the robot should approach those parts and what's the right way to, let's say,
09:55position your body with respect to the environment, to be able to reach far enough
10:00to extract it, to hold the weight. So there's a lot that there's a, there's a cycle that has to do
10:06with using human intuition in the form of expert robot programming, testing, finding failures,
10:15iterating, and getting that system to have high performance. Cool.
10:21Now there's obviously like an issue with that model, which is a scalability, right? Like
10:28that's a system that if you have to deploy three times, you can do it. If you have to deploy thousands
10:33of times, uh, doesn't scale fast enough. And it's a system that also doesn't improve over time,
10:39doesn't accumulate experience, uh, knowledge, uh, to make the next time easier.
10:44So that obviously isn't going to work. Uh, so we've built a robot hardware generalist
10:52and the world you, you described there has a large team of highly credentialed robot engineers,
11:00program it to do each task, just, you know, not going to work at its face. So we're, we've decided
11:06we want to build this, uh, robot generalist, or we'll call it a robot brain because that's essentially
11:14what it is. How do you build a robot brain? I think that the core part of it, it's still
11:19pretty similar to that cycle of refinement. The core difference is that you don't want to program
11:25the robot explicitly. Instead, you want to teach the robot how to do a certain task. And as the
11:32robot is doing it, you want to tell it when it has failed, how to correct for mistakes,
11:37how to improve its performance over time. So there's a core workflow there. Um, something that
11:43we refer to as a post training, where basically you demonstrate the robot, how to do something.
11:49And, um, a certain number of times could be, it could take in the order of a day,
11:54for example, of demonstration of an expert demonstrator, uh, to tell the robot how to do
12:00that task and how to, um, correct for certain errors that might happen during the execution
12:05and accumulate all that knowledge. Now, not in the form of a program, an algorithm, but rather in the
12:11form of, uh, knowledge ingrained in a neural network. Okay. So that post training step that you
12:17just described there, um, you said, you know, uh, some number of demonstrations, like let,
12:24let's say generously takes a day to learn how to, you know, grab that engine cover and move it to the
12:31other slot. That's still a awful lot of days, uh, to do these, you know, 10, tens of thousands of
12:40different tasks. So like, what, what are we going to do about that? So, um, the most, um, interesting
12:47part of this new strategy, it's what we call pre-training, which basically accounts for a, um,
12:55a system that makes a good initial guess as to what should be that policy. So that once you go to
13:04factory and do the first deployment, the behavior is already somewhat good. And it takes a small
13:10amount of time to, to improve it. Now, the big question is how do you build that system that is
13:15capable of producing a good initial guess that has accumulated enough common sense and general
13:21understanding of what it means to move and manipulate objects and do tasks like manufacturing
13:27tasks. So that is capable of that. And so what you're describing there isn't like some unsupported
13:34fantasy like that. That's essentially how modern LLMs work. So like this is how something like chat GPT
13:42is built. Yeah, exactly the same way. Uh, there's core differences because of the complexities around
13:48actually generating and finding the right data that is capable of providing understanding of visual,
13:54spatial relationships, dextrous manipulation, agile, whole body coordination, and motion, which is not
14:01something you find in text or in audio or even in video that is specialized for robots. But the
14:07techniques, the technology is essentially the same. So something like chat GPT is built on kind of the
14:14sum of all human knowledge on the internet. Um, we don't have the sum of all human, uh, physical
14:22behavior available somewhere. Uh, so how do you, how do you get that? So, um, there's, there's two parts
14:29of it. Um, the first one is, um, we have, um, what I call data stream lanes that generate good data that
14:38is useful to capture that general understanding. Teleoperation is one of them. So a scaling up
14:44teleoperation so that you don't solve one task, but you solve hundreds of them, thousands of them,
14:50not necessarily during deployment in factories, which is more costly and more difficult, but in a
14:56dedicated space where you do that, that's one bet. And to some degree, that's what most companies
15:00are doing today. But then there's other, uh, very interesting directions, which I would qualify
15:06closer to research efforts today that provide ways to scale up data generation much faster.
15:14One of them is reinforcement learning and simulation, uh, where you don't learn necessarily by providing
15:21optimal demonstrations, but you let an agent to explore by itself trial and error and optimizing
15:27behaviors. Since we can scale simulation to a much larger degree that can provide the way to
15:33increase data generation. And the third one is directly by observing human behavior. If you design
15:39a robot that somewhat mimics, uh, or somewhat gets close to how a human operates and its form factor,
15:47uh, in its hands, you can, um, learn, um, some degree of common sense of physical, uh, behavior from
15:57observing humans. And then there's a second way by which, um, we can accumulate, uh, knowledge at scale
16:06in this pre-training phase, which is what we normally refer as the fly, the data flywheel, right? So once
16:12you've done this for enough tasks and you get to deployments, uh, the data that gets generated by
16:19those deployments is very rich is, uh, data that actually represents the environment and the tasks
16:25where you want your robots to operate. So that generates, um, um, information data experience that
16:32goes back into pre-training, uh, in a way, why it's important to get in the short term to be able to do
16:39deployments, even if it's within and without robots being yet the robot brain yet being perfect.
16:47So that's the process that we want to execute. And those are the kind of data inputs we need.
16:53There's the matter of like, what kinds of ML models are we training here? Um, there's like a lot of
17:00different things floating around. Yeah. There's LLMs, there's VLAs, there's VLMs. Uh, do you want to walk
17:07us through a little bit of a map of kind of how robotics was previously in the world we're headed to?
17:13Yeah. Um, so there's like a spectrum, right? So classical robotics, like four years ago,
17:23we build specialized models for, as I described earlier for perception, specialized models for
17:30high level coordination, specialized models for manipulation, even specialized models for mobility.
17:38Now we've agreed that maintaining and, uh, evolving all of those models with all that complexity is
17:45not the scalable process. On the other end, you have a fully end-to-end solution where robots ingest
17:53pixels, uh, or raw information from sensors and at the output you produce torques or currents that go to
18:01to the actuators. We know how to train, uh, these more abstract systems through experience. The complexity
18:09is the scale of data that is necessary, right? And like LLMs today are a perfect example. If you have
18:15enough data, um, you can rely on, uh, raw representations, like not just LLMs, but like video model prediction,
18:25uh, systems, right? You can rely on raw representations to train really highly performance systems.
18:33What's the degree in between where an actual practical solution is going to land on? Uh,
18:39for example, for manufacturing, um, it's still a question, right? Like you might want to benefit from
18:45the fact that manufacturing provides a little bit of structuring. So, you know, which objects are going
18:50to be ahead of time, you know, um, the, uh, furniture and fixtures you're going to encounter with.
18:57So you might want to put that into pre-training. Yeah. So in that, the spectrum from fully model-based
19:04to fully end-to-end train, uh, there's different progressions of abstracting more and more layers
19:11of like perception, um, high level cognition, um, dexterity, uh, and mobility. Got it. So like in
19:20the large behavior model video and the associated paper with it, like where in that progression is
19:26Atlas at that point in time? So that's a great example. That is not a system that ingests pixels
19:33and outputs torques, right? It's a system that ingest pixels and it outputs commands to
19:41more abstract representation of what a robot should do. So it provides commands for what
19:46the end effector should do, what the feet should do, what the torso, what the hands should do. Yes.
19:52Hands, feet and torso should do below that. We have a very powerful whole body controller that consumes
20:00that and realizes that on the robot. Yeah. Right. So, um, that provides a layer of abstraction
20:07that makes learning simpler and reduces the amount of data that we need to generate, but still allows
20:14us to exploit the high agility and strength of a platform of a human platform like, like Atlas.
20:20I mean, that's tricky. Cause like that, that's not just a technology progression thing. Like there,
20:26there's real frequency separation between what Atlas is doing like with its actuators versus like looking
20:36at a scene and deciding what you need to do next. Like, is it necessarily a given that should be one
20:44neural network? No, absolutely. It's not a given that it should be just one single neural network. In fact,
20:50it is pretty common for especially dynamic balancing robots like humanoid or quadrupeds to rely
20:57on what we call system one system to separation, where you have a system at the lowest level,
21:06which takes care of whole body control that the system that needs to run very fast and needs to know
21:11and understands of the extent of the strength and the balance of the robot. It needs to know that you
21:17shouldn't move your hand too fast over in this direction without counterbalancing.
21:21So that's like you're in humans. That'd be like your cerebellum in your nervous system.
21:26So it's unclear that you might want to force everything else in your system that is in charge
21:32of acquiring common sense to also have to learn that at the same time. So there's some value from
21:38separating that. Let's jump in and talk about some of these data sources in a little bit more depth.
21:47So let's talk about teleoperation. Yeah, I would say that teleoperation is today
21:51the most valuable sorts of data for early deployments or for early behaviors. So if you want your robot to
22:04exert a certain behavior in the next couple of weeks, teleoperation is your friend. It's the most
22:11valuable kind of data because it's fully embodied. So teleoperation means that you
22:17are controlling the robot and the robot is experiencing what it means to do a certain behavior.
22:22So the data that you collect that experience is through the body of the robot. So in a way,
22:29there's like a zero gap between the demonstration and the actual data you're collecting.
22:35Why is that important? That's useful because
22:38because that kind of data is not polluted by artifacts of your interface to the robot.
22:47Okay.
22:47Um, it's like ground truth data, right? Like if it worked, it means that the robot will be able to
22:54do it again by using the same commands, sending the same signals to the actuators.
22:58So that data is valuable. Um, it's probably the hardest to scale because it requires, um, building a
23:07very proficient teleoperation system, right? So our demonstrators at BD, um, um, they, they learned
23:16over a few weeks, they become experienced in understanding the extents of what Atlas is capable of doing
23:23by spending hours, um, moving robots, um, through these teleoperation system, which basically
23:29is in the form of a BR system where they see through the eyes of the robot and, um, they have
23:36trackers in their body so that the robot reacts life in real time to the same motions that they're doing.
23:43Um, once they become experienced in doing that, they are free, so to speak, to command the robot and
23:50get it to do anything that that teleoperation interface allows them to do with its observability.
23:57There's a, the scale thing going on here is important. So like, we're talking about trying
24:02to build something like an LLM for physical movement of a robot. Um, again, those are trained on
24:11close to the entirety of human knowledge. Um, and we're talking about collecting, you know, thousands,
24:19tens of thousands, maybe, you know, and generously hundreds of thousands of teleoperation, uh,
24:25you know, motions and hours of data. There's a immense scale gap between what we want out of
24:32these models and that kind of data source, right? Yeah, that's correct. And, um, it's, um,
24:38it's not the secret, right, that, uh, that's probably not going to be sufficient. Um, we think that it's
24:44still very valuable because of that sort of zero gap transfer between the data we're collecting and
24:51the behavior, the motions that the road can execute has tremendous value when it comes to demonstrate,
24:58um, embodied behavior. So if you want the robot to crouch, if you want the robot to reach, if you want
25:06the robot to do a whole body coordinated motion, it's a great way to generate high quality data.
25:13Yeah. Yeah. And it's definitely gonna, how we're going to pilot the giant atlas to fight Godzilla.
25:18Yes, exactly. Um, will you be inside? No, no, no. Someone much better at it than me will be.
25:25All right. Let's talk about that second kind of data then. So you mentioned reinforcement learning
25:30or trial and error for manipulation. What's going on there? How does it work?
25:35So reinforcement learning in, in the scale of high quality data, um, it would be below
25:43teleoperation because now you have these seem to real domain gap, the data you generate in simulation.
25:49Um, it doesn't translate perfectly to the robot. Um, we're pretty good that Boston dynamics of
25:56understanding, um, that seemed to real gap, but there is still a gap. However, simulation gives you the
26:03option to scale massively the data generation workflow, um, which is a great benefit, but also allows you
26:12to control the scenarios where you're learning, right? You can expose the robot to small variations of
26:18anything you want in a controlled manner. Oh, you want to grasp this object? Well, do the same thing,
26:23but also if the object is one centimeter to the right or one to the left or in thousands of variations.
26:28Um, and, um, in that context and because of the massive scale, we have access to algorithms that
26:37are very hard to replicate with real ground and real data or with real hardware, which are algorithms
26:44that are based on trial and error and optimization to fine tune behaviors that become better and better
26:51over time. Um, that can become even better than demonstrations, um, just because they're trying
26:58to optimize maximize some objective. Yeah. So I guess that that's an important thing that you,
27:04we didn't quite hit on earlier is with teleoperation, you always have this interface that makes it a little
27:13hard to, to do really precise, really fast, really dexterous things. Um, you're saying that that's really
27:22not a thing for reinforcement learning. Yeah. Um, actually that's a peep of me, uh, that, uh,
27:30you have with teleoperation, which is that we use teleoperation to provide high quality demonstrations
27:37that are meant to build these physical understanding of what it means to move in the world. However,
27:45um, demonstrators, unless they're very, very skilled, um, because of the complexities of the teleoperation
27:53interface, they end up demonstrating behaviors that are somewhat so par, they are like kind of slow or
28:03too sequential. Instead of using dynamic behaviors, they use quasi static, slower behaviors. And it
28:11requires conscious design of the teleoperation interface, but also conscious training of
28:17demonstrators to avoid that simulation or reinforcement learning doesn't have the limitation.
28:22You can sort of optimize the performance of, um, a, any given task to its maximum. Um, you can force
28:30it to be fast and dynamic and you can force it to be robust by exposing it to, um, many variations.
28:37And we've had like, um, we were very happy with the success we're having in that direction,
28:43in particular for dexterous manipulation. Yeah. So you have some pretty cool examples here of
28:49three tasks that you're, uh, that are reinforcement learning folks just trained up recently. You want
28:55to talk about those three? Yeah. So we have here examples that are sort of like the
28:59bread and butter of what it means to do work in the assembly line. So insertion tasks that are
29:06haptically driven, like, uh, inserting, um, a steering wheel in a socket or inserting a plug
29:14into a socket or inserting a small part into a jig. Any of those tasks, um, is very, very hard to
29:20operate either because it requires small, very controlled finger motions, or because it requires
29:29sensing haptic signals that are not directly visible. And so, you know, like looking at these,
29:34these look like specialist policies. Like you trained a thing that specifically only knows how
29:40to pick up a steering wheel and put it on the steering column. How do these end up part of this
29:47robot brain we're trying to train? Yeah, that's a fun question. So that's correct. Each one of these
29:52is its own individual policy. We call these specialists. They're really good at doing one
29:57thing with its domain of generality or robustness, but is that one thing? And we're very excited about
30:03the fact that we can scale that process pretty seamlessly of creating thousands, tens of thousands,
30:10hundreds of thousands of specialists. In a way these become your teleoperated examples, right? If you do
30:19rollouts of these policies, this generates a kind of data that then you can condense into a generalist,
30:25like you would do with behavior cloning, uh, teleoperated data. Cool. And then it not just
30:32manipulation, but also the locomotion end of Atlas too. Yeah, yeah, absolutely. Um, right. So
30:38So locomotion, especially agile locomotion, if you want to go fast, if you want to do natural walking
30:46behaviors, if you want to do cartwheels, suffer from the same issue. That's very unrealistic to
30:52expect. We're going to get one of our demonstrators to get Atlas to do a cartwheel. Yeah. That's something
30:57that you really want to optimize through because it's a dynamic behavior that has very contrived regions
31:04of attraction of where you need to get the robot. You need the robot to accelerate to a certain
31:08velocity so that he can jump and do a cartwheel. It's not something you want to do other than in
31:14simulation. And then this third kind of data, human demonstration or, you know, human observation
31:22data. What does that mean? That's, that's the biggest bet I would say. Um, it's the more, uh,
31:28long horizon bet. Um, but also the one that, uh, has highest potential of scope. So, um, it means
31:37anything from like all the way to learning directly from YouTube videos, right? So can you get robots
31:44to learn what it means to interact with the physical world, um, with dexterity and common sense by
31:51looking at videos of people repairing bicycles in their backyard, um, or looking at videos of people
32:00constructing things? Um, so that's the extreme of it, but there's a spectrum of ways in which you can
32:06use more, um, open-ended demonstrations from people that, uh, doesn't have to be all the way to
32:15tell operation. For example, something that we're very excited about is ecocentric demonstrations where
32:21you get people to wear sensors that make it easier to capture what the person is doing. It could be
32:28the same sensors in the head that Atlas has in the head. Um, it could be gloves, uh, on the hands that
32:35capture tactile signals and just get them to do by their own life or get them to do their own job and
32:44just learn, um, what it means to do that job, uh, from that data. Cool. And then, I mean,
32:51that does sound like, or rather the reinforcement learning and that one do sound like the kinds of
32:56things that conceivably could get to these, um, scales of data that are being used to train LLMs.
33:03Yes. Um, I agree both of them. And, um, I would say that it's not necessarily either or
33:11each one of the, of those two has their own strengths and weaknesses, right? So clearly simulation,
33:18um, from our experience is a really good choice for, uh, dextrous manipulation, but it might be more
33:26difficult to learn things that involve, um, highly deformable interacting with highly deformable objects,
33:33for example, uh, because they're just naturally difficult to simulate or require a lot of compute.
33:38Direct ecocentric demonstrations with people wearing sensors, um, they're great at capturing just like
33:47the breadth of common sense of what it means to interact with objects and the sequence of things
33:54that you should do to accomplish a certain task, but it will be more difficult to learn directly from
33:59that dextrous manipulation because they're not embodied in actual, the, the body of the real robot
34:06with its own sensors. Cool. Okay. So we've got a complete plan to build a robot brain. Yeah. We should,
34:13now we just need to execute. Um, so let's, let's, yeah. So like, let's go back and recap. Like, so
34:21what does it mean to build a generalist and like, what are the essential, uh, characteristics of that
34:28thing? Uh, or what are the characteristics that thing will have to have for us to be successful in
34:34putting humanoids to work? I would say that there's like four core tenants, right? So general purpose
34:41hardware and you've talked about it, um, ease of retasking, very important, um, to avoid the issues
34:49with the cost of integration of every, uh, every single task, ease of retasking and getting to a
34:56point where it's not months, but rather it's days or hours, what it takes to get the robot to do a
35:02certain job, natural interfaces for interacting with a robot rather than requiring expert programming.
35:09Um, as, and our customers will want a more natural way to tell robots what to do either with natural
35:17language or with direct demonstrations. And the fourth one safety around people.
35:23Did you get reliability in there?
35:24Uh, yes. Um, I did not get it, but we got five. That's okay. Let's say, let's say five.
35:33Yeah. So we're building this general purpose robot. We're building a general purpose
35:38robot brain in conjunction with that. One of the really cool things about working with the
35:45Hyundai group is we're not just building this robot in a vacuum. Like we're also redesigning
35:52car plants and redesigning general manufacturing facilities alongside understanding what Atlas's
36:00capabilities will be. Can you tell me a little bit about what's exciting about that to you?
36:06Let's say one thing that, um, okay, maybe two things. The first one is that I think working with Hyundai
36:15has made the mission very clear. Yeah. Right. Like our mission is to transform manufacturing.
36:21And, um, once you know that that's clear, it's, um, it relieves a lot of pressure and stress
36:29of like actually thinking and dreaming about any other application domain where you could also be
36:36working, which there are many and which we might go also after, but what's non-negotiable is that we
36:44need to solve and transform manufacturing. So that has been, um, I think exciting to me,
36:49the fact that we have a clear mission, which is very hard, but it is clear.
36:52Yeah. I mean, that was honestly one of the hardest parts of the early days of SPOT
36:58was, you know, we had this amazing robot and we had so many application ideas for, you know,
37:06what it could be really, really useful for. Like today we look at SPOT and we see it being used
37:12all over in these industrial inspection applications. And it seems obvious now, but like in the early
37:20days we were spread really thin between so many different applications, testing and trying to do
37:28demos of each of them. It's really, really refreshing with Atlas, knowing that, you know,
37:36you know, if we can just solve this manufacturing problem that we can be successful and then we
37:43can go into the rest of the stuff, you know, we can, we're not going to stop that manufacturing. We're
37:49going to do curb the door delivery. We're going to put robots in your home, clean your tables, make
37:54your bed, but we know which one we have to do first. Exactly. Yes. Yeah. I mean, um, at the much
38:00smaller scale, a smaller scale personally also, um, like before joining BD, I was faculty at MIT and as
38:08faculty, um, um, the journey is always about finding the right motivation to solve the problem that you
38:17think you should be solving, right? Like you always have to convince yourself, actually, am I solving the
38:21right problem or am I confused? Should I be solving a different one? Um, so the change to BD has been
38:27refreshing also because of that. Like I don't have to ask that question myself anymore. I know what
38:32problem I need to solve. Yeah. I feel the same way. So part of the reason why we're talking about
38:38this today is we need more folks to join us on this mission, building a humanoid robot brain.
38:44Can you tell me a little bit about what it's like day to day on our machine learning team?
38:48Yeah, absolutely. Um, like that's the largest area in Atlas where we're hiring people with experience
38:55on building large behavior models, um, and people with experience on reinforcement learning.
39:02Do we, the experience in the team, I would say is, um, it's hybrid, right? So there's some people in
39:08the team that are very, uh, excited about research and they spend their time, you know, uh, solving the
39:16problems that, uh, we know we will have three years from now, uh, or five years from now. Um, but there's
39:22also people in the team that are very, uh, enthusiastic about, uh, how can we get as fast as possible to
39:28deploy these robots for real? And that, um, it's also people that need to know, need to be very
39:34experienced on, uh, these modern techniques like behavior cloning, like efficient, um, data collection
39:41techniques, like, uh, efficient co-training techniques, uh, to fuse data between simulation and
39:48and teleoperation that also are good roboticists. Right. Um, so the experience, I would say that
39:55it's, um, on both sides and we're always looking to hire good people. What do you think your team
40:01looks forward to and enjoys the most day to day? I think the team is very excited about, um,
40:08I don't know if the team, but myself, I am very excited about the prospects of really enabling
40:14dextrose manipulation. I think that that is the ultimate frontier of what it means to do general
40:23purpose work. Um, and, um, BD is a place where you get to experience both the design and fabrication of,
40:36high end, um, dextrose grippers, and you get to experience the development of the techniques
40:44that are going to get those grippers to do very cool things like using tools, like handling objects.
40:52Um, and to me, experiencing that and seeing that sort of rapidly progressing, it's fascinating.
41:00The, uh, I've got to answer this myself. I have so much fun. The, you know, the, the really special
41:06thing about robotics, as opposed to any other place that you could be working in machine learning or
41:11really in any of the functional disciplines is that like we, we do the whole thing. Like, so you
41:19mentioned, you know, folks at Boston dynamics design grippers. And like, if you're a machine learning
41:24researcher, you get to talk to the folks that design the fingertips day to day, argue with them. You get
41:31to see your stuff show up in the hands three months later. Um, and it extends all the way through the
41:37other side too. So like we have folks who work on robots day to day, diagnose things. We also have
41:44folks who, you know, sit 15 feet further away from the robots in the lab, who think about how do I
41:50wrangle a thousand GPUs to run this training job faster? Like you, you see every aspect of putting
41:57a robot together all in one room. Yep. I don't think it's, it's, there's very few places in the world
42:03where you can get that. It's very exciting. Well, thanks, Alberto. Thanks for telling us what's
42:09going on on the ML side of Atlas. Um, and we'll, uh, I know we'll have to see how fast, uh, we, we get to
42:16that full generalist. Yes.
Schreibe den ersten Kommentar