Skip to playerSkip to main content
  • 2 days ago
Genesis AI is a full-stack physical AI company, building the universal foundation model for general-purpose robotics. Our breakthrough combines a robotics-native AI brain, a human-scale robotic hand, an invisible data-collection glove, and the world's most accurate simulator. Our latest model, GENE-26.5, achieves human-level manipulation — fully autonomous. We will share the latest release around model, simulation and concrete experiences.  

Category

🤖
Tech
Transcript
00:00Ladies and gentlemen, we're back on the discovery stage.
00:05Our next guest is Genesis AI, a French-American startup with offices in Paris and San Francisco.
00:14They build foundation models for a physical world.
00:19Where ChatGPT learned to talk, Genesis is teaching machines to act,
00:26to pick up things, to move in space, the brain that will power the next generation of robots.
00:34Please welcome Théophile Gervais, co-founder of Genesis AI.
00:44Hello, my friend.
00:49Hello, everyone.
00:53So we started Genesis about a year ago with a simple question.
00:56How do we bring 10 billion robots to the world within the next decade?
01:00And so we already have a lot of robots today, but typically none of you see them
01:03because they are either Roombas or industrial arms doing the same thing all day.
01:08And what's missing to get robots everywhere else is a lot more AI.
01:12And so we started with this vision about a year ago.
01:15And a year into it, we've built every piece of our vision, and that's what I want to walk you
01:21through.
01:25So the first piece is the brain of a robot.
01:27This is the model that takes as input the video that the robot can see
01:31and outputs the actions that the robot needs to take.
01:35And so we train the general-purpose robot brain that can do many types of tasks
01:41using very complex dexterous hands, like, for example, cooking a 20-step meal over five minutes,
01:47making an omelette with eggs as well as tomatoes.
01:50All of this is real-time.
01:54For example, it can also play piano or do very complex lab automation.
01:59Lab pipetting is a very common workflow where you need millimeter-level precision
02:04to insert the tip of ortho-cobel harnessing for industrial use cases.
02:13So all of these are a single model that can do all these types of use cases.
02:23And so the way we get to train a really general-purpose robot brain is
02:27that we need to scale data in the same way we've been able to scale data for language models.
02:32And so for language models, we start from all of the Internet.
02:35But for robotics, we don't have the same equivalent of what is the large-scale data set
02:38of human actions to train robots.
02:41And the issue is today's methods to collect data to train robots are very hacky.
02:46Typically, robots have grippers, while we humans have very complex dexterous hands.
02:50And that makes the data collection devices very complex.
02:53Like, for example, on the left is a device called teleoperation.
02:56So you have a human that moves an exoskeleton, a plastic thing, to move the robots.
03:02But the human cannot feel what the robot feels.
03:04And that makes data collection very slow and clunky.
03:07It would take you half an hour to teach the robot how to make a salad.
03:10And you need to do this hundreds of times.
03:13Like, you need tens of thousands of hours of data collection.
03:15So not scalable.
03:17A bit more scalable approach is remove the robot completely
03:20and have the human wear grippers in their hands.
03:23So have the human adapt to the robot's embodiment, which is grippers.
03:27That's more scalable.
03:28But you can still imagine that if you're a professional technician or in a lab or a cook,
03:32you just can't wear grippers to do your job.
03:34So this is not the right scalable way to train robots.
03:38So we thought, instead of adapting the human to the robot,
03:42what about adapting the robot to the human?
03:44If you have a dexterous hand for the robot that has the same form and function as our human hands,
03:49we have a full 20 degrees of freedom of our fingers,
03:52then you can directly transfer human data of us doing different tasks to the robots.
03:57And so one very natural data collection device is to just wear a glove
04:01that precisely tracks the position of your fingers as well as pressure within the palm
04:06and use the human data of experts doing their job to train the robots.
04:09And so that's how we were able to scale data to train our first version of the brain.
04:19And so even if you're able to truly scale data,
04:22and we were talking about we need to get to tens of millions of hours of data of human manipulation,
04:27you still have a lot of workflows in robotics that are not scalable.
04:31Like the first one, for example, is...
04:36How do you test your models?
04:38Typically, you train your model on your big GPU cluster for many weeks,
04:42and then you put it on your robot,
04:43and you have a human spending the whole day resetting the objects to where they should be,
04:48letting the robot try different tasks,
04:50and then repeat this 500 times in a row during the day
04:53to get a single metric of how well does the robot do at a single task in a single environment.
04:58But we're trying to train general-purpose robot brains,
05:01so they're supposed to do hundreds of tasks as far as hundreds of scenarios,
05:04and there is no way to do this in a way that makes sense operationally.
05:08Also, the robot wears out, so it's not the same from one day to the other.
05:11The lighting changes.
05:12The operator from one day to the other is different,
05:14so you don't get a repeatable metric.
05:16So the way we solve this is through simulation.
05:25So instead of doing evaluation of robots in the real world,
05:28you can do it in virtual worlds.
05:29So we spent a lot of time building a virtual world
05:33that is both the physics of how do objects interact with each other,
05:36whether it's rigid bodies or deformable objects,
05:39as well as rendering,
05:40with people coming from video games to make very realistic pixels.
05:44And so this is our Paris office.
05:46You are in the real world, and now you're in the matrix.
05:48This is a virtual world,
05:50and you can see it's still very realistic compared to reality.
05:56So here you can see the underlying 3D structure of the world
06:00that has been scanned.
06:07And you can even simulate the effects of the camera,
06:11like this lens simulation.
06:13You can do the underlying 3D mesh of the robots.
06:23And this is, for example, a side-by-side comparison
06:25between simulation on the left and the real world on the right.
06:29It's really hard to tell the difference,
06:31and that's what lets you train,
06:33like evaluate your robots in virtual worlds at scale,
06:36hundreds of thousands of different environments,
06:38and then have the results in simulation
06:42match the results in the real world.
06:50So I'll skip ahead the yo-yo.
06:52So the last piece is,
06:54even if you have a really general-purpose robot brain
06:56that can do many types of tasks,
06:57because you train on a lot of human data
07:00using Dexter's hands that lets you
07:02transfer human data to the robot directly,
07:04and you have a simulator to evaluate your robot at scale,
07:07you still need the robot product,
07:08because nobody cares about just the brain.
07:11At the end of the day,
07:11customers just want a robot to solve use cases.
07:14And so that's what we released last week.
07:18So this is Inno.
07:19It's our first robot.
07:21We spent a lot of time thinking about
07:23what the first robot should be,
07:24and it's really important for it to have the upper body be human,
07:28because to have human hands,
07:30human arms, and human hands,
07:32because you learn from human data.
07:33The rest of the body does not need to be human.
07:35You don't need a face,
07:37because you don't have a brain.
07:38You don't necessarily need a humanoid body.
07:40It's also more stable on wheels,
07:42because you don't need to balance your body.
07:46So this is what we ended up with,
07:48as the simplest possible form of a wheeled robot
07:50we could think of.
07:51And the idea is to have it be applicable
07:53to many different types of environments,
07:55whether it's in data centers,
07:57in industrial environments,
07:58in labs,
07:58or eventually in the home.
08:04And so the next step for us,
08:07now that we've built all the building blocks of the tech,
08:09is how do we bring this to market?
08:12And so we're partnering with industrials
08:16and pharma companies or data centers,
08:19or even services,
08:20and if you're interested, please reach out.
08:33We have time,
08:34if we have questions for my friend.
08:37Do you want to answer to your question?
08:39Do you have questions for my friend, Tioffil?
08:41No? Yes?
08:42We understand everything.
08:44It was quite clear.
08:49No?
08:50Okay.
08:51That's a wrap.
08:52Round of applause again for you.
08:54Ah!
09:01What technology do you use to simulate the environment?
09:05So we write,
09:06like you have to write the physics,
09:08we write low-level code for the physics,
09:10like how do objects interact with each other
09:13for rigid bodies, deformables,
09:15and we write rendering.
09:17Just like for video games,
09:18if you go from the 3D state of the world
09:19to how does the light interact with it,
09:21to bounce back to the camera.
09:23And so that's what we do it.
09:25So if you're working in video games
09:26and interested in building physics for robots,
09:29please reach out.
09:30We're hiring in robotics, AI, and simulation.
Comments

Recommended