How Adobe Builds And Trains Its Generative AI Models

Forbes

Dr. Gavin Miller is the Head of Adobe Research, spoke at Imagination in Action's 'Forging the Future of Business with AI' Summit about how Adobe trained its generative AI models.  Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1  Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:  https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript  Stay Connected Forbes newsletters: https://newsletters.editorial.forbes.com Forbes on Facebook: http://fb.com/forbes Forbes Video on Twitter: http://www.twitter.com/forbes Forbes Video on Instagram: http://instagram.com/forbes More From Forbes:  http://forbes.com  Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.

Transcript

00:00 So how do we see GenAI transforming business from an Adobe point of view?

00:05 And in particular, in Adobe research, what are we inventing that will help our customers

00:11 transform their businesses?

00:13 Broadly speaking, you can think of it in two broad areas.

00:17 On the left, there's the creation and editing of media.

00:21 This includes images, video, audio, and so on.

00:24 And on the right, there's the analysis of generation of campaigns and then the analysis

00:30 of response to those campaigns, which are often powered by the digital media that's

00:34 created with our mainstream media editing tools.

00:40 So one of the things that's come up today and always comes up in AI is where does the

00:46 data come from?

00:47 So of course, we exploit many different sources, including our stock photography business that

00:53 then gives us a highly moderated source of images.

00:56 But we're also interested in capturing unusual or unique data sets.

01:00 So on the left is a light stage that we commissioned.

01:04 We previously built our own, but the off-the-shelf one was actually better.

01:08 And this is used for capturing data sets under a variety of lighting conditions so we can

01:13 train models for relighting.

01:16 And then the topic of the next video is the image on the right, which is how do you get

01:22 ground truth matting data for natural subjects?

01:26 So this sounds easy.

01:27 We have blue screen, but blue screen is actually an under-constrained problem.

01:32 And we found that we were training things based on the output of previous algorithms

01:35 that were themselves trained.

01:36 And so we wanted to go back to ground truth and try to do that.

01:41 And the way we did that was to leverage an idea that came out in Seagraph in about 2005

01:47 of using cross polarizers with two cameras.

01:51 So there's a new device from Sony which has little micro polarizing filters in each subpixel.

01:57 And if you do that, then hopefully the video will play.

02:01 One more click.

02:04 You can get really strong ground truth data.

02:07 So you actually use an LCD screen as the background, just displaying white.

02:12 And this camera gives you four channels of polarizing related information.

02:16 If you use it in the way that was previously published, you end up with very noisy images.

02:20 But we have a new breakthrough idea which is about to be presented at CVPR.

02:27 And these are some of the results from it.

02:28 And you can see that you get very, very clean ground truth data.

02:33 Even though this setup is not practical for doing a large movie shoot, we can capture

02:37 a data set that then lets us train better blue screen or green screen extraction algorithms.

02:43 So it's really an example of an AI-inspired data capture with a subsequent use case using

02:50 a derived model.

02:53 So one of the challenges if you have highly moderated data where you've removed all of

03:01 the trademark content, because that's part of your policy, is if you don't sell it to

03:05 an enterprise that wants to create their trademark content, there isn't an easy way to do that.

03:10 So we have also developed fast algorithms for train your own model.

03:15 So this takes our core Firefly generative model and then lets enterprises customize

03:21 it with their own content.

03:23 And so they won't accidentally create content from any other vendor in their backgrounds

03:28 or anything, but they'll definitely have full control and end up with a very high quality

03:33 example where they upload their own data to the server.

03:37 They get to have a versioned model which is just for them to use.

03:42 And so we have a multi-tenant model for this.

03:48 So here you just upload the images.

03:52 And then once you've done that, you can-- we have a pseudo brand that we invented just

03:58 to demonstrate the idea without using proprietary content called Drip, which is a drinks brand.

04:05 And then once you have it, you can use it to generate large variations and so on.

04:13 So one of the things about scaling content is that we want to generate multiple variations

04:20 for all different form factors of display and different styles.

04:25 So here is an example of this, where it's sort of the combinatorial explosion of customizing

04:31 something for a target segment and then all of the devices that that segment might have.

04:37 And it rapidly means that creative people are not doing creative tasks and get grumpy

04:42 about it.

04:43 And also, it's easy to either make a mistake or just burn through a lot of budget doing

04:49 that.

04:50 Whereas by doing this automatically with Gen AI, it's one of those everyday tasks that

04:54 lets them focus on being creative.

04:58 So one of the other variations that you have to think about, of course, being a global

05:02 company, is all of the different language versions that you need to do for anything

05:06 related to text, audio, and video.

05:09 And so the grandest challenge is probably redubbing videos in other languages.

05:15 And so we have developed models to do that for a wide variety of spoken language and

05:22 also reanimate the lips to match.

05:25 So this isn't creating arbitrary video from text.

05:30 It's really taking a pre-existing recording, translating it, and so on.

05:35 So if the demo gods are friendly...

05:38 What's up, Zeyun?

05:39 What are you doing?

05:40 Hola, estamos trabajando en una manera fácil de traducir y doblar videos.

05:41 Wait, what?

05:42 Oh, c'est traduction en génére par l'IA.

05:43 I'm working on this with my team, the Speech AI team at Adobe Research.

05:44 Hi.

05:45 Can we talk about the translation?

05:46 Yes, of course.

05:47 So, we're going to be using the speech AI team to translate the video.

05:48 And we're going to be using the speech AI team to translate the video.

05:49 And we're going to be using the speech AI team to translate the video.

05:50 And we're going to be using the speech AI team to translate the video.

05:51 And we're going to be using the speech AI team at Adobe Research.

05:53 Hi.

05:54 Can we take a sneak into what that means?

05:57 Sure.

05:58 With this technology, you can upload your video and have it dubbed and translated into

06:02 various languages.

06:03 It will match your voice and it will generate new lip motions to match those languages.

06:08 Wait, wait, wait.

06:09 This is easy.

06:10 We're very happy about it.

06:11 We're very excited about this.

06:12 It's easy.

06:13 We're very excited about this.

06:14 We're very excited about this.

06:15 We're very excited about this.

06:16 We're very excited about this.

06:17 We're so excited about this.

06:18 We're so excited about this.

06:19 We're so excited about this.

06:20 We're so excited about this.

06:21 We're so excited about this.

06:22 Keep on hearing about it.

06:23 So there he's wonderful.

06:24 He has great energy.

06:25 He doesn't speak all those languages, but we do have some people who do so they can sort

06:26 of sanity check it.

06:27 I think the main thing there is not only is it very fast and efficient, particularly for,

06:28 say, lower budget movies where you want to be on social quickly, but it captures the

06:29 voice of the original performer rather than having another actor speaking in a foreign

06:30 language.

06:31 So in some ways, it's very, very powerful.

06:32 So I think the main thing there is not only is it very fast and efficient, particularly

06:33 for, say, lower budget movies where you want to be on social quickly, but it captures the

06:34 voice of the original performer rather than having another actor speaking in a foreign

06:35 language.

06:36 So I think the main thing there is not only is it very fast and efficient, particularly

07:00 for, say, lower budget movies where you want to be on social quickly, but it captures the

07:07 voice of the original performer rather than having another actor speaking in a foreign

07:11 language.

07:12 So I think the main thing there is not only is it very fast and efficient, particularly

07:13 for, say, lower budget movies where you want to be on social quickly, but it captures the

07:14 voice of the original performer rather than having another actor speaking in a foreign

07:15 language.

07:16 So I think the main thing there is not only is it very fast and efficient, particularly

07:17 for, say, lower budget movies where you want to be on social quickly, but it captures the

07:18 voice of the original performer rather than having another actor speaking in a foreign

07:19 language.

07:20 So I think the main thing there is not only is it very fast and efficient, particularly

07:21 for, say, lower budget movies where you want to be on social quickly, but it captures

07:22 the voice of the original performer rather than having another actor speaking in a foreign

07:23 language.

07:24 So I think the main thing there is not only is it very fast and efficient, but it captures

07:25 the voice of the original performer rather than having another actor speaking in a foreign

07:26 language.

07:27 So I think the main thing there is not only is it very fast and efficient, but it captures

07:28 the voice of the original performer rather than having another actor speaking in a foreign

07:29 language.

07:30 So I think the main thing there is not only is it very fast and efficient, but it captures

07:31 the voice of the original performer rather than having another actor speaking in a foreign

07:32 language.

07:33 So I think the main thing there is not only is it very fast and efficient, but it captures

07:34 the voice of the original performer rather than having another actor speaking in a foreign

07:35 language.

07:36 So I think the main thing there is not only is it very fast and efficient, but it captures

07:37 the voice of the original performer rather than having another actor speaking in a foreign

08:05 language.

08:06 So I think the main thing there is not only is it very fast and efficient, but it captures

08:07 the voice of the original performer rather than having another actor speaking in a foreign

08:08 language.

08:09 So I think the main thing there is not only is it very fast and efficient, but it captures

08:10 the voice of the original performer rather than having another actor speaking in a foreign

08:11 language.

08:12 So I think the main thing there is not only is it very fast and efficient, but it captures

08:13 the voice of the original performer rather than having another actor speaking in a foreign

08:14 language.

08:15 So I think the main thing there is not only is it very fast and efficient, but it captures

08:16 the voice of the original performer rather than having another actor speaking in a foreign

08:17 language.

08:18 So I think the main thing there is not only is it very fast and efficient, but it captures

08:19 the voice of the original performer rather than having another actor speaking in a foreign

08:20 language.

08:21 So I think the main thing there is not only is it very fast and efficient, but it captures

08:40 the voice of the original performer rather than having another actor speaking in a foreign

09:01 language.

09:25 So I think the main thing there is not only is it very fast and efficient, but it captures

09:32 the voice of the original performer rather than having another actor speaking in a foreign

09:39 language.

09:40 So I think the main thing there is not only is it very fast and efficient, but it captures

09:41 the voice of the original performer rather than having another actor speaking in a foreign

09:42 language.

09:43 So I think the main thing there is not only is it very fast and efficient, but it captures

09:44 the voice of the original performer rather than having another actor speaking in a foreign

09:45 language.

09:46 So I think the main thing there is not only is it very fast and efficient, but it captures

09:47 the voice of the original performer rather than having another actor speaking in a foreign

09:48 language.

09:49 So I think the main thing there is not only is it very fast and efficient, but it captures

09:50 the voice of the original performer rather than having another actor speaking in a foreign

09:51 language.

09:52 So I think the main thing there is not only is it very fast and efficient, but it captures

09:55 the voice of the original performer rather than having another actor speaking in a foreign

09:56 language.

09:57 So I think the main thing there is not only is it very fast and efficient, but it captures

09:58 the voice of the original performer rather than having another actor speaking in a foreign

09:59 language.

10:00 So I think the main thing there is not only is it very fast and efficient, but it captures

10:01 the voice of the original performer rather than having another actor speaking in a foreign

10:02 language.

10:03 So I think the main thing there is not only is it very fast and efficient, but it captures

10:04 the voice of the original performer rather than having another actor speaking in a foreign

10:05 language.

10:06 So I think the main thing there is not only is it very fast and efficient, but it captures

10:07 the voice of the original performer rather than having another actor speaking in a foreign

10:08 language.

10:09 So I think the main thing there is not only is it very fast and efficient, but it captures

10:32 the voice of the original performer rather than having another actor speaking in a foreign

10:33 language.

10:34 So I think the main thing there is not only is it very fast and efficient, but it captures

10:35 the voice of the original performer rather than having another actor speaking in a foreign

10:36 language.

10:37 So I think the main thing there is not only is it very fast and efficient, but it captures

10:38 the voice of the original performer rather than having another actor speaking in a foreign

10:39 language.

10:40 So I think the main thing there is not only is it very fast and efficient, but it captures

10:41 the voice of the original performer rather than having another actor speaking in a foreign

10:42 language.

10:43 So I think the main thing there is not only is it very fast and efficient, but it captures

10:44 the voice of the original performer rather than having another actor speaking in a foreign

10:45 language.

10:46 So I think the main thing there is not only is it very fast and efficient, but it captures

10:47 the voice of the original performer rather than having another actor speaking in a foreign

11:15 language.

11:16 So I think the main thing there is not only is it very fast and efficient, but it captures

11:17 the voice of the original performer rather than having another actor speaking in a foreign

11:18 language.

11:19 So I think the main thing there is not only is it very fast and efficient, but it captures

11:20 the voice of the original performer rather than having another actor speaking in a foreign

11:21 language.

11:22 So I think the main thing there is not only is it very fast and efficient, but it captures

11:23 the voice of the original performer rather than having another actor speaking in a foreign

11:24 language.

11:25 So I think the main thing there is not only is it very fast and efficient, but it captures

11:26 the voice of the original performer rather than having another actor speaking in a foreign

11:27 language.

11:28 So I think the main thing there is not only is it very fast and efficient, but it captures

11:29 the voice of the original performer rather than having another actor speaking in a foreign

11:30 language.

11:31 So I think the main thing there is not only is it very fast and efficient, but it captures

11:50 the voice of the original performer rather than having another actor speaking in a foreign

12:11 language.

12:12 So I think the main thing there is not only is it very fast and efficient, but it captures

12:40 the voice of the original performer rather than having another actor speaking in a foreign

12:41 language.

12:42 So I think the main thing there is not only is it very fast and efficient, but it captures

12:43 the voice of the original performer rather than having another actor speaking in a foreign

12:44 language.

12:45 So I think the main thing there is not only is it very fast and efficient, but it captures

12:46 the voice of the original performer rather than having another actor speaking in a foreign

12:47 language.

12:48 So I think the main thing there is not only is it very fast and efficient, but it captures

12:49 the voice of the original performer rather than having another actor speaking in a foreign

12:50 language.

12:51 So I think the main thing there is not only is it very fast and efficient, but it captures

12:52 the voice of the original performer rather than having another actor speaking in a foreign

12:53 language.

12:54 So I think the main thing there is not only is it very fast and efficient, but it captures

12:55 the voice of the original performer rather than having another actor speaking in a foreign

13:23 language.

13:24 So I think the main thing there is not only is it very fast and efficient, but it captures

13:25 the voice of the original performer rather than having another actor speaking in a foreign

13:26 language.

13:27 So I think the main thing there is not only is it very fast and efficient, but it captures

13:28 the voice of the original performer rather than having another actor speaking in a foreign

13:29 language.

13:30 So I think the main thing there is not only is it very fast and efficient, but it captures

13:31 the voice of the original performer rather than having another actor speaking in a foreign

13:32 language.

13:33 So I think the main thing there is not only is it very fast and efficient, but it captures

13:34 the voice of the original performer rather than having another actor speaking in a foreign

13:35 language.

13:36 So I think the main thing there is not only is it very fast and efficient, but it captures

13:37 the voice of the original performer rather than having another actor speaking in a foreign

13:38 language.

13:39 So I think the main thing there is not only is it very fast and efficient, but it captures

13:56 the voice of the original performer rather than having another actor speaking in a foreign

14:17 language.

14:43 So I think the main thing there is not only is it very fast and efficient, but it captures

14:50 the voice of the original performer rather than having another actor speaking in a foreign

14:54 language.

14:55 So I think the main thing there is not only is it very fast and efficient, but it captures

14:56 the voice of the original performer rather than having another actor speaking in a foreign

14:57 language.

14:58 So I think the main thing there is not only is it very fast and efficient, but it captures

14:59 the voice of the original performer rather than having another actor speaking in a foreign

15:00 language.

15:01 So I think the main thing there is not only is it very fast and efficient, but it captures

15:02 the voice of the original performer rather than having another actor speaking in a foreign

15:03 language.

15:04 So I think the main thing there is not only is it very fast and efficient, but it captures

15:05 the voice of the original performer rather than having another actor speaking in a foreign

15:06 language.

15:07 So I think the main thing there is not only is it very fast and efficient, but it captures

15:08 the voice of the original performer rather than having another actor speaking in a foreign

15:09 language.

15:10 So I think the main thing there is not only is it very fast and efficient, but it captures

15:11 the voice of the original performer rather than having another actor speaking in a foreign

15:12 language.

15:13 So I think the main thing there is not only is it very fast and efficient, but it captures

15:14 the voice of the original performer rather than having another actor speaking in a foreign

15:15 language.

15:16 So I think the main thing there is not only is it very fast and efficient, but it captures

15:17 the voice of the original performer rather than having another actor speaking in a foreign

15:18 language.

15:19 So I think the main thing there is not only is it very fast and efficient, but it captures

15:20 the voice of the original performer rather than having another actor speaking in a foreign

15:21 language.

15:22 So I think the main thing there is not only is it very fast and efficient, but it captures

15:38 the voice of the original performer rather than having another actor speaking in a foreign

15:39 language.

15:40 So I think the main thing there is not only is it very fast and efficient, but it captures

15:41 the voice of the original performer rather than having another actor speaking in a foreign

15:42 language.

15:43 So I think the main thing there is not only is it very fast and efficient, but it captures

15:44 the voice of the original performer rather than having another actor speaking in a foreign

15:45 language.

15:46 So I think the main thing there is not only is it very fast and efficient, but it captures

15:47 the voice of the original performer rather than having another actor speaking in a foreign

15:48 language.

15:49 So I think the main thing there is not only is it very fast and efficient, but it captures

15:50 the voice of the original performer rather than having another actor speaking in a foreign

15:51 language.

15:52 So I think the main thing there is not only is it very fast and efficient, but it captures

15:53 the voice of the original performer rather than having another actor speaking in a foreign

15:54 language.

15:55 So I think the main thing there is not only is it very fast and efficient, but it captures

15:56 the voice of the original performer rather than having another actor speaking in a foreign

15:57 language.

15:58 So I think the main thing there is not only is it very fast and efficient, but it captures

15:59 the voice of the original performer rather than having another actor speaking in a foreign

16:00 language.

16:01 So I think the main thing there is not only is it very fast and efficient, but it captures

16:02 the voice of the original performer rather than having another actor speaking in a foreign

16:03 language.

16:04 So I think the main thing there is not only is it very fast and efficient, but it captures

16:05 the voice of the original performer rather than having another actor speaking in a foreign

16:06 language.

16:07 So I think the main thing there is not only is it very fast and efficient, but it captures

16:08 the voice of the original performer rather than having another actor speaking in a foreign

16:09 language.

16:10 So I think the main thing there is not only is it very fast and efficient, but it captures

16:11 the voice of the original performer rather than having another actor speaking in a foreign

16:12 language.

16:13 So I think the main thing there is not only is it very fast and efficient, but it captures

16:14 the voice of the original performer rather than having another actor speaking in a foreign

16:15 language.

16:16 So I think the main thing there is not only is it very fast and efficient, but it captures

16:17 the voice of the original performer rather than having another actor speaking in a foreign

16:18 language.

16:19 So I think the main thing there is not only is it very fast and efficient, but it captures

16:20 the voice of the original performer rather than having another actor speaking in a foreign

16:21 language.

16:22 So I think the main thing there is not only is it very fast and efficient, but it captures

16:38 the voice of the original performer rather than having another actor speaking in a foreign

16:39 language.

16:40 So I think the main thing there is not only is it very fast and efficient, but it captures

16:41 the voice of the original performer rather than having another actor speaking in a foreign

16:42 language.

16:43 So I think the main thing there is not only is it very fast and efficient, but it captures

Category

Transcript

Recommended