Skip to player
Skip to main content
Search
Connect
Watch fullscreen
Like
Bookmark
Share
More
Add to Playlist
Report
Brainstorm AI 2024: A New Tool to Master The Mix
Fortune
Follow
1 year ago
Jessica Powell, Co-founder and CEO, AudioShake
Category
🤖
Tech
Transcript
Display full video transcript
00:00
Hi, thanks all for coming.
00:01
I'm Jessica Powell, the CEO and co-founder of AudioShake.
00:06
And I am waiting for the slides to start.
00:10
Oh, there we go.
00:11
All right.
00:12
So today, I'm going to talk to you about sound.
00:14
But before we do that, we're going to do a sound experiment.
00:17
So in just a moment, I'm going to ask you all
00:19
to clap and to cheer as if you were at a sports
00:22
stadium or a concert.
00:24
And I'm going to talk.
00:25
And can we get some music?
00:28
Great.
00:29
OK, so louder, louder.
00:32
Can you hear me?
00:33
Can you hear me?
00:35
Great.
00:37
I really just wanted applause.
00:40
Great, so now let's see.
00:41
If I had asked you what the person two feet from you
00:44
had said during that time, you would have had no idea.
00:48
Or if it had been a concert, and you had actually
00:50
wanted to hear what was happening,
00:52
maybe more you wanted to hear the music than the crowd,
00:55
you'd be stuck.
00:56
And that's generally how we experience sound
00:58
in the real world, right?
00:59
Think about being in a noisy bar.
01:01
Or you're on the street, and you're recording a video.
01:05
And all of a sudden, an ambulance
01:06
comes by or a police siren.
01:09
Or since we're in Silicon Valley,
01:10
let's say you're developing an application that
01:12
requires voice input.
01:14
Well, if your user is in a call center,
01:16
on a noisy street corner, or simply
01:18
has toddlers running around in the background,
01:21
good luck getting the input that you need.
01:23
But what if that didn't have to be the case?
01:25
What if we could actually extract what we needed to hear?
01:29
So let's listen to something that's very, very noisy,
01:31
like this.
01:32
This is the mission.
01:33
And liftoff of the Space Shuttle Discovery,
01:36
returning to the Space Station.
01:39
And now let's isolate the voice.
01:42
And liftoff of the Space Shuttle Discovery,
01:45
returning to the Space Station.
01:47
So this is what we do at Audioshake.
01:49
We make sound work better for everyone.
01:51
We split sound into its different components
01:54
in order to make audio editable, accessible, and useful.
01:58
We're a core audio infrastructure company,
02:00
a bit like a Dolby.
02:02
Now this is actually pretty hard to do,
02:04
and let me show you why.
02:06
This is an image of an audio recording.
02:08
And you probably don't know,
02:10
is this a whole bunch of people speaking in a crowded room?
02:12
Is this a music recording?
02:15
Is it a bunch of sound effects in a movie?
02:17
And all I'm gonna tell you is that the y-axis
02:19
is the frequency, and the x-axis is time.
02:23
And that's about all you're gonna get.
02:25
Now what I'm gonna do is I'm actually gonna color
02:26
this in for you so you can see what's actually going on.
02:29
So this is a song recording.
02:31
And you can see we've got some vocals, some drums,
02:34
some bass, a collection of other instruments.
02:37
And again, this is not actually how sound looks.
02:41
You have no way of actually knowing
02:42
what's in these different parts.
02:44
Now recording engineers have a hack for this,
02:47
developed back in the 60s, around the time of the Beatles,
02:50
which is multi-track recording.
02:52
You send the vocalist into the studio
02:54
to lay down their track, then the drummer,
02:56
then the bass player, and you now have
02:58
all these granular elements that you can do
03:01
granular audio editing with.
03:02
You can remix the track, you could put those different
03:05
sound objects in different perceptual fields,
03:08
the bass here, the drum here, to make a surround sound mix.
03:11
You could correct imbalances in the audio,
03:13
all because you have those parts.
03:16
But real-world sound doesn't come to us like this.
03:19
It's more like what we just experienced
03:21
in our little experiment.
03:23
And so we have to turn to AI and deep learning
03:26
to help us fix this problem,
03:27
and be able to get at these individual parts.
03:30
So let me show you how we do that.
03:31
And in fact, before I move slides,
03:33
if you look here, if you look at the yellow and the orange,
03:36
the drums and the bass, this is a really great example.
03:39
So you can see that these are not in one tidy band.
03:42
They're all over the place, right?
03:44
They're overlapping with all the other frequencies as well.
03:46
And remember, we don't see any color in this image.
03:49
So all we can see is essentially sound patterns.
03:54
So what we do is we train on hundreds of thousands
03:57
of these parts, which are called stems,
03:59
and let's say drums, in order to learn
04:02
what these sound patterns are.
04:04
So let's say we can then take an audio recording
04:07
that we've never seen before,
04:08
and recognize that there are drums in there.
04:11
This matches to the sound pattern of a drum.
04:14
So we figured out what something is,
04:16
but now we need to disentangle it from everything else.
04:19
So let's say we want to do that with the bass.
04:20
We want to isolate the bass from the recording.
04:24
Think of an image editor.
04:25
So what we're going to do is we're going to leave
04:27
everything that we've, all the pixels that we've identified
04:30
as corresponding to the bass,
04:31
we're going to leave those there.
04:33
And everything that's not the bass,
04:34
we're going to black out.
04:36
And essentially we're going to silence that audio.
04:38
And that's going to leave us with the bass.
04:40
Now this might sound, as my parents said to me,
04:43
like a very weird hobby,
04:46
but actually it's very, very practical.
04:48
And AudioShake works across a ton of entertainment
04:50
and speech workflows today.
04:52
So in film and TV, we can isolate dialogue,
04:55
music, and effects to take old film
04:57
and open it up to localization in other languages.
05:00
Or remove on set wind and noise.
05:04
In music, we can separate the different instrument stems
05:06
for remixing or immersive sound.
05:09
In sports, we can boost what a player is saying
05:12
on the field, or remove music from the environment
05:15
so that teams and leagues can avoid millions of dollars
05:17
in copyright fines.
05:19
In transcription and captioning,
05:21
our tech is used to isolate the voice
05:23
before it goes through transcription
05:25
so that you get much higher accuracy.
05:28
And in generative AI, we're used by some of the world's
05:30
largest foundation models to learn human conversation.
05:34
So for example, disentangling overlapping speakers
05:37
or extracting the uh-huhs and yeahs
05:40
that make human conversation human.
05:43
So let me show you a quick demo
05:45
of what this all looks like in practice.
05:48
♪ Oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh oh
05:52
♪ I've a feeling coming down low
05:55
♪ Gonna stare at the ceiling but there's n-
06:00
The lock of Dream Come True, innit really.
06:03
Did you uh, we talked about nerfs
06:04
and coming into a game like this.
06:05
It's the magnitude of this game, massive game.
06:08
Did, was there a hangover at all from last season?
06:10
The result and the coming to...
06:12
Well, when you came to Earth,
06:13
you couldn't be more mistaken.
06:14
We're here to help you by deactivating them.
06:18
The loot, yeah.
06:19
Dort hat sich was eingenistet, Noah.
06:22
I think that you should get to the airport.
06:24
No, I think that you should get to the airport
06:26
like 30 minutes before boarding, maximum.
06:29
I want to walk right through security.
06:31
Enjoy it and lounge.
06:33
That's what I call a lounge.
06:35
In fact, there's going to be so much more that you're going
06:43
to be able to power with sound separation in the future
06:45
and in the near future.
06:47
So we've already heard a little bit today about music.
06:49
In music, you're going to be able to remix and reimagine
06:52
everything.
06:53
And all content, including UGC, is
06:56
going to be able to be made immersive,
06:58
both through sound separation and things
07:00
that are happening on the vision side as well.
07:04
We're also going to be able to do all kinds of measurement
07:06
and analysis thanks to sound separation.
07:09
So imagine being able to measure the degradation
07:12
or the health of an ecological environment
07:15
around the presence or the absence
07:17
and the change over time in different kinds of animal
07:19
sounds.
07:21
And in generative AI, we're going to be able to,
07:24
the foundation models, in fact, are going to go deeper
07:26
and deeper into audio, to the point where they're
07:28
going to be able to understand audio the way we do.
07:31
So all the complexity, the way any human here
07:33
could have said, that was crowd noise
07:36
and there was a dog barking over there.
07:38
Computers will be able to do that too.
07:40
And they're going to be able to learn to talk like us
07:42
as well, which is going to change
07:44
so many workflows from the factory floor
07:46
through to kiosks and amusement park.
07:49
Finally, because of advances not just in AI,
07:52
but also chips and hardware, we're
07:54
going to be able to solve the noisy bar problem.
07:58
Now one big technical challenge in all this is speed.
08:01
So in sound separation, we are taking large audio files,
08:05
running them through large deep learning models,
08:07
and then outputting large high resolution audio files.
08:11
And that's why for the past year,
08:12
AudioShake's been working on building faster and faster
08:15
models.
08:16
And today, we're launching here at Fortune,
08:18
the AudioShake voice SDK, which will isolate voice
08:21
in real time for noisy backgrounds,
08:24
streaming capable on edge devices.
08:27
And I think, while we were all talking and doing
08:29
our experiment, in the back, they were running exactly that,
08:33
or we're about to find out if they did.
08:35
Can you play the full mix of what
08:36
that room sounded like when everyone was cheering?
08:39
Louder.
08:39
Can you hear me?
08:41
Can you hear me?
08:43
All right, and what did you isolate?
08:45
Louder.
08:46
Can you hear me?
08:47
Can you hear me?
08:49
Great.
08:50
I should have done something like you all want a free car,
08:53
or something like that.
08:54
So as you go out today into the noisy hall,
08:56
or in a street corner later, or at a bar tonight watching
09:00
a DJ remix a track, know that all these experiences are going
09:03
to be enhanced by sound separation,
09:05
and AudioShake's going to help make it possible.
Be the first to comment
Add your comment
Recommended
15:53
|
Up next
Brainstorm Tech 2024: From Music to Medical Breakthroughs
Fortune
2 years ago
17:19
Brainstorm Tech 2024: A New Generation Rises In Silicon Slopes
Fortune
2 years ago
15:16
Brainstorm Tech 2024: AI Goes To Hollywood
Fortune
2 years ago
0:54
Amazon Rejects Saks Plan
Benzinga
22 hours ago
0:51
Trump Healthcare Unveiled
Benzinga
22 hours ago
0:58
Trump Loads Up On Bonds
Benzinga
22 hours ago
1:27
The Jonas Brothers Make Martha’s Famous Eggnog
Martha Stewart Living
7 weeks ago
1:22
Meredith Marks Makes Martha's Perfect Caviar Baked Potato
Martha Stewart Living
3 months ago
23:30
Martha Stewart's Ultimate Halloween | Spooky Recipes & Decorating!
Martha Stewart Living
3 months ago
1:07
Gen Z is making fan edits of Fed Chair Jerome Powell
Fortune
2 hours ago
1:11
MacKenzie Scott’s $26 billion philanthropy playbook
Fortune
3 days ago
1:30
Powell blasts DOJ criminal probe as attack on Fed independence
Fortune
5 days ago
0:43
Nvidia CEO Jensen Huang says "every single car will be AI powered" in the future
Fortune
6 days ago
1:16
Why does Donald Trump want Greenland?
Fortune
1 week ago
0:56
Zohran Mamdani and Kathy Hochul announce a $1.7 billion investment in child care
Fortune
1 week ago
1:30
Trump plan to send U.S. Big Oil into Venezuela isn't so simple—here's why
Fortune
1 week ago
1:15
Before Maduro’s ouster, Machado said Venezuelans should run their country
Fortune
2 weeks ago
0:44
How Trump’s tariffs have spiked the cost of Christmas trees
Fortune
3 weeks ago
0:44
MacKenzie Scott's college roommate loaned her $1,000 so she wouldn’t have to drop out
Fortune
4 weeks ago
0:53
Google cofounder Sergey Brin: Staying retired "would’ve been a big mistake"
Fortune
4 weeks ago
1:05
Trump promises $1,776 "warrior dividend" for every U.S. military member
Fortune
4 weeks ago
0:49
Do top execs think AI might take their job?
Fortune
5 weeks ago
0:29
Exelon CEO: Don't stay in your lane
Fortune
5 weeks ago
0:35
Perplexity CBO: SEO "broke trust"
Fortune
5 weeks ago
1:07
Databricks CEO: Most organizations are “frustrated” with AI reality
Fortune
5 weeks ago
Be the first to comment