Your AI Is Building a File on You. Tim Berners-Lee Has a Fix.

Vivatech

Every AI platform you use is quietly building a file on you. They call it "memory" and say it improves the experience. Maybe, but your preferences, finances, relationships, and intentions are accumulating inside systems you don't control and can't see into. We've watched this pattern before, with the web itself. Sir Tim Berners-Lee saw the web's centralization coming and built an alternative. Now he's pointing it at AI. With Inrupt CEO John Bruce, he'll walk through Charlie — an agent that sits between you and the tools you already use, deciding what each one actually needs to know and keeping your data portable rather than locked to one platform. The premise is simple and far from settled: AI can be useful without owning you.

Transcript

00:03That was some dramatic entrance music. Sir Tim, do you always get that music?

00:08That's my... Yes, in the right.

00:11Wonderful. All right. So, we are going to talk about this fascinating new product that you guys

00:18have built. You're going to explain how it works, why it's important. But first,

00:23what is the problem you're trying to solve? What has gone wrong with the world that you

00:28are trying to fix? Either one of you can answer that. Although, what's gone wrong is that it's

00:37no fun to be on the web anymore. When the web started, anybody could meet their own website,

00:46and that was very empowering. Now, everybody's on Facebook. People are using

00:58arms to ask questions where they could use the search engine just as well. So, we've got

01:15disempowered people using AI and where they don't have control of their own data. And we have

01:23people and people using... And the web is under threat. And it's from the point of view of personal

01:31empowerment, personal... We call it... Jan was talking about... They were talking about data sovereignty.

01:39About AI sovereignty of a country. We talk about sovereignty of an individual. You, an individual,

01:47ought to have control of your data.

01:50It's a powerful statement coming from a man who built the web.

01:54John, do you want to explain? Given this problem, what is it that you gentlemen have cooked up?

02:00Well, as Tim said, you know, I mean, Jan was talking about macro sovereignty, geo sovereignty. We are

02:07focused on individual sovereignty. And the essence of that is having access to your own data. Because

02:14that's the last thing we've got at the moment. And with the advent of the LLMs, it's becoming increasingly

02:20clear that they're going to be your memory. I mean, the amount of data individuals are

02:29providing to the LLMs. I mean, you get a good return for the investment. You know, you get good

02:35service. But the trade-off is that they become your memory. And we think that that's fraught with

02:43problems. And what do you mean by your data? So obviously, it's your specific information,

02:50your data birth, your health information, your bank account information. But most of the stuff on the

02:54web is shared. You know, you and I send an email. It's your data. It's my data. What exactly

02:59do you define as your data? Is it every search query you put into an LLM? What is it?

03:04Yeah, all of those things. I mean, and largely, you get to decide what's important to you. But in our

03:09world, you should. So, and what I think we're all finding is the implications when you don't

03:17keep control of your own data. I mean, you know, we talk, I mean, you and I were talking the

03:22other day

03:26about experiences that we're beginning to have where you realize, hang on a minute. How on earth

03:32did it know that about me? How did it know the name of my kids? Or how did it know

03:38that I've got

03:39a doctor's appointment that I'm used to be at in a couple of hours? I mean, the degree of intimacy

03:43that

03:44the LLMs are beginning to build about you is way beyond where it ever was with search terms and

03:51the like. So, the answer to your question largely is, I think every piece of data about you should

03:57be available to you, actually. And you should be the custodian, largely, of who gets to use it.

04:03Okay. So, there's a huge trade-off there, right? So, I will often ask my LLM of choice,

04:10hey, you know, given what you know about my health situation, my training needs,

04:14what kind of workout should I do today? And obviously, there's a trade-off between having

04:19all of my data and giving a good answer, right? So, your view would be, I should make that choice,

04:24right? I can say, hey, you can have my resting heart rate and my iron levels. Or I can say,

04:30no, you shouldn't have that, and you should just train me as a generic person.

04:35But that leads to the problem where I don't want to have to go in every question and say,

04:39yes, no, yes, no, yes, no. So, explain these two tricky trade-offs. Either of you, explain

04:43how you think about utility versus privacy and also the user having to spend a lot of time

04:51figuring out what to give and what not to give.

04:53Yeah. And candidly, that's the best of all worlds, right? Where the user should be able to get the

04:59utility out of the LLMs, of course. I mean, I want it, you want it, we all want it. But

05:04by the

05:04same token, I don't want it to know me intimately. And I think that what that requires is my agent.

05:12I need to have something that works for me, not for the LLMs, and takes care of the house keeping

05:19that you described. You know, what consents should it grant? And it should be granting them

05:24at machine speed. And I don't want to be clicking all day with yes, no, yes, no. I mean, that's

05:28impractical. It would be like going to Europe. You know, you go to Europe after I have to click

05:31no on every single browser tab. And I wouldn't be crazy. Yeah.

05:35Tim, will you explain your philosophy on what data should be private? What is the data that's

05:41important that should never go into an LLM? Well, basically, when I have a conversation

05:54with the LLM, it's often going to be about what workout should I do? It should be about

05:58what vocation should I go on? All that sort of stuff. That's completely private. I might

06:09want to share it later on. I might want to assemble, usually, to help me assemble a definition

06:19of my perfect vocation. I might want to put the definition of my perfect vocation out there

06:25in some sort of marketplace where travel agents can come to, and airlines can come and sort

06:34of match up and provide me with, so basically, we might want to flip around the current attention

06:43economy to be an intentioned economy. And so, in that case, I'd be making public things about

06:49what I want. But everything starts off private. Yeah. So, all right. So, you have this system.

06:56It's called Charlie. What is it? How does it work? You've just announced it. It's not your first

07:00interview, but this is your second public interview about it. Well, pretty early. So, explain what it

07:06is. Has anybody here used it? Hands up if you've used it. Early days. All right. Good. At the

07:14beginning. So, now, explain to all these people, because we're going to have you all come back here

07:18tomorrow. I'm going to ask you that again, and I expect every hand to go up. So, explain what it

07:22is

07:22and how it works. Yeah. So, first, let me tell you about Charlie. Let me tell you that when Tim

07:27and I

07:27started the company, we have a company together called Inrupt. And for some years now, we've been

07:32advocating the use of this technology, which gives everybody a data vault, a data wallet,

07:39if you like. And we keep all our stuff in it. And then applications can come to us and ask

07:44for access to it. And it's all based on open source protocols. Actually, there's quite a

07:50flourishing open source community around it. So, the Solid protocols were invented some years

07:55ago. And when we started the company, we started it to mobilize around Solid, to bring resources

08:01to help drive the adoption of Solid and give everybody their own data vaults.

08:08And back then, Tim said, you know, and you have to remember what it was like then. There

08:13were two things. One was called Siri, and one was called Alexa, and that was it. And Tim

08:18said, you know, at some point in time, we're going to know all of us have our own. One that

08:21doesn't work for Amazon or Google or Apple, but one that works for us. And he wrote about

08:29it, and he called it Charlie. And we said back then, in 20 years, we're going to build

08:33it. It'll take us that long to build it. 20 years, so the technology didn't exist. But

08:38then about 18 months ago, it was evident that the way the LLMs were evolving, maybe it could

08:44exist. So we built a demonstration of it. And we built a demonstration that showed you

08:49can actually do it. You can build a thing that works for us, not for them. Not to oversimplify,

08:55but that's what it did. And we began to show it to folks quietly, and they said, love the

09:01idea of a data vault. Love the idea of data wallets. But how do you get the data in it

09:06and to your point? Does that mean all day I'm clicking yes, no, yes, no, yes, no to grant

09:11consented access to it? So that thing called Charlie, that would be really interesting,

09:18because if you can build that to take care of all this stuff. So we said, okay, let's

09:24see if we can build it. And we did.

09:27But Charlie is an intermediary when I go to OpenAI, or Charlie is its own system? Do I

09:34go to Charlie, or I use Charlie in the middle between my journey?

09:37Yeah, that's right. Think of it that way.

09:39It's always a layer. It's always a layer between you and the LLM.

09:44Yeah.

09:44So I log into Charlie, and then I go to the LLM, and then I say, hey, help me understand

09:49this next workout. And Charlie, the LLM says, hey, Charlie, can you give me Nick's iron levels?

09:55And Charlie's like, nah, no, I'm not going to give you that. Is that how it works?

09:58Well, no, no, no, no, no, no, no. It's better than that. Because we don't want to stop you

10:03using the LLMs, of course not. But at the same time, we don't want you necessarily being

10:09at risk in terms of all your personal data being available to the LLMs.

10:13So what Charlie does, it does a number of things, and I don't have the time to explain it all,

10:19but in simple terms, what it does is, in the first instance, before you submit the prompt,

10:25any LL, and this applies to LLM, Anthropic, OpenAI, Mistral, before you submit the prompt,

10:31Charlie says, ha, these pieces of data would be pertinent to this prompt. I'll package it up.

10:40But before I send it, I'm going to strip out all your PII. Now, you're in Europe, so you

10:46fundamentally understand PII like they don't in the States, but Charlie strips out all your PII.

10:52And then before it submits it, it obfuscates you. So it doesn't send your

11:01absolute data. It sends just a jittered version, just an approximation. So you can engage with the

11:09LLM. You get all the kind of guidance you're looking for, but it doesn't get a fix on you.

11:15It doesn't get to know you intimately.

11:16Does it send false data? Like, does it say, well, his birthday is June 32nd, right? Or does it just

11:22say, let's say my birthday was June 15th. It says his birthday is June 16th, June 14th?

11:26It depends on the use case. Now, the way Charlie works, and it sounds like it's a heavy lift, but

11:31it's

11:31not quite simple to use. You can throttle it. You can say, in certain circumstance, you're going to

11:37need to know my date of birth. So you can be very deterministic. You can say, in these use cases,

11:46be explicit. Tell the real me. But in those use cases, not. I want you to put jitter into the

11:53financial numbers. I want you to obfuscate my, I mean, you know, the reality of me.

11:58So still, yeah, you get the best of all worlds.

12:01I mean, this is my favorite thing about the product, the whole obfuscating data thing.

12:04I've seen other people build data vaults, privacy, data on the blockchain, own your own privacy. I've

12:09been hearing that for a while. I've never heard feed false data in LLM to protect yourself, which is

12:14just fabulous. So give me, you know, you can invent the web. You can invent a new way of lying.

12:20Congratulations, Sir Tim. Explain a query where it's very useful to obfuscate data. Let's make

12:28this like, give me a query that you've put into one of the big models and how much data you

12:34sent

12:34along with it and how that data was obfuscated. Let me tell you the high-level version.

12:40But so if you, and good examples at the moment are financial services. You know, particularly now that

12:46we've seen OpenAI introduce finance manager, you know, hook me up to all your bank accounts

12:52and I'll look after you. I mean, frightening, isn't it? But anyway, so...

12:54I did that. I made a billion dollars yesterday. It was amazing.

12:57Wow.

12:57Just kidding.

12:58Yeah, I know you are.

12:59Go on.

12:59So in the context of financial services, it's fascinating. And we have a number of projects

13:06underway in this where if you want straightforward advice, can I afford a mortgage? There's two

13:10ways you can go about doing it. You can provide your actual bank balances. You can upload financial

13:17statements in order to get an answer back from ChatGPT. Alternatively, you can use Charlie.

13:24And with Charlie, it wouldn't send your actual balances. It wouldn't send your actual credit

13:30scores. It sends a little approximation, just a little approximated, and that's that notion

13:36of jitter. You can introduce how much jitter you have. Charlie takes care of that. You don't

13:39have to worry about it. So it approximates you enough where you still get the guidance

13:43you're looking for without submitting your real data.

13:47Right. And you presumably get slightly worse guidance, right? Because if you have the exact

13:51credit score and your exact income and your exact bank statement, they can make a more

13:55precise calculation. And your argument is that the tradeoff is worth it.

13:59100%.

14:00Right. Because if you upload your credit score while trying to get your mortgage application,

14:05the AI company's going to hold onto that forever.

14:08Forever. It's your memory.

14:09And it could be used against you.

14:10Yeah. Yeah. Yeah. Yeah. Exactly.

14:11What is the worst example of an AI company holding onto data and actually using it in

14:16a way that was harmful to a user that you've heard about?

14:19We probably don't know.

14:21Yeah. But that we know about.

14:25Small examples, but dynamic pricing is one such, right? And I don't know that it's the

14:29LLMs necessarily. I was meeting with somebody earlier today. And she was telling me how,

14:35you know, around here you click, no, don't, do you consent to, and you click don't. So they did

14:42an analysis. And in 35% of the times, when you click don't, they still do it. Because nothing stops

14:50them.

14:50You click on the no button and it says yes.

14:53Yeah. It still does yes. The code still does yes.

14:55It still does yes.

14:56So that's two questions. So there's stuff going on at the moment we don't appreciate. But the ones that are

15:03making it into the public domain,

15:06two levels, public domain stuff, written about New York Times and so on, dynamic pricing, all of that stuff's going

15:12on.

15:12But on a micro level, I know you've experienced this. I suspect we all have. We all get to a

15:19WTF moment.

15:22We all get to a point where we think, hang on, it should not know that. I've had it. I

15:28know you, I'm sure we're all appreciating it.

15:30If you haven't, I guarantee you will. And it shows the intimacy with which these models are getting to know

15:37us.

15:37And a consequence of that is, unfortunately, the business model has skewed quite negatively the web we've got.

15:48It will emerge on steroids because of the intimacy with which they have this data available.

15:54So help me understand the trust question. So part of the reason why we don't want to upload our data

16:01is because we don't totally trust the companies

16:03and there's a long history of corporate malfeasance. But in order to use Charlie, I have to trust you. Absolutely.

16:12I trust you guys. You're lovely on stage. But how is the user supposed to trust your company and your

16:17vault when it's still a company?

16:19It's still a vault managed by board directors with financial incentives?

16:24Sure. You don't. You don't have to trust us. You have to trust in the people we're working with who

16:31intend to distribute Charlie.

16:35And you trust corporations. Not all of them, but some of them you do. You trust your bank. You have

16:41to. They've got your money.

16:43You're sending me an overdrawn notice, though, so I trust you.

16:45Yeah, well, that's different. And some banks are different. But, I mean, you know, generally there are trusted entities, banks

16:51want such, where they say,

16:53look, if you trust us to look after your money, trust us to look after your data. Here's Charlie.

16:59And Charlie's going to help you operate in a world of the LLMs in a not-safe way.

17:04And the good news is it helps them, too, because we're in an interesting point in time, actually, where, for

17:13all the disadvantage, I think, that I fear lies ahead of us as individuals, if we're not careful,

17:19the same kind of disadvantages exist for corporations.

17:25And financial companies, retailers, telecommunications carriers, insurance companies can all get disintermediated by the LLMs and the agents and by

17:36agents of the LLMs.

17:37So, you know, if I'm a bank and I'm sitting there strategically, all the things that I normally grant to

17:43my customer, I offer my customer counsel and guidance on financial matters,

17:49all that gets swept away by ChatGPT or similar, what am I left with?

17:55I end up being disintermediated.

17:57So that's interesting.

17:58So you think, let's go back to our mortgage example.

18:00You both think the bank will want the customer to have Charlie because if the customer is just uploading their

18:06bank statements up into OpenAI to ask the loan information,

18:09that's actually worse for the bank because now they no longer have the power of the control of your data.

18:14100%.

18:15Okay.

18:15So these guys are on, these guys want you to succeed.

18:18The big LLMs don't want you to succeed because they want more data.

18:21So how does the market play out?

18:24Does Chase Manhattan Bank, do they encourage me and nudge me to install Charlie?

18:30Yeah.

18:31Yes.

18:32Yeah.

18:32So explain what's going to happen.

18:34Well, I can't speak explicitly for Jamie Dimon and all his crew, but I mean, you know.

18:38Theoretically.

18:39So you could start going out.

18:40What you might do, maybe they'll hear people in the room who will talk to you.

18:44You may go out and find partners and say, look, you guys are risking everything.

18:47If all of this data is being uploaded into these LLMs, work with us, and we've got a cool way

18:53of making sure it doesn't happen.

18:55Yeah.

18:55Is that the next business, the biz dev step for you?

18:58Yeah.

19:01All right.

19:03So, and yeah, financial companies in particular.

19:08So, because for the people who trust.

19:12Yeah, trusted agents.

19:13I mean, and that's the way it should be.

19:16I mean, these are institutions that have spent a lot of time earning your trust.

19:21They're heavily regulated as well.

19:24They sit in a position of trust and oversight where they should be the kind of people that you would

19:30say, yeah, you know, if I'm going to look after my memory and I want to store it someplace, I

19:35want to make sure that there's a trusted entity looking after it with me, why wouldn't you use me?

19:40So, is your business model, individuals are going to, because you're going to need a business model.

19:45You have 20 employees, right?

19:47You know, it's not going to be a gigantic company.

19:48You don't need trillions of dollars.

19:49But your business model will be somebody subscribes and pays a fee or your business model will be you'll have

19:55like a commission from these trusted entities.

19:57How is it going to work?

19:58Great question.

19:59And I truly mean that because what we're finding is a fascinating point in time actually for corporations.

20:06And we're spending a good deal of time teasing this out with them.

20:10It used to be the case that when you were a company, you know, you knew where you wanted to

20:15get to and the job of the leadership team and everybody else was to get you there.

20:20How do I get to that place?

20:22And they're all sitting there now in a totally different mode.

20:26They're sitting there not knowing where that place is going to be, but they can't sit doing nothing.

20:32So, somebody said to me the other day, you know, we used to be pathfinders and now we're wayfinders.

20:39We have to figure out generally how to head in the right direction and we'll figure out then where next

20:45to go because a lot of us are, and I think the LLM vendors are no different, we're not clear

20:52where it all ends up.

20:53So, the answer to your question generally is that the job of work is for folks to appreciate that they

21:03can do things now and they should do things now strategically to make sure that they're safeguarded and they're safeguarding

21:11their customers for the future.

21:13Right.

21:13Is Charlie open source?

21:15Yeah, well, excuse me, the code underneath, all open source.

21:19The way we obfuscate, the way we strip PII, the protocols, all of it open source.

21:25What we've configured with Charlie, we've made closed source only because we've experienced you can move a downside faster in

21:34certain regards.

21:35If you generate stuff, closed source, get early adoption and then open it up.

21:39Wait, which parts are open, which parts are closed?

21:41Sorry, which parts of Charlie are open and which parts are closed?

21:45It's all available open source.

21:47We just implemented it using our resources.

21:52We have a closed source solid server called Enterprise Solid Server.

21:56Okay.

21:57That's closed source.

21:59Okay.

21:59So if we put together a Charlie implementation with all the protocols, and so the ESS, the Enterprise Solid Server,

22:10implements the solid protocol, so the protocol is open.

22:14Let me ask one last question as we kind of run out of time about the obfuscation of data because

22:18I think it's so interesting the way you do it.

22:19So one of the things about data is that if you get 10 pieces of blurry data about me, you

22:26can fold it back and figure out who I am, right?

22:28You get one piece of clear data, you can figure out who I am, and 10 pieces of blurry data,

22:32right?

22:32So how do you make sure that over time the LLMs, which are very smart and getting much smarter, are

22:39not able to piece together the data that you've blurred to figure out all the information you were trying to

22:44keep from them?

22:44Oh, there's no absolutes.

22:46I mean, you know, I don't think we can guarantee that they couldn't figure it out.

22:51Yeah.

22:52But what we can, with high degrees of confidence, assure anybody, that using Charlie gives you a chance, if you

23:01will, to see what next.

23:04And in terms of the what next, you know, these are businesses, so they go for ROI.

23:09And if there's easy things to do and difficult things to do, they'll do the easy ones first.

23:13Right.

23:14And I believe that if they can get an approximation of you, that may be just enough to satisfy their

23:23needs for general learning.

23:26And then for the kind of things they want to do, which is largely get access to that multi-trillion

23:31dollar, you know, advertising and transaction market out there,

23:35then the kind of things they can do, an approximation is sort of kind of okay.

23:39Do you, last question, do you think the big large language models, even if you say, hey, don't train on

23:44my data, even if you have a corporate account,

23:47do you think they're storing all that and that they'll use it in the future, or do you think they're

23:49actually deleting it or not?

23:51We noticed that Anthropic just changed their privacy policies with Fable.

23:54They're going to store all your corporate data for 30 days.

23:56Do you think they're storing everything on a long-term basis and that'll come back to haunt us?

23:59Uh, I couldn't say they're not.

24:04Tim?

24:07So, when, uh, are the companies lying about their privacy policies?

24:13Yeah.

24:15Well, time will tell, but some of them have already been exposed to doing that.

24:20Yeah.

24:21All right.

24:21Well, great.

24:22Everybody, delete your data.

24:24Use Charlie.

24:25Thank you so much for joining on stage.

24:27And thank you for building the web.

24:30You're welcome.

Category

Transcript

Comments

Recommended