Skip to playerSkip to main content
#devopenso #microsoft #research #language #model

Category

😹
Fun
Transcript
00:00My name is Dasha Metropolitansky and I'm a research data scientist in the Microsoft
00:07Special Projects Resilience Team. I developed a system called Claimify, which is a claim
00:12extraction system. You're probably wondering, what is claim extraction? Well, there are
00:17two keywords, claim and extraction. A claim, as I define it, is a simple factual statement
00:22that can be verified as true or false, and extraction is the process of breaking down
00:27a text into claims. To give a simple example to illustrate, let's say that you have a
00:32sentence. Some notable examples of technology executives include Satya Nadella and Bill Gates.
00:39If I had to break this sentence down into claims, there would be two of them. Satya Nadella
00:43is a technology executive and Bill Gates is a technology executive. So this is a really
00:48simple example, but you can already start to see some of the properties that we might care
00:51about when we do claim extraction. One of them is that I got rid of this word notable because
00:56what does that even mean? That's not something that I can verify as true or false. The other
01:00is that I created separate claims, one for Bill Gates and one for Satya Nadella. This is because
01:05we want claims to be the simplest possible independent statements. So taking a step back,
01:10Claimify basically takes in a text of any length. Usually these texts are much longer than the
01:15example I just gave you, which was just a sentence, and it'll take that text and decompose it into
01:20these high quality claims. I'm working now on a system that does hallucination detection. So let's say that
01:25your question answering application answers questions based on some source documents like
01:29news articles. You want to make sure that the language model is answering those questions based
01:34on the source documents, not just making things up. But that's a really hard evaluation to perform
01:39when you have a paragraph or multi-paragraph answer that has so much information in it. Now imagine if you
01:45could distill that into a simple set of standalone factual statements. It becomes much easier to then check
01:51those independently. But this is not just about hallucination detection. You can do other sorts
01:55of evaluations as well. So back to the use case, you're building an application. You want to know
02:00how relevant the answers are to the question that was asked. If this answer contains 20 or 30 distinct
02:06points, it's so hard to say how relevant the entire answer is. Maybe some points are relevant and others
02:10aren't. But if you can take these individual factual claims and say this one is relevant, this one
02:15isn't, you can easily aggregate that into one composite measure. Our team is also using the
02:20number of claims in the answer as a proxy for how comprehensive it is. So to summarize, why does
02:26claim extraction matter? Because it unlocks the ability to evaluate long form content generated by
02:32language models. We don't try to do claim extraction on the whole text at once. We actually break it down
02:38into sentences and we do the claim extraction on each sentence independently. Now to ensure that those
02:43sentences are interpreted accurately, we include some context, which is basically a window of text around
02:48the sentence. So that's number one. Number two is we don't treat claim extraction as one monolithic
02:55task. We break it down into three parts. Selection, disambiguation, and decomposition. Selection means
03:02we're filtering out sentences that do not contain any verifiable claims. So for example, if I gave you the
03:07sentence, companies should embrace AI. That's not a factual claim. It's an opinion. So we would filter it out.
03:14Secondly, we have disambiguation. This is basically detecting whether there's ambiguity and then
03:19deciding if there's ambiguity, can it be resolved using the context or flag that it can't be resolved.
03:25Ambiguity here just means there are multiple plausible interpretations and depending on which
03:29interpretation you pick, you're going to get a very different set of claims. This is one aspect of
03:33claimify that is really unique and powerful, especially the ability to determine whether or not
03:38this ambiguity can be resolved. And then the last stage is decomposition, which takes the
03:44disambiguated sentence and breaks it down into these simple standalone factual statements.
03:49So let's take a closer look at claimify in action.
03:51Imagine you're developing a chatbot. You ask it to provide an overview of challenges in emerging
03:56markets, and it generates this answer. Assessing the quality of the answer is really hard. It's
04:01packed with information, and there's no gold standard to compare against. Instead of processing
04:06the entire text at once, claimify extracts claims from each sentence independently. We include context
04:11for each sentence to ensure accurate interpretation. To recall the sentence, the UN found that the resulting
04:16contaminated water caused many residents to fall ill, highlighting the need for improved water
04:21management. The baseline prompt ignored the phrase highlighting the need for improved water
04:24management. It only extracted claims from the first part of the sentence. However, claimify reasoned
04:29quote, the sentence could be interpreted as the UN found that the contaminated water caused illness
04:34and also highlighted the need for improved water management. Or it could be interpreted as the UN
04:39only found the contamination and illness and the author is adding the interpretation about the need for
04:44improved water management. In other words, this may or may not be a verifiable claim. Claimify decided
04:50that the context did not clearly support either interpretation, so it flagged the sentences as
04:55cannot be disambiguated and did not proceed to the decomposition stage. Here are the sentences where
05:00at least one claim was extracted. I'll highlight a few examples. Recall the sentences about Argentina's
05:05inflation, where the baseline missed the claims about economic hardship and the prediction of rates
05:10greater than 300%. Claimify did not miss these claims. Also, the baseline just said Argentina's
05:16currency value has plunged. Claimify correctly specified that inflation has depreciated the
05:21currency. Consider the sentence, countries like Afghanistan and Sudan have experienced similar
05:26challenges to those of Libya, where the baseline claims never specified what those refers to.
05:32Well, the context discusses public health crises, flooding, and contaminated water, so Claimify made
05:37specific claims about these issues. In the sentence, Nigeria is striving to become self-sufficient in
05:44wheat production, but is hindered by climate change and violence. The baseline had claims like,
05:48Nigeria's wheat production is hindered by climate change and violence. Claimify captured that it's
05:53Nigeria's efforts to become self-sufficient in wheat production that are being hindered.
05:58So one of the most popular use cases of language models is generating long-form content. Unfortunately,
06:03it's really hard to evaluate the quality of that content. Claim extraction can help, and Claimify
06:09is a really powerful tool for generating high-quality claims.
Be the first to comment
Add your comment

Recommended

0:08