Originally published in Forbes, August 21, 2024
This is the first of a three-article series covering the business value of predictive AI, using misinformation detection as an example for illustration: article 1, article 2, article 3.
Not all business problems are best addressed with generative AI. For some, prediction is the solution.
Take misinformation detection. Imagine you run a social media platform and, for certain high-risk channels, a third of user posts convey misinformation. As a result, your business is getting slammed in the press.
As with fighting fraud, managing credit risk and targeting marketing, this is exactly the kind of problem that demands prediction. On the one hand, you can’t trust a machine to filter every post automatically. On the other, you can’t have every post manually reviewed—that’s too expensive.
Enter predictive AI, which flags (predicts) cases of interest. For misinformation detection, this means flagging the posts most likely to convey misinformation, which in turn should be audited by humans and potentially blocked.
A single chart summarizes the value of doing so:
A savings curve for misinformation detection.
A savings curve for misinformation detection. The horizontal axis represents the portion of posts manually audited and the vertical axis represents savings.ERIC SIEGEL…Insert Text Above
This plots the money saved (vertical axis) against the portion of posts manually audited (horizontal axis). The leftmost position corresponds with auditing zero posts, which would mean no misinformation is blocked, and the rightmost position corresponds with auditing all posts, which could mean all misinformation is blocked, but would cost a lot of human labor.
The shape of this curve guides you to balance between auditing too few and too many. In this case, you would save the most by auditing the top 51% most likely to be fraudulent. That’s where you’d maximize the reduction in cost—in comparison to not auditing any posts and not blocking any misinformation.
These savings are estimated in part based on two different kinds of cost:
1) The cost to manually audit a post (set at $4 for this example).
2) The cost of misinformation going undetected (set at $10 for this example).
While the first of these two can be objectively established based on the cost of labor, the second is subjective, so there may be no definitive way to determine its setting. Many use cases of predictive AI face a similar dilemma. How do you put a number on the cost of a medical condition going undetected? Or the cost of an important email message being incorrectly relegated to your spam folder? For all such projects, it’s vital that stakeholders be given the capability to vary the cost setting so that they can see how it affects the shape of this curve. That is the topic of this follow-up article.
The peak estimated savings comes to $442,000, based also on an assumed population of 200,000 posts (within the high-risk channels). This means that, if there are 200,000 posts each week, the right setting stands to save the company $442,000 per week.
Most predictive AI projects see a similar effect, where there’s a “Goldilocks zone” somewhere in the middle: The value is maximized by not inspecting too many cases, and yet not inspecting too few. After all, that’s the value of predictive scores: They prioritize cases so that you can decide where to draw the line—where to set the decision threshold—as to which cases get “treated” (audited, contacted, approved, etc.). The horizontal axis reflects this ordering and the dotted vertical line represents an example setting for the decision threshold.
Yet, most predictive AI projects don’t actually “see” this effect because they don’t go as far as plotting a curve like this. Tracking a business metric like savings is critical, but is not yet common practice. Instead, projects usually only track technical metrics that don’t provide clear insight into the potential business value.
Predicting With A Large Language Model
Predictive models come in all shapes and sizes, including decision trees, logistic regression and ensemble models. But for language-heavy tasks such as misinformation detection, a large language model may be well-suited. Such models usually serve generative AI—to generate draft content—but they can also serve to predict.
To try this out, we tapped a Stanford project that tested various LLMs on various benchmarks, including one that gauges how often a model can establish whether a given statement is true or false. The test cases were designed to evaluate for reading comprehension—they didn’t represent the kinds of public misinformation that would more typically appear on social media. So we’ve used this testbed only to illustrate the mechanics of how an LLM could serve for misinformation detection; this illustration should not be considered a rigorous research project into the effectiveness of this approach.
For each case, OpenAI’s GPT-3 (175 billion parameters) was asked multiple times—with some simple, rote variations in the wording—whether the statement was true or false. Each of the LLM’s response counted as a “vote” in order to turn these outputs into a predictive score for each test case. Those scores, in turn, are reflected in the chart’s ordering, from left to right: Those on the left are predicted as more likely to be false and those on the right as more likely to be true. A bit more than one third, 37%, of the test cases were false statements and the rest true.
No matter which kind of model you use to predict, the value of doing so can usually be represented with a chart like that shown above. The model’s utility comes down to that left-to-right ordering. With that valuable prioritization, you then get to choose where to draw the line (the decision threshold).
Beyond the goal of maximizing any one business metric, such as savings or profit, establishing the decision threshold must also take other tradeoffs into consideration. I have continued with this example in a follow-up article so that we can more holistically navigate the options.
About the author
Eric Siegel is a leading consultant and former Columbia University professor who helps companies deploy machine learning. He is the founder of the long-running Machine Learning Week conference series, the instructor of the acclaimed online course “Machine Learning Leadership and Practice – End-to-End Mastery,” executive editor of The Machine Learning Times and a frequent keynote speaker. He wrote the bestselling Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, which has been used in courses at hundreds of universities, as well as The AI Playbook: Mastering the Rare Art of Machine Learning Deployment. Eric’s interdisciplinary work bridges the stubborn technology/business gap. At Columbia, he won the Distinguished Faculty award when teaching the graduate computer science courses in ML and AI. Later, he served as a business school professor at UVA Darden. Eric also publishes op-eds on analytics and social justice. You can follow him on LinkedIn.
The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 •
Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact