This article was originally published on SAS Knowledge Exchange
Who benefits by predicting your behavior? Organizations do—companies, governments, hospitals, and political campaigns. They employ predictive analytics, technology that learns from data to render per-person predictions, one individual at a time.
People have been struck by the final words in the title of my new book on this subject, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (www.thepredictionbook.com).
An old friend even sent me a photo of the book aside an onion, suggesting the material might be lightened to predict who will click, buy, lie, or cry. Or, we might consider changing it to, "The power to predict who will drink Coke, choke, or croak."
Joking aside, this exercise in enumerating verbs serves to demonstrate just how wide a variety of human actions and behavior can be predicted, such as whether an individual will buy, steal, drop out of school, quit his or her job, donate, crash his or her car, or vote.
Prediction is possible when we have at our disposal pertinent data that records such behavior. And we do! In case you haven't noticed, there's a well-publicized flood of data. Data is a recording of history, of things that have happened and actions people have taken. We aren't drowning in data, we're drowning in experience from which to learn.
Predictive analytics is the technology that leverages data to generate predictions of such human behavior on the individual level, one person at a time. Its capacity to do so reflects the power intrinsic to the data from which it learns. And the value attained by so doing relies on organizations making active use of such predictions, employing them to drive per-person operational decisions, one individual at a time. Lying and dying are pertinent examples.
Predicting Lying
Law enforcement is improving lie detection with predictive analytics methods. As with medical diagnosis or assessing the risk of an applicant for insurance coverage, predictive analytics augments established methodology to improve—by way of machine learning methods—the ability to assess the risk that an individual is lying based on the collection of known characteristics about that individual.
For example, University at Buffalo researchers trained a system to detect lies with 82 percent accuracy by observing eye movements alone. In another project, researchers predict deception with 76 percent accuracy within written statements by persons of interest in military base criminal investigations.
Predicting Dying
With all the human behavior being predicted, how about the final thing each of us do: die? In fact, there are five reasons organizations may predict your death. Sometimes they do it with altruistic intent, for healthcare-related purposes. In other cases, there's a financial incentive—they predict death for the money.
Healthcare providers predict death to help prevent it. For example, Riskprediction.org.uk predicts your risk of death in surgery, based on aspects of you and your condition, in order to help inform medical decisions.
Law enforcement and military predict kill victims in order to protect, and safety institutes predict system failure casualties to help avert them.
Life insurance prices policies according to predicted life expectancy. A growing number of life insurance companies go beyond conventional actuarial tables and employ predictive analytics to establish mortality risk.
Beyond life insurance, it turns out health insurance companies also predict death—of policyholders. Until recently, death prediction has not been within the usual domain for health insurance. I got the inside scoop, anonymously, from a top-five U.S. health insurance company—but I'll reserve the details for my book (Predictive Analytics), or see more details in my Smart Blogs article, Deathwatch: Five Reasons Organizations Predict When You Will Die.
Eric Siegel, Ph.D., is the founder of Predictive Analytics World (www.pawcon.com)—coming in 2013 to Toronto, San Francisco, Chicago, Washington D.C., Boston, Berlin, and London—and the author of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (February 2013, published by Wiley). For more information about predictive analytics, see the Predictive Analytics Guide (www.pawcon.com/guide).
In light of this article and the book about predictive analytics, a open question that I would like to ask is whether researchers have an effective way of measuring the impact that our attempts to predict behavior have on behavior iteself. In other words, does the awareness that a certain metric is widely used sometimes have the effect of making that metric less effective for predictive purposes because people learn how to “game” it? Paradoxically, can the very use of the metric also sometimes increase its predictive accuracy?
Consider the gaming effect first, since examples are legion. Standardized test scores are weighted so heavily in the admissions process, for example, that a whole industry of test prep materials and classes has emerged; a good score may now mean that you simply had the money to pay for test prep and a mentor who steered you in that direction. The search-engine optimization industry also seeks to uncover which metrics Google uses to rank pages so that they can increase clients’ website rankings. Law schools and universities have been known to hire their own graduates in temporary jobs (to inflate graduate employment statistics) and even raise tuition in order to obtain higher rankings in U.S. News & World Report. In a competitive world, it seems like knowing what metrics are used to rank and categorize people—and knowing how to game them—is potentially a matter of survival. In general, people are going to behave a little differently when they know they’re being observed; data mining is, in a sense, a form of observation.
Now consider the other side of the coin. People are perceived differently by others—and by themselves—when they know they have been tagged with a positive or negative label with regard to talent, intellect, or growth potential. These perceptions can have a very string influence on an individual’s own behavior and on others’ behavior toward the individual; those induced behavioral changes, in turn, help determine the outcomes that the original metrics were intended to predict. Consider the example of the hockey players discussed in Malcolm Gladwell’s outliers. The children who were tagged as having more talent or potential due to some extrinsic factor like the relative age effect got more attention from coaches and more opportunities to practice in competitive leagues while growing up and thus became more likely to make it to the NHL. (Carol Dweck’s research, which explores the effect of self-perceptions, is also worth mentioning).
People are generally not used to the idea that the act of trying to predict an outcome can have a significant effect on the outcome itself, but it seems like this principle may operate to a larger extent than we appreciate when it comes to data mining and predictive analytics. It doesn’t make these disciplines any less valuable, of course, but it does suggest that some predictive models could be improved if there was some way to account for these types of effects.