Originally published in Data Science Central, July 4, 2018
Editor’s note: Although this author absolves the researchers (from Google) and blames only the journalists for the widespread false claims of a 95% accuracy level for mortality prediction, note that the research paper itself does indeed use the word “accuracy” multiple times as a synonym of AUROC, thus “starting it” among the non-technical or less technical journalists at large.
Having my newsfeed cluttered with articles about Google creating an AI that beats hospitals by predicting death with 95% accuracy (or some other erroneous claim), I dug up the original research paper to fact check this wondrous new advancement. Many of said articles used this quote from the abstract (academia’s equivalent of a paperback blurb):
These models outperformed traditional, clinically used predictive models in all cases. We believe that this approach can be used to create accurate and scaleable predictions for a variety of clinical scenarios.
but what the researchers actually said in the paper was:
To the best of our knowledge, our models outperform existing EHR (Electronic Health Record) models in the medical literature.
After reviewing the paper, it was clear the erroneous claims were not from the researchers but from journalists / laypersons misinterpretation. Henceforth, I will be referring to it as Google’s NN (Neural Networks) because this over-hyped AI over-labelling must stop.
Journalists latched onto Google’s NN 0.95 score vs. the comparison 0.86 (see EWS Strawman below), as the accuracy of determining mortality. However the actual metric the researchers used is AUROC (Area Under Receiver Operating Characteristic Curve) and not a measure of predictive accuracy that indexes the difference between the predicted vs. actual like RMSE (Root Mean Squared Error) or MAPE (Mean Absolute Percentage Error). Some articles even erroneously try to explain the 0.95 as the odds ratio.
Just as the concept of significance has different meanings to statisticians and laypersons, AUROC as a measure of model accuracy does not mean the probability of Google’s NN predicting mortality accurately as journalists/laypersons have taken it to mean. The ROC (see sample above) is a plot of a model’s False Positive Rate (i.e. predicting mortality where there is none) vs. the True Positive Rate (i.e. correctly predicting mortality). A larger area under the curve (AUROC) means the model produces less False Positives, not the certainty of mortality as journalists erroneously suggest.
To continue reading this article in Data Science Central, click here.
About the Author:
Throughout my career, I have rapidly transformed businesses in diverse industries (including financial, retail, CPG, automotive, telco, healthcare, agency) with best-of-breed analytics and insights. I was a data scientist long before it was fashionable. My work has had far-reaching impacts; at times negative (inadvertently getting an entire marketing department fired) but mainly positive (single-handedly changing the law of a US state).