Abbott Analytics
Portions excerpted from Chapter 2 of his book Applied Predictive Analytics (Wiley 2014, http://amzn.com/1118727967) Successful predictive modeling is more than identifying the right algorithms. And, even though 60-90% of our time is spend on data preparation before deploying the first predictive model built from a new data set, successful predictive modeling goes well beyond effective
In my last post, “Coefficients are not the same as variable influence”, I argued that coefficients in a linear regression model are useful but limited in answering the question, “which variables are most influential in model predictions?”...
When we build predictive models, we often want to understand why the model behaves the way it does, or in other words, which variables are the most influential in the predictions. But how can we tell which...
Excerpted and modified from Chapters 3 and 4 of Mr. Abbott’s book Applied Predictive Analytics, Wiley 2014 The Data Understanding stage of a predictive analytics project is intended to uncover the characteristics of the data available for...
In my last two posts I described why overfitting predictive models is dangerous beyond the most obvious problem, namely that accuracy on new data is lower than expected. In the next few posts, I’ll describe how to...
By: Victoria Garment, Content Editor, Software Advice
As featured on Software Advice's
Plotting Success blog
Editor’s note: This article compares measures for model performance. Note that “accuracy” is a specific such measure, but that this article uses the word “accuracy” to generically refer to measures in general. In data mining, data scientists...
Arguably, the most important safeguard in building predictive models is complexity regularization to avoid overfitting the data. When models are overfit, their accuracy is lower on new data that wasn’t seen during training, and therefore when these...
This speaker session is from Predictive Analytics World, September 30-October 1, 2013 in Boston, MA: (more…)
Predictive Modeling competitions, once the arena for a few data mining conferences, has now become big business. Kaggle (kaggle.com) is perhaps the most well-known forum for modeling competitions, using a crowd-sourcing mentality: if more people try to...