(adapted from Chapter 13 of the Handbook of Statistical Analysis and Data Mining Applications) After a first pass of training and evaluating a model, you may find you need to improve its results. Here is a checklist of ten practical actions that I’ve found usually help: Transform real-valued inputs to be approximately Normal in distribution. Regression, for instance, behaves better if the inputs are Gaussian; extremes have too much influence on squared-error. For variables that are typically log-normally distributed, like income, this involves transforming the variable via a logarithm or the more general Box-Cox function. Remove outliers.
This content is restricted to site members. If you are an existing user, please log in on the right (desktop) or below (mobile). If not, register today and gain free access to original content and industry news. See the details here.