As I have stated in previous articles, the most difficult challenge in building predictive models is the creation of the analytical file. Typically, this comprises between 80%-90% of the data scientist’s time with 10%-20% comprising the actual run or runs of the different mathematical/statistical algorithms. In the creation of the analytical file, the two elements in its design are the development of the target variable and the development of the independent variables or potential predictor variables. The data challenges are a reality in creating the right analytical file. Yet, with certain models such as fraud, the level of
This content is restricted to site members. If you are an existing user, please log in on the right (desktop) or below (mobile). If not, register today and gain free access to original content and industry news. See the details here.