By: Eric Siegel, Founder, Predictive Analytics World

In anticipation of his upcoming conference co-presentation, Advanced Analytics and the Matthew Pietrzykowski imageCorporate Audit Function at Predictive Analytics World San Francisco, April 3-7, 2016, we asked Matthew Pietrzykowski, Senior Data Scientist at General Electric, a few questions about his work in predictive analytics.

Q: In your work with predictive analytics, what behavior or outcome do your models predict?

A: The predictive models that tend to be generated in GE’s Corporate Audit Staff (CAS) are heavily focused on classification outcomes, forecasting and optimization. The types of models used range from logistic regression to random forest classification models. Typically, the models are built to help auditors assess whether there is evidence to support an auditable event or find the optimal or reasonable outcome.  These models tend to be of mixed data types and some are augmented with the results of text mining short form narrative fields.

Q: How does predictive analytics deliver value at your organization – what is one specific way in which it actively drives decisions or operations?

A: In the internal audit space, predictive analytics has seen sporadic use since most of the internal audit work is retrospective with a focus on uncovering mechanisms of potential failure rather than the prediction of new cases. However we have found great potential of it with reducing false positives through targeted reviews of audit field work as well as in risk assessment.  As an example, predictive analytics is being used in executive level planning to help with auditor deployment. The model predicts business sites with a greater risk of showing audit violations.

Q: Can you describe a quantitative result, such as the predictive lift of your model or the ROI of an analytics initiative?

A: One of the models produced predicts the risk of an auditable outcome by classifying business sites using multiple disparate data sets.  The goal was to compile different resources as potential inputs that are used in a typical audit analysis.  These data were from different sources with different schema, so the blending problem was of particular concern.   The final model predicted with a ~90% classification accuracy on test data which is a ~23% improvement over base rate.

Q: What surprising discovery or insight have you unearthed in your data?

A: Some of the more surprising outcomes and insights came from necessity.  Most of our data is mixed data types with continuous, categorical, and short form text fields.  Text mining the narrative fields has resulted in both insightful and more impactful overall modeling results than if the narrative fields were omitted.  As an example, we helped one of the businesses leverage their short-form narrative fields by mining them, summarizing them into semantic clouds, and aggregating the results into a summary measure over time.  This time series can then be analyzed for trends that are potential markers for risk events.   We are even seeing evidence to suggest that document term matrices can be used as differentiable attribute data in classification models.

Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World.

A: Data science and in particular, predictive analytics, has a place in the corporate audit function. In fact, it’s a strategic part of GE CAS.  Advanced analytics is a core requirement for our auditors so that they can leverage it in a scientific manner while they are actively auditing our business sites.  The value is seen not only in risk abatement, planning, and forecasting, but it’s forcing a paradigm shift in the organization. 

———————

Don't miss Matthew’s conference co-presentation, Advanced Analytics and the Corporate Audit Function on Monday, April 4, 2016 at 11:20 am to 12:05 pm at Predictive Analytics World San Francisco. Click here to register to attend.

By: Eric Siegel, Founder, Predictive Analytics World