The work of interpreting data to help decision-makers goes back some 5,000 years to the bureaucrats and businessmen of ancient Sumer. But dealing with the astronomical size and complexity of modern data sets requires a new, multifaceted set of computational, statistical, and communication and people skills, called “data science.”
Data scientists are among the most sought-after professionals today, and the need for them will only increase as our world grows ever more complex and interconnected.
As Mark A. Smith of Ventana Research blogged in January, a key challenge for organizations in 2014 is acquiring expertise at extracting the best possible insights from big data. That means finding people with a uniquely balanced set of data science skills that enable them to take on today’s challenge of making sense of petascale and larger data sets from the enterprise, the cloud and the Web.
But what do these well-rounded data scientists look like, exactly, and where can they be found?
The function of the data scientist is, in a word, sensemaking — providing a clear understanding of an organization’s universe through data analysis, helping improve decision-making and supporting leadership.
The best data scientists also will be intensely curious and interested in discovering new insights. They will be creative in their approach to identifying and solving problems.
Their professional expertise stands upon three pillars: (i) deep theoretical knowledge of statistics and computability, (ii) practical knowledge of diverse data science tools (and the ability to create them when needed), and (iii) an ability to communicate effectively with people with no technical background about very complex technical material.
Specifically, well-rounded data scientists have the following skills:
The first generation of data scientists was largely self-taught. They started from backgrounds in physics/science, statistics, mathematics, or computer science, and learned the other necessary skills and knowledge along the way. But universities (including Illinois Institute of Technology) are providing new multidisciplinary degree programs to teach students data science, which should help to take the guesswork out of finding data scientists.
These programs go beyond traditional degrees in statistics, mathematics, computer science, and business intelligence by teaching a broad set of both technical and soft skills to prepare students for careers in data science.
But not all such programs are created equal, and it is important to be aware of the differences among them.
Some programs focus on teaching students specific tool-based skills and application areas. These programs can produce graduates who can start work on well-defined projects fairly quickly. Other programs with deeper theoretical content produce graduates who will be able to more easily work outside of their initial comfort zone, and who can learn, grow and adapt as the field changes.
Similarly, programs that focus deeply on mathematical and computational content may produce more technically knowledgeable graduates, but unless they entered the program with already excellent communications skills, these graduates may not be well-suited to real-world data science jobs where communicating with nontechnical people is an essential part of the job. Programs that more fully integrate soft skills into the curriculum will produce more well-rounded data scientists who can take on leadership roles.
Data science brings a distinct set of challenges to business communication. How does one explain statistical evidence and analytical results without oversimplifying or creating confusion? Students need to learn how to weave results into a coherent story, how to explain statistical assumptions and caveats clearly, and how to create data visualizations that give insight and are not just pretty pictures. The only way to learn these skills is by practicing them at the same time as learning the related technical material.
Finally, a critical component of any quality data science education program is some sort of practical experience component, whether it is a student project, an internship program, or a guided practicum. Until they have worked on real-world data science problems, students will not fully understand how to perform central data science activities that cannot be taught in a classroom setting: struggling to define the analytical problem correctly, dealing with real data complexities and inconsistencies, and communicating results in a clear, enlightening, and satisfying fashion to non-technical, non-academic clients.
Our model is to place students into teams of two to four individuals who work on projects for industrial partners with academic guidance. In such team-based work, students are forced to work together, exercising interpersonal communication skills. They can learn from and teach each other, seeing in the process how people with different talents and knowledge can together achieve more than they could as individuals. And by working with real clients from industry, students get direct experience and feedback on both their technical performance as well as their communication skills.
Understanding what makes for a good data scientist and how to evaluate different educational programs is essential for effective recruiting. There is no question that it will become easier as the field matures and general standards for data science education start to emerge.
The field of data science will of course continue to evolve and change, as the nature and complexity of data continue to evolve and change. But one constant will remain: well-rounded data scientists will be needed to help us all make sense of our changing world.
By: Shlomo Argamon
Originally published at www.information-management.com
Shlomo Argamon is Professor of Computer Science, Director of Data Science Program, at the Illinois Institute of Technology, Chicago, IL.