My POV: What is Data Science and the Role of a Data Scientist?

The field of Data Science and the role of a Data Scientist are hot commodities right now. IBM predicts the demand for Data Scientist will increase 28% by the year 2020. [14] Demand is so high in fact, that in the screenshot below, I show an advertisement from one of many companies that offer data science/data scientist training with the guarantee that they will find you a job immediately after you complete your training or get your money back!

In this blog post, I will first explore what the field of Data Science is, and then discuss the role of a Data Scientist. Sources used have been noted and provided at the end of this article.

Data Science Defined

In 1998, Chikio Hayashi defined Data Science as a “concept to unify statistics, data analysis, machine learning [added to the definition later] and their related methods” in order to “understand and analyze actual phenomena” with data. It includes three phases, design for data, collection of data, and analysis on data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science. [1]

Turing award winner Jim Gray imagined data science as a “fourth paradigm” of science (empirical, theoretical, computational and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge. [2][3]

Data Science – The New Buzzword

After Mr. Hayashi’s paper was published, the term “data science” instantly became a buzzword. Back in 2012, the Harvard Business Review called the Data Scientist “The Sexiest Job of the 21st Century”. It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. Hans Rosling, featured in a 2011 BBC documentary with the quote, “Statistics is now the sexiest subject around.” Nate Silver referred to data science as a sexed-up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as “data science” to be more attractive, which can cause the term to become “dilute[d] beyond usefulness.” While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents.

Data Scientist Defined

Back in April 2015, Jojo John Moolayil, IoT & Data Science author, provided a nice, simple definition on Quora of what a data scientist is. [11] Mr. Moolayil stated:

“The area of study which involves extracting knowledge from data is called Data Science and people practicing in this field are called as Data Scientists.”

Mr. Moolayil points out that businesses generate a large amount of data which includes “transactional, inventory, sales, marketing, customer, external and many other dimensions.” This data has enormous value embedded in it, but the data is latent (today, we may think of this latency in terms of a data lake). There are many trends and patterns that can be mined from this latent data, so the business can make insightful, actionable decisions.

So, here is where the data scientist comes in. The process of extracting information and meaning from this data is a monumental task. It requires a variety of skill sets which are interdisciplinary in nature. 

The data the data scientist has in front of them was probably accumulated across a variety of disparate data sources. The data may be structured, unstructured or semi-structured. Amalgamating all these data sources coherently into a proper dataset on which analysis can be performed requires a lot of technical skills. [11] Technologies like R Programming, Python, Java, Perl, C/C++, Hadoop, MapReduce, SQL, Hive, Pig, Apache Spark, Data Visualization, Machine Learning and AI, Unstructured Data (“Dark Analytics”), etc. are the ones most widely used. Yet, this list of toolsets is far from exhaustive.

Now that you have the data in a format that you can manipulate, you will need mathematical and statistical skills to begin analyzing the data. The types of analysis to obtain more information from the data may include, but is not limited to, exploratory data analysis, statistics tests, regression models, etc. Your toolsets to do this may include machine learning, deep learning, statistical inference, etc.

Finally, you have results you want to share with your business partners (e.g., the people that requested the results). This is where the data scientist needs solid skills in communications, the ability to present the data so it is understandable and meaningful, the ability to persuade, design thinking, problem solving, data ethics, data visualization, etc.

So, we have crossed several disciplines here regarding the kinds of skillsets a data scientist needs. We have technical skills, mathematical and statistical skills, and human centric & investigational skills.

[NOTE: This classification term, human centric & investigational skills, was provided to me by Bridget Cogley, Senior Consultant at Teknion Data Solutions and Data Ethicist. Neither of us liked the term “soft skills” since these skills are often difficult to achieve and more inherent in some people than others, and not everyone can easily learn them. We also did not like the term “non-technical skills” as these skills have an important cohesion and symmetry with a data scientist’s technical skills.]

Becoming a Data Scientist

As of the writing of this article, on Google, there are 146 Million search results for Data Scientist Training. Here are a few example screenshots, from the Google search I did, on how you too can become a Data Scientist.

Summary

On several occasions, I have heard IT managers and recruiters refer to the data scientist as a unicorn. That is, they believe they exists, it is just really, really hard to find one. The Data Scientist’s skills cross several disciplines requiring technical skills, mathematical and statistical skills, and human centric & investigational skills. Some of these skills can be learned through proper education and hard work. Other skills, like the human centric & investigational skills, come with experience or are a natural gift that some possess. Hopefully, with many colleges and universities developing curriculums in Data Science, organizations requiring this special skill will find their unicorn sooner than later.

Sources

[1] Hayashi C. (1998) What is Data Science ? Fundamental Concepts and a Heuristic Example. In: Hayashi C., Yajima K., Bock HH., Ohsumi N., Tanaka Y., Baba Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo.

[2] Stewart Tansley; Kristin Michele Tolle (2009). The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. ISBN 978-0-9825442-0-4.

[3] Bell, G.; Hey, T.; Szalay, A. (2009). “COMPUTER SCIENCE: Beyond the Data Deluge”. Science. 323 (5919): 1297–1298. doi:10.1126/science.1170411ISSN 0036-8075.

[4] Wikipedia, Data Science, https://en.wikipedia.org/wiki/Data_science.

[5] KDNuggets, 9 Must-have skills you need to become a Data Scientist, updated, KDNuggets.com, May 2018, https://www.kdnuggets.com/2018/05/simplilearn-9-must-have-skills-data-scientist.html.

[6] Browne-Anderson, Hugo, What Data Scientists Really Do, According to 35 Data Scientists, Harvard Business Review, August 15, 2018, https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists.

[7] Cearley, David W. and Brian Burke, Samantha Searle, Mike J. Walker, Top 10 Strategic Technology Trends for 2018, October 3, 2017, Gartner ID: G00327329

[8] Idonine, Carlie, Citizen Data Scientists and Why They Matter, Gartner Research Blog, May 13, 2018, https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-whythey-matter/.

[9] Loshin, David, Empowering the Citizen Analyst: Agile Techniques for Enhancing Self-Service for Data Science, Knowledge Integrity, TDWI Webinar, October 11, 2018.

[10] –, Gartner Says More Than 40 Percent of Data Science Tasks Will Be Automated by 2020, Sydney, Australia, January 16, 2017, https://www.gartner.com/en/newsroom/pressreleases/2017-01-16-gartner-says-more-than-40-percent-of-data-science-tasks-will-beautomated-by-2020.

[11] Moolayil, Jojo John, What is a data scientist?, Quora, April 27, 2015, https://www.quora.com/What-is-a-data-scientist-3.

[12] Smith, Stephen J., The Demise of the Data Warehouse, Eckerson Group, July 19, 2017, https://www.eckerson.com/articles/the-demise-of-the-data-warehouse.

[13] Harris, Jeremie, Why you shouldn’t be a data science generalist, Toward Data Science, November 1, 2018, https://towardsdatascience.com/why-you-shouldnt-be-a-data-science-generalist-f69ea37cdd2c.

[14] Columbus, Louis, IBM Predicts demand For Data Scientists Will Soar 28% By 2020, Forbes Magazine, May 13, 2017, https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-predicts-demand-for-data-scientists-will-soar-28-by-2020/#137cc51c7e3b.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.