NOTE: In my regular evening trolling of blogs related to data visualization, I came across a blog by Jalil Farid called [physics. mathematics. sports. technology. fun]. In particular, I liked his thoughts on what he perceived a data scientist to be. So today, I am showcasing portions of his blog on this topic.
Jalil drew the above image for his boss sometime in mid-January of this year. He was trying to explain how the term “Data Scientist” is used for a bunch of industry roles that have varying skill-sets, and he tried to organize what he had seen and based on his knowledge of mathematics, he wanted to organize his findings. The image above is what Jalil created to explain some of what he had seen in the short time he had spent exploring the field of Data Scientist.
The most striking thing he noticed was the comparibility to physics. Except the beginnings of physics had large amounts of mathematics, and no connection to data. Data science appears to be a large amount of data, but we’re just starting to connect it to mathematics. Now physics seems happily divided between two sets of thought processes. These separations are two distinct scientific “directions” on the same road that culminate to bring us the awesome science we see today. Experimentalists are the people who design/engineer experiments that produce quality data. He talked about physicists that make a living coding in Fortran, who may occasionally build electronic components from scratch, can quickly set up a numerical solution to a PDE if its bounded, and who are notoriously connected with quality data with lots of floating point precision (they have to know what floating point precision is). But then there are some physicists who know what a Lebesgue integral is, who work with functions that are defined at every point, yet are differentiable nowhere. These guys are great at group theory, and seek to find and enjoy calculus of variations and solving Green’s functions.
So Mr. Farid posed the question of whether data science will go the same way? Will a distinction form between people who can communicate to serious engineers to monetize applications and keep the field going, and people who can communicate with mathematicians who can continue to develop new ideas? Will there be more interest in stochastic modeling and mathematically proving why this model simulates data nicely? Will there be continued growth in technologies that achieve high performance computing requirements for the big data problems of today?
Time will tell…
Jalil’s blog is http://www.jalilfarid.com/