[NOTE: This classification term, human centric & investigational skills, was provided to me by Bridget Cogley, Co-Founder and Chief Visualization Officer for Versalytix in 2018. Neither of us liked the term “soft skills” since these skills are often difficult to achieve and more inherent in some people than others, and not everyone can easily learn them. We also did not like the term “non-technical skills” as these skills have an essential cohesion and symmetry with a data scientist’s technical skills.]
CAVEAT: This is from a blog post I wrote 7 years ago. It was written with a Data Scientist role in mind, but it can really apply to any technical role, and I believe most of it still holds up, but I am open to enhancements and other topics you feel are part of the set of “soft skills.”
#1 – Data Ethics
I probably could have started with interpersonal skills or communication skills, but if you lack data ethics, regardless of the number of degrees you have or the number of years you have worked in the profession, you may have the data scientist title, but you are no data scientist.
According to Hilary Mason, a Data Scientist in Residence at Accel, she noted three key challenges facing the data science community.
- Imprecise Ethics
- No standards of practice
- A lack of a consistent vocabulary
We work in a profession with a great deal of uncertainty. Too often, our interactions with our business communities are determined by the algorithms and machine learning developed by data scientists.
Omoju Miller, the senior machine learning data scientist at GitHub, noted in a Harvard Business Review interview by Hugo Browne-Anderson:
We need to have that ethical understanding, we need to have that training, and we need to have something akin to a Hippocratic oath. And we need to actually have proper licenses so that if you actually do something unethical, perhaps you have some kind of penalty, or disbarment, or some kind of recourse, something to say this is not what we want to do as an industry, and then figure out ways to remediate people who go off the rails and do things because people just aren’t trained and they don’t know.
#2 – You really need to understand the data
“I am drowning in data, yet I am starving” – Unknown
Data scientist often deal with very large amounts of data. Often times, we have a lot of data about a subject area, but it is in a form that it is not consumable. As a Data Architect by profession, over the years, I am often surprised when I ask a business partner questions related to their data and they are unsure or do not have an answer to my question. This is not something I am just now experiencing, but something I have experienced over the past 45+ years in this profession.
Referring back to Hilary Mason’s three major concerns, I have often walked into situations where no documentation exists, no data dictionary exists, no abbreviation lists (her version of a vocabulary), no data models, no data lineage, etc.
I often use the telephone area codes as an example when discussing the importance of understanding your data. Back in the early 1980s, if I wanted to validate an area code (I coded in COBOL back then), I could depend on checking the middle digit to be a zero or a one. Arizona only had one area code back then, 602. Today, Arizona has five area codes: 480, 520, 602, 623, and 928. My old validation rule will no longer work. I remember when we first had to convert full telephone numbers to the new nomenclature, which was very messy.
As a data scientist, you must be curious. I would even dare to say, you must be more curious about your business partner’s data then they are! It is O.K. to let your guard down and say, “I don’t understand this. Please explain it to me.” Often times, they don’t know the answers to questions about their data either. Your non-functional requirements are as equally important as your functional ones.
Curiosity causes knowledge to occur. I spend most of my time discovering and prepping data. This is starting to change with what I refer to as self-service ETL tools such as Alteryx and Tableau Prep.
#3 – You Have to Read!
I have always been a voracious reader. It sometimes drives my wife crazy all the magazine subscriptions I have. I have always tried to be aware of the latest technologies and trends related to data science.
It is difficult to stay on top of everything related to data science. It embarrasses me to say this, but I am a slow reader. When I was young, my father encouraged me to read a paragraph at a time. He told me once I understood what I just read, to go on to the next paragraph. Repeat. And continue. My life as a life-long reader had begun.
Today, we have the Internet at our disposal. There are a lot of excellent sites to find articles about almost every subject imageable. There is also a lot of “noise” on the Internet. You must train yourself to determine what is valuable to read and what is just noise. I don’t personally have any guidance I can offer here how to discern the two, but it is a kind of gut feel for me when I find something of great value to read, and what is just a vendor preening or someone spewing a biased beef.
I recommend you settle on some key focus areas such as data preparation, data visualization, key statistical concepts, machine learning or Tableau. In my personal technical learnings, the Tableau community is a prolific community. You will find tricks & tips, how to articles, deep discussions on data visualization philosophy, etc. As you read, visit the sites of some of the products they are discussing and read the product information, its capabilities, etc. Even if you probably would not buy or be using that product soon, you at least have the knowledge about it in your “bags of tricks.”
#4 – Know Your Business!
In a previous life, I use to do a lot of interviews to help our HR Department, for hiring application developers and data modelers. I am going to let you know my favorite question I asked on these interviews. It is:
How does your current company make a profit?
Out of, let’s say, 50 of these interviews over the years, I only had one person who was able to answer this question. Typically, they would respond by telling me about an application they just developed or some key reports they had created for senior management. I would tell them I understand that is what you did from the technical side, but I want to know your understanding of how your company makes money!
So, it is always important that you understand the business of your company. I recommend you read your company’s annual reports, their product descriptions, know the features and advantages of these products, competitive intelligence, etc.
I personally set my goal, when I am working with a department at work, to know as much or more about their department then the people I am working with. Perhaps it is an unrealistic goal, but often times, I come pretty darn close.
In terms of the data science aspect of the business, you should be able to discern which problems the business considers critical. In addition, you should always be thinking of new ways for your business partners to leverage their data to make actionable decisions.
To be able to do this, you must understand how the problems you solve can impact the business. Therefore, as I mentioned earlier, you need to know how your business operates, what it needs to do to make a profit, your business partner’s “pain points”, and what you can do to steer them in the right direction.
#5 – Communication Skills
First, check your technical jargon at the door. You are talking to your business partners, and if you want to get their attention, you need to talk in their language. You need to take all that fancy tech-speak and translate it to language your business partners understand and care about. Think of the departments within your company and the types of things they are interested in. For example, when I talk to our Finance Department, they are interested in being able to quickly access information about next year’s budget, expenditures by department, how much revenue has come in so far this year, and which departments are bringing in the most revenue. A data scientist must be able to provide quantified insights to their business partners for them to be able to make actionable decisions on a timely basis.
Storytelling is a highly desirable skill for data scientists. Being able to weave a compelling story around the business partner’s data will draw them in to wanting to know more details about the story. Also, it will work to help facilitate questions from them, which may help bring new questions and data needs to the surface. Proper use of data visualization and infographics are great tools to convey large amounts of data in understandable, digestible visuals that can quickly be consumed by your business partners, so they gain the knowledge they need to make actionable decisions.
#6 – Be a Team Player
A data scientist does not work in a vacuum. Nor should they want to. In most cases, you will work with individuals within your organization from company executives, product managers, department heads to staff-level employees. Often, you will even have to work with external customers, or in my case, citizens within our City.
When I first entered the IT profession in the late 1970s, I literally worked in the cold, damp basement of the building. A separate group (non-IT) gathered requirements and told us what reports to develop. Once I completed coding the report, I turned that over to the person who gathered the requirements to present to our business partners. If there were any changes, we would have to repeat this entire process over again (often several times) until they were satisfied with the report. Being able to talk directly to the business partners was a no-no for coders. We have learned over the years that this Waterfall, coder in the basement method of development does not work well, and more dynamic, iterative methods will better serve our business partners.
Fortunately, the methods used in IT have matured as well as the business partner’s perception of the IT professional. Iterative, collaborative processes now have IT personnel directly talking to the business partners which helps them hear the needs of the business, as well as being able to ask questions in real-time. Now, this notion of not talking directly to the business may sound silly or antiquated to some of you, but 40 years ago, programmers were relegated to the basement and just coded. Back then, most coders would have loved to talk to the business to get the requirements right the first time versus using a “middleman” to go back-and-forth. Times have changed, IT is more sophisticated, and the data scientist is now front and center with the business partners. Don’t take for granted the evolution of the IT profession to get us where we are today. These relationships with your business partners are critical to your very existence. Everyone around you is part of your team. Treat everyone with respect, remember the information they have is important, and what subject matter knowledge they know is essential to how successful you will be as a data scientist. To coin an old phrase, there is no “I” in “Team.”