The first 90 days in my first data science role: Most important lessons retrospective
I previously wrote about how I first entered the data science field. This is a follow up article on what happens next: how one can make the most of one’s new role, and hit the ground running. Inspired by some questions from my mentoring sessions on this topic, I outline examples of (sometimes unexpected) things that helped me at the time. Hindsight is 20/20. (Happy 2021, by the way!)
- Learning the business logic is necessary to even begin contributing
- Try things and ask lots of questions
- Translate algorithms to plain English words
Learning the business logic is necessary to even begin contributing
When I started my first data scientist role, I was excited yet nervous. I didn’t find Python coding to be a potential issue, since I had already worked with it in game development, and had passed the interviews which involved Python, after all…
The thing is, to begin providing any value as a data scientist, I had to figure out how to access relevant data, navigate the data warehouse, and join different tables to build relevant datasets, in the first place.
It’s not like school or Kaggle, with some pre-built dataset, or my student web-scraping projects where I controlled all the logic. For the most part we had to construct the data we needed, from what was available in the very vast data warehouse. Big thanks to the data engineering team for ensuring that it existed in the first place.
In the first few weeks I was pretty much dead weight while I learned these things, bombarding my onboarding buddy with questions such as:
- “Hey, what’s the table name for Ontario [line of business]?”
- “Is the column [customer ID] the same as [another name for the same ID, but in another table]?”
- “Do I subtract a day from this date column to get the real time range I need?”
Even months (and years) later, I was still learning new bits and pieces of logic behind the data. For example, knowing where to find a internal Wiki page about data outages to explain why there were less records for some month in some table.
This process was something I hadn’t expected at the time as a brand new data scientist. Once again, I was reminded that in the real world, the data isn’t pre-made like the Iris or Titanic. Neither was it like my small scale web-scraped datasets in my student days.
I am grateful for the onboarding process at the time for pairing me up with a buddy for my first modeling project, as well as amazingly patient coworkers. In terms of what this means for teams, is that providing this contextual knowledge is a necessary investment so that the new hire gets up to speed and starts contributing sooner.
To my past self, it was eye opening knowing that “hey, I can code! I know stats!” doesn’t cut it at all as a data scientist. In subsequent roles, I once again had to go through this process of learning business meaning before providing much value in terms of data science. And now, as my perspective starts flipping to being one that provides this contextual knowledge, I keep in mind that it’s all worth it.
Try things and ask lots of questions
This rule applies not only in the first 90 days, but forever in one’s career - but the earlier one realizes this, the better.
One behavior that was essential for me to start contributing to projects was asking questions. I know what it’s like to feel embarrassed for asking a “silly” question, but in order for the team to deliver, it’s more productive to simply set aside those thoughts, as it benefits… no one.
With the amount of business logic to learn, no experienced person on the team would expect a new hire to know it all. As it is, it’s likely an experienced person is still learning as well, as I mention in the previous section.
A rule of thumb I use is to at least try some things out before asking. People don’t like it when someone doesn’t first spend any effort at all to find a solution, but the point is they will understand if you’ve tried and didn’t find or recall the answer.
It’s much more productive to ask “Hey, what’s the table name for Ontario [line of business]?” when you can also say “I’ve tried looking at the Wiki with table names for Quebec, and tried guessing the Ontario table name based on the naming convention. But it didn’t show up.” (Purely example: the data warehouse at that role had 1000s of tables, so by selecting all the table names I might not have gotten anywhere. And as a newbie I didn’t know how to do that.)
This isn’t just lip service. As a new hire, if I at least try to use resources at my disposal, even if it’s low hanging fruit, it helps me practice getting information on my own in the future. This is quite valuable, and builds independence.
The key here isn’t to spend all day trying to figure something out with the limited knowledge at the top of my head, which probably isn’t comprehensive anyway, as a new hire. I tend to box the time I can spend on figuring something out, before simply reaching out to ask.
The time-box varies: for trivial things like table names, I might try a quick search in the database, and for code I might use a little more time to try a few Stack Overflow solutions.
Having attempted a few approaches also saves the time of the person I ask: perhaps they would also suggest the first solution that came to mind, which I would already know doesn’t work since I had tried it. This helps the coworker being asked to brainstorm other possible solutions apart from the most obvious, which is a better use of problem solving time.
All in all, trying things and asking questions to unblock oneself is important to be an effective developer. Even as a senior developer I have no qualms about asking “silly” questions, after attempting some approaches; it’s not embarrassing most of the time since asking helps make the project progress faster, in the grand scheme of things.
Translate algorithms to plain English words
Don’t be the person walking around the office mumbling things like “I did ALS (Alternating least squares)”. Let me explain what this means…
We as data scientists enjoy chatting about niche optimizations and academic journals at work, and that’s fine. In fact, the group I co-run (outside of work), Aggregate Intellect, invites researchers to speak about their work on machine learning at a highly technical level, and has 12k+ YouTube subscribers.
But even within these groups that love nerd talk, I know that the best way in a workplace, where there are limited attention resources, to get someone to care about my work, is to describe these “cool things” in plain English.
To be frank, in my first 90 days as a data scientist, I was the person walking around mumbling algorithm names. It does take more than 90 days to cultivate the communication skills I mention here, but I note it in hopes that one can start developing those skills earlier!
Along with an algorithm and what it does in the mathematical and computational sense, I’ve learned to also mention what it’s actually doing in the business (e.g. recommending something on a website to increase user engagement).
Otherwise, it’s easy for one’s project to fall into obscurity. I’ve seen plenty of teams make POCs (proof of concepts) that end up abandoned, not because they might not be useful with some more effort, but because the communication about their impacts didn’t come across. I think this is one thing a lot of highly technical people make the mistake of doing.
It’s invaluable to describe technical work in ways that help anyone see the value of having a person stationed on the project. Otherwise, it’s possible for one to be pulled off a project they hold dear, with the repository becoming a code graveyard.
Of course, if the project doesn’t make sense, no amount of communication will help turn it around. The point being, if one has poor communication in terms of technical topics, it often causes unnecessary harm to their own projects, and impedes the ability to work on the better projects in the company, which in turn might impede career growth.
It’s worth spending time on communication - I’ve been able to get on large stages internally and externally due to this, despite having relatively low tenure at the time. I’ve written about a technique I use in this post about data science storytelling.
As I continue learning in my career as a Principal data scientist, I’ve been reflecting on what I’ve experienced when I began this journey, especially what I (accidentally) did well. Hindsight is 20/20. I do have many more thoughts and observations on how to make the most of one’s data science learning, and will be gathering more material for a future article on this topic.
I hope that some of these thoughts were helpful! As usual, you can find me on LinkedIn or email@example.com to discuss this post.