We recently interviewed Nick Larusso, Data Scientist at Graphiq about the role they're currently hiring for .
Nick himself is a senior engineering manager, heading up the data engineering efforts on the engineering team at Graphiq. Nick joined the company in January 2013 after graduating from UCSB with a PhD in computer science in the area of data management and data mining. Nick received his Bachelors of Science in computer science and engineering from Ohio State University.
Prior to starting work at Graphiq, Nick held various engineering positions at Citrix Online, National Instruments, American Electric Power, and Intel.
Nick, thanks for taking the time to chat with us - we're excited to learn more about the role you're hiring for and any advice you have for candidates!
Let's get into it :) ...
Q. What's one thing that's really compelling about the role that doesn't necessarily come through in the job description?
Our engineering team has a huge degree of latitude around both what should be built as well as how to build it. Same goes for our data engineers and data scientists. We really get to act as product managers as well as engineers, which means you’re always working on really interesting problems and you control your own destiny. While this is an amazing opportunity, wearing both hats takes a lot of discipline, and not everyone can handle it.
Q. Out of all the requirements listed, which are the most critical for a candidate to possess? Why?
I think this largely depends on the culture of the team, so the requirements I’ll talk about are specific to Graphiq.
Competitive: we look for people who are motivated, driven, and willing to fight to win. Not physically, of course (unless it involves nerf guns), but we tackle some very difficult problems and sometimes come up against seemingly unsolvable technical challenges. When that happens, I’d much rather be surrounded by a group of people I want people on my team that are excited to solve the problem and will find a way to win.
Data driven: Good data engineers and scientists understand the importance of the scientific method (building hypotheses, experimenting, and using those results to make informed decisions). Software systems can be incredibly complicated and when you combine that with user behavior data you have yourself a system with levels of complexity that rivals the weather or human biology. With such complexity, it’s not uncommon to find unexpected results, so I’ve found that it is incredibly important that candidates be driven to understand through experimentation and verification rather than speculation.
Q. How do you typically read a data science resume? (i.e., Are there sections you skim, places you focus more time etc.)
I definitely scan the whole resume, below is brief outline of what I look for in each section.
Education: Here, I focus on the candidate’s major and their grades. High grades are good, but not required. If a candidate has lower grades, I will usually look for what extracurriculars or work they were involved in during school. My best advice here is to always list your GPA, even if it’s low, it’s better to have it than not. Missing GPAs are a red flag.
Work experience: I look at what internships and jobs the candidate held previously. I’m usually looking for the standard stuff here. Has the candidate had relevant experience? Does she hop around from job to job? What were her main accomplishments? I’ve found work experience to be a good indicator of candidate quality so I think this section is very important. With internships being quite common in today’s world, most new grads have something to talk about here. My advice here is to talk about your impact at each job, not just what you did or what machine learning methods or big data systems you were exposed to. I much prefer depth over breadth.
Academic & personal projects: Candidates will often combine these within one section, but I view them to be very different. Academic projects are things you work on as part of a course assignment, usually in teams, which does not include independent research! These projects are quite similar among candidates across universities, so they have very little discriminative power. I tend to scan this section very quickly or sometimes just skip it.
I find personal projects quite interesting. These are projects the candidate took on outside of their regular coursework, driven by the candidate herself because she was interested in a particular topic. These projects show that the candidate has areas of interest and enough drive to initiate a project in order to learn more or solve a problem they have personally faced. Both are great things to see in a candidate.
Personal projects are also fantastic because they give me something to talk about in phone interviews. Candidates are rarely more passionate then when they are discussing a project they initiated because of their own personal interests.
Technical skills: This is used mostly as a filter. The data science position attracts a wide range of candidates, everything from software engineers to pure statisticians. We have a very hands on group at Graphiq, so you’ll be implementing your solution in production rather than just prototyping all day, so you’ll definitely need to know how to code. I’m less concerned with the specific languages you know and more concerned that you know more than just R or Matlab. I usually give extra points for candidates that have experience in at least 1 static types language and 1 dynamic scripting language. This shows that they’ve taken the time to learn at least 2 languages and usually indicates they have some concept of using the right tool for the job (e.g. using Python to handle some simple file parsing rather than writing it in C++).
Q. What are the hallmarks of what you'd consider a "stand-out" data science resume?
I honestly haven’t found any yet. Resumes tell such a limited part of the story, so I have my doubts that such indicators would exist, but I have not given up on my search!
Q. What should candidates avoid doing on their resume?
One pattern I’ve observed recently is when a candidate lists out every Big Data or machine learning method/tool they have ever heard or read about. Usually these are the same candidates that fail to mention any results or impact from any of their projects. We look for people who want to have the ability to make an impact on the organization they choose to join, and I’ve found that failing to talk about the outcomes of past accomplishments is a really good indicator of a poor fit for Graphiq.
If you're interested in learning more about the role at Graphiq, you can find the full job posting here . Below is a summary ...
As a Data Scientist on Graphiq’s data team, you will eat, breath, and sleep data. You will be responsible for building full-fledged data products that help our product managers provide users with deeper insights and tell a more compelling story with our data. You will research, design, and implement robust methods for statistical analysis that can be used to help understand our billions of data points. You will build scalable solutions for analyzing our large and very connected knowledge graph.