Do I need a strong math background to pursue a career as a data scientist?
We see a lot of questions like this. Its hard when you're trying to break into the field to know exactly how much math & stats you need. And, part of the reason for that is that it really depends.
Firstly, it depends on how a company is defining "data scientist." Some companies say "data scientist" but really mean "data engineer", which is much more focused on the software engineering side of things and strong with coding production systems, data storage and extraction, cluster management etc. The latter is less Math/Stats and more CS focused.
Secondly it depends on how a company is dividing responsibilities. Some look for people who are either strong in programming or strong in mathematics/statistics, and then combine them in a team. Others look for "fully fledged" data scientists who have the the deep insight in different models and when to apply which algorithms and can do all the implementation of the data. How the role you're looking at fits into these descriptions will effect how much math/stats you need to demonstrate.
Given the variance, the trick is to carefully disect the job posting and dig into the background of the current team. LinkedIn is a great place to do this. You can generally figure out the different roles (job titles) as well as see the skills/background people in these roles have.
That said, there are a few mainstays that, irrespective of role, you should be demonstrating on your resume. Either through your academic courses/coursework, online courses you've taken, or project work you've completed (including write-ups that demonstrate your understanding). Specifically:
- Linear algebra (and ideally basic multivariate calculus)
- Regression ... linear regression and the things that violate the assumptions of linear models (e.g., autocorrelation in time series data, non-independent observations)
- Probability theory ... especially Bayes' Law and Central Limit Theorem
- Numerical analysis (e.g., time series analysis and forecasting)
- Core machine learning methods (clustering, decision trees, k-NN)
How to take action now?
Compare this list of mainstays versus your resume. Which do you cover off? Which are you missing? Of those, which have you used or are proficient with? Time to make space to mention them - and if it is via project work, think about linking to a more detailed write-up (for example on GitHub) so you can highlight a deeper level of understanding. This is especially important for non-Math/Stats candidates, as the burden of proof is higher! If you've covered more than the above, great! Make sure the most relevant courses shine through and get you noticed :)