How To Start A Data Science Project When You Are A Beginner

How To Start A Data Science Project When You Are A Beginner


You know you should have some data science projects on your resume/portfolio to show what you know. The only problem is that although you've taken some intro courses at your school, gone through some MOOC's, and read a few blog posts, when you look to other people's work you think it's out of your league.

You are trying to break into the field

You want to start working on a data set, yet you're not quite sure what to do with it. At this point, you have some ideas, but you're worried that they're very basic or simplistic. You just want to get your feet wet and learn by doing while proving abilities to a future employer. Nearly everything you've learned you've taught yourself out of curiosity so you want to start working on something, anything really.

Though you've gotten great advice, it's still hard to know where to start

Most of the advice you have been given regarding starting data science and building a portfolio falls into three buckets: a) to go to Kaggle, b) find a data set you like, and c) thinking of questions you want answered and then answer them using data science. These are all great approaches to learning data science by doing. The only issue is that since, you're just starting out, it's hard to know where to really start or what to do once you have a data set or are on a website with data sets. Further, at this point, you are still learning data science so it's not like you can build a super-sophisticated model and call it a day.

Make a data visualization!

Regardless of the data size and how nice it looks, you don't need some state-of-the-art ML algorithm to get something useful out of it. You can start as simply (and you should!) as make a data visualization of the data. If it's too big, then you can choose a part of the data and visualize the part you chose.

Once you've done one type of visualization, you can make a few other types too. By thinking through how to make a visualization (do you have text, numerical, nominal, categorical, range, etc... values) you'll be a few steps closer to understanding the data.

As you are just starting the important thing to do is to think of questions and check if it's possible to answer them from the data. This way you can check what insights you can get from the data. For instance - are there any outliers in the data you visualized? Are there any interesting things you found just by looking at the visualization? Can you start getting what summary statistics would look like - variance, standard deviation, means, etc?

Data Visualization leads to questions which lead to doing and learning deeper Data Science

You can start simply by making a data visualization and go from there. The questions of your data and data visualization will lead to some answers. Then those answers will lead to more questions you can try to answer. As you iterate through this process, you'll find yourself asking some questions that will need some math, statistics, computer science, and data science to answer them. Once you've started to figure this out, you can then go and explore the internet, books, and blogs to figure out the next steps.

Start with data visualization because it's the easiest and highest win you can achieve without learning a ton of new material and you can use tools already available on your computer or the web.

So the next time you see or receive advice about finding a data set, going to Kaggle, or thinking of questions, your mind should instantly start thinking about how you are going to visualize the data set. This way you can start by asking and answering simple questions and as you go through the process you will learn more and think of deeper questions to ask and answer.

Good luck and start visualizing your data today!

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.