Behind the Scenes at Dataiku - Interview with Data Scientist Matthieu Scordia

Behind the Scenes at Dataiku - Interview with Data Scientist Matthieu Scordia

We recently interviewed Matthieu Scordia, a Data Scientist at Dataiku about his background, work at Dataiku and the role they're currently hiring for.

Matthieu himself graduated with a master's degree in artificial intelligence from Pierre et Marie Curie University in Paris 3 years ago and then joined Dataiku right away, as an intern at first. He was their first employee! You can find him on twitter @mattsco

Dataiku Data Scientist Matthieu Scordia

Matthieu, thanks for taking the time to chat with us - we're excited to learn more about your work at Dataiku

Let's get into it, and first chat a bit more about your background :) ...

Q. How did you get interested in working with data?

I studied Artificial Intelligence at university, I’m a big fan of science fiction movies like Matrix so I thought that would be a fun subject to get into. During my masters degree, Machine Learning was the speciality I was the most interested in, so I started to play around with Kaggle competitions and it became a little bit of an addiction. University was very theoretical and it was hard for me to imagine all the possibilities of Data Science at the time. However the variety of Kaggle competitions really showed me the super power I was getting by mastering data.

Q. Was there a specific "aha" moment when you realized the power of data?

When I was working on Kaggle competitions I started to see all the powerful things that could be done with data. But it isn’t until I started working and developed my own project from scratch that I really saw what ‘working in data’ was all about. That first project I did was my very own movie recommender I developed with Data Science Studio. I started with a dataset of all the movies I’d seen in years, labelled those hundreds of movies with a star system, enriched that with data from online movie databases, and started trying out different algorithms to see how I could get my perfect movie recommendations!

I still use it today: I set up a fun interface to easily rate movies as I see them, and every week my project rates new releases and tells which I’m most likely to go to and which I’m most likely to enjoy. My girlfriend gets a little upset that I trust my algorithm more than her advice, but to this day I use it to pick my weekly movie outings!

This project had me working on all the different steps of a data project, from collecting data to cleaning it, enriching it, building and testing models to making webapps with the results and automating the process so the model gets better with every movie I see. It was a real aha moment to see that even outside of work, I just loved working on data and could make fun at things out of it as well!

That's great! Let's switch gears and talk a bit more about Dataiku

Q. What are the biggest areas of opportunity/questions you want to tackle at Dataiku?

The main opportunity for us today is to democratise data science in companies. We want to tackle problems like:

  • The fact that data scientist are in such demand that they’re hard to find, and companies often don’t know how to hire these types of profile yet.
  • Even once data scientists are hired, data preparation takes up SO MUCH of their time and it definitely isn’t the most fun part, or the part they bring the most value to.
  • There are a lots of incredible Big data technologies that are being developed, but they’re hard to assemble and often don’t work well together. (I suggest the great technoslavia blogpost)
  • Because of communication problems between different teams and different profiles (engineers vs data scientists vs marketers of managers) models are rarely deployed into production in the end and resources go to waste.

Q. What projects are you currently working on, and why/how are they interesting to you?

Right now I’m working for one of our favorite clients. With similar issues to an ecommerce platform, they want to optimise their conversion rates by doing recommandations, homepage personalisation, or email targeting. It’s great to work on concrete marketing issues to deliver value to their end user.

But what interests me the most right now is that period when the model is done and the data flow has been tested, and I can help our client to put it into production. The client starts A/B testing the model and sees how great it’s going, it’s awesome!

Q. What tools, techniques, programming languages etc do you use on a regular basis? Has this changed at all recently?

As you can imagine, I only work with Data Science Studio, the platform we developed at Dataiku. I get to test it and suggest new features to make the jobs of data scientists like me easier. DSS is great because I get to work with all the technologies and tools I like, integrated in the software: Python, SQL, R (if you want to), as well as Hadoop and Spark. I’m a python guy myself so I love to code and analyse data in Jupyter notebooks. Recently I started to learn javascript and d3js to make cool data visualisation. I love building webapps to share results and have fun with them!

Q. What has been the most surprising insight you have found?

One time, I was working for a client, and we noticed that their were curious anomalies in a lot of the timestamp data. We looked into and found that the data was collected by a fleet of cellphones, and the people collecting the data had been manually changing the time on their device, which was weird. The reason they were doing this is because they were playing Candy Crush on the phones, and to get more lives they went “back in time” on their smartphones, altering the data we were working on! I love the investigative part of working with data, it always comes down to people in the end.

That's amazing! Incredible what lengths people will go to to win at Candy Crush! :) ... Ok, let's chat a bit about the role you're currently hiring for at Dataiku...

Q. What's one thing that's really compelling about the role that doesn't necessarily come through in the job description?

We are looking for passionate people and candidates who are proactive and have lots of ideas to contribute. It’s a major part of the job to love the tool Data Science Studio and want to contribute to making it better. Our boss always says that he only recruits people that are smarter than him, from all kinds of different backgrounds, and that makes going to working every day so much fun!

Q. Out of all the requirements listed, which are the most critical for a candidate to possess? Why?

As a data scientist you need to have a real solid knowledge of machine learning and be able to code in Python or R fluently. And if you also spend time building compelling stories around data, or spend all your free time on Kaggle for instance, you’re probably a great candidate for us. You have to like playing with data, have a bit of a hacker mindset.

Q. How do you typically read a data science resume? (i.e., Are there sections you skim, places you focus more time etc.)

Well before we even start reading the resume, we reply to candidates by sending them a data science exercise, like a small Kaggle competition. Usually, the candidate has to make predictions on a small dataset, using the language of his choice.

Only about half of the candidate send us the exercise back. Then, we especially look at how the candidate handles difficulties, what hacks he finds to avoid them. This is a great way to gouge a data scientist’s real value, before even looking at his studies or his background. After that we have him come in for a job interview to see how he could fit in with the team, as well as interact with clients.

Q. What should candidates avoid doing on their resume?

Tell us you graduated from Stanford instead of admitting you just completed Andrew NG’s Coursera MOOC. True story, candidates have done this.

Matthieu, thanks for spending the time chatting with us. We really enjoyed learning more about your work at Dataiku, and your passion for everything data certainly shines through :)

Readers, if you're interested in learning more about the role at Dataiku, you can find the full job posting here.

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.