We recently caught up with Joseph Misiti, co-founder of Math & Pencil, SocialQ and more! We were keen to learn more about his background, his work at SocialQ, and thoughts on how Data Science is evolving. Also, given his thought-provoking article "Why becoming a data scientist is NOT actually easier than you think", we were keen to garner his advice on how best to enter the field…
Hi Joseph, firstly thank you for the interview. Let's start with your background and how you became interested in working with data...
Q - What is your 30 second bio?
A - I'm 31 years old and live in New York City. I hold a BS in Electrical Engineering which focused on signal processing and numerical analysis, and an MS in Applied Math with a focus on computer vision, data mining and wavelet analysis. I started out in DOD, building SATCOM radios at Harris Corporation, moved on to missile defense algorithms at Lockheed Martin, and capped my work in that sector with building a lie detector (Thin Slice Detector if you have read the book Blink) using computer vision and wavelet analysis. I moved to New York City three years ago and started a consultancy called Math & Pencil which is behind start ups including:
Q - How did you get interested in working with data?
A - I have always loved math and computer science, so analyzing data was a natural next step. I suppose I really got excited after studying numerical solutions to partial differential equations while an undergrad, because that was the first time I really saw the power of computer modeling/applied mathematics.
Q - So, what was the first data set you remember working with? What did you do with it?
A - The first data set I can recall playing with (I think) was the Hair-Eye Color data set. I was using it in an introduction to statistics course in undergrad to learn about linear regressions, coefficients, p-values, etc
Q - Was there a specific "aha" moment when you realized the power of data?
A - When I was working for Lockheed Martin, we used Kalman filters to model errors in missile measurements. The actual algorithms are fairly simple from a math perspective, but when applied to missile data, were actually really good at predicting errors in the measurements. Removing these errors would theoretically reduce friendly fire, so I would say this was the first time I saw a simple algorithm applied to a real-life data set that could literally save lives. It was pretty amazing at the time.
Q - Wow, that's very powerful! … On that note, what excites you most about recent developments in Data Science?
A - The most exciting thing about data science in my opinion is the open source movement combined with Amazon EC2 prices dropping. For the first time, I do not need to have access to or purchase a cluster of computers to run an experiment. With a lot of recent developments in deep learning being run on GPUs rather than CPUs, I can very easily rent a GPU instance on EC2, install the software, and use an open source library written in Python like pylearn2 to test out an hypothesis.
The open source movement in general is really amazing. I would say mostly because of the rise in popularity of Github, it’s very easy to contribute to projects now. For instance, I created an open source project last month called awesome-machine-learning which is basically a list of all the machine learning resources on the web. Within a few weeks, over 2.9K people had starred it and I have had 48 contributors help me out. If you step back and think about it, this is really amazing (and most of us just take it for granted).
It is amazing - and that is a terrific resource you've put together - thanks! Let's switch gears and talk about your current work at SocialQ...
Q - What attracted you to the intersection of Data Science and Social Media? Where can Data Science create most value?
A - I was originally attracted to the idea of building a company with the potential of using my machine learning skill set, but also realized the company would have to have some level of success to get there. SocialQ started as a SaaS based tool to help marketing researchers dig into their social data via a few dashboards. After a few years, we have built up a rather larger data set and we are now able to offer statistical tools. Also, we work directly with customers/marketing researchers to figure out what type of questions they want answered, and then come up with statistical solutions to these problems. It has been a really interesting learning experience.
Q - So what specifically led you to found SocialQ? What problem does SocialQ solve?
A - One of the problems SocialQ solves is what marketing researchers can do with their social data, when they don't necessarily have comprehensive tools or the math background to make sense of what they've collected. We have created a platform that not only helps them answer those questions, but also makes the collection of the data easier. It is bundled into a set of SasS based tools so the researcher can initiate a study and then login in the next day and see the results.
Q - What are the biggest areas of opportunity/questions you want to tackle?
A - I am interested in helping companies improve their brands on social media using mathematics. There are a lot of different ways to do that, but that is the problem area I am trying to tackle currently.
Q - What learnings/skills from your time building data models for the US intelligence community are most applicable in your current role?
A - The skill set I was using previously is still being applied today in my day-to-day, the only difference is the features I'm extracting. Computer vision features are very specific to the field, (SIFT, wavelets, etc). My new job requires more NLP techniques, but a lot of the algorithms I am using are the same (SVMs,PCA, kmeans, etc). I have more freedom now, because the code I am using is not owned by the US government and its contractors. I can download a package of Github and start playing around without having to go through six different people for approval.
Q - Makes sense! So what projects are you currently working on, and why/how are they interesting to you?
A - Recently I have accepted roles as a consultant for a few interesting companies. One was using NLP to built classifiers/recommendations around RSS feed data set for a start-up, another is a computer vision problem involving human hand writing. In my spare time, I have been studying a lot of Bayesian Inference and reading papers and listening to lectures on Deep Learning. I find almost all aspects of statistics/math interesting so any chance I get where someone will compensate me to solve such a problem, pending I have the time and interest in the query, I'm in.
Q - And how is data science helping? What techniques, models, software etc are you using?
A - For NLP I have been using a lot of Latent Dirichlet allocation to build features. For computer vision, it has been a lot of OpenCV and almost all classifiers are trained in Scikit-Learn. All data analysis, pre-processing, and exploratory statistical analysis is using iPython notebooks, Pandas, Statsmodels, and either Matplotlib or ggplot2. Basically the same toolset everyone else is using that has moved on from R.
Q - How do you/your group work with the rest of the team?
A - My technical team consists of two other engineers and one designer. The designer is Tim Hungerford, and the two engineers are Andrew Misiti and Scott Stafford. I have worked with them so long at this point that everything just seems to work, but interestingly enough, all of us work remotely, with Andrew and Tim based in Buffalo, Scott based in DC, and myself based in Manhattan. We do all our work using a combination of Github issues, campfire, Gmail/hangouts, and sometimes (although rarely these days) Asana.
Thanks for sharing all that detail - very interesting! Finally, let's talk a bit about the future and share some advice … ...
Q - What does the future of Data Science look like?
A - It's exciting. I think the open source movement around data science will continue to be the driving force. I am excited about new languages like Julia and frameworks like Scikit-learn/Pandas/iPython. I think more and more, we are going to see researchers moving from R/MATLAB to Python. I am not an MATLAB/R hater at all, but I do think Python is much easier to work with, document, and read. In the end, the reason for not using R/MATLAB anymore is simply because you cannot integrate them into a web service - unless it's some simple exploratory analysis, you're going to need to eventually convert them into another language anyways, so why not just avoid that step (also MATLAB isn’t always cost effective).
I think because the demand for data science is increasing, more smart people are going to go into it. I also think long-term, it will solve a lot of larger problems like bringing down heath care costs, reducing government waste, detecting government/private company fraud etc.
Q - Any words of wisdom for Data Science students or practitioners starting out?
A - Never stop learning. Take as many math and statistics courses as you can, and implement all the basic algorithms on your own before you use open source versions. I am only advocating "reinventing the wheel" because I truly believe the only way to understand something is to build it from scratch, even though it is extremely time consuming.
Also, write blog posts and open source code, because that is what potential employers in the future are going to want to see.
Finally, do not be afraid to learn this stuff on your own rather than going back to school for it. If you already have an MS, I would save the money and invest in yourself, the job is more about getting meaningful results, not the school or degree you have, especially to me as an employer. You can see this by looking at the wide range of backgrounds employed data scientists have.
Joseph - Thank you ever so much for your time! Really enjoyed learning more about your background, your work at SocialQ, and your thoughts on how Data Science is evolving. Good luck with all your ongoing projects!
P.S.If you enjoyed this interview and want to learn more about
- what it takes to become a data scientist
- what skills do I need
- what type of work is currently being done in the field