We recently caught up with Ravi Parikh, Co-Founder of Heap, (YC W13) which is harnessing the power of Big Data to modernize web and iOS analytics...
Hi Ravi, firstly thank you for the interview. Let's start with your background...
Q - What is your 30 second bio?
A - I studied computer science at Stanford, where I did research with Professor Jeff Heer on data visualization. In 2012 I co-founded Heap, a user analytics company, and I've been working on that since. I also do quite a bit of data visualization work independently.
Q - What was the first data set you remember working with? What did you do with it?
A - When I was young I was really enjoyed filling out March Madness brackets. I loved poring over statistics and historical trends in order to "engineer" a perfect bracket. Ironically though the only bracket I ever filled out that did well was one where I didn't do any analysis and instead put my hometown team in the finals.
One of the lessons I learned from all the analysis I did was the importance of avoiding "data dredging" - the practice of blindly mining data to find relationships. If you look long enough and hard enough at a large set of data you'll find plenty of seemingly interesting relationships that are just products of random chance. It's important to be disciplined and use methods like multiple hypothesis testing correction to avoid being misled.
Q - Was there a specific "aha" moment when you realized the power of data?
A - For me that "aha" moment was when I learned about Anscombe's quartet. It's a group of four datasets each of which consist of several (x,y) pairs. Each of these datasets has the same mean of x, mean of y, variance of x, variance of y, x/y correlation, and the same linear regression line. Basically many of the "standard" summary statistics we might use to characterize these datasets are identical for all four. However, when visualized, each of the four datasets yield significantly different results. This was when I truly understood that asking deeper questions about data and visualizing data is incredibly important and powerful.
Ravi, very interesting background and context - thank you for sharing! Next, let's talk more about what you are working on at Heap.
Q - What specific problem is Heap trying to solve? How would you describe it to someone who is not familiar with it?
A - Heap is web and iOS analytics tool that captures every user interaction on a website or mobile app: every click, form submission, pageview, tap, swipe, etc. Instead of having to write tracking code, Heap captures everything upfront and lets you analyze it later. When you want to answer a question with data, you can do it immediately, instead of writing code, deploying it, and waiting for metrics to trickle in.
Q - How did you come to found Heap?
A - Matin Movassate, my co-founder at Heap, had the initial idea. He used to work at Facebook as a product manager. To make any data-driven decision, he was forced to figure out what he wanted to track, ask a developer to write event tracking code, wait for the next product release cycle, wait for data to trickle in, and then finally have an answer. This is a process that could take weeks or months just to answer simple questions like "How many people are using the messages feature?" We decided to build Heap to eliminate that entire cycle.
Q - What have you been working on this year, and why/how is it interesting to you?
A - One of the coolest things I've built at Heap is the iOS tracking library, which automatically grabs touch and gesture events on mobile apps. Figuring out how to automatically capture event data from iOS apps while taking into account performance and network overhead was a fun challenge.
Q - What has been the most surprising insight you have found?
A - We built Heap because we, as developers, were frustrated with the current state of the art in analytics. However we've found that our approach to tracking data without writing code has enabled product managers, marketers, and other non-technical folks to conduct end-to-end analysis on their data. We're looking forward to a future where anyone can be a data scientist.
Q - What technology are you using?
A - Our stack is Node + Redis + Postgres + Backbone + D3. Some things we're working on:
- Data capture. We're integrating with more clients and frameworks, including Android, AngularJS, and Backbone.js, all with virtually no performance overhead or integration cost.
- Real-time infrastructure. We support an expressive set of queries that allow our users to slice and dice the data in arbitrary ways. The results need to come back with sub-second latencies and reflect up-to-the-minute data.
- Data visualization. Simple pre-generated graphs just don't cut it. There's an enormous number of ways to organize the data. Existing tools only scratch the surface.
Q - What else should we know about Heap?
A - Heap is a small, engineering-focused company with a growing user base. We collect orders of magnitude more data than other analytics products, and it's a complex technical problem to store and analyze that volume of data.
Editor Note - If you are interested in more detail on some of the neat properties of Heap's approach, this Quora discussion is very insightful. Here are a few highlights:
Automatically retroactive - Heap captures all raw interactions since install time, so your analysis isn't constrained by events you remembered to log upfront.
Super granular - You can drill down into a cohort of users (or a specific user) and visualize their precise path through your app... You can define cohorts (without shipping code) as things like "users who added items to their shopping cart but never checked out".
Untouched application code - As the surface areas of your application increases, sprinkling tracking/logging calls across your app can be error-prone and difficult to manage. Heap entirely decouples analytics from development.
Editor Note - Back to the interview!...
Finally let's talk a little about the YC experience, helpful resources and the future of Web/Mobile Analytics...
Q - How did you find the YC experience? What was most surprising?
A - YC was an amazing experience. There's not much I can say about it that hasn't already been said more eloquently by someone smarter than me, but I will reiterate that anyone in the early stages of building a technology company should consider it very strongly. I was most surprised by the incredibly high caliber of everyone else in my batch. If nothing else, YC puts you in proximity with other talented and unique people.
Q - What publications, websites, blogs, conferences and/or books do you read/attend that are helpful to your work?
A - I follow a number of blogs, websites, and people who are always teaching me new things about data science and visualization. The NYTimes graphics department is incredibly high-quality and staffed with some very impressive people. The Economist also puts out a daily chart which, while simple, are well done and insightful. Visualizing.org is a great website that hosts challenges for any visualization designer to hone their skills. One website that I'm looking forward to following once it's up and running again is Nate Silver's FiveThirtyEight.com, which is currently in the process of relaunching.
Q - What does the future of Web/Mobile Analytics look like?
A - We're moving towards a more integrated future. Currently the landscape is fragmented. A large, modern organization takes advantage of hundreds of disparate data sources, but the real power comes from integrating these and finding deeper insights that way.
Q - What is something a smallish number of people know about that you think will be huge in the future?
A - It's probably not fair to say a small number of people know about this, but I'm incredibly excited about the future of bioinformatics. The cost of genome sequencing and other technologies is dropping rapidly, and we're on the verge of an explosion in the amount of data that researchers will have access to.
Q - Any words of wisdom for Data Science students or practitioners starting out?
A - Get your hands dirty. There's no faster way to learn than finding an interesting data set and playing around with it.
Ravi - Thank you so much for your time! Really enjoyed learning more about your background and what you are building at Heap.
Heap can be found online at https://heapanalytics.com and Ravi Parikh @ravisparikh.
Readers, thanks for joining us!
P.S.If you enjoyed this interview and want to learn more about
- what it takes to become a data scientist
- what skills do I need
- what type of work is currently being done in the field