We recently caught up with Chul Lee, Director of Data Engineering & Science at MyFitnessPal. We were keen to learn more about his background, how Data Science is shaping the Health and Fitness industry and what he is working on at MyFitnessPal...
Hi Chul, firstly thank you for the interview. Let's start with your background and how you became interested in working with data.
Q - What is your 30 second bio?
A - I am highly multi-cultural: born in South Korea, grew up in Mexico, and received my high education in Canada. I obtained my Ph.D. in CS from the University of Toronto specializing in Web Algorithms. After my Ph.D, I co-founded a social news aggregation startup, Thoora. I joined LinkedIn with other Thoora members in early 2012. At LinkedIn, I led a team that operated several relevance engines for LinkedIn's content products. I became the head of data engineering & science at MyFitnessPal just 3 months ago.
Q - How did you get interested in working with data?
A - I have been always amazed by the power of computing and data in general. At the same time, I have always appreciated the beauty of mathematics. I originally studied mathematics at college but eventually switched to computer science once I realized that computer science would allow me to pursue all my passions. At graduate school, I studied various web algorithms including PageRank and I further developed my interest in working with data.
Q - What was the first data set you remember working with? What did you do with it?
A - A bunch of text files in MS-DOS. When I was a child, I wrote a simple BASIC program that was able to count the number of lines, words, characters, spaces, etc in these files. Using this simple program, I wrote another program that constructed uni-grams and bi-grams and did a very rudimentary plotting of their distributions.
Q - Was there a specific "aha" moment when you realized the power of data?
A - The core essence of Thoora, the first startup I worked for, was to rank and present each news article by the volume of social buzz around it. At Thoora, I was amazed by how fast and accurately social media can break interesting news stories. For instance, when Michael Jackson passed away, the social media space immediately became full of eulogies for Michael Jackson in the matter of few seconds and it was an incredible experience to observe that phenomenon in actual numbers and stats.
Very interesting background - thanks for sharing! Let's talk in more detail about the Health & Fitness space and how data/data science are shaping it...
Q - What attracted you to the intersection of data/data science and Health & Fitness?
A - I learned a valuable lesson at LinkedIn in terms of how data products and data science could create tremendous value for users. I had the feeling that similar success could be replicated in the intersection of data/data science and Health & Fitness. Interestingly enough, the nature of many data problems in health & fitness is very similar to that of problems I have worked on previously. The overall idea of being a pioneer in the new field was exciting to me.
Q - Which Health & Fitness companies do you consider pace-setters in terms of their use of data science? What in particular are they doing to distinguish themselves from peers?
A - I think health & fitness big data innovation is in its nascent stage and therefore it is not clear which companies are pace-setters in terms of their use of data science. I think companies like Jawbone, Zephyr Health, Zynopsis, explorys, HealthTap, etc. are attempting interesting and novel ways of applying data science in their product offerings. I personally find the possibility of using IBM's Watson in healthcare very interesting. I would like to say that our endeavor at MyFitnessPal of using data science in some of our product offerings is also interesting.
Q - I think you're right to say so :) ... Given the current nascent stage, which companies/industries do you think Health & Fitness should emulate with respect to their use of Data Science? Why?
A - I think big internet and social media companies are the true pioneers of data science without calling their approach as "data science". The success of data products developed by companies like Google, ebay, Yahoo, Facebook, LinkedIn, Twitter, etc clearly demonstrates the power of data science. For instance, the Google Translate Service is one of the most interesting and powerful statistical engines based on big data techniques. LinkedIn's well-known PYMK (People You Might Know) is another big success story. Also note that these companies pioneered several tools (e.g. Hadoop, Pig) that eventually became essential in data science. Thus, I think health & fitness companies should try to emulate the success of these companies in the use of data science to tackle their own data problems especially in the development of new products.
Q - Makes a lot of sense! ... On that note, where do you think Data Science can create most value in Health & Fitness?
A - Almost everywhere! I might be slightly over-optimistic here but I think data science can create value in almost every spectra of health & fitness because the right usage of data science will increase the overall information processing power of health & fitness data while providing insights about our health habits, consumption, treatments and medication. Thus, the room for growth of data science in health & fitness for the next few years is big.
That's great to here - and makes for interesting times to be working at MyFitnessPal! Let's talk more about that ...
Q - What are the biggest areas of opportunity/questions you want to tackle at MyFitnessPal?
A - I think offering Amazon like recommendations on what you should eat and what exercise you should do, based on your dietary preferences would be the biggest areas of opportunity since it will have a big product impact and will trigger technology innovation as well. To achieve that goal, discovering hidden patterns for diets that people are following, how and why their behaviors change over time, why certain diets work while others don't, etc would be very important.
Q - What learnings from your time at LinkedIn will be most applicable in your new role?
A - At LinkedIn I learnt that feature engineering and reducing noise in data are crucial for the successful development of data products. I also learnt the importance of building and operating large-scale data processing infrastructure. Thus, I am currently paying special attention to the development of a scalable data processing infrastructure while making sure that all data cleaning, standardization and feature extraction tasks are well supported.
Q - What projects are you currently working on, and why/how are they interesting to you?
A - There are several feature extraction and data processing projects that I am working on with other team members. One project that I am particularly interested in is the food categorization project which is an attempt to categorize a given food item into one or many food categories. This is particularly interesting because it sits in an intersection of data science and engineering involving different large-scale machine learning, natural language processing, and data analysis techniques. In addition, it will have direct impact on other data science problems that we are trying to tackle.
Q - That sounds really interesting! What has been the most surprising insight you have found so far in your work?
A - The overall variety and complexity of different data problems that have to be tackled in health & fitness. Note that health & fitness touches pretty much every aspect of human activity. Different companies in health & fitness have been accumulating different types of semi-structured and structured data related to health & fitness. The overall variety and complexity of problems that need to be solved in order to better understand health & fitness data is way bigger that I originally envisioned.
Q - And to help you do so you have recently recruited a top notch team from Google, Ebay, LinkedIn, etc - how did you find the recruiting process?
A - As you know, there is a fierce competition to attract top engineering & data science talent in the startup world. Thus, it was not easy to find and secure top talent from Google, Ebay & LinkedIn. We were specifically looking for candidates who had previous experience at companies that had a strong presence in data science since we wanted to emulate their success with respect to their use of data science. We were also looking for candidates that were passionate about building great data products that could have real impact on users. I think our current team members joined MyFitnessPal because they were impressed by the nature and scale of data problems that need to be tackled and MyFitnessPal’s overall vision for data products also helped a lot. [Editor Note: see for example this example on Growing the World's Largest Nutrition Database!]
Q - How does your team interact with the rest of the organization?
A - Since we are a startup, my team works with a wide range of functional teams. More specifically, we mainly collaborate on different data related projects with product managers, platform engineers and app engineers. In terms of the decision making process, I would say that we participate in almost every decision making process as long as the given project has some "data" component. It is sometimes challenging to communicate our work/findings with the rest of the organization since not everyone has a strong quantitative mindset. I do not think this is necessarily bad since many times different perspectives can lead to creative solutions. Thus, for a data scientist, having story-telling skills is important (as it has been pointed out by other data scientists). Yet, we have to admit to ourselves that data science, as any other scientific discipline, involves a certain level of complexity and therefore it is important to make sure we pursue scientific rigor while maintaining the accessibility of our work/findings.
Chul, thanks so much for all the insights and details behind what you are working on at MyFitnessPal - sounds like you have a lot of data to play with and are building some very valuable tools/products! Finally, its time to look to the future and share some advice...
Q - What excites you most about recent developments in Data Science?
A - I am most excited about the re-surgance of traditional optimization and machine learning techniques at large-scale in a wide range of application scenarios. Due to this new trend, some traditional algorithms are being re-evaluated and revised to accommodate different computation models at large-scale, especially in distributed and parallelized environments. Meanwhile, new computation infrastructures are being proposed to support these new algorithms.
Q - What does the future of Data Science look like?
A - Very promising and rosy! There is no doubt that data is becoming the definitive currency of today's digital economy. I believe that data science will continue finding new applications in many domains. Thus, new problems and challenges in these applications will continue stimulating innovation in data science itself.
Q - Any words of wisdom for Data Science students or practitioners starting out?
A - Don’t be afraid to explore new exciting and emerging areas in data science like healthcare, ecology, agriculture, green tech, etc. I believe that these new areas will be booming for the next few years and you shouldn't let these opportunities go, especially when you are starting out as a data scientist. As such, I think it is important to have a good preparation in the fundamentals of data science as many of the techniques are somewhat universal and transferrable from one domain to another - and these new areas are not exceptions.
Chul - Thank you so much for your time! Really enjoyed learning more about your background, how data and data science are influencing the Health & Fitness space and what you are working on at MyFitnessPal. MyFitnessPal can be found online at http://www.MyFitnessPal.com/ and Chul is on twitter @Chul_Lee.
Readers, thanks for joining us!
P.S.If you enjoyed this interview and want to learn more about
- what it takes to become a data scientist
- what skills do I need
- what type of work is currently being done in the field