As the year end approaches we decided to dig through the 2017 archives to figure out what were the most read articles of the year. It is an interesting mix - deep-dives into what’s going on in Data Science at top tech companies, interesting applications of data science, advice on how best to get into the field, unique explorations (or even debates) around some of the core concepts / techniques; and some helpful practical posts. Plus some emerging new sites that are already proving very popular…
First up, a look inside some of the leading tech firms …
Google Brain Residency Last year, after nerding out a bit on TensorFlow, I applied and was accepted into the inaugural class of the Google Brain Residency Program. The program invites two dozen people, with varying backgrounds in ML, to spend a year at Google's deep learning research lab in Mountain View to work with the scientists and engineers pushing on the forefront of this technology. The year has just concluded and this is a summary of how I spent it.
Two Decades of Recommender Systems at Amazon.com Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this update to our original paper, we discuss some of the changes as Amazon has grown.
Second, some interesting (and diverse) applications of Data Science
Linear Programming and Healthy Diets My dad’s an interesting guy. Every so often he picks up a health trend and/or weight loss goal that would make many people’s jaw drop. Recently he asked me to help him optimize his diet. He was essentially solving a linear program by hand, roughly as best as one can, with a few hundred variables! After asking me whether there was “any kind of math” that could help him automate his laborious efforts, I decided to lend a hand.
Next, articles and resources for those looking to break into Data Science …
Advice For New and Junior Data Scientists I have been working at Airbnb for a little bit less than two years and have recently become a senior data scientist — an industry title used to signal that one has acquired a certain level of technical expertise. As I reflect on my journey so far and imagine what’s next to come, I once again wrote down a few lessons that I wish I had known in the earlier days of my career.
Advice for non-traditional data scientists When I started, I honestly didn’t have any particular skills or capacity which would have made data science a good career choice. I studied philosophy in undergrad, and while I had done a bit of statistics, it wasn’t something I would have said I was comfortable with. All I really had was an interest and the capacity to learn new things. If you’re in a similar boat, here is some advice about the process
Some Reflections on Being Turned Down for a Lot of Data Science Jobs In the last five years, I've clearly interviewed for a lot of data science jobs, and I've also been turned down for a lot of data science jobs. I've spent a good bit of time reflecting on why I wasn't offered this job or that. Several folks have asked me if I had any advice to share on the experience, and I hope to offer that here.
How to Get a Job In Deep Learning A lot of people think you need a PhD or tons of experience to get a job in deep learning, but if you're already a decent engineer, you can pick up the requisite skills and techniques pretty quickly. At least, that's our philosophy
And finally in this section, a plug for our own guide to help - Get A Data Science Job Course - which distills all our real-world-tested advice into a self-directed course.
Fourth, for those already up and running, some helpful practical posts…
How to unit test machine learning code. Over the past year, I’ve spent most of my working time doing deep learning research and internships. And a lot of that year was making very big mistakes that helped me learn not just about ML, but about how to engineer these systems correctly and soundly. One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.
Technical Debt in Machine Learning Experienced teams know when to back up seeing a piling debt, but technical debt in machine learning piles extremely fast. You can create months worth of debt in a matter of one working day and even the most experienced teams can miss a moment when the debt is so huge that it sets them back for half a year, which is often enough to kill a fast-pacing project.
Rules of Machine Learning This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming
Facets: An Open Source Visualization Tool for Machine Learning Training Data Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.
Finally, some underlying research trends and debate on core concepts / techniques…
A Peek at Trends in Machine Learning Have you looked at Google Trends? It’s pretty cool — you enter some keywords and see how Google Searches of that term vary through time. I thought — hey, I happen to have this arxiv-sanity database of 28,303 (arxiv) Machine Learning papers over the last 5 years, so why not do something similar and take a look at how Machine Learning research has evolved over the last 5 years? The results are fairly fun, so I thought I’d post.
Unlearning descriptive statistics If you've ever used an arithmetic mean, a Pearson correlation or a standard deviation to describe a dataset, I'm writing this for you. Better numbers exist to summarize location, association and spread: numbers that are easier to interpret and that don't act up with wonky data and outliers.
A Litany of Problems With p-values In my opinion, null hypothesis testing and p-values have done significant harm to science. The purpose of this note is to catalog the many problems caused by p-values. As readers post new problems in their comments, more will be incorporated into the list, so this is a work in progress.
Engineering is the bottleneck in (Deep Learning) research When I was in graduate school working on NLP and information extraction I spent most of my time coding up research ideas. That’s what grad students with advisors who don’t like to touch code, which are probably 95% of all advisors, tend to do. When I raised concerns about problems I would often hear the phrase “that’s just an engineering problem; let’s move on”. I later realized that’s code speech for “I don’t think a paper mentioning this would get through the peer review process”. This mindset seems pervasive among people in academia. But as an engineer I can’t help but notice how the lack of engineering practices is holding us back.
And, as a bonus - some up and coming sites that are already proving very popular.
AI Workbox Bite-size video tutorials for Deep Learning Developers: Learn the latest cutting-edge tools and frameworks. Short screencasts with full transcripts.
Statistical Thinking Blog devoted to statistical thinking and its impact on science and everyday life
Springboard Blog A range of posts, though several showing applications of Data Science, or practical guides/advice on how to break into the field
Thanks to everyone for being part of the DataScienceWeekly community - we look forward to many more interesting reads in 2018! Happy New Year!
Receive the Data Science Weekly Newsletter every Thursday
Easy to unsubscribe at any time. Your e-mail address is safe.