As the year end approaches we decided to dig through the 2017 archives to figure out what were the most read articles of the year. It is an interesting mix - deep-dives into what’s going on in Data Science at top tech companies, interesting applications of data science, advice on how best to get into the field, unique explorations (or even debates) around some of the core concepts / techniques; and some helpful practical posts. Plus some emerging new sites that are already proving very popular…
First up, a look inside some of the leading tech firms …
- Google Brain Residency
Last year, after nerding out a bit on TensorFlow, I applied and was accepted into the inaugural class of the Google Brain Residency Program. The program invites two dozen people, with varying backgrounds in ML, to spend a year at Google's deep learning research lab in Mountain View to work with the scientists and engineers pushing on the forefront of this technology. The year has just concluded and this is a summary of how I spent it.
- I Took the AI Class Facebookers are literally sprinting to get into
Facebook is now organizing formal classes and longterm research internships in an effort to build new deep learning talent and spread it across the company. "We have incredibly smart people here," Zitnick says. "They just need the tools."
- Two Decades of Recommender Systems at Amazon.com
Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this update to our original paper, we discuss some of the changes as Amazon has grown.
Second, some interesting (and diverse) applications of Data Science
- Linear Programming and Healthy Diets
My dad’s an interesting guy. Every so often he picks up a health trend and/or weight loss goal that would make many people’s jaw drop. Recently he asked me to help him optimize his diet. He was essentially solving a linear program by hand, roughly as best as one can, with a few hundred variables! After asking me whether there was “any kind of math” that could help him automate his laborious efforts, I decided to lend a hand.
- How I replicated an $86 million project in 57 lines of code
When an experiment with existing open source technology does “good enough”
- First Evidence That Online Dating Is Changing the Nature of Society
Dating websites have changed the way couples meet. Now evidence is emerging that this change is influencing levels of interracial marriage and even the stability of marriage itself.
Next, articles and resources for those looking to break into Data Science …
- Advice For New and Junior Data Scientists
I have been working at Airbnb for a little bit less than two years and have recently become a senior data scientist — an industry title used to signal that one has acquired a certain level of technical expertise. As I reflect on my journey so far and imagine what’s next to come, I once again wrote down a few lessons that I wish I had known in the earlier days of my career.
- Advice for non-traditional data scientists
When I started, I honestly didn’t have any particular skills or capacity which would have made data science a good career choice. I studied philosophy in undergrad, and while I had done a bit of statistics, it wasn’t something I would have said I was comfortable with. All I really had was an interest and the capacity to learn new things. If you’re in a similar boat, here is some advice about the process
- Some Reflections on Being Turned Down for a Lot of Data Science Jobs
In the last five years, I've clearly interviewed for a lot of data science jobs, and I've also been turned down for a lot of data science jobs. I've spent a good bit of time reflecting on why I wasn't offered this job or that. Several folks have asked me if I had any advice to share on the experience, and I hope to offer that here.
- I ranked every Intro to Data Science course on the internet, based on thousands of data points
I’ve taken many data science-related courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role. A few months ago, I started creating a review-driven guide that recommends the best courses for each subject within data science.
- How to Get a Job In Deep Learning
A lot of people think you need a PhD or tons of experience to get a job in deep learning, but if you're already a decent engineer, you can pick up the requisite skills and techniques pretty quickly. At least, that's our philosophy
And finally in this section, a plug for our own guide to help - Get A Data Science Job Course - which distills all our real-world-tested advice into a self-directed course.
Fourth, for those already up and running, some helpful practical posts…
- How to unit test machine learning code.
Over the past year, I’ve spent most of my working time doing deep learning research and internships. And a lot of that year was making very big mistakes that helped me learn not just about ML, but about how to engineer these systems correctly and soundly. One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.
- Technical Debt in Machine Learning
Experienced teams know when to back up seeing a piling debt, but technical debt in machine learning piles extremely fast. You can create months worth of debt in a matter of one working day and even the most experienced teams can miss a moment when the debt is so huge that it sets them back for half a year, which is often enough to kill a fast-pacing project.
- Building ML models is hard. Deploying them in real business environments is harder
A few months ago, we described on our blog how machine learning (ML) improved efficiency in our contact centre. Today we would like to tell you how we built this system, what we have learned along the way, and how we were able to reduce response times for customer emails by up to 4x.
- Rules of Machine Learning
This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming
- Facets: An Open Source Visualization Tool for Machine Learning Training Data
Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.
Finally, some underlying research trends and debate on core concepts / techniques…
- A Peek at Trends in Machine Learning
Have you looked at Google Trends? It’s pretty cool — you enter some keywords and see how Google Searches of that term vary through time. I thought — hey, I happen to have this arxiv-sanity database of 28,303 (arxiv) Machine Learning papers over the last 5 years, so why not do something similar and take a look at how Machine Learning research has evolved over the last 5 years? The results are fairly fun, so I thought I’d post.
- Unlearning descriptive statistics
If you've ever used an arithmetic mean, a Pearson correlation or a standard deviation to describe a dataset, I'm writing this for you. Better numbers exist to summarize location, association and spread: numbers that are easier to interpret and that don't act up with wonky data and outliers.
- A Litany of Problems With p-values
In my opinion, null hypothesis testing and p-values have done significant harm to science. The purpose of this note is to catalog the many problems caused by p-values. As readers post new problems in their comments, more will be incorporated into the list, so this is a work in progress.
- What a nerdy debate about p-values shows about science — and how to fix it
The case for, and against, redefining “statistical significance.”
- Engineering is the bottleneck in (Deep Learning) research
When I was in graduate school working on NLP and information extraction I spent most of my time coding up research ideas. That’s what grad students with advisors who don’t like to touch code, which are probably 95% of all advisors, tend to do. When I raised concerns about problems I would often hear the phrase “that’s just an engineering problem; let’s move on”. I later realized that’s code speech for “I don’t think a paper mentioning this would get through the peer review process”. This mindset seems pervasive among people in academia. But as an engineer I can’t help but notice how the lack of engineering practices is holding us back.
And, as a bonus - some up and coming sites that are already proving very popular.
- AI Workbox
Bite-size video tutorials for Deep Learning Developers: Learn the latest cutting-edge tools and frameworks. Short screencasts with full transcripts.
- Statistical Thinking
Blog devoted to statistical thinking and its impact on science and everyday life
- Springboard Blog
A range of posts, though several showing applications of Data Science, or practical guides/advice on how to break into the field
Thanks to everyone for being part of the DataScienceWeekly community - we look forward to many more interesting reads in 2018! Happy New Year!