Data Science Weekly Newsletter - Issue 191

Issue #191

July 20 2017

Editor Picks
 
  • Facets: An Open Source Visualization Tool for Machine Learning Training Data
    Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more...Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets...
  • How Checkers Was Solved
    The story of a duel between two men, one who dies, and the nature of the quest to build artificial intelligence...
  • How To Make A Racist AI Without Really Trying
    My purpose with this tutorial is to show that you can follow an extremely typical NLP pipeline, using popular data and popular techniques, and end up with a racist classifier that should never be deployed...There are ways to fix it. Making a non-racist classifier is only a little bit harder than making a racist classifier. The fixed version can even be more accurate at evaluations. But to get there, you have to know about the problem, and you have to be willing to not just use the first thing that works...
 
 

A Message from this week's Sponsor:

 

 
STPF is the premier opportunity for outstanding scientists and engineers to learn first-hand about policymaking while contributing their knowledge and analytical skills to address some of today’s most pressing societal challenges. Enhance your career while engaging with policy administrators and thought leaders.

For over 43 years, doctoral level scientists, social scientists, engineers, and health/medical professionals have applied their knowledge and technical expertise to policymaking at the national and international levels. Fellows serve yearlong assignments in all three branches of the federal government and represent a broad range of backgrounds, disciplines and career stages.

For more information, visit: go.stpf-aaas.org/DSW
 

 

Data Science Articles & Videos

 
  • Apple Machine Learning Journal
    Welcome to the Apple Machine Learning Journal. Here, you can read posts written by Apple engineers about their work using machine learning technologies to help build innovative products for millions of people around the world. If you’re a machine learning researcher or student, an engineer or developer, we’d love to hear your questions and feedback...
  • Using Machine Learning to Predict Value of Homes On Airbnb
    Recently, advances in Airbnb’s machine learning infrastructure have lowered the cost significantly to deploy new machine learning models to production. For example, our ML Infra team built a general feature repository that allows users to leverage high quality, vetted, reusable features in their models. Data scientists have started to incorporate several AutoML tools into their workflows to speed up model selection and performance benchmarking...In this post, I will describe how these tools worked together to expedite the modeling process and hence lower the overall development costs for a specific use case of LTV modeling — predicting the value of homes on Airbnb...
  • This One Weird Trick Will Simplify Your ETL Workflow
    An unfortunate truth that every hopeful, young data scientist has to come to terms with: Most of the time spent developing a production-ready model is invested in writing ETL that gets your data in a clean format...In this post aimed at SQL practitioners who would rather spend their time writing Python, I’ll outline a trick that can make your ETL more maintainable and easier to write using one of my favorite libraries, jinja2. R enthusiasts can also follow along, developing analogous patterns using one of the language’s many text templating packages, such as whisker...
  • Open Up Guide: Using Open Data to Combat Corruption
    Corruption has a devastating impact on the lives of people around the world. When money that should be spent on schools, hospitals and other government services ends up in the hands of dishonest officials, everyone suffers. This briefing explains how open data can be used to tackle corruption and accompanies the launch of Open Up: A Guide to Using Open Data to Combat Corruption...
  • How to Visualize Your Recurrent Neural Network with Attention in Keras
    In this tutorial, we will write an RNN in Keras that can translate human dates (“November 5, 2016”, “5th November 2016”) into a standard format (“2016–11–05”). In particular, we want to gain some intuition into how the neural network did this. We will leverage the concept of attention to generate a map (like that shown in Figure 1) that shows which input characters were important in predicting the output characters...
  • The Future of Deep Learning
    Given what we know of how deep nets work, of their limitations, and of the current state of the research landscape, can we predict where things are headed in the medium term? Here are some purely personal thoughts. Note that I don't have a crystal ball, so a lot of what I anticipate might fail to become reality. This is a completely speculative post. I am sharing these predictions not because I expect them to be proven completely right in the future, but because they are interesting and actionable in the present...
  • Promotional Song Targeting
    Here, we describe our technique for promoting tracks on Pandora’s radio service. The goal of this approach is to simultaneously maximize the reach of these tracks (i.e., the number of unique listeners and contexts we’re finding for these tracks) as well as the overall positivity of the feedback (i.e., we want listeners to thumb these tracks up). We start off by describing and presenting the results of our promotional techniques on Featured Tracks — a free program we offer to artists on Pandora. Next, we describe how we accomplished these results through a technique we refer to as song targeting...
 
 

Jobs

 
  • Data Scientist - Hello Fresh - Berlin, Germany

    We are looking for a smart, result-oriented individual who can translate data insights into recommendations driving high-end business value across areas of demand management, marketing, customer lifecycle, and product development. Our ideal candidate has solid background in data science, including predictive modeling, forecasting and validation techniques. So if you are passionate about finding answers in scientific investigation and leading new solutions, feel invited to apply!...
 
 

Training & Resources

 
  • Machine Learning Crash Course: Part 4 - The Bias-Variance Dilemma
    There is a very delicate balancing act when machine learning algorithms try to predict things. On the one hand, we want our algorithm to model the training data very closely, otherwise we’ll miss relevant features and interesting trends. However, on the other hand we don’t want our model to fit too closely, and risk over-interpreting every outlier and irregularity...
  • What I’ve Learned About Neural Network Quantization
    As we keep pushing on quantization, this sort of co-design between researchers and implementers is crucial to get the best results. I think there’s a whole new field beginning to emerge, which I’m not sure whether to call ML Engineering or ML Systems, looking at the whole lifecycle of a deep learning solution, all the way from initial research through to deployment in production. It’s only with that sort of integrated view that we’re going to be able to solve some of the outstanding problems we’re still facing...
 
 

Books

 

 
 
P.S. Looking to hire a Data Scientist? Find an awesome one among our readers! Email us for details on how to post your job :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.