Data Science Weekly Newsletter - Issue 128

Issue #128

May 5 2016

Editor Picks
 
  • The Special Relationship Between Noodles and Qdoba
    I’ve had a theory that for every Noodles, there’s a Qdoba that’s right next door. It might be some sort of selection bias however, since I can think of a couple locations where they’re directly next to each other. To me, Noodles and Qdoba have a special relationship, at least compared to other restaurants. I figured now was about the time I should test this, and I can use Chipotle to test...
  • A Neural Network that Dreams in Resumes
    If a neural network can write Shakespeare, could it write a resume for you? Inspired by the remarkable results of Recurrent Neural Networks and using thousands of anonymized resumes from untapt, I’ve been experimenting with applying deep learning techniques to the CV...
 
 

A Message from this week's Sponsor:

 

  • Whitepaper: A Practical Guide to Building Data Driven Products Beyond Analysts' Laptops via @YhatHQ
    Learn how to apply data science insights to the real world. Discover the implications beyond analysts’ laptops and answer the question of what to do with predictive models once they’re built.
 
 

Data Science Articles & Videos

 
  • How to get into the top 15 of a Kaggle competition using Python
    Doing well in a Kaggle competition requires more than just knowing machine learning algorithms. It requires the right mindset, the willingness to learn, and a lot of data exploration. Many of these aspects aren’t typically emphasized in tutorials on getting started with Kaggle, though. In this post, I’ll cover how to get started with the Kaggle Expedia hotel recommendations competition, including establishing the right mindset, setting up testing infrastructure, exploring the data, creating features, and making predictions...
  • Finding Similar Music using Matrix Factorization
    This post is a step by step guide on how to calculate related artists using a couple of different matrix factorization algorithms. The code is written in Python using Pandas and SciPy to do the calculations and D3.js to interactively visualize the results...
  • [Video] How Machine Learning Amplifies Inequality in Society
    In this talk, Mike Williams, Research Engineer at Fast Forward Labs, looks at how supervised machine learning has the potential to amplify power and privilege in society. Using sentiment analysis, he demonstrates how text analytics often favors the voices of men. Mike discusses how bias can inadvertently be introduced into any model, and how to recognize and mitigate these harms...
  • Neural Networks Are Impressively Good At Compression
    I hope I have given you an intuition for how neural networks can compress patterns in few weights. They use the full range of the weights to the point where a connection activated with a strong input can mean something entirely different than the same connection activated with a weak input. And best of all I didn’t have to teach them to do this. They just start behaving like this if you force them to express a complex pattern in few connections...
  • Artistic style transfer for videos
    We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. Supplementary video accompanying the paper...
  • Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning
    Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions...
  • The Descriptor Protocol, and Python Black Magic
    Since I graduated last summer, I have been writing lots of both Python 2 and 3. This snippet seemed like something I should understand well. However, I did not, so this post is an attempt to solve that...
 
 

Jobs

 
  • Senior Data Scientist - SimpleReach - New York

    SimpleReach is seeking a seasoned data scientist to join our ranks. This mathematically savvy individual will be on the front lines, wrangling data and investigating our massive stores of traffic events while also building machines to intelligently classify content and build recommendation engines for a wide range of applications...
 
 

Training & Resources

 
  • An Introduction to Scientific Python (and a Bit of the Maths Behind It) - Matplotlib
    In this series of posts, we will take a look at the main libraries used in scientific Python and learn how to use them to bend data to our will. We won't just be learning to churn out template code however, we will also learn a bit of the maths behind it so that we can understand what is going on a little better. So let's kick things off with a incredibly useful little number that we will be using throughout this series of posts; Matplotlib...
  • D3 Basic Pie Chart Video Tutorial
    You will use the CSV data from the D3js.org website Pie Chart example to see how a full D3 Pie Chart example data visualization is built...
  • Identify, describe, plot, and remove the outliers from the dataset
    There are different methods to detect the outliers, including standard deviation approach and Tukey’s method which use interquartile (IQR) range approach. In this post I will use the Tukey’s method because I like that it is not dependent on distribution of data...
 
 

Books

 

 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.