Data Science Weekly Newsletter - Issue 149

Issue #149

Sept 29 2016

Editor Picks
 
  • Estimating Delivery Times: A Case Study In Practical Machine Learning
    I would like to share with you some insights I gained from the development process for Estimated Delivery Time [at Postmates], and hopefully illustrate how powerful the proper application of some simple and accessible Machine Learning techniques can be when applied to the right problem...
  • Why the Father of Modern Statistics Didn’t Believe Smoking Caused Cancer
    In the summer of 1957, Ronald Fisher, one of the fathers of modern statistics, sat down to write a strongly worded letter in defense of tobacco...The letter was addressed to the British Medical Journal, which, just a few weeks earlier, had taken the editorial position that smoking cigarettes causes lung cancer. According to the journal's editorial board, the time for amassing evidence and analyzing data was over. Now, they wrote, “all the modern devices of publicity” should be used to inform the public about the perils of tobacco...According to Fisher, this was nothing short of statistically illiterate fear mongering...
  • Riding Dirty: The Science of Cars and Rap Lyrics
    Fancy cars have always been an important element in rap music. You can find many articles online talking about what cars rappers love the most, but they all lack the research...Let’s turn to data science to settle this debate once and for all. By analyzing all lyrics on Rap Genius, we’ll see which rides have been celebrated the most...
 
 

A Message from this week's Sponsor:

 

  • Learn to solve real-world problems using data science from an expert

    Learning alone is difficult. Getting stuck while learning alone is even more frustrating. What if you can learn with peers, and receive instant feedback from an instructor - live?

    With 8 hours of live sessions, you will be able to do hands-on practices and get your questions answered. Join our machine learning live class in Python and learn how to tackle real-world analyses with an expert mentor at Codementor. Class starting soon - check out the curriculum here.
 
 

Data Science Articles & Videos

 
  • Cubr: A Rubik's Cube Solver Written in Python
    Cubr is a project I completed in three weeks at the end of my introductory computer science class at CMU. The idea is simple: you mix up a Rubik’s cube. You show the cube to your computer’s webcam. Some magic happens, and your cube appears onscreen. Then, the cube begins to solve itself, and all you have to do is follow along and you will have solved your cube!...
  • Amazon Says It Puts Customers First. But Its Pricing Algorithm Doesn’t.
    We [ProPublica] looked at 250 frequently purchased products over several weeks to see which ones were selected for the most prominent placement on Amazon’s virtual shelves — the so-called “buy box” that pops up first as a suggested purchase. About three-quarters of the time, Amazon placed its own products and those of companies that pay for its services in that position even when there were substantially cheaper offers available from others...
  • ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
    The recent advances in deep neural networks have led to effective vision-based reinforcement learning methods that have been employed to obtain human-level controllers in Atari 2600 games from pixel data. Atari 2600 games, however, do not resemble real-world tasks since they involve non-realistic 2D environments and the third-person perspective. Here, we propose a novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world...
  • A.I. Doesn't Get Black Twitter
    To study the use of African-American language online, researchers examined 59 million tweets from 2.8 million users, collecting what they believe the largest data set to date...When researchers evaluated their model against natural language processing tools, such as Google’s SynaxNet, researchers found that the software flagged African-American English as “not English” at a much higher frequency than standard English. In Twitter’s own language identifier, identification based on African-American language was twice as bad, despite the large presence of African-American users on the site...
  • Generating Faces with Deconvolution Networks
    One of my favorite deep learning papers is Learning to Generate Chairs, Tables, and Cars with Convolutional Networks. It’s a very simple concept – you give the network the parameters of the thing you want to draw and it does it – but it yields an incredibly interesting result...I happened to stumble upon the Radboud Faces Database some time ago, and wondered if something like this could be used to generate and interpolate between faces as well...The results are actually pretty exciting!...
  • An Introduction to Stock Market Data Analysis with Python (Part 2)
    In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance using pandas, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking. This second post discusses topics including divising a moving average crossover strategy, backtesting, and benchmarking, along with practice problems for readers to ponder...
  • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
    Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues...
 
 

Jobs

 
  • Research Lead / Data Scientist - Spotify - NYC

    You will apply your mathematical or scientific training to analyze large volumes of diverse data, model complex human-scale problems, and develop algorithms to serve various needs. You will work in collaboration with other mathematicians and scientists in Research and with data engineers and design technologists across Analytics to imagine and build creative solutions to challenging questions, most often with a clear line of sight from your work to real-world impact. You will work on projects that cut across various facets of Spotify (from Product to Marketing to Content)...
 
 

Training & Resources

 
  • From both sides now: the math of linear regression
    Linear regression is the most basic and the most widely used technique in machine learning; yet for all its simplicity, studying it can unlock some of the most important concepts in statistics...If you have a basic understanding of linear regression, but don’t have a background in statistics and find statements like “ridge regression is equivalent to the maximum a posteriori (MAP) estimate with a zero-mean Gaussian prior” bewildering, then this post is for you...
  • snorkel - information extraction systems using data programming
    Snorkel is intended to be a lightweight but powerful framework for developing structured information extraction applications for domains in which large labeled training sets are not available or easy to obtain, using the data programming paradigm...In the data programming approach to developing a machine learning system, the developer focuses on writing a set of labeling functions, which create a large but noisy training set. Snorkel then learns a generative model of this noise—learning, essentially, which labeling functions are more accurate than others—and uses this to train a discriminative classifier...
  • 20 Python Libraries You Aren't Using (But Should)
    This week you'll meet Caleb Hattingh who wrote a great book called 20 Python Libraries You Aren't Using (But Should). He and I spend an hour digging into all the very powerful and interesting packages that you probably haven't heard of but will be super excited to use after you learn about them...
 
 

Books

 

  • Business Intelligence with R - From Acquiring Data to Pattern Exploration

    Business Intelligence with R is a practical, hands-on overview of many of the major BI/analytic tasks that can be accomplished with R. It is not meant to be exhaustive--there is always more than one way to accomplish a given task in R, so this book aims to provide the simplest and/or most robust approaches to meet daily workflow needs. It can serve as the go-to desk reference for the professional analyst who needs to get things done in R...


    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.