Data Science Weekly Newsletter - Issue 146

Issue #146

Sep 08 2016

Editor Picks
 
  • A Technical Primer On Causality
    What does “causality” mean, and how can you represent it mathematically? How can you encode causal assumptions, and what bearing do they have on data analysis? These types of questions are at the core of the practice of data science, but deep knowledge about them is surprisingly uncommon...
  • The Pallettes of Earth
    Take a satellite image, and extract the pixels into a uniform 3-D color space. Then run a clustering algorithm on those pixels, to extract a number of clusters. The centroids of those clusters them make a representative palette of the image...
  • Deep Neural Networks for YouTube Recommendations
    YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence. In this paper, we describe the system at a high level and focus on the dramatic performance improvements brought by deep learning...
 
 

A Message from this week's Sponsor:

 

  • Rodeo + ScienceOps Demo Webinar

    Join Yhat cofounder and CTO Greg Lamp & Rodeo Product Manager Colin Ristig for a live product tour of Yhat's open-source Python IDE, Rodeo, and enterprise model deployment platform, ScienceOps. Greg and Colin will walk through a demo of both products using a beer recommender algorithm and web app as an example. The webinar will take place on Wednesday, September 21 at 2 PM EST.

    Get your invite to the Yhat webinar today!
 
 

Data Science Articles & Videos

 
  • Artificial Intelligence Swarms Silicon Valley on Wings and Wheels
    For more than a decade, Silicon Valley’s technology investors and entrepreneurs obsessed over social media and mobile apps that helped people do things like find new friends, fetch a ride home or crowdsource a review of a product or a movie. Now Silicon Valley has found its next shiny new thing. And it does not have a “Like” button...
  • How a Japanese Cucumber Farmer is using Deep Learning and TensorFlow
    It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes...
  • Experimentation in a Ridesharing Marketplace
    Technology companies strive to make data-driven product decisions — and Lyft is no exception. Because of that, online experimentation, or A/B testing, has become ubiquitous. The way it’s bandied about, you’d be excused for thinking that online experimentation is a completely solved problem. In this post, we’ll illustrate why that’s far from the case for systems — like a ridesharing marketplace — that evolve according to network dynamics. As we’ll see, naively partitioning users into treatment and control groups can bias the effect estimates you care about...
  • A Decomposable Attention Model for Natural Language Inference
    We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters than previous work and without relying on any word-order information...
  • Craigslist and U.S. Rental Housing Markets
    The UC Berkeley Urban Analytics Lab collected, validated, and analyzed 11 million Craigslist rental listings to discover fine-grained patterns across metropolitan housing markets in the United States. I’ll summarize our findings below and explain the methodology at the bottom...
  • Hierarchical Multiscale Recurrent Neural Networks
    Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks...
  • A Survival Guide to a PhD
    Now that my PhD has come to an end I wanted to compile a similar retrospective document in hopes that it might be helpful to some. Unlike the undergraduate guide, this one was much more difficult to write because there is significantly more variation in how one can traverse the PhD experience. Therefore, many things are likely contentious and a good fraction will be specific to what I’m familiar with (Computer Science / Machine Learning / Computer Vision research). But disclaimers are boring, lets get to it!...
 
 

Jobs

 
  • Data Scientist - Blue Owl - San Francisco

    A million people a year die in car collisions around the world. That number should be zero. You can help us create a new insurance company that uses the latest technology and data science methods to save lives by preventing car collisions before they happen. The field is rich with data and we will be pushing the boundaries of what is possible...
 
 

Training & Resources

 
  • Auto-sklearn - Automated Machine Learning Toolkit
    auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction...
 
 

Books

 

  • Python Pocket Reference

    Updated for both Python 3.4 and 2.7, this convenient pocket guide is the perfect on-the-job quick reference. You’ll find concise, need-to-know information on Python types and statements, special method names, built-in functions and exceptions, commonly used standard library modules, and other prominent Python tools...


    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.