Data Science Weekly Newsletter - Issue 171

Issue #171

March 2 2017

Editor Picks
 
  • On the Origin of Deep Learning
    This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks, and extends to popular recent models like variational autoencoder and generative adversarial nets...
  • Mathematicians becoming data scientists: Should you? How to?
    I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter. I asked her: so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry? How would you know whether you might find that kind of work enjoyable? And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?...
  • Self-driving cars in the browser
    The goal of this project was to create a fully self-learning agent, that would be able to control a car in a 2D bottom-down environment. Written solely in JavaScript...
 
 

A Message from this week's Sponsor:

 

 
  • Yhat Demo Webinar

    Join Yhat cofounder Greg Lamp for a live tour of Yhat's product suite using a beer recommender algorithm as an example. We'll demo our open-source Python IDE, Rodeo, our centralized data science hub, Bandit, and finally our model deployment platform, ScienceOps.The webinar will take place on Wednesday, March 22 at 2 PM EST. Get your invite to the Yhat webinar today!
 
 

Data Science Articles & Videos

 
  • Beyond The Tip: A Data-Driven Exploration of Archer
    Archer has run for 7 seasons with an 8th on the way, it follows the title character and a team of spies and administrative staff as they battle rival spy agencies, the KGB, arms dealers, drug lords, kidnappers, paramilitarios, Welsh separatists, cyborgs, clones, tigers, crocodiles, alligators, and if we're in the Orinoco drainage basin, the black cayman, which can grow up to 20 feet long. In an attempt to better see that structure, we've used data analysis and data visualization of the captioning of the shows...
  • Voronoï playground : interactive weighted Voronoï study
    This block experiments weighted Voronoï diagram. Weighted Voronoï diagram comes in several flavours (additive/multiplicative, powered/not-powered, 2D/3D and highier dimensions, ...), but this block focuses on the 2D additive weighted power diagram. It helps me to understand the basics (properties, underlying computations, meanings, ...) of such diagram...
  • Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
    Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution...
  • Surprise Maps: Showing the Unexpected
    Surprise maps are useful when the raw numbers, by themselves, don’t tell us much: visual patterns might look complex but convey only statistical noise, or patterns may look simple but hide the really interesting features...
  • What’s wrong with my time series? Model validation without a hold-out set
    Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quick-response algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. The trouble is that the usual approach of cross-validation doesn’t work for time series models. The reason is simple: time series data are autocorrelated so it’s not fair to treat all data points as independent and randomly select subsets for training and testing. In this post I’ll go through alternative strategies for understanding the sources and magnitude of error in time series...
  • Billion-scale similarity search with GPUs
    This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art...
  • Deep and Hierarchical Implicit Models
    Implicit probabilistic models are a very flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. They combine the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks...
 
 

Jobs

 
  • Data Scientist - Gilt Groupe - NYC

    The Data team is composed of data engineers and data scientists, and sits within the Gilt Tech organization. Data engineers extract, load and transform data, then empower business users to build dashboards and interpret data. Data scientists use the tools of statistics and machine learning to solve hard problems around the business. We have data crying out for attention. Whether you’re interested in consumer behavior, pricing and online commerce, retail and fashion, logistics and operations - we have rich, clean data to tackle nearly any subject...
 
 

Training & Resources

 
  • Pre-trained word vectors
    We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia. These are vectors in dimension 300, trained with the default parameters of fastText...
 
 

Books

 

  • Data Scientists at Work

    "A collection of interviews with 16 of the world's most influential and innovative data scientists from across the spectrum of this hot new profession - from Yann LeCun at Facebook to Jake Porway at DataKind"...


    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.