
Editor Picks
 On the Origin of Deep Learning
This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks, and extends to popular recent models like variational autoencoder and generative adversarial nets...
 Mathematicians becoming data scientists: Should you? How to?
I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter. I asked her: so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry? How would you know whether you might find that kind of work enjoyable? And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?...
 Selfdriving cars in the browser
The goal of this project was to create a fully selflearning agent, that would be able to control a car in a 2D bottomdown environment. Written solely in JavaScript...
A Message from this week's Sponsor:
 Yhat Demo Webinar
Join Yhat cofounder Greg Lamp for a live tour of Yhat's product suite using a beer recommender algorithm as an example. We'll demo our opensource Python IDE, Rodeo, our centralized data science hub, Bandit, and finally our model deployment platform, ScienceOps.The webinar will take place on Wednesday, March 22 at 2 PM EST. Get your invite to the Yhat webinar today!
Data Science Articles & Videos
 Beyond The Tip: A DataDriven Exploration of Archer
Archer has run for 7 seasons with an 8th on the way, it follows the title character and a team of spies and administrative staff as they battle rival spy agencies, the KGB, arms dealers, drug lords, kidnappers, paramilitarios, Welsh separatists, cyborgs, clones, tigers, crocodiles, alligators, and if we're in the Orinoco drainage basin, the black cayman, which can grow up to 20 feet long. In an attempt to better see that structure, we've used data analysis and data visualization of the captioning of the shows...
 Voronoï playground : interactive weighted Voronoï study
This block experiments weighted Voronoï diagram. Weighted Voronoï diagram comes in several flavours (additive/multiplicative, powered/notpowered, 2D/3D and highier dimensions, ...), but this block focuses on the 2D additive weighted power diagram. It helps me to understand the basics (properties, underlying computations, meanings, ...) of such diagram...
 Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learningbased computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with singleprecinct resolution...
 Surprise Maps: Showing the Unexpected
Surprise maps are useful when the raw numbers, by themselves, don’t tell us much: visual patterns might look complex but convey only statistical noise, or patterns may look simple but hide the really interesting features...
 What’s wrong with my time series? Model validation without a holdout set
Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quickresponse algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. The trouble is that the usual approach of crossvalidation doesn’t work for time series models. The reason is simple: time series data are autocorrelated so it’s not fair to treat all data points as independent and randomly select subsets for training and testing. In this post I’ll go through alternative strategies for understanding the sources and magnitude of error in time series...
 Billionscale similarity search with GPUs
This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at dataparallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as kmin selection, or make poor use of the memory hierarchy. We propose a design for kselection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art...
 Deep and Hierarchical Implicit Models
Implicit probabilistic models are a very flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. They combine the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks...
Jobs

The Data team is composed of data engineers and data scientists, and sits within the Gilt Tech organization. Data engineers extract, load and transform data, then empower business users to build dashboards and interpret data. Data scientists use the tools of statistics and machine learning to solve hard problems around the business. We have data crying out for attention. Whether you’re interested in consumer behavior, pricing and online commerce, retail and fashion, logistics and operations  we have rich, clean data to tackle nearly any subject...
Training & Resources
 Pretrained word vectors
We are publishing pretrained word vectors for 90 languages, trained on Wikipedia. These are vectors in dimension 300, trained with the default parameters of fastText...
Books

"A collection of interviews with 16 of the world's most influential and innovative data scientists from across the spectrum of this hot new profession  from Yann LeCun at Facebook to Jake Porway at DataKind"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :)  All the best, Hannah & Sebastian



