Data Science Weekly Newsletter

Issue

January 9, 2014

‍

Editor's Picks

‍

How Google Cracked House Number Identification In Street View
Google can identify and transcribe all the views it has of street numbers in France in less than an hour, thanks to a neural network that’s just as good as human operators. Now its engineers reveal how they developed it...

Future Of Neural Networks & MLaaS: Dave Sullivan Interview (Ersatz CEO)
We recently caught up with Dave Sullivan, Founder and CEO of Blackcloud BSG - the company behind Ersatz - and host of the San Francisco Neural Network Aficionados group. We were keen to learn more about his background (from history grad to Machine Learning expert), recent developments in Neural Networks/Deep Learning and how Machine Learning as a Service (MLaaS) is evolving...

ConvNetJS: Deep Learning In Your Browser
ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat...

‍

‍

5 Things I’ve Learned About Data Science
A few months ago I started a new job for the first time in 10 years, leaving my comfortable home at a government FFRDC for an exciting opportunity with the new data science and analytics team at FitnessKeeper. Here are a few of the things I’ve learned...

How I Made $500k With Machine Learning And High Frequency Trading
This post will detail what I did to make approx. 500k from high frequency trading from 2009 to 2010. Since I was trading completely independently and am no longer running my program I’m happy to tell all. My trading was mostly in Russel 2000 and DAX futures contracts...

The Mathematics Of Gamification
At Foursquare, we maintain a database of 60 million venues. Like many existing crowd-sourced datasets (Quora, Stack Overflow, Amazon Reviews), we assign users points or votes based on their tenure, reputation, and the actions they take. Superusers like points and gamification. But data scientists like probabilities and guarantees. We’re interested in making statements like, “we are 99% confident that each entry is correct.” How do we allocate points to users in a way that rewards them for behavior but allows us to make guarantees about the accuracy of our database?...

How Netflix Re-Engineered Hollywood
To understand how people look for movies, the video service created 76,897 micro-genres. We took the genre descriptions, broke them down to their key words, … and built our own new-genre generator. Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies...

How To Build A Mind
Joscha Bach presents a foray into the present, future and ideas of Artificial Intelligence. Are we going to build (beyond) human-level artificial intelligence one day? Very likely. When? Nobody knows, because the specs are not fully done yet. But let me give you some of those we already know, just to get you started...

FrankenImage: An Image-Based, Non-Photorealistic Renderer
The goal of FrankenImage is to reconstruct input (target) images with pieces of images from a large image database. FrankenImage is deliberately in contrast with traditional photomosaics... [it] aims for component database images to be as large as possible in the final composition, taking advantage of structure in each database image, instead of just its average color...

Six Novel Machine Learning Applications
Cutting-edge startups (as well as established tech companies and Universities) are increasingly finding new, novel, and exciting ways to apply powerful machine learning tools such as neural networks to existing problems in many different industries. Below is a list of 10 of the most interesting applications...

How Google Is Using People Analytics To Completely Reinvent HR
Most companies on the top 20 market cap list could be accurately described as “old school,” because most can attribute their success to being nearly half a century old, having a long established product brand, or through great acquisitions. Google’s market success can instead be attributed to what can only be labeled as extraordinary people management practices that result from its use of “people analytics.”...

Prismatic's Schema For Server & Client-Side Data Shape Validation
Aria Haghighi introduces Schema, a Clojure and ClojureScript library for declaring and validating the shape of data. One of the difficulties with bringing Clojure into a team is the overhead of understanding the kind of data (e.g., list of strings versus, nested map from long to string to double) that a function expects and returns. While a full-blown type system is one solution to this problem, we present a lighter weight solution: schemas...

‍

‍

Senior Data Engineer/Scientist - Streaming Activity - Netflix, Los Gatos, CA
Netflix takes its data seriously and leverages it as part of our core culture to make data-driven decisions to steer product development, and we've only scratched the surface in the types of deeper analytics we'd like to do! Want to help? We're looking for an additional data engineer/scientist to work directly with the product development and streaming platform teams to build better data frameworks and dig into analytics regarding the quality and performance of the streaming experience...

‍

‍

Learning From Data: CalTech Online Course
Hours of CalTech Online Machine Learning Course Materials! ...

Some Things R Can Do You Might Not Be Aware Of
There is a lot of noise around the "R versus Contender X" for Data Science. I think the two main competitors right now that I hear about are Python and Julia. I'm not going to weigh into the debates because I go by the motto: "Why not just use something that works?". So I thought I'd point out a few cool things that R can do...

How To Identify Outliers In Your Data
There are many methods and much research put into outlier detection. Start by making some assumptions and design experiments where you can clearly observe the effects of those assumptions against some performance or accuracy measure. I recommend working through a stepped process from extreme value analysis, proximity methods and projection methods...

A Simple Machine Learning Method To Detect Covariate Shift
In this post, I’m going to show you how to use Machine Learning (as it couldn’t be otherwise) to quickly check whether there’s a covariate shift between training data and production data. You read it right: Machine Learning to learn whether machine-learned models will perform well or not...

‍