Data Science Weekly Newsletter - Issue 45

Issue #45

Oct 2 2014

Editor Picks

  • How BuzzFeed Thinks About Data Science
    Jonah Peretti hired BuzzFeed’s first data scientist in 2010, to predict when and how articles would go viral on the Internet. It’s a hard problem. We are still thinking about this same question today, but our canvas has changed. BuzzFeed now covers news, politics, business, tech, entertainment, food, international coverage and much more, reaching over 150 million unique visitors a month. The data science team has evolved alongside it...
  • Teach statistics before calculus! [TED Talk]
    Someone always asks the math teacher, "Am I going to use calculus in real life?" And for most of us, says Arthur Benjamin, the answer is no. He offers a bold proposal on how to make math education relevant in the digital age...
 
 

Data Science Articles & Videos

 
  • How big data is unfair
    As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way...
  • Integrating Kafka & Spark Streaming: Code Examples & State of the Game
    Spark Streaming has been getting some attention lately as a real-time data processing tool, often mentioned alongside Apache Storm. If you ask me, no real-time data processing tool is complete without Kafka integration (smile), hence I added an example Spark Streaming application to kafka-storm-starter that demonstrates how to read from Kafka and write to Kafka, using Avro as the data format and Twitter Bijection for handling the data serialization...
  • Artificial General Intelligence that plays Atari video games:
    How did DeepMind do it?

    Last December, an article named “Playing Atari with Deep Reinforcement Learning” was uploaded to arXiv by employees of a small AI company called DeepMind. Two months later Google bought DeepMind for 500 million euros, and this article is almost the only thing we know about the company . Currently our team is trying to replicate their artificial mind, and in this post we describe its inner workings...
  • Gilt and Preemptive Shipping: A Q&A with Our Chief Data Scientist
    The Gilt tech team doesn’t need an in-house psychic to help us predict which customers will buy products we’ve never sold before. Instead, we rely on the data wizardry performed by our Principal Data Scientist, Igor Elbert, who has been helping us to refine our product performance predictions (say that three times fast) by using various machine learning and predictive modeling techniques...
  • In Chicago, food inspectors are guided by big data
    In Chicago, just 32 food inspectors — called sanitarians — are responsible for auditing the city’s more than 15,000 restaurants. Traditionally, sanitarians are assigned beats, or groups of restaurants, that they inspect a few times a year, depending on a restaurant’s assessed risk level: How complex a restaurant’s menu items are, and how likely ingredients are to trigger food poisoning. Today, the city is experimenting with a new technology to guide where those inspections should occur, based on factors such as current weather, nearby construction and past health code violations...
 
 

Jobs

 
  • Data Scientist, Tradesy - Santa Monica, CA

    Tradesy is a new kind of peer-to-peer marketplace that addresses the pain-points associated with selling on sites like eBay and Craigslist. Our mission is to make it simple and delightful for anyone to sell the unused or underused goods cluttering their closets. We have millions of passionate members, a product that people love, and an office with an ocean view in sunny Santa Monica. We're backed by some of the best investors around, including KPCB and Sir Richard Branson. But enough about us, lets talk about you! You're a Data Scientist that pushes the limits of what can be possible with data. You have a high-level of ownership over your work and absolutely hate to be micro-managed...
 
 

Training & Resources

 
  • Hacker's guide to Neural Networks
    I've worked on Deep Learning for a few years as part of my research and among several of my related pet projects is ConvNetJS - a Javascript library for training Neural Networks. Javascript allows one to nicely visualize what's going on and to play around with the various hyperparameter settings, but I still regularly hear from people who ask for a more thorough treatment of the topic. This article (which I plan to slowly expand out to lengths of a few book chapters) is my humble attempt. It's on web instead of PDF because all books should be, and eventually it will hopefully include animations/demos etc...
  • Bayesian models in R
    There are many ways to run general Bayesian calculations in or from R. To get an idea on their comparison I decided to run a number of calculations through all of them...
 
 

Books

 

  • Thoughtful Machine Learning: A Test-Driven Approach

    JUST RELEASED!: Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis...

    "In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.