Data Science Weekly Newsletter - Issue 87

Issue #87

July 23 2015

Editor Picks
  • Two examples of why machine learning is becoming the
    most powerful way to increase revenue

    From recommendations and personalization to ads and e-commerce, companies like Google, Facebook, Amazon, Netflix, and LinkedIn have been increasing revenue and engagement with machine learning for years. The success stories that follow show how we’re leveling the playing field by helping product teams and publishers leverage the same technology as these tech giants, without the need to build it in house...
  • Exploring the shapes of stories using Python and sentiment APIs
    Using two hacks and a multinomial logistic regression model of n-grams with TF-IDF features, a pre-trained sentiment model can score the long-range sentiment of text of stories, books, and movies. The models do a reasonable job of summarizing the “shapes of stories” directly from text...

A Message from this week's Sponsor


  • Want to be a Data Scientist, but don't know where to start?
    Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.

    Spots are limited; registration ends in 48 hours!

Data Science Articles & Videos

  • Deriving the Reddit Formula
    A few things about Reddit's hot formula have always bothered me. The formula has obviously been a success when it comes to setting the Internet on fire, but I have to wonder...
  • Is there a simple algorithm for intelligence?
    The question I explore here is whether there is a simple set of principles which can be used to explain intelligence? In particular, and more concretely, is there a simple algorithm for intelligence?...
  • Split Testing for Geniuses
    You are sitting at a slot machine with two levers, labeled A and B. When you pull a lever, sometimes a dollar comes out of the slot and sometimes not. The casino tells you that each lever has a fixed chance of giving you a dollar (its success rate) but, of course, they don’t tell you what it is. Since you don’t have any way of distinguishing them to start, you pull lever A and a dollar comes out (Yipee!). What do you do next?...
  • Kaggle Competition Tips & Summaries
    Over the years, I’ve participated in a few Kaggle competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions....
  • Data Science at Agari: Forwarder Classification
    Among the challenges that our engineering team faces is the ability to classify an email-sending entity as a forwarder. At Agari, we are primarily interested in the authentication of emails from originating senders...
  • Machine learning to predict San Francisco crime
    In today’s post, we document our submission to the recent Kaggle competition aimed at predicting the category of San Francisco crimes, given only their time and location of occurrence. As a reminder, Kaggle is a site where one can compete with other data scientists on various data challenges. We took this competition as an opportunity to explore the Naive Bayes algorithm. With the few steps discussed below, we were able to quickly move from the middle of the pack to the top 33% on the competition leader board, all the while continuing with this simple model!...
  • Deepdream: Avoiding Kitsch
    Yes yes, #deepdream. But as Memo Atkin and others point out, this is going to kitsch as rapidly as Walter Keane and lolcats unless we can find a way to stop the massive firehose of repetitive #puppyslug that has been opened by a few websites letting us upload selfies...


  • Data Scientist - City of New York - NYC

    The successful candidate will serve as a Data Scientist reporting to the Mayor’s Office of Criminal Justice. Responsibilities will include: Gather and convert data into insights to guide policy development and evaluation; work with City agencies to integrate data sets and develop data-informed strategies; utilize programming languages such as SAS, SQL, R, SPSS, Python; develop new approaches to collecting data not presently incorporated into City systems; work closely with operations, policy, and analytic teams to establish and validate models and approaches; and perform special projects and initiatives as assigned...

Training & Resources

  • Pyxley: Python Powered Dashboards
    We have written a Python package, called Pyxley, to not only help simplify the development of web-applications, but to provide a way to easily incorporate custom Javascript for maximum flexibility...



  • Discovering Statistics Using R

    Recommended by several readers of the newsletter...

    "The book is a great overview of statistics concepts and provides a gentle, yet comprehensive, introduction to the R language..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.