Data Science Weekly Newsletter - Issue 92

Issue #92

August 27 2015

Editor Picks
 
  • The Most Timeless Songs of All Time
    This is a story about proving, with data, that No Diggity by Blackstreet is timeless. 20 years have passed since No Diggity's release. Its popularity on Spotify, relative to every other song from the 90s, is a strong signal for whether it will be remembered by our children’s children. So let's examine every song that ever charted, 1990 - 1999, and rank them by number of plays on Spotify, today...
 
 

A Message from this week's Sponsor:
O'Reilly Media

 

  • With datasets growing increasingly large, the need for custom data solutions has soared as well. In O’Reilly’s new Learning Path: Architect and Build Big Data Applications — a logical, well-crafted sequence of video training from people who really know what they're talking about, will take you through the entire process of designing and building data applications that can visualize, navigate, and interpret reams of data. Until September 2, buy this Learning Path—28 hours of video training—for just $99. It's yours forever, to learn at your own pace. For more topics, see the full set.
 
 

Data Science Articles & Videos

 
  • Python, Machine Learning, and Language Wars.
    A Highly Subjective Point of View

    Oh god, another one of those subjective, pointedly opinionated click-bait headlines? Yes! Why did I bother writing this? Well, here is one of the most trivial yet life-changing insights and worldly wisdoms from my former professor that has become my mantra ever since: “If you have to do this task more than 3 times just write a script and automate it.”...
  • Inside the Zestimate: Data Science at Zillow
    If you’re like most homeowners, you probably sneak a peek at your ‘Zestimate’ from time to time to see how your home’s value might have changed. Getting a Zestimate is very easy and straightforward for users, but behind the scenes, there’s a hefty amount of data science that goes into the equation...
  • Detecting Diabetic Retinopathy in Eye Images
    The past almost four months I have been competing in a Kaggle competition about diabetic retinopathy grading based on high-resolution eye images. In this post I try to reconstruct my progression through the competition; the challenges I had, the things I tried, what worked and what didn’t...
  • Style in the Age of Instagram:
    Predicting Success within the Fashion Industry using Social Media

    We therefore seek to understand the ingredients of success of fashion models in the age of Instagram. Combining data from a comprehensive online fashion database and the popular mobile image-sharing platform, we apply a machine learning framework to predict the tenure of a cohort of new faces for the 2015 Spring / Summer season throughout the subsequent 2015-16 Fall / Winter season. Our framework successfully predicts most of the new popular models who appeared in 2015...
  • Meet the Guy who Sorts All the World's Numbers in His Attic
    Neil Sloane is considered by some to be one of the most influential mathematicians of our time. That’s not because of any particular theorem the 75-year-old Welsh native has proved, though over the course of a more than 40-year research career at Bell Labs (later AT&T Labs) he won numerous awards for papers in the fields of combinatorics, coding theory, optics and statistics. Rather, it’s because of the creation for which he’s most famous: the Online Encyclopedia of Integer Sequences (OEIS), often simply called “Sloane” by its users...
  • Large Scale Decision Forests: Lessons Learned
    Fast forward several months and 100+ experiments later, we now have a global decision forest model working as a productive member of our modeling stack. Along the way to shipping our decision forest model we learned quite a few things about working with decision forests for the fraud detection use case. The most interesting lessons we've learned are detailed below...
  • 15 Most Read Data Science Articles in 2015. So far...
    We've compiled the latest set of "most read" articles from the Data Science Weekly Newsletter. This is what is most popular thus far in 2015 - a mix of interesting applications of data science, advice on how best to get into the field, and unique explanations of some of the core concepts / techniques...
  • You Don't Need a Data Scientist (Yet)
    The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they’re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist...
 
 

Jobs

 
  • Data Scientist, Service Operations - Tesla - Fremont, CA

    Tesla Motors uses proprietary technology, world-class design, and state-of-the-art manufacturing processes to create a new generation of highway capable electric vehicles. The service team is looking for a Data Scientist to join its team to exploit the benefits of machine learning and big data to model service use and cost as well as delight our customers with exceptional support after they purchase our vehicle...
 
 

Training & Resources

 
  • Cohort Analysis with Python
    Despite having done it countless times, I regularly forget how to build a cohort analysis with Python and pandas. I’ve decided it’s a good idea to finally write it out - step by step - so I can refer back to this post later on. Hopefully others find it useful as well...
  • Announcing CGT
    I would like to announce a library that I have been working on with a few collaborators1, called the Computation Graph Toolkit (CGT): GitHub / Documentation. For those of you who are familiar with Theano, the main upshot of CGT is this: CGT replicates Theano’s API, but it has very short compilation time and supports multithreading...
 
 

Books

 

  • The History of Statistics: The Measurement of Uncertainty before 1900

    Comprehensive history of statistics from its beginnings around 1700 to its emergence as a distinct and mature discipline around 1900...

    "This book is THE definitive work on the early development of statistics. Obviously written by a man in love with his subject. Bernoulli, de Moivre, Bayes, Laplace, Gauss, Quetelet, Lexis, Galton, Edgeworth and Pearson all but come alive. I particularly enjoyed the reproductions of first sources included that you would otherwise have to travel to Paris to see..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.