Data Science Weekly Newsletter - Issue 33

Issue #33

July 10 2014

Editor Picks

  • Understanding Random Forests: From Theory to Practice

    The goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability...
  • Frequentism and Bayesianism: What's the Big Deal?

    Statistical analysis comes in two main flavors: frequentist and Bayesian. The subtle differences between the two can lead to widely divergent approaches to common data analysis tasks. After a brief discussion of the philosophical distinctions between the views, I’ll utilize well-known Python libraries to demonstrate how this philosophy affects practical approaches to several common analysis tasks...
  • Feature Learning Escapades

    My summer internship work at Google has turned into a CVPR 2014 Oral titled "Large-scale Video Classification with Convolutional Neural Networks". Politically correct, professional, and carefully crafted scientific exposition in the paper and during my oral presentation at CVPR last week is one thing, but I thought this blog might be a nice medium to also give a more informal and personal account of the story behind the paper and how it fits into a larger context...

Data Science Articles & Videos

  • Project Jupyter
    Launching Project Jupyter, the evolution of the language-agnostic parts of IPython into an open platform for interactive computation, research and education...
  • Modeling Musical Influence with Topic Models
    The role of musical influence has long been debated by scholars and critics in the humanities, but never in a data-driven way. In this work we approach the question of influence by applying topic-modeling tools to a dataset of 24941 songs by 9222 artists from the years 1922 to 2010...
  • Replicating 538's plot styles in Matplotlib
    After pulling a few graphs locally, sampling colors, and crowd-sourcing the fonts used, I was able to come pretty close to replicating the style in Matplotlib styles. Here's an example (my figure dropped into an article on
  • An Exhaustive List of Google's Ranking Factors
    [...] scoured the internet to find all of the mention of Google ranking factors and created an all-inclusive infographic listing and categorizing 200 factors that make up Google's ranking algorithm. ...
  • Elo ratings (part 4)
    We’ve almost arrived at the end of the ratings and rankings tutorials. I’ll do one more post on Markov ratings, then a couple of posts on ensemble ratings, and then it’ll almost be time for season. This week I’ll be talking about Elo ratings. Originally used to rate and rank chess players, Elo ratings are now used in a number of sports, including by Jeff Sagarin for USA Today. They’re a very simple and elegant way to create ratings...


  • Data Scientist - Predictive Modeler, Humana Inc - Dallas, TX

    Humana needs your analytical skills to help us tell a compelling story about healthcare today. As a Data Scientist/ Predictive Modeler, you will conduct cutting edge analyses that drive Humana’s Behavioral Health strategy and operations. You will collect and analyze large amounts of data from disparate sources to develop and implement sophisticated predictive models that will help improve health outcomes. You will be responsible for presenting technical information to a broad audience. You will use claims, business, consumer and other data sources to develop innovative solutions that ultimately lead to improved clinical outcomes for our members...

Training & Resources

  • SciPy 2014 Videos
    SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development...



  • Love and Math: The Heart of Hidden Reality

    A New York Times Science Bestseller...

    "An interesting combination of an autobiography and the latest developments in the unification of branches of mathematics and their potential application in theoretical physics."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.