Data Science Weekly Newsletter - Issue 130

Issue #130

May 19 2016

Editor Picks
 
  • The Rise of the Data Natives
    We are now witnessing a new revolution — that of data natives who expect their world to be “smart” and seamlessly adapt to their taste and habits...
  • Statistical Power Analysis
    Imagine a scientist planning to run an experiment. A power analysis can help answer questions like: a) Will this experiment work - how likely is it to detect a statistically significant effect?, b) How much data needs to be collected?, and c) What is the smallest effect this experiment can measure?...This visualization illustrates how assumptions about the data generating process affect the likelihood of detecting a significant effect...
 
 

A Message from this week's Sponsor:

 

  • Catenus Science Apprenticeship Program
    The Catenus Science Apprenticeship Program identifies top data scientists who will raise the bar when hired at a startup. To help meet this goal, the program will train qualified candidates to have immediate, meaningful impact as data scientists in some of the top data startups in the world. This program will hone their skills in statistics, machine-learning, programming, and product development by presenting them with real-world challenges put forth by startups in Silicon Valley and the Bay Area.

    We offer a fully-paid, 13-week apprenticeship during which we reinforce technical and business skills. We do this via a mix of formal instruction and hands-on application of data science in some of the best startups in the world...
 
 

Data Science Articles & Videos

 
  • Liberating Data from NYC Property Tax Bills
    So there you have it… we turned 1.1 million pdfs into a high quality open dataset on NYC property taxes, including all exemptions and abatements. Data scientists everywhere, go forth and crunch the numbers...
  • Shot Blocking in the NHL Playoffs
    As a casual observer, it seemed to me like shot blocking was more prevalent during playoff games than the regular season. Intuitively, this would make sense since there is more on the line for each game, but I wanted to take a look at some data to see whether my suspicions were correct...
  • Wikipedia Navigation Vectors
    Wikipedia Navigation Vectors: a semantic embedding of Wikipedia learned from 370M sessions. In this project, we learned embeddings for Wikipedia articles and Wikidata items by applying Word2vec models to a corpus of reading sessions...
  • Feed-forward neural doodle
    Sometimes you sigh you cannot draw, aren’t you? It takes time to master the skills, and you have more important things to do :) What if you could only sketch the picture like a 3-years old and everything else is done by a computer so your sketch looks like a real painting? We make a step towards making such things available for everybody and present an online demo of our fast algorithm...
  • How to create your own Machine Learning Predictive System in the NBA using Python
    Which sports geek wouldn’t like to create their own system for predicting matches, be it if you want to bet or just from an intellectual curiosity? This is not going to be a comprehensive DIY kind of guide, I’m just going to talk about what I found when playing with this stuff for a few months and share some code that will be very useful for anyone that wants to get started with this...
  • Visualising Random Variables
    When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon, despite all the advances in modern technology. However, there are some nice things one can do in an electronic medium, such as this blog. Here, I would like to experiment with the ability to animate images, which I think can convey some mathematical concepts in ways that cannot be easily replicated by traditional static text and images...
 
 

Jobs

 
  • Data Scientist - Jawbone - San Francisco

    Our mission is to deliver better living through data -- from wearable tech to best-in-class Bluetooth headsets and wireless speakers. We’re looking for data scientists who share our passion to transform massive amounts of data into products that delight our customers and improve their lives...
 
 

Training & Resources

 
  • Easier data analysis in Python with pandas (video series)
    pandas is a powerful, open source Python library for data analysis, manipulation, and visualization. If you're working with data in Python and you're not using pandas, you're probably working too hard! In this video series, we'll focus on the functionality that is most important to master, as well as making use of the latest and greatest pandas features. You'll learn current best practices, and you can follow along with every video at home because all datasets used in the series are available online...
  • Impatient R
    Impatient R: a well done guide to beginning to learn the R programming language for impatient, clever people...
 
 

Books

 

  • The Lady Tasting Tea:
    How Statistics Revolutionized Science in the Twentieth Century

    An insightful, revealing history of how mathematics transformed our world...

    "I have taken courses in statistics, taught it many times and solved several statistical problems that have appeared in journals. But until I read this book, I never really thought about it in so deep and philosophical a manner..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.