Data Science Weekly Newsletter - Issue 77

Issue #77

May 14 2015

Editor Picks
 
  • Software development skills for data scientists
    Data scientists often come from diverse backgrounds and frequently don't have much, if any, in the way of formal training in computer science or software development. That being said, most data scientists at some point will find themselves in discussions with software engineers because of some code that already is or will be touching production code...
 
 

Data Science Articles & Videos

 
  • Analytics Of Optimal 2-for-1 Strategy In NBA Basketball
    For those that don’t watch basketball, the two for one shot involves a strategic play at the end of the quarter... Jeff Van Gundy...said all the analytics guys push to take the 2 for 1 shot in every situation, but asked if it’s a tied fourth quarter Game 7 of the NBA finals, would it be smart to go for the 2 for 1 situation...It’s a hard data science questions as there aren’t that many past Game 7’s in the NBA even, much less ones that get decided by fewer than one or two possessions. But still a 2 for 1 situation just makes intuitive sense at the very least. Two chances to score are always better than the one. But how much better is it?...
  • Forbidden Data - Wyoming just criminalized citizen science
    Wyoming doesn’t, of course, care about pictures of geysers or photo competitions. But photos are a type of data, and the new law makes it a crime to gather data about the condition of the environment across most of the state if you plan to share that data with the state or federal government...
  • Using The Right Tool For the Job - Prototyping in R, Productionalizing in Go
    Slides from a talk given by Drew Lanenga, a Data Scientist at Lytics, about how we try to balance the best of both languages at Lytics. We'll walk through a changepoint detection package we recently made and open sourced, prototyped and validated in R, then ported to Go to fit in nicely with the rest of our stack...
  • Deep Learning for Image Understanding in Planetary Science
    I stumbled upon a tweet by Leon Palafox, a Postdoctoral Fellow at the The University of Arizona Lunar and Planetary Laboratory, and reached out to him to discuss his success with GPUs and share it with other developers interested in using deep learning for image processing...
  • Genetic Programming in Python, with a scikit-learn inspired API
    While Genetic Programming (GP) can be used to perform a very wide variety of tasks, gplearn is purposefully constrained to solving symbolic regression problems. This is motivated by the scikit-learn ethos, of having powerful estimators that are straight-forward to implement...
  • Movie Recommendation with MLlib
    In this chapter, we will use MLlib to make personalized movie recommendations tailored for you. We will work with 10 million ratings from 72,000 users on 10,000 movies, collected by MovieLens. This dataset is pre-loaded in the HDFS on your cluster in /movielens/large. For quick testing of your code, you may want to use a smaller dataset under /movielens/medium, which contains 1 million ratings from 6000 users on 4000 movies...
  • Logistic Regression and Gradient Descent
    Logistic regression is an excellent tool to know for classification problems. Classification problems are problems where you are trying to classify observations into groups. To make our examples more concrete, we will consider the Iris dataset. The iris dataset contains 4 attributes for 3 types of iris plants. The purpose is to classify which plant you have just based on the attributes...
 
 

Jobs

 
  • Data Scientist - GrubHub - NYC

    GrubHub Holdings Inc. is the nation's leading online and mobile food-ordering company dedicated to connecting hungry diners with local takeout restaurants. The GrubHub Holding Inc. portfolio of brands includes GrubHub, Seamless, MenuPages and Allmenus. The company's online and mobile ordering platforms allow diners to order directly from thousands of takeout restaurants across the country and London, and every order is supported by the company's 24/7 customer service. GrubHub Holdings Inc. has offices in Chicago, New York City and London. With a career at GrubHub Holdings Inc., you can order your cake and eat it, too!...
 
 

Training & Resources

 
  • New York R Conference Videos Are Up
    2015 New York City R Conference - videos include talks by Chris Wiggins (Chief Data Scientist, The New York Times), Hilary Parker (Senior Data Analyst, Etsy, Inc.), Andrew Gelman (Statistics Professor, Columbia University), Saar Golde (Chief Data Scientist, Knowledgent), and many more...
  • PyNeural: A simple but fast Python library for training neural networks
    PyNeural is a neural network library written in Cython which is powered by a simple but fast C library under the hood. PyNeural uses the cblas library to perform the backprogation algorithm efficiently on multicore processors. PyNeural exposes a simple Python API that plays nicely with NumPy, making it easy for you to munge your data sets as needed and quickly use them to train a neural network for classifiction...
 
 

Books

 

  • Statistics Done Wrong: The Woefully Complete Guide

    A pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free...

    "If you're interested in learning about the current debates/ problems/ challenges in statistics, this is a good primer. The book is written for scientists, but it can benefit anyone who deals with data and analysis on a regular basis..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.