Data Science Weekly Newsletter - Issue 96

Issue #96

September 24 2015

Editor Picks
 
  • What it's like to be on the Data Science job market
    Sooner or later you're going to find yourself looking for a data science job. Maybe it's your first one or maybe you're changing jobs. Even if you're fully confident in your skills, have no impostor syndrome, and have tons of inside leads at great companies, it's a tremendously stressful experience. The process of looking for a new job is often one that occurs secretly and confidentially and then is so exhausting that discussing the process is the last thing you want to do. I hope to change that...
  • From Political Science to Data Science
    Today it's been a year since I started working as a data scientist. Before that I was doing my Ph.D., in political science. I wonder what other people who've made this sort of transition - from some social science to data science - have learned. Here's what I've found out (so far). Maybe this will encourage others to share their own experiences...
 
 

A Message from this week's Sponsor:
Continuum Analytics

 

 
 

Data Science Articles & Videos

 
  • A Short(-ish) Introduction to Using R Packages for Baseball Research
    There are some great resources out there for learning R and for learning how to analyze baseball data with it. In fact, a few pretty smart people wrote a fantastic book on the subject, coincidentally titled Analyzing Baseball Data with R. I can’t say enough about this book as a reference, both for baseball analysis and for R. Go and buy it. What follows is in no way a substitute for that book; instead, think of this as a quick reference based on some of the tools that I regularly use (or in some cases, should probably use more)...
  • Making Meaningful Restaurant Recommendations at OpenTable
    At OpenTable, recommendations play a key role in connecting diners with restaurants. Our methods go beyond using the diner-restaurant interaction history as the sole input — we use click and search data, the metadata of restaurants, as well as insights gleaned from reviews, together with any contextual information to make meaningful recommendations. In this talk, I will highlight the main aspects of our recommendation stack built with Scala using Apache Spark...
  • Facebook’s Cyborg Virtual Assistant Is Learning from Its Trainers
    Late last month a few hundred lucky users of Facebook’s mobile messaging app got an unusual new contact to talk with: M, a virtual assistant powered by a mixture of algorithms and human operators. That cyborg design makes M capable of handling much more complex requests than the mobile app assistants that Apple, Microsoft, and Google offer in their smartphone software...
  • Is Personalized Discovery A Feature, Category Or New Paradigm
    There are so many great choices available in the digital realm, and new stuff is pouring in every second. Many times we feel helpless in front of such an abundance of endless possibilities. Nevertheless, so far no one has created a solution that would automatically bring all the interesting options to your fingertips without you asking for it. A universal personalized Discovery solution doesn’t exist yet. Why?...
  • Deep Style: Inferring the Unknown to Predict the Future of Fashion
    In this post we'll look specifically at how to build an automated process using photographs of clothing to quantify the style of some of items in our collection. Then we will use this style model to make new computer generated clothing like the image to the right...
  • Some Nice ML Libraries
    I recently had a go at the Kaggle Acquire Valued Shoppers Challenge. This competition was a bit special in that the dataset was 22 GB, one of the biggest datasets they’ve had in a competition. While 22 GB may not quite qualify as big data, it’s certainly something that your average laptop will choke on when using standard methods. Ordinarily I’d reach for scikit-learn for these tasks, but in this case some of the methods in scikit-learn were a bit slow, and some other libraries had nice methods that scikit-learn didn’t have, so I decided to try out some other libraries...
  • Predicting Cab Booking Cancellations
    The business problem tackled here is trying to improve customer service for YourCabs, a cab company in Bangalore. The problem of interest is booking cancellations by the company due to unavailability of a car. The challenge is that cancellations can occur very close to the trip start time, thereby causing passengers inconvenience...
  • How To Show Awareness Of The Wider Commercial Impact Of Data Science
    When interviewing for a data science job you will have to show your programming abilities. You will have to show your math and stats abilities. You will also have to show your business expertise. It is this last step, "showing your business expertise", that will often trip people up. If you trip up on this last part of the interview, the feedback that companies will give you will be around your lack of awareness of the wider commercial impact of data science to their business...
 
 

Jobs

 
  • Data Analyst - Memorial Sloan Kettering Cancer Center - NYC

    As one of the world's premier cancer centers, Memorial Sloan Kettering Cancer Center is committed to exceptional patient care, leading-edge research, and superb educational programs. The Development Analytics group within the Office of Development provides information, analysis, and tools to support strategic management decision-making and effective fundraising programs. Our team is small and collaborative, working with IT, marketing, and fundraisers to strengthen our operation. We are currently seeking an enthusiastic Data Analyst to join our team...
 
 

Training & Resources

 
  • Machine Learning Cheat Sheet - Classical equations and diagrams
    This cheat sheet contains many classical equations and diagrams on machine learning, which will help you quickly recall knowledge and ideas on machine learning. The cheat sheet will also appeal to someone who is preparing for a job interview related to machine learning...
 
 

Books

 

  • The Signal and the Noise: Why So Many Predictions Fail — but Some Don't

    Very well reviewed...

    "This is the best general-readership book on applied statistics that I've read. Short review: if you're interested in science, economics, or prediction: read it. It's full of interesting cases, builds intuition, and is a readable example of Bayesian thinking."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.