Data Science Weekly Newsletter - Issue 44

Issue #44

Sept 25 2014

Editor Picks

  • Predicting NYC Taxi Tips
    After cleaning and getting a sample from the original dataset, it's possible to predict, with an accuracy of 71.74%, if the tip of a trip in a NYC taxi is going to be less than 20% or greater than or equal to 20% of the charge, without the possibility to use information about the passengers...
 
 

Data Science Articles & Videos

 
  • From Reducing Friendly Fire to Analyzing Social Data: Joseph Misiti Interview
    We recently caught up with Joseph Misiti, co-founder of Math & Pencil, SocialQ and more! We were keen to learn more about his background, his work at SocialQ, and thoughts on how Data Science is evolving. Also, given his thought-provoking article "Why becoming a data scientist is NOT actually easier than you think", we were keen to garner his advice on how best to enter the field...
  • Political Ideology Detection Using Recursive Neural Networks
    Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence...
  • Comparing machine learning models in R
    While preparing for the DataWeek R Bootcamp that I conducted this week I came across the following gem. This code, based directly on a Max Kuhn presentation of a couple years back, compares the efficacy of two machine learning models on a training data set...
  • Multicore LDA in Python: from over-night to over-lunch
    Latent Dirichlet Allocation (LDA), one of the most used modules in gensim, has received a major performance revamp recently. Using all your machine cores at once now, chances are the new LdaMulticore class is limited by the speed you can feed it input data. Make sure your CPU fans are in working order!...
  • Anomaly Detection: A Survey
    Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection...
 
 

Jobs

 
  • Data Scientist - Birchbox - NYC

    Birchbox is looking for a data scientist who is a great software engineer – or a great software engineer who really knows statistics and machine learning – to work on data-driven tasks...
 
 

Training & Resources

 
  • Machine Learning for (Smart) Dummies
    One of the benefits of the open academic collaborations that Yahoo Labs encourages, including mine, is the knowledge transfer each party brings to the table. It is in the same spirit of collaboration and open discourse that we are offering all of the seven classes below for your professional and/or personal enrichment...
 
 

Books

 

  • The Cartoon Guide to Statistics

    Covers all the central ideas of modern statistics...

    "This book is exceptional in its ability to communicate difficult concepts in a light and entertaining manner..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.