Data Science Weekly Newsletter - Issue 24

Issue #24

May 8 2014

Editor Picks

 
  • Wine Classification using Neural Networks

    Neural networks can solve some really interesting problems once they are trained. They are particularly well suited for complex decision boundary problems over many variables. In this demo we will try to build a neural network that can classify wines from three wineries by thirteen attributes...
  • Spark is on fire

    Spark is on the rise, to an even greater degree than I thought last month...
 
 

Data Science Articles & Videos

 
  • How to create a Data-Driven Organization: One Year On
    A year ago, I wrote a well-received post here entitled How do you create a data-driven organization?". I had just joined Warby Parker and set out my various thoughts on the subject at the time, covering topics such as understanding the business and customer, skills and training, infrastructure, dashboards and metrics. One year on, I decided to write an update. So, how did we do?...
  • Spatial Localization of Recent Ancestors for Admixed Individuals
    Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over non-model-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources...
  • Smart Umbrellas 'could collect Rain Data'
    How would you fancy being a mobile weather station? Rolf Hut, from Delft University of Technology in The Netherlands, plans to turn our umbrellas into rain gauges. His prototype smart brolly has a sensor that detects raindrops falling on its canvas, and uses bluetooth to send this information via a phone to a computer...
  • Eurovision 2014: First predictions
    For the last two years, I’ve been publishing the results of a statistical model for predicting the results of the Eurovision Song Contest. This year’s final takes place on Saturday in an abandoned shipyard in Copenhagen, so it’s time for some more predictions. I’ve made some small changes to the model this year, which have had huge consequences for the results, which I think should be a lot more accurate now....
  • Kaggle LSHTC4 Winning Solution
    Our winning submission to the 2014 Kaggle competition for Large Scale Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of sparse generative models extending Multinomial Naive Bayes. This document describes the models and software used to build our solution...
  • Intuition for Simulated Annealing
    This post develops the intuition behind simulated annealing via lots of pictures. It's self-contained and ought to be accessible to those without a math-centric background. It also serves as a gentle introduction to more technical discussions...
 
 

Jobs

   
 

Training & Resources

 
  • Yann LeCun will be doing an AMA in /r/MachineLearning on May 15 4PM EST
    I'm happy to announce Director of AI Research at Facebook/NYU Professor Yann LeCun will be stopping by /r/MachineLearning on May 15 4:00-6:00 PM EST for an AMA. Based on the success of the last AMA, a thread will be created before the official AMA time for those who won't be able to attend...
  • Billion Words: Because today's language modeling standard should be higher

    We [Google Research] are releasing scripts that convert a set of public data into a language model consisting of over a billion words, with standardized training and test splits, described in an arXiv paper. Along with the scripts, we’re releasing the processed data in one convenient location, along with the training and test data...
  • JHU Data Science: More is More
    Today Jeff Leek, Brian Caffo, and I are launching 3 new courses on Coursera as part of the Johns Hopkins Data Science Specialization...
  • 15 In-Depth Data Scientist Interviews

    Over the past few months we have been lucky enough to conduct in-depth interviews with 15 different Data Scientists for our blog. The 15 interviewees have varied roles and focus areas: from start-up founders to academics to those working at more established companies; working across healthcare, energy, retail, agriculture, travel, dating, SaaS and more...
 
 

Books

 

  • Data Just Right: Introduction to Large-Scale Data & Analytics

    Released Dec 2013 this book is well rated (4.7 out of 5 stars on Amazon)...

    "If you work with expensive enterprise strength data management/analysis products like SAS and Oracle and you want a book that will give you a map to cover the open source tools for dealing with "big data" (i.e., Hadoop, Hive, and Pig) get this. It does an amazingly good job of explaining the utility of the various tools that are used to manage *HUGE* data."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.