Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
Issue
35
July 24, 2014

Editor's Picks

  • Dropout: A Simple Way to Prevent Neural Networks from Overfitting
    Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem...
  • Introducing tidyr
    tidyr is new package that makes it easy to “tidy” your data. Tidy data is data that’s easy to work with: it’s easy to munge (with dplyr), visualise (with ggplot2 or ggvis) and model (with R’s hundreds of modelling packages)...
  • Airbnb Is Quietly Building the Smartest Travel Agent of All Time
    Under the covers, Airbnb has quietly begun an ambitious effort to painstakingly mine the treasure trove of data contained in the site’s customer reviews and host descriptions to create a smarter way of traveling. It turns outs Airbnb is more than a travel website — it’s a stealth big data company...



Data Science Articles & Videos

  • Leading from the Back: Making Data Science Work at a UX-driven Business
    MailChimp's success as a start-up wasn't built on data. It was built on a user experience that placed an intuitive and friendly interface on email marketing and removed much of the busy work. So how does a company whose business is not data, use its massive data set? John Foreman, author of the Excel-based data science book Data Smart and Chief Scientist for MailChimp, discusses what it means to "lead from the back" in data science, even if that sometimes means breaking out a spreadsheet in favor of Hadoop...
  • Neglected Machine Learning ideas
    This post is inspired by the “metacademy” suggestions for “leveling up your machine learning.” They make some halfway decent suggestions for beginners. The problem is, these suggestions won’t give you a view of machine learning as a field; they’ll only teach you about the subjects of interest to authors of machine learning books, which is different...
  • From Boom to Bust: Building a Predictive Quarterback Model
    This past off-season I took it upon myself to develop a metric for evaluating quarterback prospects for the NFL draft. My goal was to create a metric that could ultimately help predict which draft-eligible quarterbacks would be most likely to succeed in the NFL by identifying which traits quarterback prospects had in common with successful NFL quarterbacks when they were coming out of college...
  • Data Mining at NASA to Teaching Data Science at GMU: Kirk Borne Interview
    We recently caught up with Kirk Borne, trans-disciplinary Data Scientist and Professor of Astrophysics and Computational Science at George Mason University. We were keen to learn more about his background, his ground-breaking work in data mining and how it was applied at NASA, as well as his perspectives on teaching data science and how he is contributing to the education of future generations...
  • Doing Data Science in a Startup: The Hard Truth
    I hate to break it to you, but a high-tech Internet startup is not a natural environment to do research. Most startups come into existence around a very applicable and practical idea (hopefully), which either requires no scientific research or the core research was already done by the founders before the startup came to be. However, there are a number of advantages that can make startups a much more attractive working experience than classic academic-style research...
  • Aspiring Data Scientist? Here Are Some At Work Project Ideas
    Do you find yourself wanting to move into Data Science but keep hearing "get some data, analyze it, and you'll be fine..."? Have you developed many of the base skills for data science, such as programming, data analysis, and/or visualization but are unsure of how to apply them? Are you looking to differentiate yourself from the ever-growing pile of aspiring "data scientist" who have taken the usual Coursera classes and done Kaggle competitions? You are not alone...



Jobs

  • Software Developer - Data Science - Mailchimp
    MailChimp's Data Science Team is seeking a software developer to help us build internal tools and processes. We don’t care about pedigree or what languages or stacks you’ve worked in, we’re just looking for performance-minded developers that listen hard and change fast. In fact, if you’d rather send us code than polish up your resumé, that works for us. You’ll work with our data scientists and our product developers to turn research into internal services that can move enormous piles of data for statistical analysis...


Training & Resources

  • Fuzzy Matching with Yhat
    Ever had to manually comb through a database looking for duplicates? Anyone that's ever had a data entry job probably knows what I'm talking about. It's not fun! In this post I'm going to show you how you can write a simple, yet effective algorithm for finding duplicates in your data...


Books


  • TThe History of Statistics: The Measurement of Uncertainty before 1900
    A definitive work on the early development of statistics...
    "Stigler is unrivaled as a statistician who researches the history of statistics. This covers the famous mathematicians and statisticians who developed the foundation on which probability and statistics blossomed in the 20th Century. He is thorough and accurate and his writing is always clear and interesting. ..."...


Easy to unsubscribe at any time. Your e-mail address is safe.