Data Science Weekly Newsletter - Issue 73

Issue #73

April 16 2015

Editor Picks
 
  • Reddit AMA with Andrew Ng (Baidu)
    Dr. Andrew Ng is Chief Scientist at Baidu. He leads Baidu Research, which includes the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, speech recognition, and semantic intelligence...
  • How Airbnb uses machine learning to detect host preferences
    What started as a small research project resulted in the development of a machine learning model that learns our hosts’ preferences for accommodation requests based on their past behavior. For each search query that a guest enters on Airbnb’s search engine, our model computes the likelihood that relevant hosts will want to accommodate the guest’s request...
  • Can Amazon Make Machine Learning More Accessible?
    First we had IBM Watson Analytics, then Microsoft launched Azure Machine Learning. With last week’s launch of Amazon Machine Learning, the e-commerce giant is the latest tech giant to attempt to democratize the development and use of machine learning models and make the technology useful to people who aren’t data scientists...
 
 

A Message from this week's Sponsor

 

  • Want to be a Data Scientist, but don't know where to start?
    Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.

    Spots are limited; registration ends in 48 hours!
 
 

Data Science Articles & Videos

 
  • Building a Client-side Blog Search Algorithm
    A few months ago we noticed that our blog was getting really hard to navigate. One of the most frequent requests we would get from people was the ability to search for posts. Sounds a bit obvious but we were all a bit surprised. Searching for posts?!? How many posts can we possibly have? Do we really have that many that people need to be able to search for them?...
  • Topic modeling on beer reviews
    The purpose of this project is to investigate topic modeling in multi-aspect reviews. More specifically, I wanted to investigate a way to find the words in reviews which were associated with the different categories being rated. Since I, like seemingly all data sciencists, love beer, I was thrilled to find a dataset containing about 1.5 million beer reviews from the beeradvocate website. Below is a summary of my workflow and findings in playing around with this dataset...
  • Chief Data Officer Role Shifts to Offense
    Many organizations are redefining the chief data officer role from one solely concerned with data governance to a role responsible for helping generate new information-based products and services...
  • Extracting Structured Data From Recipes Using Conditional Random Fields
    NYT Cooking launched last fall with over 17,000 recipes that users can search, save, rate and (coming soon!) comment on. The product was designed and built from scratch over the course of a year, but it relies heavily on nearly six years of effort to clean, catalogue and structure our massive recipe archive...
  • Becoming a Full-Stack Statistician in 6 Easy Steps
    It's been a fun challenge to go from being an academic statistician to a practicing data scientist deep in the trenches of the software industry. Here are a few essential skills that I have had to pick up along the way. Remember, to become a full-stack statistician, try to be as fast as possible in each of the following categories...
 
 

Jobs

 
  • Data Scientist - Custora - New York, NY

    Imagine you had a limitless amount of data for every single customer for a major retailer: every product purchased online and in-store, every email opened and clicked, every product viewed on the website, every advertisement impression viewed, and more. With all that data at your fingertips, how could improve the retailer's marketing efforts?...
 
 

Training & Resources

 
  • Jake VanderPlas - Machine Learning with Scikit-Learn (I) - PyCon 2015
    This tutorial will offer an introduction to the core concepts of machine learning and the Scikit-Learn package. We will introduce the scikit-learn API, and use it to explore the basic categories of machine learning problems and related topics such as feature selection and model validation, and practice applying these tools to real-world data sets...
  • Research papers that changed the world of Big Data
    If you are looking for some of the most influential research papers that revolutionised the way how we gather, aggregate, analyze and store increasing volumes of data in a short span of 10 years, you are in the right place!...
  • Advice for applying Machine Learning
    This post is based on a tutorial given in a machine learning course at University of Bremen. It summarizes some recommendations on how to get started with machine learning on a new problem...
 
 

Books

 

  • Learn R in a Day

    Clear and efficient way to get up and running in R......

    "I was delighted with this little book...it got me functional with R, able to enter, manipulate, and plot data usefully in less than 8 hours of work..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Enjoyed the newsletter? Please forward it along to friends and colleagues - we'd love to have them onboard! - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.