Data Science Weekly Newsletter - Issue 21

Issue #21

April 17 2014

Editor Picks

 
  • Data Workflows for Machine Learning

    In this in-depth video, we compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. ...
  • AI Developers to power New Generation of Context-Driven AI

    Spurred by recent advances in machine learning and AI, context-aware intelligent assistants represent the new frontier of content search and discovery. Companies leverage unstructured data—things like photographs, videos, chat logs, documents—to make better, more informed business decisions to automate processes. Now leveraging humanlike capabilities inside automated workflows to augment what’s possible in business and humanity...
 
 

Data Science Articles & Videos

 
  • Deep Learning (or not): The why's have it
    Deep Learning is the big thing in Machine Learning right now, it just netted a win in the Galaxy Zoo Kaggle contest and it is the thing being talked about by major media outlets. The technique rightfully deserves serious attention as it has proven effective in a large number of tasks, but I see one key problem with it that is similar to but actually worse than its cousin Neural Networks; it is a black box that cannot explain "why" at even the highest level...
  • Netflix Reveals All (well, at least a lot)
    The Netflix content team is tasked with the challenge of licensing/ purchasing/ developing the best TV and movies for its 44 million users in 41 countries. This talk covered an overview of what the content data science teams do for the organization towards the goals of identifying characteristics of an “ideal” content library, predicting demand for titles that Netflix does not have, determining the customer impact of adding or losing sets of content, and helping to identify the next original series...
  • Devising Our Data Destiny
    The Hadoop ecosystem is becoming the incumbent data platform. It brings powerful new capabilities for collecting and analyzing data. This technology can be used for both harm and benefit. As a society, we should deliberately address this potential, developing pragmatic approaches...
  • Using Machine Learning To Pick Your Lottery Numbers
    It is well known that people are not creative when they choose their lottery numbers : indeed, they pick their birth dates, draw some particular shapes on the grid (lines, cross, ...), etc. The goal of this notebook is to explore if Machine Learning can help us to discriminate "human generated" combinations, from "machine generated" (a.k.a. random) combinations...
 
 

Jobs

 
  • Sr Data Scientist, Marketing Algorithms/Analytics - Netflix, Los Gatos, CA

    Netflix is seeking an outgoing, curious, interdisciplinary data expert to work as a data miner, statistical modeler and algorithm designer. You'll have opportunity to work closely with marketing decision makers and other sr data scientists to better understand and optimize our different customer acquisition channels. You'll bring a combination of mathematical rigor and innovative algorithm design to create recipes that efficiently extract relevant insights from billions of rows of data to meaningfully improve our operations...
 
 

Training & Resources

 
  • Prediction.io Open Source Machine Learning Server
    Prediction.io is an open source machine learning server for predictive solutions, such as personalization or recommendations, built on top of scalable frameworks such as Hadoop and Cascading - to handle Big Data...
  • Outliers

    Many machine learning and data analysis tutorials often contain some version of the following phrase as one of the preliminary steps to building a model: “Identify outliers in your data and remove them.” Sounds simple, right? Unfortunately, almost none of these tutorials spend any time talking about what an outlier actually is and what the consequences of removing data that fairly or unfairly gets labeled as an outlier does to your model...
 
 

Books

 

  • Who's #1?: The Science of Rating and Ranking

    An interesting and approachable look at the world of ranking algorithms...

    "Who's #1? is an excellent survey of the fundamental ideas behind mathematical rating systems. Once a realm of sports enthusiasts, ranking things is becoming a vital tool in many information-age applications. Langville and Meyer compare and contrast a variety of models, explaining the mathematical foundations and motivation. Readers of this book will be inspired to further explore this exciting field."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.