Data Science Weekly Newsletter - Issue 15

Issue #15

March 6 2014

Editor Picks

 
  • Machine Learning in 10 Pictures

    I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating...
  • Using CART for Stock Market Forecasting

    Most of the time literature on market forecasting mixes two market features: Magnitude and Direction. In this article I focus on identifying the market direction only... market conditions when the odds are significantly biased toward an up or a down market. This post gives an example of how CART (Classification And Regression Trees) can be used in this context...
  • Why Apache Spark is a Crossover Hit for Data Scientists

    While discussion about Spark for data science has mostly noted its ability to keep data resident in memory... this is perhaps not even the big news, not to me. It does not solve every problem for everyone. However, Spark has a number of features that make it a compelling crossover platform for investigative as well as operational analytics...
 
 

Data Science Articles & Videos

 
  • Big Data correctly predicts winners at the Oscars
    Guessing the Oscar winners is a fun past-time for many movie fans and industry insiders. They are all thrilled when they guess right. It's all part of the fun of Oscar night. But last year, big data stole the show when Farsite analysts correctly predicted five out of the six winners and they were ready to do it again. This year, six out of six predictions were correct...
  • ETL: The most important Acronym you've never heard of
    As impression volumes rise into trillions across all manner of devices, the focus of many ad tech engineering teams isn’t on ethereal ML algorithms, but something far less glamorous. The process is called ETL — the critical, painstaking work of cleansing and consolidating disparate datasets...
  • Bayesian Bandits testing for mobile apps
    While A/B testing is a competent tool in evaluating variants for a simple process – for example, the best-converting variant of an e-commerce landing page that isn't likely to change in the future relative to the rest of the site – it's perhaps not well-suited to dynamic mobile apps operating as services. An alternative to A/B testing is Bayesian Bandits testing...
  • AMA with Yoshua Bengio - Transcript & Comments
    Yoshua Bengio is one of the ML professors who led the deep learning renaissance of 2006, along with Geoff Hinton and Yann LeCun (and one of the last deep learning professors to remain completely in academia). This post features the top 200 comments/responses on his recent reddit AMA...
  • Is Julia the Future for Big Data Analytics?
    In many Big Data blogs, meetups and in the halls of the most recent O’Reilly Strata Conference, one of the most-discussed topics is which language is better for data analysis: Python or R. Some of the talk has even reached “religious” overtones not unlike previous discussions on Windows vs. Linux or Microsoft’s Internet Explorer vs. Mozilla Firefox. So what’s the issue here?...
  • Rendering scikit Decision Trees in D3.js
    Scikit-learn provides routines to export decision trees to a format called Graphviz, although typically this is used to provide an image of a chart. For some applications this is valuable, but if the product of machine learning is a the ability to generate models (rather than predictions), it would be preferable to provide interactive models...
 
 

Jobs

 
  • Data Scientist - Microsoft, Redmond, WA

    Join the excitement of Machine Learning in the Cloud at Microsoft! We are a fast paced data science team in the Microsoft Cloud + Enterprise organization building machine learning powered intelligent web services and end to end solutions for scenarios in diverse enterprise and consumer verticals. We are looking for applied scientists who are passionate about applying machine learning and data mining techniques to a variety of exciting applications for enterprises and consumers...
 
 

Training & Resources

 
  • Vowpal_Wabbit: The Redis of the Data Science Community
    vowpal_wabbit, or vw, is an online learning program originally built by Yahoo! Research (now Microsoft Research) . It's fairly basic to use, it's a command line tool and it's mostly written in C++. Even the website has a great Web 1.0 feel to it. Using vw basically maxes out your data science style points. It's like not wearing a mask in hockey, or having lift tickets from 3 foreign countries on your ski jacket. Yeah, it's that cool...
  • Classification with scikit-learn
    This post looks into the problem of classification, a situation in which a response is a categorical variable. We will build upon the techniques that we previously discussed in the context of regression and show how they can be transferred to classification problems...
 
 
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.