Data Science Weekly Newsletter - Issue 9

Issue #9

January 23 2014

Editor Picks

 
  • Meet the Man Google Hired to Make AI a Reality

    Geoffrey Hinton was in high school when a friend convinced him that the brain worked like a hologram. To create one of those 3-D holographic images, you record how countless beams of light bounce off an object and then you store these little bits of information across a vast database. While still in high school, back in 1960s Britain, Hinton was fascinated by the idea that the brain stores memories in much the same way. Rather than keeping them in a single location, it spreads them across its enormous network of neurons. This may seem like a small revelation, but it was a key moment for Hinton...
  • How a Math Genius Hacked OkCupid to Find True Love

    In June 2012 Chris McKinlay realized he had been approaching online matchmaking like any other user. Instead, he should be dating like a mathematician. If, through statistical sampling, McKinlay could ascertain which questions mattered to the kind of women he liked, he could construct a new profile that honestly answered those questions and ignored the rest. He could match every woman in LA who might be right for him, and none that weren’t...
  • Flexible Muscle-Based Locomotion for Bipedal Creatures

    Amazing video and paper on how a muscle-based control method for simulated bipeds (robots) can optimize their gaits based on target speed and can even cope with uneven terrain and external perturbations (e.g., boxes being thrown at them!), and can steer target directions ... All actuation forces are the result of 3D simulated muscles, and a model of neural delay is included for all feedback paths...
 
 

Data Science Articles & Videos

 
  • Dirty Secrets of Data Science
    From the inaugural NYC Data Science Meetup.. Hilary Mason fought off jetlag and a tough cold to give a great presentation. This is a 12-minute clip from Hilary’s talk, which she called “Dirty Secrets of Data Science.”...
  • How Search Engines Use Machine Learning for Pattern Detection
    Search engines use machine learning for pattern detection. While it’s impossible to explain in one short article how machine learning influences our lives, understanding the basics of machine learning can give you some insight into search algorithm updates...
  • Using Artificial Intelligence to detect Fraud, Waste & Abuse
    The emergence of new computing frameworks and capabilities have made it possible to implement complex processing frameworks like Machine Learning that could now make sense of huge amounts of data, understand the context and make decisions in real time with a very high degree of confidence. Let us look at a simple example of tracking fraud in contract awarding...
  • Big Data Finds What Football, Basketball and Hockey Have in Common
    What do basketball, football and hockey have in common? On the surface, not very much. After all, one sport is played on ice and the other two are played on either a grassy field or a wooden court. But according to University of Colorado computer science professor Aaron Clauset, when the games are analyzed from a Big Data perspective, patterns and similarities between the sports begin to emerge...
  • Digging into the Dirichlet Distribution
    When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. In a multi-dimensional case, the Dirichlet distribution is one of the basic probability distributions for describing the data. In this talk, Max Sklar, from Foursquare, takes a closer look at the Dirichlet distribution and it's properties, as well as some of the ways it can be computed efficiently...
  • Big Data & Agriculture - The Next Green Revolution
    We recently caught up with Wolfgang van Loeper, Founder and CEO of MySmartFarm. Once a wine farmer himself, he has gone from cultivating grapes to cultivating Big Data to transform agriculture. We were keen to learn more about his background and what he is building at MySmartFarm (MSF)...
  • Algorithms are not enough: Lessons bringing CompSci to Journalism
    There are some amazing algorithms coming out the computer science community which promise to revolutionize how journalists deal with large quantities of information. But building a tool that journalists can use to get stories done takes a lot more than algorithms. Closing this gap has been one of the most challenging and rewarding aspects of building Overview, and I really think we’ve learned something...
 
 

Jobs

 
  • Akamai: Principal Data Scientist, Cambridge MA

    Akamai is growing fast and so is the volume of data we manage. We need someone with the experience and ambition to use this data to help lead change in our fast-changing, entrepreneurial environment. If you love analyzing and extracting meaningful insight from data, and then see your insight practically applied, this role is full of opportunity for you....
 
 

Training & Resources

 
  • Stanford Statistical Learning MOOC

    The new StatLearning course from Stanford University begins this week. Built on the OpenEdX platform, the free massively open online course (MOOC) is an excellent way to get up to speed with state-of-the-art machine learning by two of the foremost experts in the field: professors Trevor Hastie and Robert Tibshirani. Trevor and Rob are also co-authors of the textbook for the class: “An Introduction to Statistical Learning” which is provided as a free download...
  • A List of Data Science and Machine Learning Resources

    Every now and then I get asked for some help or for some pointers on a machine learning/data science topic. I tend respond with links to resources by folks that I consider to be experts in the topic area. Over time my list has gotten a little larger so I decided to put it all together in a blog post...
  • Learning Curves in scikit learn
    scikit learn is a great machine learning library for Python. It offers broad range of machine learning algorithms and tools. Learning curves are a new tool that has been merged last week and I want to use that feature to point out why scikit learn is such a great library...
  • What's New in ggplot-0.4?
    Data Science in Python by yhat: ggplot is a graphics package for Python that aims to approximate R's ggplot2 package in both usage and aesthetics. This is a post summarising the latest fixes and enhancements in the ggplot-0.4 release...
 
 
P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.