Data Science Weekly Newsletter - Issue 143

Issue #143

Aug 18 2016

Editor Picks
 
  • Machine Learning meets ketosis: How to effectively lose weight
    Early in the process I figured I could use machine learning to identify the factors that made me gain or lose weight. I used a simple method: every morning I would weigh myself, and record both the new weights and whatever I did in the past ~24 hours, not just the food I ate, but also whether I exercised, slept too little or too much, etc. The file I kept was fairly simple. A CSV with 3 columns...
  • This Supercomputer Will Try to Find Intelligence on Reddit
    Is it possible that the secret to building machine intelligence lies in spending endless hours reading Reddit? That’s one question a team of researchers at OpenAI, a nonprofit backed by several Silicon Valley luminaries, hopes to answer with a new kind of supercomputer developed by chipmaker Nvidia...
 
 

A Message from this week's Sponsor:

 

 
 

Data Science Articles & Videos

 
  • Using Rodeo To Transform Olympics Data Into GIFs
    Like many people around the globe, I've spent the past week and a half pretty captivated by the biggest sporting spectacle in the world, the Olympics. As I watched each genetic superhuman rack up medals while consuming some combination of pizza and beer on my couch, I noticed that certain countries seemed to stockpile wins each year while leaving others in the dust, and wondered how that had changed over time. Even better, would it be possible to find data on this and create some kind of visual from it? Turns out - It is!...
  • Deep Learning in Fashion (Part 3): Clothing Matching Tutorial
    In Part 2 of this series, we discussed how e-commerce fashion sites typically make clothing recommendations based on image similarity (here’s a great tutorial on how to do that, by the way). But what if you could also recommend products based on how well they matched with a past purchase? We built a fashion matching demo to show how you can do just that, and I’ll also teach you how to build that model in Python through this tutorial...
  • Investigating Worker Exploitation in California
    At a recent DataKind SF event, I was rather intrigued by the challenges faced in investigating wage theft and other labor violations not just throughout the nation, but also specific to California and the Bay Area regions...
  • Instagram photos reveal predictive markers of depression
    Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression. Statistical features were computationally extracted from 43,950 participant Instagram photos, using color analysis, metadata components, and algorithmic face detection. Resulting models outperformed general practitioners’ average diagnostic success rate for depression...
  • Visualizing Clusters of Clickbait Headlines Using Spark, Word2vec, and Plotly
    Facebook recently announced that they will punish Facebook Posts which link to articles using clickbait headlines by limiting their exposure on the News Feed. They also announced that they have a large team manually classifying what is and isn’t linkbait. From my analysis of BuzzFeed headlines last year, I found that clickbait typically follows very specific tropes with phrases such as “The [X] Most” or “You Should Do.” It shouldn’t be that difficult to identify clickbait using heuristics/machine learning....
  • Forget Python vs. R: how they can work together
    A few weeks ago I had the opportunity to speak at SciPy about how we use both Python and R at Civis. Why go all the way to a Python conference to talk about R? Was I fanning the flames of yet another Python vs R language war? No! It turns out arguing which language is “better” is not a very good use of our time...
 
 

Jobs

 
  • Lead Data Scientist - Johns Hopkins University Applied Physics Laboratory - Laurel, MD

    Spearhead a data science/data analytics service, leveraging strong hands-on technical expertise, exceptional communications and interpersonal skills. Build relationships across the enterprise, working with sector technical staff and executives/ managers and subject matter experts within IT, InfoSec and other disciplines to identify high-value use cases for data science...
 
 

Training & Resources

 
  • Reinforcement Learning - Part 1
    I'm going to begin a multipart series of posts on Reinforcement Learning (RL) that roughly follow an old 1996 textbook "Reinforcement Learning An Introduction" by Sutton and Barto. From my research, this text still seems to be the most thorough introduction to RL I could find...
  • Machine Learning Exercises In Python, Part 1
    This blog post will be the first in a series covering the programming exercises from Andrew Ng's Coursera class. One aspect of the course that I didn't particularly care for was the use of Octave for assignments. Although Octave/Matlab is a fine platform, most real-world "data science" is done in either R or Python (certainly there are other languages and tools being used, but these two are unquestionably at the top of the list). Since I'm trying to develop my Python skills, I decided to start working through the exercises from scratch in Python...
 
 

Books

 

 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.