Data Science Weekly Newsletter

Issue

143

August 18, 2016

‍

Editor's Picks

‍

Machine Learning meets ketosis: How to effectively lose weight
Early in the process I figured I could use machine learning to identify the factors that made me gain or lose weight. I used a simple method: every morning I would weigh myself, and record both the new weights and whatever I did in the past ~24 hours, not just the food I ate, but also whether I exercised, slept too little or too much, etc. The file I kept was fairly simple. A CSV with 3 columns...

Visualisation of Global Cargo Ships
Incredible visulization of global shipping traffic..

This Supercomputer Will Try to Find Intelligence on Reddit
Is it possible that the secret to building machine intelligence lies in spending endless hours reading Reddit? That’s one question a team of researchers at OpenAI, a nonprofit backed by several Silicon Valley luminaries, hopes to answer with a new kind of supercomputer developed by chipmaker Nvidia...

‍

‍

Doing data science in Python?

Try out YhatHQ's free, lightweight Python IDE, Rodeo
today!
Available on Mac, Windows and Linux.

‍

‍

Using Rodeo To Transform Olympics Data Into GIFs
Like many people around the globe, I've spent the past week and a half pretty captivated by the biggest sporting spectacle in the world, the Olympics. As I watched each genetic superhuman rack up medals while consuming some combination of pizza and beer on my couch, I noticed that certain countries seemed to stockpile wins each year while leaving others in the dust, and wondered how that had changed over time. Even better, would it be possible to find data on this and create some kind of visual from it? Turns out - It is!...

Deep Learning in Fashion (Part 3): Clothing Matching Tutorial
In Part 2 of this series, we discussed how e-commerce fashion sites typically make clothing recommendations based on image similarity (here’s a great tutorial on how to do that, by the way). But what if you could also recommend products based on how well they matched with a past purchase? We built a fashion matching demo to show how you can do just that, and I’ll also teach you how to build that model in Python through this tutorial...

Investigating Worker Exploitation in California
At a recent DataKind SF event, I was rather intrigued by the challenges faced in investigating wage theft and other labor violations not just throughout the nation, but also specific to California and the Bay Area regions...

Snakes & Ladders - Calculate the average number of moves
Because as a parent one gets roped into these board (boring?) games every so often, and I wanted to calculate the average duration of a snakes and ladders game...

Instagram photos reveal predictive markers of depression
Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression. Statistical features were computationally extracted from 43,950 participant Instagram photos, using color analysis, metadata components, and algorithmic face detection. Resulting models outperformed general practitioners’ average diagnostic success rate for depression...

Visualizing Clusters of Clickbait Headlines Using Spark, Word2vec, and Plotly
Facebook recently announced that they will punish Facebook Posts which link to articles using clickbait headlines by limiting their exposure on the News Feed. They also announced that they have a large team manually classifying what is and isn’t linkbait. From my analysis of BuzzFeed headlines last year, I found that clickbait typically follows very specific tropes with phrases such as “The [X] Most” or “You Should Do.” It shouldn’t be that difficult to identify clickbait using heuristics/machine learning....

1.1 Billion Taxi Rides with MapD & 8 Nvidia Pascal Titan Xs
In this blog post I'm going to see how much of an upgrade the new Pascal-based cards offer MapD when querying 1.1 billion taxi trips made in New York City over the course of six years...

Forget Python vs. R: how they can work together
A few weeks ago I had the opportunity to speak at SciPy about how we use both Python and R at Civis. Why go all the way to a Python conference to talk about R? Was I fanning the flames of yet another Python vs R language war? No! It turns out arguing which language is “better” is not a very good use of our time...

‍

‍

Lead Data Scientist - Johns Hopkins University Applied Physics Laboratory - Laurel, MD
Spearhead a data science/data analytics service, leveraging strong hands-on technical expertise, exceptional communications and interpersonal skills. Build relationships across the enterprise, working with sector technical staff and executives/ managers and subject matter experts within IT, InfoSec and other disciplines to identify high-value use cases for data science...

‍

‍

Reinforcement Learning - Part 1
I'm going to begin a multipart series of posts on Reinforcement Learning (RL) that roughly follow an old 1996 textbook "Reinforcement Learning An Introduction" by Sutton and Barto. From my research, this text still seems to be the most thorough introduction to RL I could find...

JupyterLab: the next generation of the Jupyter Notebook
It's been a long time in the making, but today we want to start engaging our community with an early (pre-alpha) release of the next generation of the Jupyter Notebook application, which we are calling JupyterLab...

Machine Learning Exercises In Python, Part 1
This blog post will be the first in a series covering the programming exercises from Andrew Ng's Coursera class. One aspect of the course that I didn't particularly care for was the use of Octave for assignments. Although Octave/Matlab is a fine platform, most real-world "data science" is done in either R or Python (certainly there are other languages and tools being used, but these two are unquestionably at the top of the list). Since I'm trying to develop my Python skills, I decided to start working through the exercises from scratch in Python...

‍

‍

Machine Learning in Python: Essential Techniques for Predictive Analysis Machine Learning in Python shows you how to successfully analyze data using only two core machine learning algorithms, and how to apply them using Python...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

‍