Data Science Weekly Newsletter

Issue

182

May 18, 2017

‍

Editor's Picks

‍

Applying Artificial Intelligence in Medicine: Our Early Results
Picture a world where your heart can be monitored continuously using a device you could purchase at a Best Buy or Target. Algorithms transform the raw data coming from your watch into diagnoses, and your doctor will be notified when a problem is detected. Today, Cardiogram is taking the first step down that path. We’ve developed an algorithm to use the Apple Watch to detect atrial fibrillation...

How Our Company Learned to Make Better Predictions About Everything
If an individual can gain a predictive edge, so can a company. At Twitch, a subsidiary of Amazon, we created a program that teaches all our employees to become better forecasters regardless of their quantitative background, organizational role, or area of expertise...

The Cost of Doing Data Science on Laptops
At the heart of the data science process are the resource intensive tasks of modeling and validation. During these tasks, data scientists will try and discard thousands of temporary models to find the optimal configuration. Even for small data sets, this could take hours to process. Because of this, data scientists who rely on their laptops or departmental servers for processing power must choose between fast processing time and model complexity. In either case, performance and revenue suffer...

‍

A Message From This Week's Sponsor

‍

[WHITEPAPER] Applied Data Science by Yhat, Inc.

This is a white paper about data science teams and how companies apply their insights to the real world. You’ll learn how successful data science teams are composed and operate and which tools and technologies they are using.

‍

Data Science Articles & Videos

‍

Are Pop Lyrics Getting More Repetitive?
In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: "the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced"). I'm going to try to test this hypothesis with data. I'll be analyzing the repetitiveness of a dataset of 15,000 songs that charted on the Billboard Hot 100 between 1958 and 2017...

I don’t know Fisher’s exact test, but I know Stan
A few days ago, I watched a terrific lecture by Bob Carpenter on Bayesian models. He started with a Bayesian approach to Fisher’s exact test. I had never heard of this classical procedure, so I was curious to play with the example. In this post, I use the same data that he used in the lecture and in an earlier, pre-Stan blog post. I show how I would go about fitting the model in Stan and inspecting the results in R...

Home advantages and wanderlust
Analyzing the home advantage in English soccer, with R...

Google’s neural network-generated custom face stickers are like Bitmoji that aren’t horrible
So let me just say really quick that I really dislike Bitmoji. Pretty much everything about it/them. The one thing that’s good about Bitmoji is that the user really can easily customize a representative avatar, which is good for inclusion, even if the results are universally terrible in every way. Fortunately Google has just blown Bitmoji out of the water with a genuinely excellent alternative...

Facebook Wants to Merge AI Systems for a Smarter Chatbot
A framework called ParlAI will let researchers combine dialogue systems and get feedback from real humans...

The Making of the Weighted Pivot Scatter Plot
I recently published a story that tried to answer the question: what city is the microbrew capital of the US? One of the graphics in the story allows the user to adjust some parameters to change the rankings, and see how the data can be manipulated to yield different results. It looks like this...

Understanding deep learning requires re-thinking generalization
This paper has a wonderful combination of properties: the results are easy to understand, somewhat surprising, and then leave you pondering over what it all might mean for a long while afterwards! The question the authors set out to answer was this: What is it that distinguishes neural networks that generalize well from those that don’t?...

From Physics to Finance: My First Year in [Data Science] Industry
As the first data science hire in a startup, the opportunities to interact with and learn from other groups are limitless. I communicate with engineers, business development, marketing, customer success and design teams on a regular basis: a unique opportunity to learn about the other fields, how different departments operate, and the financial technology as a whole. Here are some lessons I have learned along the way...

‍

Jobs

‍

Senior Data Engineer - Sprout Social - Chicago, IL
Sprout’s data team uses code, statistics, and machine learning to inform our products and organization. We’re looking for an experienced Data Engineer to join our team of data scientists and engineers.
Our team uses Python and its data stack (pandas, scikit-learn, Apache Airflow), along with a little Apache Spark and a lot of Amazon Redshift to drive decisions and power software that is used by more than 17,000 brands around the world. Companies like Microsoft, Zipcar, Hyatt, Google, and Zendesk rely on Sprout to create stronger relationships with their customers through social media.
We’re looking for curious, analytical, and creative people to help utilize the vast amount of data we have. If you love finding ways of using data to build better products and solve problems, we’d love to talk with you...

‍

Training & Resources

‍

Interesting talks from PyData London 2017
I made the time to watch all of the talks from the conference, and write a blog post about the ones I found the most interesting...

The Behance Artistic Media Dataset
2.5M artwork urls, 393K attribute labels, 74K short image descriptions/captions...

The Hitchhiker’s Guide to d3.js
This guide is meant to prepare you mentally as well as give you some fruitful directions to pursue. There is a lot to learn besides the d3.js API, both technical knowledge around web standards like HTML, SVG, CSS and JavaScript as well as communication concepts and data visualization principles. Chances are you know something about some of those things, so this guide will attempt to give you good starting points for the things you want to learn more about...

‍

Books

‍

Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia
"Critical book for anyone interested in the way technology is shaping our cities. Great collection of many historical and current endeavors, and a strong perspective on the different approaches to solving urban issues with technology."......
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

Looking to hire a Data Scientist? Find an awesome one among our readers! Email us for details on how to post your job :) - All the best, Hannah & Sebastian

‍