Data Science Weekly Newsletter

Issue

194

August 10, 2017

‍

Editor's Picks

‍

A computer was asked to predict which start-ups would be successful.
The results were astonishing
In 2009, Ira Sager of Businessweek magazine set a challenge for Quid AI's CEO Bob Goodson: programme a computer to pick 50 unheard of companies that are set to rock the world. Nearly eight years later, the magazine revisited the list to see how “Goodson plus the machine” had performed. The results surprised even Goodson: Evernote, Spotify, Etsy, Zynga, Palantir, Cloudera, OPOWER – the list goes on...

Dots vs. polygons: How I choose the right visualization
When I start designing a map I consider: How do I want the viewer to read the information on my map? Do I want them to see how a measurement varies across a geographic area at a glance? Do I want to show the level of variability within a specific region? Or do I want to indicate busy pockets of activity or the relative volume/density within an area?...

PyTorch vs TensorFlow — spotting the difference
In this post I want to explore some of the key similarities and differences between two popular deep learning frameworks: PyTorch and TensorFlow. Why those two and not the others? There are many deep learning frameworks and many of them are viable tools, I chose those two just because I was interested in comparing them specifically...

‍

A Message From This Week's Sponsor

‍

Get started with Python for data science in minutes

Using Python for data science and machine learning is easy with ActiveState’s Python distribution. Pre-bundled with 300+ packages, ActivePython includes NumPy, SciPy, scikit-learn, TensorFlow, Theano and Keras, and leverages the Intel Math Kernel Library, so you can focus on your data and not setting up software. Download ActivePython and start developing for free
.

‍

Data Science Articles & Videos

‍

What New York Subway Stations Actually Look Like
Subway stations’ complex tunnel systems are a mystery even to most regular riders. Architect Candy Chan’s new X-ray maps demystify the paths in and around them...

Transitioning entirely to neural machine translation
Creating seamless, highly accurate translation experiences for the 2 billion people who use Facebook is difficult. We need to account for context, slang, typos, abbreviations, and intent simultaneously. To continue improving the quality of our translations, we recently switched from using phrase-based machine translation models to neural networks to power all of our backend translation systems, which account for more than 2,000 translation directions and 4.5 billion translations each day...

An Algorithm Trained on Emoji Knows When You’re Being Sarcastic on Twitter
Detecting the sentiment of social-media posts is already useful for tracking attitudes toward brands and products, and for identifying signals that might indicate trends in the financial markets. But more accurately discerning the meaning of tweets and comments could help computers automatically spot and quash abuse and hate speech online. A deeper understanding of Twitter should also help academics understand how information and influence flows through the network. What’s more, as machines become smarter, the ability to sense emotion could become an important feature of human-to-machine communication...

Whiz Kid Invents an AI System to Diagnose Her Grandfather's Eye Disease
Kopparapu and her team—including her 15-year-old brother, Neeyanth, and her high school classmate Justin Zhang—trained an artificial intelligence system to recognize signs of diabetic retinopathy in photos of eyes and offer a preliminary diagnosis. She presented the system last month...

How Machine Learning Is Helping Neuroscientists Crack Our Neural Code
A big challenge in neuroscience is understanding how the brain encodes information. Neural networks are turning out to be great code crackers...

On the Effects of Batch & Weight Normalization in GANs
We introduce a weight normalization (WN) approach for GAN training that significantly improves the stability, efficiency and the quality of the generated samples...

Getting Deep Recommenders Fit:
Bloom Embeddings for Sparse Binary Input/Output Networks
We propose Bloom embeddings, a compression technique that can be applied to the input and output of neural network models dealing with sparse high-dimensional binary-coded instances. Bloom embeddings are computationally efficient, and do not seriously compromise the accuracy of the model up to 1/5 compression ratios...

Exploring the census income dataset using bubble plot
When exploring a data set, we look at the connection between different features in the data and between the features and the target. This can give us a lot of insights about how we should formulate the problem, the required preprocessing (missing values, normalization), which algorithm should we use to build are model, should we segment our data and build different models for different subsets of our dataset, etc...

‍

Jobs

‍

Data Scientist - BuzzFeed - New York City, USA
BuzzFeed’s data science team is diverse, coming from varying backgrounds, experiences, and skill sets. The team uses data-driven methods to power decisions, inform strategy, build robust data products, and identify opportunities for innovation across the company. We are true hybrids - software engineers, statisticians, mathematicians, domain experts and analysts - who specialize in translating questions into methodical approaches, experiments, and products. We think deeply about the limitations of data, and communicate our output coherently...

‍

Training & Resources

‍

hipsteR: re-educating people who learned R before it was cool
I was an early adopter of R, having first learned S (yay!) and then S-plus (yuck!). But at times my knowledge of R seems stuck in 2001. I keep finding out about “new” R functions (like replicate, which was new in 2003). This is a tutorial for people like me, or people who were taught by people like me...

Tidyverse
Welcome to the new and improved tidyverse website. We are working hard to make tidyverse.org the place to go to learn the tidyverse and to keep up to date with it as it evolves...

Diamond Part 1
We are excited to announce Diamond, an open-source Python solver for certain kinds of generalized linear models. This post covers the mathematics used by Diamond. The sister post covers the specifics of diamond. If you just want to use the package, check out the Github page...

‍

Books

‍

The Book of R: A First Course in Programming and Statistics
"The Book of R is a comprehensive, beginner-friendly guide to R, the world’s most popular programming language for statistical analysis. Even if you have no programming experience and little more than a grounding in the basics of mathematics, you’ll find everything you need to begin using R effectively for statistical analysis"...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

Want to be a Data Scientist? We've put together a comprehensive guide to help get you started. Check it out here!
:) - All the best, Hannah & Sebastian

‍