Data Science Weekly Newsletter

Robots helped inspire deep learning and might become its killer app
Andrew Ng knows a lot about both deep learning and robotics, and he recently presented on how the former might affect the latter. Robot brains will need to train on a lot of data, and deep learning seems like a good way to do it...

Why local state is a fundamental primitive in stream processing
What do you get if you cross a distributed database with a stream processing system?...

SparklingPandas
SparklingPandas aims to make it easy to use the distributed computing power of PySpark to scale your data analysis with Pandas...

How to Use a Decision Tree to Trade Bank of America Stock
In our last article we went through a basic example of building a machine-learning algorithm to predict the direction of Apple stock, now we’ll explore how you can actually use these algorithms to help you come up with your own strategy...

Social Media & Machine Learning tell Stores where to locate:
Dmytro Karamshuk Interview
We recently caught up with Dmytro Karamshuk, Researcher in Computer Science and Engineering at King's College London - investigating data mining, complex networks, human mobility and mobile networks. We were keen to learn more about his background, how human mobility modeling has evolved and what his research has uncovered in terms of applying machine learning to social media data to determine optimal retail store placement… ...

Decision Tree (CART) – Retail Case Study
In this article, we will discuss a type of decision tree called classification and regression tree (CART) to develop a quick & dirty model for a retail case...

Tracking Movers & Shakers: Tackling Human Resources News Classification
A detailed review of CB Insights' HR news classification algorithms using supervised machine learning...

Big Public Data to Predict Crowd Behavior: Nathan Kallus Interview
We recently caught up with Nathan Kallus, PhD Candidate at the Operations Research Center at MIT. We were keen to learn more about his background, his research into data-driven decision making and the recent work he's done using big public data to predict crowd behavior - especially as relates to social unrest...

Sentiment Analysis on Movie Reviews
I've been playing with this problem the open-source way, putting all my code on Github. The other day I got lucky and reached 2nd place with score of 0.65 and I thought it would be nice to share what I did with everybody...

Markov Chains - A Visual Explanation
Markov chains, named after Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another...

Scaled Inference Wants To Be The Google Brain For Everyone
Google Brain, an artificial intelligence and machine learning project at Google, has been used to power services like Android’s speech recognition system and photo search on Google+. Now, two of the most longstanding machine learning engineers, one of whom worked on Google Brain, have left the search giant to start a new company...

Six Steps To Take Before Pursuing Education To Get A Data Science Job
Do you find yourself thinking I'm a bit of an unlikely candidate to do Data Science because I don't have a PhD in Computer Science or Mathematics? Do you feel like almost all of the companies hiring Data Scientists are looking for people with advanced degrees, so you find yourself wondering if you should go through 5-6 years in a PhD program? As you look at the "Data Science Venn Diagram" or the "Becoming a Data Scientist – Curriculum via Metromap" do you find yourself wondering if you have what it takes to make it?...

Data Scientist, Analytics - Facebook - New York
We’re looking for data scientists to work on our core products with a passion for social media to help drive informed business decisions for Facebook. You will enjoy working with one of the richest data sets in the world, cutting edge technology, and the ability to see your insights turned into real products on a regular basis. The perfect candidate will have a background in computer science or a related technical field, will have experience working with large data stores, and will have some experience building software...

One Hundred Million Creative Commons Flickr Images for Research
Today, we are announcing the Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets for researchers. The dataset, we believe, is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing...

Probabilistic Programming in Python
Talk and supporting slides and code samples...

Probabilistic Approaches to Recommendations
Just released!...
"This book starts with a brief summary of the recommendation problem and its challenges and a review of some widely used techniques Next, we introduce and discuss probabilistic approaches for modeling preference data. We focus our attention on methods based on latent factors, such as mixture models, probabilistic matrix factorization, and topic models, for explicit and implicit preference data. These methods represent a significant advance in the research and technology of recommendation. The resulting models allow us to identify complex patterns in preference data, which can be exploited to predict future purchases effectively. ..."...

Editor's Picks