**Special Notice**: Folks, this is our 100th Edition. We just wanted to take this opportunity to thank you for joining, sharing and being part of this community!
How Tesla is ushering in the age of the Learning Car
Tesla’s new autopilot system is relying on the cutting edge of machine learning, connectivity and mapping data. While Tesla’s new hands-free driving is drawing a lot of interest this week, it’s the technology behind-the-scenes of the company’s newly-enabled autopilot service that should be getting more attention...
Data Infrastructure at IFTTT
Since data is so critical to IFTTT, and given that our services generate billions of events per day, our data infrastructure must be highly scalable, available, and flexible enough to keep up with rapid product iteration. In this post, we’ll walk you through a high level overview of our data infrastructure and architecture. We’ll also share some of the insights we’ve gained building and operating data at IFTTT...
A Message From This Week's Sponsor
- Query 1.7 Billion Reddit Comments on Hadoop with Anaconda Platform
Continuum Analytics data scientists Kris Overholt and Daniel Rodriguez walk through spinning up a Hadoop cluster on AWS, querying and exploring 975GB of Reddit data, and creating interactive visualizations in the browser with Bokeh. Read the Tutorial Here.
Data Science Articles & Videos
How To Start A Data Science Project When You Are A Beginner
You know you should have some data science projects on your resume/portfolio to show what you know. The only problem is that although you've taken some intro courses at your school, gone through some MOOC's, and read a few blog posts, when you look to other people's work you think it's out of your league...
Premier League Projections and New Expected Goals
Once more into the breach. We're eight weeks in to the English Premier League season, and things are (still) moderately weird, with Chelsea nosediving and Liverpool and Tottenham trailing Leicester City and Crystal Palace. Can Chelsea recover? Are any of the newly-risen top four actual contenders for those positions? Is Tottenham in trouble? To answer these questions, I am running projections...
Improving YouTube video thumbnails with deep neural nets
Inspired by the recent remarkable advances of deep neural networks (DNNs) in computer vision, such as image and video classification, our team has recently launched an improved automatic YouTube "thumbnailer" in order to help creators showcase their video content. Here is how it works...
Deep Learning Startups, Applications and Acquisitions – A Summary
Most major tech companies are use Deep Learning techniques in one way or another, and many have new initiatives on the way. Naturally there are a lot of startups doing cool things in the space. I tried to do my best to categorize the companies below based on where their main focus seems to be...
How intelligent data platforms are powering smart cities
According to a 2014 U.N. report, 54% of the world's population resides in urban areas, with further urbanization projected to push that share up to 66% by the year 2050. This projected surge in population has encouraged local and national leaders throughout the world to rally around "smart cities" — a collection of digital and information technology initiatives designed to make urban areas more livable, agile, and sustainable....
Win probability plots -- useful tool?
It's been an interesting weekend for win probability models. In case you missed it, on Saturday, Michigan State improbably returned a fumble for a touchdown to win a game in which they never held a lead. This morning, ESPN Stats and Info tweeted the following, indicating that MSU had, in a single play, transitioned from a 0.2% win probability to 100%...
Understanding empirical Bayes estimation (using baseball statistics)
This post isn’t really about baseball, I’m just using it as an illustrative example. (I actually know very little about sabermetrics. If you want a more technical version of this post, check out this great paper). This post is, rather, about a very useful statistical method for estimating a large number of proportions, called empirical Bayes estimation...
Analyzing Pronto CycleShare Data with Python and Pandas
This week Pronto CycleShare, Seattle's Bicycle Share system, turned one year old. To celebrate this, Pronto made available a large cache of data from the first year of operation and announced the Pronto Cycle Share's Data Challenge, which offers prizes for different categories of analysis. There are a lot of tools out there that you could use to analyze data like this, but my tool of choice is (obviously) Python...
Data Scientist - About.com - NYC
At About we have Internet data stretching back before the existence of Google – millions of articles created over several decades. The Data Science team works on high-impact projects utilizing predictive analytics, natural language processing and machine learning. As a Data Scientist it will be your responsibility to use the unique datasets at our fingertips to drive insights and innovation across the company...
Training & Resources
Theoretical Motivations for Deep Learning
This post is based on the lecture “Deep Learning: Theoretical Motivations” given by Dr. Yoshua Bengio at Deep Learning Summer School, Montreal 2015. I highly recommend the lecture for a deeper understanding of the topic...
Data Scientists at Work
Collection of interviews with sixteen of the world's most influential and innovative data scientists...
"Excellent book. It was fascinating to learn how the great minds behind of our most popular Internet sites evolved and are affecting our future..."...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...
Enjoyed the first 100 newsletters and want to energize us for the next 100? Buy us a coffee to keep us going :)