Data Science Weekly Newsletter

Issue

116

February 11, 2016

‍

Editor's Picks

‍

Text Mining South Park
South Park, an adult animated television series spanning nearly 20 years, follows four main characters (Stan, Kyle, Cartman and Kenny) and an extensive ensemble cast of recurring characters. This analysis reviews their speech to determine which words and phrases are distinct for each character...

What has Kaggle learned from 2 million machine learning models?
Kaggle is a community of almost 450K data scientists who have built nearly 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons on winning techniques we have learned from the Kaggle community...

Two Minute Papers - How Do Genetic Algorithms Work?
Genetic algorithms are in the class of evolutionary algorithms that build on the principle of "survival of the fittest". By recombining the best solutions of a population and every now and then mutating them, one can solve remarkably difficult problems that would otherwise be hopelessly difficult to write programs for...

‍

‍

DataNerd
Start a FREE Trial with New Relic and we'll send you this geek-chic shirt 4 FREE

‍

‍

Go, Marvin Minsky, and the Chasm that AI Hasn’t Yet Crossed
An expert in AI separates fact from hype in the wake of DeepMind’s victory over humans in the most challenging game of all...

The Ethical Data Scientist
People have too much trust in numbers to be intrinsically objective...

The happiness paradox: your friends are happier than you
Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood...

AI is Transforming Google Search. The Rest of the Web is Next
Yesterday the Google veteran who oversees the company’s search engine, Amit Singhal, announced his retirement. And in short order, Google revealed that Singhal’s rather enormous shoes would be filled by a man named John Giannandrea. On one level, these are just two guys doing something new with their lives. But you can also view the pair as the ideal metaphor for a momentous shift in the way things work inside Google—and across the tech world as a whole. Giannandrea, you see, oversees Google’s work in artificial intelligence...

Introducing Vector Networks
The pen tool as we know it today was originally introduced in 1987 and has remained largely unchanged since then. We decided to try something new when we set out to build the vector editing toolset for Figma. Instead of using paths like other tools, Figma is built on something we’re calling vector networks which are backwards-compatible with paths but which offer much more flexibility and control...

Landscapes of Data Infection
Plants “infected” with data: information can be stored in seeds, grown into readable landscapes...

Class visualization with bilateral filters
A while ago I played with style visualizations and bilateral filters. The latter have the nice property of filtering out noise but preserving edges. Here are some example class from GoogLeNet (Inception network). ...

Python and Parallelism or Dask
Talk by Matthew Rocklin (Continuum Analytics) at the International Conference on Machine Learning...

Interviewing Data Science Interns at Analytical Flavor Systems
We could exclusively hire interns who have previous experience with machine learning or “data science"…but we’d miss out on great candidates who are smart and driven to learn. So how can we structure a data science interview for students who may not know data splitting, feature engineering, pre-processing, model building and hyperparameter optimization, model stacking, and withholding set validation?...

‍

‍

Data Scientist - Murmuration - New York
Murmuration seeks massively improved education outcomes for kids by providing information, infrastructure, and support for education-related public advocacy and community building efforts. We are looking for an experienced, innovative Data Scientist to join our rapidly growing internal analytics team. The ideal candidate is more than a number cruncher. The role calls for strong expertise in predictive modeling, statistical analysis, and data visualization as well as the ability to clearly communicate complex analysis to non-technical audiences...

‍

‍

How to become a Bayesian in eight easy steps: An annotated reading list
We wrote an annotated reading list to get you started in learning Bayesian statistics...

Auto-scaling scikit-learn with Spark
We are excited to release a scikit-learn integration package for Spark that dramatically simplifies the life of data scientists using Python. This package, published as databricks:spark-sklearn (or spark-sklearn for short), automatically distributes the most repetitive tasks of model tuning on a Spark cluster, without impacting the workflow of data scientists:...

Data Science Deep Dive: Using the RevoScaleR Packages
This tutorial is an introduction to the enhanced R packages provided in SQL Server R Services. You will learn how to use the scalable enterprise framework for execution of R packages in Microsoft SQL Server 2016. A data scientist can use this new service to build custom R solutions that run in either local or server contexts, to support high-performance big data analytics...

‍

‍

Introduction to Algorithms Comprehensive textbook covering the full spectrum of modern algorithms
"I have studied algorithms using several books, and this is by far the best..."... For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

‍