Data Science Weekly Newsletter - Issue 116

Issue #116

February 11 2016

Editor Picks
 
  • Text Mining South Park
    South Park, an adult animated television series spanning nearly 20 years, follows four main characters (Stan, Kyle, Cartman and Kenny) and an extensive ensemble cast of recurring characters. This analysis reviews their speech to determine which words and phrases are distinct for each character...
  • What has Kaggle learned from 2 million machine learning models?
    Kaggle is a community of almost 450K data scientists who have built nearly 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons on winning techniques we have learned from the Kaggle community...
  • Two Minute Papers - How Do Genetic Algorithms Work?
    Genetic algorithms are in the class of evolutionary algorithms that build on the principle of "survival of the fittest". By recombining the best solutions of a population and every now and then mutating them, one can solve remarkably difficult problems that would otherwise be hopelessly difficult to write programs for...
 
 

A Message from this week's Sponsor:

 


 

Data Science Articles & Videos

 
  • The happiness paradox: your friends are happier than you
    Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood...
  • AI is Transforming Google Search. The Rest of the Web is Next
    Yesterday the Google veteran who oversees the company’s search engine, Amit Singhal, announced his retirement. And in short order, Google revealed that Singhal’s rather enormous shoes would be filled by a man named John Giannandrea. On one level, these are just two guys doing something new with their lives. But you can also view the pair as the ideal metaphor for a momentous shift in the way things work inside Google—and across the tech world as a whole. Giannandrea, you see, oversees Google’s work in artificial intelligence...
  • Introducing Vector Networks
    The pen tool as we know it today was originally introduced in 1987 and has remained largely unchanged since then. We decided to try something new when we set out to build the vector editing toolset for Figma. Instead of using paths like other tools, Figma is built on something we’re calling vector networks which are backwards-compatible with paths but which offer much more flexibility and control...
  • Class visualization with bilateral filters
    A while ago I played with style visualizations and bilateral filters. The latter have the nice property of filtering out noise but preserving edges. Here are some example class from GoogLeNet (Inception network). ...
  • Interviewing Data Science Interns at Analytical Flavor Systems
    We could exclusively hire interns who have previous experience with machine learning or “data science"…but we’d miss out on great candidates who are smart and driven to learn. So how can we structure a data science interview for students who may not know data splitting, feature engineering, pre-processing, model building and hyperparameter optimization, model stacking, and withholding set validation?...
 
 

Jobs

 
  • Data Scientist - Murmuration - New York

    Murmuration seeks massively improved education outcomes for kids by providing information, infrastructure, and support for education-related public advocacy and community building efforts. We are looking for an experienced, innovative Data Scientist to join our rapidly growing internal analytics team. The ideal candidate is more than a number cruncher. The role calls for strong expertise in predictive modeling, statistical analysis, and data visualization as well as the ability to clearly communicate complex analysis to non-technical audiences...
 
 

Training & Resources

 
  • Auto-scaling scikit-learn with Spark
    We are excited to release a scikit-learn integration package for Spark that dramatically simplifies the life of data scientists using Python. This package, published as databricks:spark-sklearn (or spark-sklearn for short), automatically distributes the most repetitive tasks of model tuning on a Spark cluster, without impacting the workflow of data scientists:...
  • Data Science Deep Dive: Using the RevoScaleR Packages
    This tutorial is an introduction to the enhanced R packages provided in SQL Server R Services. You will learn how to use the scalable enterprise framework for execution of R packages in Microsoft SQL Server 2016. A data scientist can use this new service to build custom R solutions that run in either local or server contexts, to support high-performance big data analytics...
 
 

Books

 

  • Introduction to Algorithms

    Comprehensive textbook covering the full spectrum of modern algorithms

    "I have studied algorithms using several books, and this is by far the best..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
 
 
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.