Data Science Weekly Newsletter - Issue 132

Issue #132

June 2 2016

Editor Picks
  • The Making Of A Machine Learning Cheatsheet: Emoji Edition
    I've mentioned this before, but I really love emoji. I spend so much of my time communicating with friends and family on chat, emoji bring necessary animation to my words that might otherwise look flat on the screen...Another thing I love is data science. The more I learn about machine learning algorithms, the more challenging it is to keep these subjects organized in my brain to recall at a later time. So, I decided to marry these two loves in as productive a fashion as possible...

A Message from this week's Sponsor:


  • “The Science of Data-Driven Storytelling”
    DataScience Inc. and the National Science Foundation’s West Big Data Innovation Hub have brought together leaders in academia, the non-profit sector, government, data science and publishing to discuss best practices for creating impactful data-driven stories. Click here to register for the live-streamed workshop, “The Science of Data-Driven Storytelling”.

Data Science Articles & Videos

  • Visualizing City Similarity
    This blog post explains an alternative way to figure out how similar cities are. After you read it, you will realize why I think Madison and Reykjavik are very similar cities...
  • Accurate prediction of single-cell DNA methylation states using deep learning
    Recent technological advances have enabled assaying DNA methylation in single cells. Current protocols are limited by incomplete CpG coverage and hence methods to predict missing methylation states are critical to enable genome-wide analyses. We here report DeepCpG, a computational approach based on deep neural networks to predict DNA methylation states from DNA sequence and incomplete methylation profiles in single cells...
  • How to build up a data team (everything I ever learned about recruiting)
    During my time at Spotify, I’ve reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I’ve also had my share of offers rejected by the candidate...that being said, here are some things I learned from recruiting...
  • Finding Similar Sounding Names – Some Basics
    Since my wife and I have a baby on the way, we've spent a lot of time thinking about names lately...After playing around with all of those baby naming tools, I recently took a stab myself and built a website that lets you find names that sound like ones you already like...For today's post, I'll simply be highlighting some of the algorithms I used to find words that sound similar, and how to implement them in SQL...
  • Deep Reinforcement Learning: Pong from Pixels
    RL is hot! You may have noticed that computers can now automatically learn to play ATARI games (from raw game pixels!), they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. It turns out that all of these advances fall under the umbrella of RL research...
  • What is software engineering for data science?
    One question that you’ll find yourself asking, is at what point do you need to systematize common tasks and procedures across projects versus recreating code or writing new code from scratch on every new project? It depends on a variety of factors and answering this question may require communication within your team, and with people outside of your team...


  • Data Scientist - MediaMath - New York

    MediaMath is a technology platform that brings together all forms of digital media, massive amounts of data, and sophisticated algorithms to power smarter marketing for the world’s leading advertisers. We’ve enjoyed massive growth since our founding in 2007, and we’re now a global company with offices in 20 locations – but this revolution has just begun! We are currently looking for a Data Scientist to support the ongoing development of MediaMath’s proprietary algorithms and analytics...

Training & Resources

  • Non-Metric Space Library (NMSLIB)
    Non-Metric Space Library (NMSLIB): A similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces...
  • How to Build a Grouped Bar Chart in D3
    In this tutorial you will use the CSV data from the website Grouped Bar Chart Example to see how a full D3 Grouped Bar Chart Example data visualization is built...
  • Introducing our Hybrid lda2vec Algorithm
    The goal of lda2vec is to make volumes of text useful to humans (not machines!) while still keeping the model simple to modify. It learns the powerful word representations in word2vec while jointly constructing human-interpretable LDA document representations...



  • Introducing Data Science:
    Big Data, Machine Learning, and more, using Python tools

    Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You’ll explore data visualization, graph databases, the use of NoSQL, and the data science process. You’ll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.