Data Science Weekly Newsletter - Issue 201

Issue #201

Sept 28 2017

Editor Picks
  • Linear Programming and Healthy Diets
    My dad’s an interesting guy. Every so often he picks up a health trend and/or weight loss goal that would make many people’s jaw drop. He’s tried the “high fat” and “no fat” diets, and quite a few others. He’s concerned with losing weight, but also living longer, so he’s into caloric restriction among other things. Recently he asked me to help him optimize his diet...
  • What is the yearly risk of another Harvey-level flood in Houston?
    You’ve likely heard that the flooding in Houston following Hurricane Harvey is reportedly a 500-year or even 1000-year flood event. You’ve perhaps also heard that this is the third 500-year flood that Houston has experienced in a three-year span, which calls into serious question the usefulness or accuracy of the “500-year flood” designation. This made me wonder: what’s our actual best estimate of the yearly risk for a Harvey-level flood, according to the data? That is the question I will attempt to answer here...
  • The Language Of Hip-Hop: The Words Rapper Use (And Don't Use)
    Last year, data scientist Iain Barr determined the most “metal” words using an elegant methodology and machine learning. We were blown away and eagerly waited for someone to replicate it for other genres. One year later, we were still waiting. To start, we need a dataset that represents hip hop. We decided to use 26 million words from the lyrics of the top 500 charting artists on Billboard’s Rap Chart (about 50,000 songs)...

A Message from this week's Sponsor:


Looking for your next great data job?

We're taking the headache out of the job search. Skip the slog through the swamp of irrelevant job listings that include the word “analyst”, and find a job at a company using cutting-edge data analysis to tackle impactful problems.

At Mode, we want to give you everything you need to be a great analyst, whether it be an online community of like-minded data people, learning resources like SQL School and Udacity courses, and powerful and collaborative software for SQL and Python analysis. Now, we're making it easy to find great data jobs.

Check out the Data Jobs Board: a curated list of the best jobs for data analysts, data scientists and data engineers.

Data Science Articles & Videos

  • When Websites Design Themselves
    Today, we’re on the verge of another revolution, as artificial intelligence and machine learning turn the graphic design field on its head again. The vision is, to quote one project’s slogan, “websites that just make themselves.”...
  • Apache Arrow and the "10 Things I Hate About pandas"
    In this post I hope to explain as concisely as I can some of the key problems with pandas's internals and how I've been steadily planning and building pragmatic, working solutions for them. To the outside eye, the projects I've invested in may seem only tangentially-related: e.g. pandas, Badger, Ibis, Arrow, Feather, Parquet. Quite the contrary, they are all closely-interrelated components of a continuous arc of work I started almost 10 years ago...
  • PixelNN: Example-based Image Synthesis
    We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges...
  • Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning
    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report the first deep learning model - Chiron - that can directly translate the raw signal to DNA sequence without the error-prone segmentation step...
  • How You Can Use the New Stack Overflow Bot from Microsoft
    So when Microsoft showed us how they were bringing AI to every developer through their platforms and tools, and asked if they could partner with us to create an AI driven experience for developers to use and learn with, we of course said yes...


  • Data Analyst - Glossier - New York, NY

    Glossier is looking for a Senior Data Analyst to take our data practice to the next level. You will work closely with our Head of Data to provide data-driven insights to teams across the organization in order to inform strategic decision-making. You will take a leading role in shaping our Data practices, and you will use your insights to scope projects, propose approaches, and help to drive them to completion. If you enjoy finding the signal in the noise, bringing order and structure to inefficiencies, and know the mean time and standard deviation of your commute then please apply...

Training & Resources

  • Neural Network from Scratch
    I've recently written an article on creating a simple neural network from scratch (perceptron linear classifier) in Python...
  • NVIDIA Deep Learning Accelerator (NVDLA)
    The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome...



  • Reproducible Research with R and R Studio

    "a very practical book that teaches good practice in organizing reproducible data analysis and comes with a series of examples..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
Reminder, if you enjoyed the first 200 newsletters and want many more ... Please make a donation to help keep us going :) - All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.