Data Science Weekly Newsletter

Issue

336

April 30, 2020

‍

Editor's Picks

‍

What's up with tornado plots?
There's been attention recently on a peculiar kind of graph. I don't know if it has a name, but let's call it a "tornado plot."...A tornado plot is a way to chart a time series. Unlike a conventional graph, which shows time on the x-axis and a value on the y-axis, a tornado plot has a twist. The value is still on the y-axis, but the x-axis shows the rate of change at each moment in time: that is, how much the value is increasing or decreasing...To understand how to read this kind of plot, I made the interactive tool below. You can draw your own time series in the first chart (in the form of a normal graph) and then see how a tornado plot shows the same data. Try it out!...

Categorizing Products at Scale
With over 1M business owners now on Shopify, there are billions of products being created and sold across the platform. Just like those business owners, the products that they sell are extremely diverse! Even when selling similar products, they tend to describe products very differently. One may describe their sock product as a “woolen long sock,” whereas another may have a similar sock product described as a “blue striped long sock.”...How can we identify similar products, and why is that even useful?...

A state-of-the-art open source chatbot
Facebook AI has built and open-sourced Blender, the largest-ever open-domain chatbot. It outperforms others in terms of engagement and also feels more human, according to human evaluators...We achieved this milestone through a new chatbot recipe that includes improved decoding techniques, novel blending of skills, and a model with 9.4 billion parameters, which is 3.6x more than the largest existing system...Today we’re releasing the complete model, code, and evaluation set-up, so that other AI researchers will be able to reproduce this work and continue to advance conversational AI research...

‍

A Message From This Week's Sponsor

‍

Help meet the growing demand in data science.

The Data Science Career Track is a 6-month, self-paced online course that will pair you with your own industry expert mentor as you learn skills like data wrangling and data storytelling, and build your unique portfolio to stand out in the job market.
Land your dream job as data scientist within six months of graduating or the course is free.

‍

Data Science Articles & Videos

‍

Female Pioneers in Computer Science You May Not Know
Whilst many will be familiar with our Women in AI lists which include those currently pushing boundaries in the present day, we thought we would put together a list of women who have been instrumental in the advancement of Computer Science and Data Science, providing the foundations for AI in the 21st Century. How many of the below are you familiar with?...

The Future of Natural Language Processing [Video]
Transfer Learning in Natural Language Processing (NLP): Open questions, current trends, limits, and future directions...A walk through interesting papers and research directions in late 2019/early-2020 on: a) model size and computational efficiency, b) out-of-domain generalization and model evaluation, c) fine-tuning and sample efficiency, d) common sense and inductive biases...

Jukebox - a neural net that generates music
We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable. We are releasing thousands of non cherry-picked samples, along with model weights and code...

How to make your data team efficient for times of crisis
Times have changed and caught most of us unprepared. It is always a part of Bolt’s culture to move quickly and adapt — and the crisis situation that is unfolding due to a pandemic definitely requires significant adaptation. This is a look from inside Bolt’s data team — data analysts, data engineers, data scientists — as we share our experience and advice for times of crisis with all the similar teams out there...

Tonks: Building One (Multi-Task) Model to Rule Them All!
This post is the story of how we, Nicole Carlson and Michael Sugimura, built Tonks, a library that streamlines the training of multi-task PyTorch networks, together. We will discuss technical details of the library as well as interpersonal challenges we faced along the way. This project ended up being a rich experience for us, in ways we never could have guessed...

The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies
Economic inequality is accelerating globally and is a growing concern due to its negative impact on economic opportunity, health and social welfare. Taxes are important tools for governments to reduce inequality. However, finding a tax policy that optimizes equality along with productivity is an unsolved problem. The AI Economist brings reinforcement learning (RL) to tax policy design for the first time to provide a purely simulation and data-driven solution...

Chip Design with Deep Reinforcement Learning
With the slowing of Moore’s Law and Dennard scaling, the world is moving toward specialized hardware to meet the exponentially growing demand for compute. However, today’s chips take years to design, resulting in the need to speculate about how to optimize the next generation of chips for the machine learning (ML) models of 2-5 years from now. Dramatically shortening the chip design cycle would allow hardware to adapt to the rapidly advancing field of ML. What if ML itself could provide the means to shorten the chip design cycle, creating a more integrated relationship between hardware and ML, with each fueling advances in the other?...

YOLOv4: Optimal Speed and Accuracy of Object Detection
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets....

Why We Need DevOps for ML Data
Getting machine learning (ML) into production is hard. In fact, it’s possibly an order of magnitude harder than getting traditional software deployed. As a result, most ML projects never see the light of production-day and many organizations simply give up on using ML to drive their products and customer experiences...This blog discusses why the industry needs to solve DevOps for ML data, and how ML’s unique data challenges stifle efforts to get ML operationalized and launched in production...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Amazon Demand Forecasting - New York

The Amazon Demand Forecasting team seeks a Data Scientist with strong analytical and communication skills to join our team. We develop sophisticated algorithms that involve learning from large amounts of data, such as prices, promotions, similar products, and a product's attributes, in order to forecast the demand of over 190 million products world-wide. These forecasts are used to automatically order more than $200 million worth of inventory weekly, establish labor plans for tens of thousands of employees, and predict the company's financial performance. The work is complex and important to Amazon. With better forecasts we drive down supply chain costs, enabling the offer of lower prices and better in-stock selection for our customers...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Intro to Keras for Researchers
Are you a machine learning researcher? Do you publish at NeurIPS and push the state-of-the-art in CV and NLP? This guide will serve as your first introduction to core Keras API concepts...

Intro to Keras for Engineers
Are you a machine learning looking to use Keras to ship deep-learning powered features in real products? This guide will serve as your first introduction to core Keras API concepts.....

Intro to Automated Question Answering
Welcome to the first edition of the Cloudera Fast Forward blog on Natural Language Processing for Question Answering! Throughout this series, we’ll build a Question Answering (QA) system with off-the-shelf algorithms and libraries and blog about our process and what we find along the way. We hope to wind up with a beginning-to-end documentary that provides a) insight into QA as a tool, b) useful context to make decisions for those who might build their own QA system, c) tips and tricks we pick up as we go, and d) sample code and commentary...

‍

Books

‍

Data Science in Production: Building Scalable Model Pipelines with Python
This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍