Data Science Weekly Newsletter

Issue

334

April 16, 2020

‍

Editor's Picks

‍

Build a Rover, Send It to the Moon, Sell the Movie Rights: 30 Years of iRobot
Build a rover, send it to the Moon, sell the movie rights...That was our first business model at iRobot. Way back in 1990. We thought it would be how we’d first change the world. It’s ironic, of course, that through that model, changing the world meant sending a robot to another one. Sadly, that business model failed. And it wouldn’t be our last failed business model. Not by a long shot...Why? Because changing the world through robots, it turns out, is no easy task...

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period
It is urgent to understand the future of severe acute respiratory syndrome–coronavirus 2 (SARS-CoV-2) transmission. We used estimates of seasonality, immunity, and cross-immunity for betacoronaviruses OC43 and HKU1 from time series data from the USA to inform a model of SARS-CoV-2 transmission. We projected that recurrent wintertime outbreaks of SARS-CoV-2 will probably occur after the initial, most severe pandemic wave. Absent other interventions, a key metric for the success of social distancing is whether critical care capacities are exceeded. To avoid this, prolonged or intermittent social distancing may be necessary into 2022...

Interpretability - An Applied Research Report
This rise in the use of algorithms coincides with a surge in the capabilities of black-box techniques, or algorithms whose inner workings cannot easily be explained. The question of interpretability has been important in applied machine learning for many years, but as black-box techniques like deep learning grow in popularity, it’s becoming an urgent concern...In this report, we explore two areas of progress in interpretability: systems designed to be perfectly interpretable, or white-box algorithms, and emerging research on approaches for inspecting black-box algorithms...

‍

A Message From This Week's Sponsor

‍

The Answer Is 42.

If you’re wondering what the question is, then you’re off to a great start for succeeding in our online M.S. in Business Analytics. We’ll not only help you elevate your proficiency in quantitative analytics — we’ll teach you how to ask the right questions and enrich your insights.
Download a Program Brochure.

‍

Data Science Articles & Videos

‍

A call to honesty in pandemic modeling
Recently there has been a proliferation of modeling work which has been used to make the point that if we can stay inside, practice extreme social distancing, and generally lock-down nonessential parts of society for several months, then many deaths from COVID-19 can be prevented...There is a simple truth behind the problems with these modeling conclusions. The duration of containment efforts does not matter, if transmission rates return to normal when they end, and mortality rates have not improved. This is simply because as long as a large majority of the population remains uninfected, lifting containment measures will lead to an epidemic almost as large as would happen without having mitigations in place at all...

How to know if artificial intelligence is about to destroy civilization
Could we wake up one morning dumbstruck that a super-powerful AI has emerged, with disastrous consequences? Books like Superintelligence by Nick Bostrom and Life 3.0 by Max Tegmark, as well as more recent articles, argue that malevolent superintelligence is an existential risk for humanity. But one can speculate endlessly. It’s better to ask a more concrete, empirical question: What would alert us that superintelligence is indeed around the corner?..

Image Segmentation: tips and tricks from 39 Kaggle competitions
Imagine if you could get all the tips and tricks you need to hammer a Kaggle competition. I have gone over 39 Kaggle competitions...and extracted that knowledge for you. Dig in...

I’m the lead researcher at Waymo and I’m here to answer your questions on the Waymo Open Dataset - Ask Me Anything! [Reddit discussion]
Hi Reddit, I’m Drago Anguelov, Principal Scientist and Head of Research at Waymo. We have seen an exciting amount of interest from the community about the Waymo Open Dataset Challenges, and I am here to answer as many of your questions about the dataset and tasks as possible. Whether you’re interested in learning more about available data labels, working on your submission for the Challenges, or just curious about using machine learning for self-driving tech, I’m happy to chat...

micrograd - a tiny autograd engine (~50 LOC) and a neural net library (~60 LOC) on top of it
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API...Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API...

Natural Language Processing for Competitive Market Analysis
This article discusses how NLP can be leveraged to discover insights that complement traditional market mapping...When considering new market opportunities or potential investments, evaluating the competitive landscape is standard due diligence...it would require a regiment of motivated MBAs to analyze nearly 3,000 companies...The question then emerged: rather than manually reviewing 1000s of companies, how can a method be developed to programmatically quantify a company’s position on the competitive landscape?...

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems....

An In-depth Walkthrough on Evolution of Neural Machine Translation
Neural Machine Translation (NMT) methodologies have burgeoned from using simple feed-forward architectures to the state of the art; viz. BERT model. The use cases of NMT models have been broadened from just language translations to conversational agents (chatbots), abstractive text summarization, image captioning, etc. which have proved to be a gem in their respective applications. This paper aims to study the major trends in Neural Machine Translation, the state of the art models in the domain and a high level comparison between them...

A Content Based Live Music Recommender
For years now I’ve been heavily involved in local live music. I go to shows, perform in shows, and have even booked and organized a lot of shows. To pull together my live music and data science interests, I thought it would be interesting to take a data science approach to helping people discover live music in their area...The result of this exploration utilizes unsupervised learning techniques, audio feature extraction with the LibROSA python library, and both the Spotify and Songkick APIs to generate a playlist of songs by artists with upcoming shows in the user’s city based on the user’s favorite artists...

‍

Conference

‍

The Premier Machine Learning Conference

5 days, 8 tracks, 160 speakers and over 150 exciting sessions
Join Machine Learning Week 2020 , May 31 – June 4, Las Vegas! It brings together five co-located events: PAW Business, PAW Financial, PAW Industry 4.0, PAW Healthcare, Deep Learning World. This event is where to meet the who’s who and keep up on the latest techniques, making it the leading machine learning event that excites and unites. You can expect top-class experts from world-famous companies such as Google, Microsoft, Lyft, Verizon, Visa and LinkedIn!
Secure your ticket now!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Head of Data Science - Tessian - London, United Kingdom

Our mission is to secure the Human Layer. This involves deploying near real-time machine learning models at massive scale to some of the world’s largest organisations to keep their most sensitive data private and secure. To do this, we're looking for an inspiring Head of Data Science ready to lead and grow our Data Science team, who is excited about the opportunities and challenges that come with building and deploying real-time production models.
Find out more about life as a Tessian Engineer...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

KeraStroke
KeraStroke is a Python package that implements generalization-improvement techniques for Keras models in the form of custom Keras Callbacks. These techniques function similarly but have different philosophies and results. The techniques are: a) Stroke: Re-initializaing random weight/bias values,b) Pruning: Reducing model size by setting weight/bias values that are close to 0, to 0, and c) NeuroPlast: Re-initializing any weight/bias values that are 0 or close to 0....

How to Use Random Seeds Effectively
This post covers an aspect of the model-building process that doesn’t typically get much attention: random seeds...Despite their importance, random seeds are often set without much effort. I’m guilty of this. I typically use the date of whatever day I’m working on (so on March 1st, 2020 I would use the seed 20200301). Some people use the same seed every time, while others randomly generate them...Overall, random seeds are typically treated as an afterthought in the modeling process. This can be problematic because, as we’ll see in the next few sections, the choice of this parameter can significantly affect results...

Mel Frequency Cepstral Coefficient (MFCC) tutorial
The first step in any automatic speech recognition system is to extract features i.e. identify the components of the audio signal that are good for identifying the linguistic content and discarding all the other stuff which carries information like background noise, emotion etc...Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in automatic speech and speaker recognition...We will give a high level intro to the implementation steps, then go in depth why we do the things we do. Towards the end we will go into a more detailed description of how to calculate MFCCs...

‍

Books

‍

Data Science in Production: Building Scalable Model Pipelines with Python
This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

‍