Data Science Weekly Newsletter

Issue

387

April 22, 2021

‍

Editor's Picks

‍

Traffic prediction with advanced Graph Neural Networks
DeepMind partnered with Google Maps to help improve the accuracy of their ETAs around the world. While Google Maps’ predictive ETAs have been consistently accurate for over 97% of trips, we worked with the team to minimise the remaining inaccuracies even further - sometimes by more than 50% in cities like Taichung. To do this at a global scale, we used a generalised machine learning architecture called Graph Neural Networks that allows us to conduct spatiotemporal reasoning by incorporating relational learning biases to model the connectivity structure of real-world road networks. Here’s how it works...

Finding magnetic eruptions in space with an AI assistant
The latest spacecraft observations are ready. You now have 24 hours to scour 84 hours-worth of data, selecting the most promising split-second moments you can find. The data points you choose, depending on how you rank them, will download from the spacecraft in the highest possible resolution; researchers may spend months analyzing them. Everything else will be overwritten like it was never collected at all...These are the stakes facing the Scientist in the Loop, one of the most important roles on the Magnetospheric Multiscale mission team...

AI researchers use heartbeat detection to identify deepfake videos
Existing deepfake detection models focus on traditional media forensics methods, like tracking unnatural eyelid movements or distortions at the edge of the face...In work released last week, Binghamton University and Intel researchers introduced AI that goes beyond deepfake detection to recognize which deepfake model made a doctored video. The researchers found that deepfake model videos leave behind unique biological and generative noise signals — what they call “deepfake heartbeats.” The detection approach looks for residual biological signals from 32 different spots in a person’s face, which the researchers call PPG cells...they propose a deepfake source detector that predicts the source generative model for any given video...

‍

A Message From This Week's Sponsor

‍

Proactively monitoring your AI performance with Mona

Mona is a SaaS platform that enables teams to proactively monitor data and model performance in production for biases, concept drifts, and data integrity issues. Mona takes a platform approach to monitoring, placing the data scientist/ML engineer in control with flexible configuration to ensure each team's unique monitoring needs are met. Mona can be deployed in 2 hours or less on any tech stack and in any ML use-case.

‍

Data Science Articles & Videos

‍

Enabling reproducible ML and Systems research: the good, the bad, and the ugly
In this talk, I will describe my 10-year effort to solve numerous reproducibility issues in ML & systems research. I will share my experience reproducing 150+ systems and ML papers during artifact evaluation at ASPLOS, MLSys, CGO, PPoPP and Supercomputing. This tedious experience motivated me to develop the cKnowledge.org framework and the open cKnowledge.io portal to bring DevOps principles to our research. I will also present cKnowledge solutions - a new way to package and share research artifacts and results with common Python APIs, CLI actions, portable workflows and JSON meta descriptions. ...

Google Scanned Objects
Scanned Objects by Google Research is a dataset of common household objects that have been 3D scanned for use in robotic simulation and synthetic perception research. The dataset is licensed under the CC-BY 4.0 License, which gives you freedom in using these assets within your latest projects...

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning
Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set... In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks...

Humans vs AI: A/B testing OpenAI's GPT-3
This is a friendly competition between human copywriters and copy generated by the new VWO feature powered by OpenAI's GPT-3 API. In this competition, we will test AI-generated copy for headlines, buttons or product descriptions against existing (or new) human-written copy for participating websites...

Fast Block Sparse Matrices for Pytorch
This PyTorch extension provides a drop-in replacement for torch.nn.Linear using block sparse matrices instead of dense ones...It enables very easy experimentation with sparse matrices since you can directly replace Linear layers in your model with sparse ones...

How to win Kaggle competitions with Anthony Goldbloom
Anthony Goldbloom is the founder and CEO of Kaggle. In 2011 & 2012, Forbes Magazine named Anthony as one of the 30 under 30 in technology. In 2011, Fast Company featured him as one of the innovative thinkers who are changing the future of business. He joins Lukas to talk about his vision for Kaggle, how Kaggle & the competitions have changed over the years, how competitive data science can prepare you for the real world, whether he likes Python or R better – and which jobs we should be worried about losing to AI in the next few decades...

Awesome production machine learning
This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning....

ETL & ELT, a comparison
When designing and building data pipelines to load data into data warehouses you might have heard of the common ETL and ELT paradigms. This post goes over what they mean, their differences and which paradigm you might want to choose...

Building EnjoyingTheShow.com for Real-time Feedback
Real-time AI facial expression gathering with Amplify GraphQL and TensorFlow.js...This app uses face-api to guage your facial expression, and then sends all faces in a particular URL or (room) to a "Watch" page. The watch page summarizes all the faces to a single Victory Pie chart...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - JetBlue - Long Island, NY

The Data Scientist applies machine learning and statistical techniques to help solve JetBlue’s most complex commercial and operational challenges. The Data Scientist will be responsible for exploring and creating compelling visualizations of new datasets, identify key features and engineer new ones to be used in modeling, and discover the modeling approaches that deliver the best results based on appropriate evaluation metrics...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Book Release: Machine Learning From Scratch
This book provides complete derivations of the most common algorithms in ML (OLS, logistic regression, naive Bayes, trees, boosting, neural nets, etc.) both in theory and math. It also demonstrates constructions of each of these methods from scratch in Python using only NumPy. My aim with the book is to provide a very thorough rundown of the fitting process behind the algorithms we see every day. I hope that seeing the models derived in math or constructed in code helps readers understand the models at a deeper level and feel more comfortable optimizing them to their own work....

Pattern Recognition and Machine Learning (PRML) Jupyter Notebooks
This [GitHub] project contains Jupyter notebooks of many the algorithms presented in Christopher Bishop's Pattern Recognition and Machine Learning book, as well as replicas for many of the graphs presented in the book. ...

Book release: Machine Learning Engineering
I've been working on the book for the last eleven months and I'm happy (and relieved!) that the work is now over. Just like my previous The Hundred-Page Machine Learning Book, this new book is distributed on the “read-first, buy-later” principle. That means that you can freely download the book, read it, and share it with your friends and colleagues, before buying...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍