Data Science Weekly Newsletter

Issue

351

August 13, 2020

‍

Editor's Picks

‍

The Mona Lisa Effect
Urban legend says that Mona Lisa's eyes follow you as you move around the room. If you enable your webcam, the eyes of this digital portrait will do just that...

A Practical Guide to Maintaining Machine Learning in Production
In the previous post, we discussed six little-known challenges after deploying machine learning...This follow-up will share some practices I’ve found useful to maintaining machine learning in production. They are: a) Monitor Training and Serving Data for Contamination, b) Monitor Models for Misbehaviour When Retraining, c) Simplify Engineering to Reduce Operational Burden, d) Useful Practices to Minimize Feedback Loops and Bias, e) Structure Teams for Iteration and Innovation, f) Crowdsource the Handling of Customer Complaints...

Adventures in Improving AI Economics
While our [A16Z] previous post [The New Business of AI] outlined the challenges facing AI businesses, the goal of this post is to provide some guidance on how to tackle them. We share some of the lessons, best practices, and earned secrets we learned through formal and informal conversations with dozens of leading machine learning teams. For the most part, these are their words – not ours. In the first part, we’ll explain why problem understanding is so important – particularly in the presence of long-tailed data distributions – and link it to the economic challenges raised in our last post. In the second part, we’ll share some strategies that can help ML teams build more performant applications and more profitable AI businesses....

‍

A Message From This Week's Sponsor

‍

Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.
Download a Program Brochure

‍

Data Science Articles & Videos

‍

ArXiv’s 1.7M+ Research Papers Now Available on Kaggle
To help make world’s largest free scientific paper repository even more accessible, arXiv announced yesterday that all of its research papers are now available on Kaggle...It’s hoped the arXiv and Kaggle collaboration will empower new use cases and lead to the exploration of richer machine learning techniques that combine multi-modal features in applications such as trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction, semantic search interfaces and more...

On-device Supermarket Product Recognition
One of the greatest challenges faced by users who are visually impaired is identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. This is because many foods share the same packaging, such as boxes, tins, bottles and jars, and only differ in the text and imagery printed on the label. However, the ubiquity of smart mobile devices provides an opportunity to address such challenges using machine learning (ML)...we recently released Lookout, an Android app that uses computer vision to make the physical world more accessible for users who are visually impaired. When the user aims their smartphone camera at the product, Lookout identifies it and speaks aloud the brand name and product size...

Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Today, we are releasing a new job scheduler integration with AWS Step Functions. This integration allows the users of Metaflow [a data science framework that Netflix open-sourced designed around the idea of independent layers] to schedule their production workflows using a highly available, scalable, maintenance-free service, without any changes in their existing Metaflow code...

REALM: Integrating Retrieval into Language Representation Models
Recent advances in natural language processing have largely built upon the power of unsupervised pre-training, which trains general purpose language representation models using a large amount of text, without human annotations or labels. These pre-trained models, such as BERT and RoBERTa, have been shown to memorize a surprising amount of world knowledge...In [this paper], we share a novel paradigm for language model pre-training, which augments a language representation model with a knowledge retriever, allowing REALM models to retrieve textual world knowledge explicitly from raw text documents, instead of memorizing all the knowledge in the model parameters...

Finite Versus Infinite Neural Networks: an Empirical Study
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks...

Stanford AIMI Symposium 2020 (Aug 5th) Videos
The 2020 AIMI Symposium is a virtual conference convening experts from Stanford and beyond to advance the field of AI in medicine and imaging. This conference will cover everything from a survey of the latest machine learning approaches, many use cases in depth, unique metrics to healthcare, important challenges and pitfalls, and best practices for designing building and evaluating machine learning in healthcare applications...

Hopfield Networks is All You Need (Paper Explained) [Video]
Hopfield Networks are one of the classic models of biological memory networks. This paper generalizes modern Hopfield Networks to continuous states and shows that the corresponding update rule is equal to the attention mechanism used in modern Transformers. It further analyzes a pre-trained BERT model through the lens of Hopfield Networks and uses a Hopfield Attention Layer to perform Immune Repertoire Classification...

Floating-Point Formats and Deep Learning
Floating-point format is not a crucial consideration in deep learning, but it can make a significant difference. What is floating-point, why should you (a deep learning practictioner) care, and what can you do about it?...

Are we in an AI overhang?
An overhang is when you have had the ability to build transformative AI for quite some time, but you haven't because no-one's realised it's possible. Then someone does and surprise! It's a lot more capable than everyone expected...I am worried we're in an overhang right now. I think we right now have the ability to build an orders-of-magnitude more powerful system than we already have, and I think GPT-3 is the trigger for 100x larger projects at Google, Facebook and the like, with timelines measured in months...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist (Entry Level) - Saturn Cloud - Remote

Saturn Cloud helps companies perform data science at a new level of scale, with one-click solutions, to solve the world’s hardest problems. Our product is a SaaS platform which equips data science teams with high-leverage automation tools, eliminating hours of traditional, manual work. The platform is user-friendly, scalable and secure.
You will be an entry-level Data Scientist for Saturn Cloud, an exciting new venture founded by the creators of Anaconda, NumPy, and SciPy. The role features drafting the first generation of Saturn resource materials, tutorials, and technical content...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

OpenCV Sudoku Solver and OCR
In this tutorial, you will create an automatic sudoku puzzle solver using OpenCV, Deep Learning, and Optical Character Recognition (OCR)...

Intro to Autoencoders
This tutorial introduces autoencoders with three examples: the basics, image denoising, and anomaly detection...An autoencoder is a special type of neural network that is trained to copy its input to its output. For example, given an image of a handwritten digit, an autoencoder first encodes the image into a lower dimensional latent representation, then decodes the latent representation back to an image. An autoencoder learns to compress the data while minimizing the reconstruction error...

Deep Learning for Computer Vision [22 videos]
During this [free] course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. We will cover learning algorithms, neural network architectures, and practical engineering tricks for training and fine-tuning networks for visual recognition tasks...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍