Data Science Weekly Newsletter

Issue

385

April 8, 2021

‍

Editor's Picks

‍

Effective testing for machine learning systems
In this blog post, we'll cover what testing looks like for traditional software development, why testing machine learning systems can be different, and discuss some strategies for writing effective tests for machine learning systems. We'll also clarify the distinction between the closely related roles of evaluation and testing as part of the model development process. By the end of this blog post, I hope you're convinced of both the extra work required to effectively test machine learning systems and the value of doing such work...

What are the untold truths of being a machine learning engineer? [Reddit Discussion - 200+ comments]
I'm working as a web developer and looking into entering the field of machine learning. My motivation is to work on self-driving cars, or on projects related to biology and medicine...What are the things that one doesn't learn from books? What are the biggest technical and non-technical challenges?...

fast.ai releases new deep learning course, four libraries, and 600-page book
Today is fast.ai’s biggest day in our four year history. We are releasing a) fastai v2: A complete rewrite of fastai, b) fastcore, fastscript, and fastgpu: Foundational libraries used in fastai v2, c) Practical Deep Learning for Coders 2020 course, part 1, and d) Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD: A book from O’Reilly...

‍

A Message From This Week's Sponsor

‍

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.

‍

Data Science Articles & Videos

‍

New milestones in embodied AI
Smarter assistants will require new advances in embodied AI, which seeks to teach machines to understand and interact with the complexities of the physical world as people do...we’re announcing several new milestones that introduce important capabilities to push the limits of embodied agents even further. This foundational research introduces state-of-the-art embodied agents that learn how to explore and understand more complex, realistic spaces from egocentric views or multimodal signals...

Designing for Human Rights in AI
In the age of big data, companies and governments are increasingly using algorithms to inform hiring decisions, employee management, policing, credit scoring, insurance pricing, and many more aspects of our lives. AI systems can help us make evidence-driven, efficient decisions, but can also confront us with unjustified, discriminatory decisions wrongly assumed to be accurate because they are made automatically and quantitatively....

Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL)
Unlike prior work like SimCLR and MoCo, the recent paper Bootstrap Your Own Latent (BYOL) from DeepMind demonstrates a state of the art method for self-supervised learning of image representations without an explicitly contrastive loss function. This simplifies training by removing the need for negative examples in the loss function. We highlight two surprising findings from our work on reproducing BYOL: (1) BYOL generally performs no better than random when batch normalization is removed, and (2) the presence of batch normalization implicitly causes a form of contrastive learning...

Fast reinforcement learning with generalized policy updates
The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism....

Reducing the Cost of Neural Network Inference with Residue Number Systems
We have developed a technique that allows the complexity-reducing Winograd transform to be applied to convolutional neural networks with INT8 parameters. The foundation of our technique is the use of a residue number system (RNS). An RNS is used to represent integers by their values modulo pairwise co-prime integers...The RNS representation enables us to perform the transformations and operations required to execute the network in the Winograd domain, without suffering the numerical problems (underflow and overflow) that typically result in a loss of prediction accuracy. This means that the resulting lower-complexity network incurs no degradation of prediction accuracy compared to the original INT8 network...

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement
Existing disentanglement methods for deep generative models rely on hand-picked priors and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal. We introduce a model-agnostic, unbiased stochastic approximation of this term based on Hutchinson's estimator to compute it efficiently during training...

OpenAI’s language generator (GPT-3) has no idea what it’s talking about
Is GPT-3 an important step toward artificial general intelligence...We doubt it. At first glance, GPT-3 seems to have an impressive ability to produce human-like text...But accuracy is not its strong point. If you dig deeper, you discover that something’s amiss: although its output is grammatical, and even impressively idiomatic, its comprehension of the world is often seriously off, which means you can never really trust what it says...

Tomato ID
This model [GitHub - MIT License] showcases object detection on iOS by detection tomato variants, in particular 16 different types of tomato...Whats the point?..To demonstrate a fun proof-of-concept and to show how useful machine learning can really be in our day to day lives!...

Lexicon Of Lies: Terms for Problematic Information
Whether "post-fact" or propaganda, the public sphere is inundated with problematic information. Lexicon of Lies is an essential guide by Data & Society Postdoctoral Scholar Caroline Jack that covers terms and concepts for information that is inaccurate, misleading, inappropriately attributed, or altogether fabricated....

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist (Entry Level) - Saturn Cloud - Remote

Saturn Cloud helps companies perform data science at a new level of scale, with one-click solutions, to solve the world’s hardest problems. Our product is a SaaS platform which equips data science teams with high-leverage automation tools, eliminating hours of traditional, manual work. The platform is user-friendly, scalable and secure.
You will be an entry-level Data Scientist for Saturn Cloud, an exciting new venture founded by the creators of Anaconda, NumPy, and SciPy. The role features drafting the first generation of Saturn resource materials, tutorials, and technical content...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Graph Representation Learning Book
This [free online pre-publication] book is my attempt to provide a brief but comprehensive introduction to graph representation learning, including methods for embedding graph data, graph neural networks, and deep generative models of graphs...

Activation Functions in Deep Learning: From Softmax to Sparsemax — Math Proof
The objective of this post is three-fold. The first part discusses the motivation behind sparsemax and its relation to softmax, summary of the original research paper in which this activation function was first introduced, and an overview of advantages from using sparsemax. Part two and three are dedicated to the mathematical derivations, concretely finding a closed-form solution as well as an appropriate loss function...

Fun with array rotations
Arrays are one of the most versatile data structures out there. Arrays form the basis of so many applications and numerous algorithms and data structures are based on them...This particular article will deal with programming problems related to rotation in arrays and strings...Before starting off with the actual problems, let us first look at some examples of rotations and try and answer the following questions about rotations: 1) What is a rotation?, 2) How many rotations are possible for an array containing N elements?, 3) What is the time complexity of rotating an array by one element?, 4) Some Python magic to implement rotations...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍