Data Science Weekly Newsletter

Issue

435

March 24, 2022

‍

Editor's Picks

‍

Algorithmic impact assessment: a case study in healthcare
This report sets out the first-known detailed proposal for the use of an algorithmic impact assessment for data access in a healthcare context – the UK National Health Service (NHS)’s proposed National Medical Imaging Platform (NMIP)...It proposes a process for AIAs, which aims to ensure that algorithmic uses of public-sector data are evaluated and governed to produce benefits for society, governments, public bodies and technology developers, as well as the people represented in the data and affected by the technologies and their outcomes...

Deep Learning on Electronic Medical Records is doomed to fail
A few years ago, I worked on a project to investigate the potential of machine learning to transform healthcare through modeling electronic medical records. I walked away deeply disillusioned with the whole field and I really don’t think that the field needs machine learning right now. What it does need is plenty of IT support. But even that’s not enough. Here are some of the structural reasons why I don’t think deep learning models on EMRs are going to be useful any time soon. ...

What’s wrong with “explainable A.I.”
A.I. has an explainability crisis. But it’s not the one you probably think...“Everyone who is serious in the field knows that most of today’s explainable A.I. is nonsense,” Zachary Lipton, a computer science professor at Carnegie Mellon University, recently told me...

‍

A Message From This Week's Sponsor

‍

A self-service image labelling platform Frustrated by complicated data labelling platforms with long-winded manuals, inconsistent output quality, and slow turnaround time? Try bolt! bolt lets you take control of your image annotation projects and leave all the annotating to us. Set up your project easily, review annotated tasks and progress, and receive the labelled data back within hours. Why bolt:

Easy to use: Follow our simple step-by-step process to upload images, create instructions, evaluate quality and export labelled data.
Quick results: A bolt user once completed 5 different projects with 2,500 annotations in total - in less than 2 hours!
High quality: We make it easy for you to iterate and improve your projects.

Try bolt now

‍

Data Science Articles & Videos

‍

Efficient Deep Learning: From Theory to Practice
In this thesis, we develop theoretically-grounded algorithms to reduce the size and inference cost of modern, large-scale neural networks. By taking a theoretical approach from first principles, we intend to understand and analytically describe the performance-size trade-offs of deep networks, i.e., the generalization properties...

How a Kalman filter works, in pictures
I have to tell you about the Kalman filter, because what it does is pretty damn amazing...Surprisingly few software engineers and scientists seem to know about it, and that makes me sad because it is such a general and powerful tool for combining information in the presence of uncertainty...

Auto-generated Summaries in Google Docs
We recently announced that Google Docs now automatically generates suggestions to aid document writers in creating content summaries, when they are available. Today we describe how this was enabled using a machine learning (ML) model that comprehends document text and, when confident, generates a 1-2 sentence natural language description of the document content...

Solving for Why
Thanks to large datasets and machine learning, computers have become surprisingly adept at finding statistical relationships among many variables—and exploiting these patterns to make useful predictions...Yet for many tasks, that is not enough. "In reality, we often want to not only predict things, but we want to improve things...

Inferring Articulated Rigid Body Dynamics from RGBD Video
Being able to reproduce physical phenomena ranging from light interaction to contact mechanics, simulators are becoming increasingly useful in more and more application domains where real-world interaction or labeled data are difficult to obtain. Despite recent progress, significant human effort is needed to configure simulators to accurately reproduce real-world behavior. We introduce a pipeline that combines inverse rendering with differentiable simulation to create digital twins of real-world articulated mechanisms from depth or RGB videos...

Universities do a terrible job teaching machine learning [twitter thread]
Not only do they give you critically out-of-date information, but they focus most of their time on the least important aspects...Here 5 things everyone in industry WISHES your professor taught you:...

R in Visual Studio Code
The R programming language is a dynamic language built for statistical computing and graphics. R is commonly used in statistical analysis, scientific computing, machine learning, and data visualization...The R extension for Visual Studio Code supports extended syntax highlighting, code completion, linting, formatting, interacting with R terminals, viewing data, plots, workspace variables, help pages, managing packages and working with R Markdown documents...

Assessing Generalization of SGD via Disagreement
Estimating the generalization error of a model — how well the model performs on unseen data — is a fundamental component in any machine learning system. Generalization performance is traditionally estimated in a supervised manner, by dividing the labeled data into a training set and test set...in many real-world settings, a large amount of unlabeled data is readily available. How can we tap into the rich information in these unlabeled data and leverage them to assess a model’s performance without labels? In this work (full paper), we demonstrate that a simple procedure can accurately estimate the generalization error with only unlabeled data. ...

Your Policy Regulariser is Secretly an Adversary
Policy regularisation can be interpreted as learning a strategy in the face of an imagined adversary; a decision-making principle which leads to robust policies. In our recent paper, we analyse this adversary and the generalisation guarantees we get from such a policy...

MetaMorph: Learning Universal Controllers with Transformers
Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning. In contrast, in robotics we primarily train a single robot for a single task. However, modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies...In this work, we propose MetaMorph, a Transformer based approach to learn a universal controller over a modular robot design space...

Sentiment Analysis on News Headlines: Classic Supervised Learning vs Deep Learning Approach
An explanatory guide to develop a binary classifier to detect positive and negative news headlines using classic machine learning and deep learning techniques...

Webinar*

Expert discussion on what’s next in data science, AI, and ML! Hear from Netflix, Meta, Wikimedia Foundation experts, and Anaconda’s co-founder and CEO, Peter Wang, about predicted trends in 2022 data science and AI/ML. We’ll reflect on lessons learned and discuss the primary factors driving change and innovation this year, including the crucial role the open-source community will play in shaping the future of the data science field. Watch it here!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Summit

‍

You're invited to the first-ever Metrics Store Summit Transform is hosting the first-ever industry summit on the metrics layer. The first-ever Metrics Store Summit on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there! Register today for free. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Lead Data Engineer - electricityMap - Copenhagen, Denmark The electricityMap team is hiring a data engineer to help us build and maintain a scalable data pipeline and database that forms the foundation of our mission to accelerate the energy system to a zero-carbon future. In your role, you’ll be making sure the quality and availability of our data is stellar by building and improving our data infrastructure, as well as managing our internal tools. You will also be responsible for managing our machine learning pipelines at scale. We’re a small team, so you’ll be owning a lot of your own work and initiatives, but we will be there to support you!

‍

Training & Resources

‍

ML Course Notes
A place [on GitHub] to collaborate and share course notes on all topics related to machine learning, NLP, and AI....

Step-by-step Approach to Build Your Machine Learning API Using Fast API
No matter how efficient your Machine Learning model is, it will only be useful when it creates value for the Business. This can not happen when it’s stored in a folder on your computer. In this fast-growing environment, speed and good deployment strategies are required to get your AI solution to the market!...This article explains how Fast APIcan help on that matter. We will start by having a global overview of Fast API and its illustration by creating an API...

Random Forests Algorithm explained with a real-life example and some Python code
Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍