Data Science Weekly Newsletter

Issue

412

October 14, 2021

‍

Editor's Picks

‍

Machine learning is not nonparametric statistics
Many times in my career, I’ve been told by respected statisticians that machine learning is nothing more than nonparametric statistics. The longer I work in this field, the more I think this view is both misleading and unhelpful. Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect. Statistics is an important tool for understanding machine learning and randomness is valuable for machine learning algorithm design, but there is considerably more to machine learning than what we learn in elementary statistics...

State of AI Report 2021
Now in its fourth year, the State of AI Report 2021 is reviewed by AI practioners in industry and research, and features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions: a) Research: Technology breakthroughs and capabilities, b) Talent: Supply, demand and concentration of AI talent, c) Industry: Areas of commercial application for AI and its business impact, d) Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI, and e) Predictions: What we believe will happen and a performance review to keep us honest...

Deploying Machine Learning Models Safely and Systematically
The Data Exchange Podcast interviews Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know...Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai, previously worked on machine learning applications and systems at Airbnb and DataRobot...

‍

A Message From This Week's Sponsor

‍

Live Webinar | How to leverage AI for BI at scale Thursday, Oct 21, 2021 at 2PM ET (11AM PT) Learn how to harness the power of AI for BI, democratize data, and improve analytics at scale with featured speakers from Snowflake and Cardinal Health.

‍

Data Science Articles & Videos

‍

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding
In this work, we demonstrate the effectiveness of dynamic flinging actions for cloth unfolding with our proposed self-supervised learning framework, FlingBot. Our approach learns how to unfold a piece of fabric from arbitrary initial configurations using a pick, stretch, and fling primitive for a dual-arm setup from visual observations...

OpenPrompt: An Open-Source Framework for Prompt-learning.
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries...

The Dawn of Quantum Natural Language Processing
In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform the parts-of-speech tagging task via numerical simulations. Moreover, a quantum-enhanced Transformer is proposed to perform the sentiment analysis based on the existing dataset....

Self-Supervised Learning Advances Medical Image Classification
In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology...In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at ICCV 2021, we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification...

Balancing Average and Worst-case Accuracy in Multitask Learning
When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning...

Neural Tangent Kernel Eigenvalues Accurately Predict Generalization
Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel", one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions...

A Few More Examples May Be Worth Billions of Parameters
We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters...

Facebook Loves Self-Supervised Learning. Period.
What was once a research strategy for Facebook AI teams – over the years – has turned into an area of scientific breakthrough – where they have been delivering strong internal results, with some self-supervised language understanding models, libraries, frameworks, and experiments consistently beating traditional systems or fully supervised models...

Duke Computer Scientist, Cynthia Rudin, Wins $1 Million AI Prize
Duke University computer scientist Cynthia Rudin wants AI to show its work. Especially when it’s making decisions that deeply affect people’s lives...She chose to pursue opportunities to apply machine learning techniques to important societal problems, and in the process, realized that AI’s potential is best unlocked when humans can peer inside and understand what it is doing...Now, after 15 years of advocating for and developing “interpretable” machine learning algorithms that allow humans to see inside AI, Rudin’s contributions to the field have earned her the $1 million Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity...

Primary visual cortex straightens natural video trajectories
Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories...our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment...

‍

Tools

‍

Create AI-powered search and recommendation apps with Pinecone Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free! *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Entry Level Data Scientist: 2022 - IBM - Multiple Locations As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Machine Learning Formulas Explained: Binary Cross Entropy Loss
This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems...It may look super confusing, but I promise you that it is actually quite simple!...Let's go step by step...

Bayesian Optimization Book [semi-final-draft]
The book aims to provide a self-contained and comprehensive introduction to Bayesian optimization, starting “from scratch” and carefully developing all the key ideas along the way. The intended audience is graduate students and researchers in machine learning, statistics, and related fields. However, I also hope that practitioners and researchers from more distant fields will find some utility here...

NYU Deep Learning SP21 Class [YouTube Playlist]
Course Videos...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍