Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
Issue
412
October 14, 2021

Editor's Picks

  • Machine learning is not nonparametric statistics
    Many times in my career, I’ve been told by respected statisticians that machine learning is nothing more than nonparametric statistics. The longer I work in this field, the more I think this view is both misleading and unhelpful. Not only can I never get a consistent definition of what “nonparametric” means, but the jump from statistics to machine learning is considerably larger than most expect. Statistics is an important tool for understanding machine learning and randomness is valuable for machine learning algorithm design, but there is considerably more to machine learning than what we learn in elementary statistics...
  • State of AI Report 2021
    Now in its fourth year, the State of AI Report 2021 is reviewed by AI practioners in industry and research, and features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions: a) Research: Technology breakthroughs and capabilities, b) Talent: Supply, demand and concentration of AI talent, c) Industry: Areas of commercial application for AI and its business impact, d) Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI, and e) Predictions: What we believe will happen and a performance review to keep us honest...
  • Deploying Machine Learning Models Safely and Systematically
    The Data Exchange Podcast interviews Hamel Husain on CI/CD for ML, MLOps tools and processes, and how much software engineering should data scientists know...Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai, previously worked on machine learning applications and systems at Airbnb and DataRobot...



A Message From This Week's Sponsor

Live Webinar | How to leverage AI for BI at scale Thursday, Oct 21, 2021 at 2PM ET (11AM PT) Learn how to harness the power of AI for BI, democratize data, and improve analytics at scale with featured speakers from Snowflake and Cardinal Health.


Data Science Articles & Videos

  • OpenPrompt: An Open-Source Framework for Prompt-learning.
    Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries...
  • The Dawn of Quantum Natural Language Processing
    In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform the parts-of-speech tagging task via numerical simulations. Moreover, a quantum-enhanced Transformer is proposed to perform the sentiment analysis based on the existing dataset....
  • Self-Supervised Learning Advances Medical Image Classification
    In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology...In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at ICCV 2021, we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification...
  • Balancing Average and Worst-case Accuracy in Multitask Learning
    When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning...
  • Neural Tangent Kernel Eigenvalues Accurately Predict Generalization
    Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel", one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions...
  • A Few More Examples May Be Worth Billions of Parameters
    We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters...
  • Facebook Loves Self-Supervised Learning. Period.
    What was once a research strategy for Facebook AI teams – over the years – has turned into an area of scientific breakthrough – where they have been delivering strong internal results, with some self-supervised language understanding models, libraries, frameworks, and experiments consistently beating traditional systems or fully supervised models...
  • Duke Computer Scientist, Cynthia Rudin, Wins $1 Million AI Prize
    Duke University computer scientist Cynthia Rudin wants AI to show its work. Especially when it’s making decisions that deeply affect people’s lives...She chose to pursue opportunities to apply machine learning techniques to important societal problems, and in the process, realized that AI’s potential is best unlocked when humans can peer inside and understand what it is doing...Now, after 15 years of advocating for and developing “interpretable” machine learning algorithms that allow humans to see inside AI, Rudin’s contributions to the field have earned her the $1 million Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity...
  • Primary visual cortex straightens natural video trajectories
    Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories...our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment...



Tools



Create AI-powered search and recommendation apps with Pinecone Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. It combines state-of-the-art vector search libraries, advanced features such as filtering, and distributed infrastructure to provide high performance and reliability at any scale. Get started now — it's free! *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



Jobs


Training & Resources

  • Bayesian Optimization Book [semi-final-draft]
    The book aims to provide a self-contained and comprehensive introduction to Bayesian optimization, starting “from scratch” and carefully developing all the key ideas along the way. The intended audience is graduate students and researchers in machine learning, statistics, and related fields. However, I also hope that practitioners and researchers from more distant fields will find some utility here...


Books

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.