Data Science Weekly Newsletter

Issue

423

December 30, 2021

‍

Editor's Picks

‍

2021: A Year Full of Amazing AI papers - A Review
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code...

Graph ML in 2022: Where Are We Now?
It’s been quite a year for Graph ML — thousands of papers, numerous conferences and workshops… How do we catch up with so many cool things happening around? Well, we are puzzled as well and decided to present a structured look at Graph ML highlighting trends and major advancements...

Papers with Code 2021 : A Year in Review
Papers with Code indexes various machine learning artifacts — papers, code, results — to facilitate discovery and comparison. Using this data we can get a sense of what the ML community found useful and interesting this year. Below we summarize the top trending papers, libraries and datasets for 2021 on Papers with Code...

‍

A Message From This Week's Sponsor

‍

High quality data labeling, consistently Edge cases are the most common challenges that ML teams face when training their AI models, making it difficult to reach 95+% accuracy. This can be more complex once you need to scale and start working with 3rd party data labeling solutions. The evaluation metrics that we use to measure the quality of labeled data - Intersection over Union (IOU) and F1 score - has allowed us to make swift adjustments on the go and continuously improve the quality of our labeling standards. To find out more and start exploring our end-to-end data labeling service, speak to the team at Supahands today.

‍

Data Science Articles & Videos

‍

The Statistical Complexity of Interactive Decision Making
A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret...The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning...

Visual Tutorial NER Chunking Token Classification
Why is there so much talk about Named Entity Recognition in a competition that requires us to detect and classify spans of texts? It looks like most of the top public notebooks used a NER approach to solve the Feedback Prize challenge. In this tutorial, I’d like to provide a beginner-friendly, visual explanation to this approach...

tree-math: mathematical operations for JAX pytrees
tree-math makes it easy to implement numerical algorithms that work on JAX pytrees, such as iterative methods for optimization and equation solving. It does so by providing a wrapper class tree_math.Vector that defines array operations such as infix arithmetic and dot-products on pytrees as if they were vectors...

Bits Of Deep Learning Podcast Episode #1 with Eric Jang
A conversation with Eric Jang on the Present and Future of Robotics...a) Introduction, b) Eric's Background, c) The “Just Ask For Generalization” recipe, d) Meta-Learning vs Supervised Learning in Robotics, e) The compute concern around GPT like models, f) How to scale to real-world robots? The Importance of simulators, g) End-to-end in robotics. Pros and Cons, h) How and who will solve robotics?, i) Self-supervised learning: What is it and why is it useful., j) Biggest obstacles in robotics?, k) General-purpose robots and Tesla Bot, and l) Conclusion...

AI Accelerators — Part I: Intro
I will give a high-level overview of accelerators for artificial intelligence applications — what they are, and how they became so popular. As discussed in later posts, accelerators stem from a broader concept rather than just a particular type of system or implementation. They are also not purely hardware-driven, and in fact — much of the AI accelerator industry’s focus has been around building robust and sophisticated software libraries and compiler toolchains...

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise...

Generative Modeling by Estimating Gradients of the Data Distribution
This blog post focuses on a promising new direction for generative modeling. We can learn score functions (gradients of log probability density functions) on a large number of noise-perturbed data distributions, then generate samples with Langevin-type sampling. The resulting generative models, often called score-based generative models, have several important advantages over existing model families: GAN-level sample quality without adversarial training, flexible model architectures, exact log-likelihood computation, and inverse problem solving without re-training models. In this blog post, we will show you in more detail the intuition, basic concepts, and potential applications of score-based generative models...

Learning Bayesian Statistics Podcast Episode #51: Election forecasting models in Germany, with Marcus Gross
Our German friends had federal elections a few weeks ago — consequential elections, since they had the hard task of replacing Angela Merkel, after 16 years in power...To talk about this election, I invited Marcus Gross on the show, because he worked on a Bayesian forecasting model to try and predict the results of this election — who will get elected as Chancellor, by how much and with which coalition?...

Engineering Trade-Offs in Automatic Differentiation: from TensorFlow and PyTorch to Jax and Julia
To understand the differences between automatic differentiation libraries, let's talk about the engineering trade-offs that were made. I would personally say that none of these libraries are "better" than another, they simply all make engineering trade-offs based on the domains and use cases they were aiming to satisfy. The easiest way to describe these trade-offs is to follow the evolution and see how each new library tweaked the trade-offs made of the previous...

CPPE-5: Medical Personal Protective Equipment Dataset
We present a new challenging dataset, CPPE - 5 (Medical Personal Protective Equipment), with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories (such as PASCAL VOC, ImageNet, Microsoft COCO, OpenImages, etc). To make it easy for models trained on this dataset to be used in practical scenarios in complex scenes, our dataset mainly contains images that show complex scenes with several objects in each scene in their natural context...

‍

Tools

‍

Free Course: Natural Language Processing (NLP) for Semantic Search Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist, Decisions - Lyft - New York, NY Data Science is at the heart of Lyft’s products and decision-making. As a member of the Science team, you will work in a dynamic environment, where we embrace moving quickly to build the world’s best transportation. Data Scientists take on a variety of problems ranging from shaping critical business decisions to building algorithms that power our internal and external products. We’re looking for passionate, driven Data Scientists to take on some of the most interesting and impactful problems in ridesharing...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

How Graph Neural Networks (GNN) work: introduction to graph convolutions from scratch
In this tutorial, we will explore graph neural networks and graph convolutions. Graphs are a super general representation of data with intrinsic structure. I will make clear some fuzzy concepts for beginners in this field...The most intuitive transition to graphs is by starting from images...

40 Open-Source Audio Datasets for ML
We have datasets from seven(!) different languages, various domains, and sources...

Good beginner exercise for improving programming: Monte Carlo simulation of the approximation of number pi (π)
The art of Monte Carlo sampling is that we can use it to solve some mathematical problem that is hard to solve on the other way...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍