Data Science Weekly Newsletter

Issue

415

November 4, 2021

‍

Editor's Picks

‍

Spectrograms or: How I Learned to Stop Worrying and Love Audio Signal Processing for Machine Learning
If you’ve ever decided to take on an A.I. sound project...you will soon realize you've taken on two projects...The audio signal processing that is required to convert the original signal into spectrograms. (Spectrograms are images of time-frequency domain features that were extracted from wave signals) And once you have those, then you can move forward with a straight ahead image classification deep learning project using those spectrograms...and that is the purpose of this article, is to walk you through the process...

True Stories of Algorithmic Improvement
In May 2020, OpenAI released a report on algorithmic efficiency improvements in deep learning. Main headline: Compared to 2012, it now takes 44 times less compute to train a neural network to the level of AlexNet (by contrast, Moore’s Law would yield an 11x cost improvement over this period)...I’ve had various experiences over the years which made the result not-that-surprising. Algorithms beating compute is the sort of thing I expect by default, on a gut level. The point of this post is to tell a few of the stories which underlie that intuition, aimed especially toward people who don’t have much first-hand experience with software engineering, ML, or simulation...

Yes but what is a Gaussian process? or, Once, twice, three times a definition; or A descent into madness
I guess I’m going to talk about Gaussian processes now...I am going to define this stuff three times. Once for mum, once for dad, and once for the country...You’ve got to wonder why anyone would introduce something three ways. There are some reasons. The first is, of course, that each definition gives you a different insight into different aspects of Gaussian processes (the operational, the boundless generality, the functional). And the second is because I’ve had to use all three of these ideas (and several more) over the years in order to understand how Gaussian processes work...

‍

A Message From This Week's Sponsor

‍

Quit writing SQL. Find answers faster. Tired of building dashboards and writing SQL queries for colleagues? PostHog enables teams to get answers by themselves quickly and easily, without needing to write any code. And it can be deployed on your own infrastructure, which is nice. PostHog offers everything product-led teams need to grow, including funnel analysis, session recordings and feature flags — all in one platform, all without SQL. Deploy PostHog today for free.

‍

Data Science Articles & Videos

‍

58 Ways to Visualize Alice in Wonderland
How many ways are there to visualize a book?...Ever so curious, I decided to find out. To come up with some kind of method to search broadly, I picked one book, Lewis Carroll’s Alice’s Adventures in Wonderland and decided to find all the possible visualizations that might pop-up on Google/Bing text search, image search, scholar search. I found more than 40!...

AI and Financial Services Industry Brief
The spike in demand since the onset of COVID-19 for digital services from grocery delivery to banking has catalyzed a reimagining of the digital infrastructure we will need to power our post-pandemic world. In this brief you will learn where researchers lead the charge in identifying opportunities and pitfalls in deploying AI in financial services. Their work sheds light on how integrating AI can make financial services and their delivery more inclusive, accessible, and effective...

The Opportunity within AMTRAK: America’s Railroad
Using interactive Tableau maps to locate Amtrak’s future customers...

Towards a Theory of Justice for Artificial Intelligence
This paper explores the relationship between artificial intelligence and principles of distributive justice. Drawing upon the political philosophy of John Rawls, it holds that the basic structure of society should be understood as a composite of socio-technical systems, and that the operation of these systems is increasingly shaped and influenced by AI. As a consequence, egalitarian norms of justice apply to the technology when it is deployed in these contexts. These norms entail that the relevant AI systems must meet a certain standard of public justification, support citizens rights, and promote substantively fair outcomes -- something that requires specific attention be paid to the impact they have on the worst-off members of society...

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence
Contrary to viewing embodied intelligence as another application domain for machine learning, here we argue that it is in fact a key driver for the advancement of machine learning technology. In this article our goal is to highlight challenges and opportunities that are specific to embodied intelligence and to propose research directions which may significantly advance the state-of-the-art in robot learning...

Explaining Machine Learning Models: A Non-Technical Guide to Interpreting SHAP Analyses
With interpretability becoming an increasingly important requirement for machine learning projects, there's a growing need to communicate the complex outputs of model interpretation techniques to non-technical stakeholders. SHAP (SHapley Additive exPlanations) is arguably the most powerful method for explaining how machine learning models make predictions, but the results from SHAP analyses can be non-intuitive to those unfamiliar with the approach...For data scientists, this guide outlines a structured approach for presenting the results from a SHAP analysis, and how to explain the recommended plots to an audience unfamiliar with SHAP....

Canonical Capsules: Self-Supervised Capsules in Canonical Pose
We propose an unsupervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties...

Dense Vectors: Capturing Meaning with Code
There is perhaps no greater contributor to the success of modern Natural Language Processing (NLP) technology than vector representations of language. The meteoric rise of NLP was ignited with the introduction of word2vec in 2013...Word2vec is one of the most iconic and earliest examples of dense vectors representing text. But since the days of word2vec, developments in representing language have advanced at ludicrous speeds...This article will explore why we use dense vectors — and some of the best approaches to building dense vectors available today...

Rliable: Better Evaluation for Reinforcement Learning - A Visual Explanation
It is critical for Reinforcement Learning (RL) practitioners to properly evaluate and compare results. Reporting results with poor comparison leads to a progress mirage and may underestimate the stochasticity of the results. To this end, Deep RL at the Edge of the Statistical Precipice (Neurips Oral) provides recommendations for a more rigorous evaluation of DeepRL algorithms. The paper comes with an open-source library named rliable...This blog post is meant to be a visual explanation of the tools used by the rliable library to better evaluate and compare RL algorithms...

Deep Learning Optimization Theory — Introduction
Understanding the theory of optimization in deep learning is crucial to enable progress. This post introduces the experimental and theoretical approaches to studying it...

‍

Tools

‍

Retool is the fast way to build an interface for any database With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow. Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Entry Level Data Scientist: 2022 - IBM - Multiple Locations As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Machine Learning with JAX - From Zero to Hero | Tutorial #1
With this video I'm kicking off a series of tutorials on JAX!...JAX is a powerful and increasingly more popular ML library built by the Google Research team. The 2 most popular deep learning frameworks built on top of JAX are Haiku (DeepMInd) and Flax (Google Research)...In this video I cover the basics as well as the nitty-gritty details of jit, grad, vmap, and various other idiosyncrasies of JAX...

Survival Analysis in Python
This tutorial is an introduction to survival analysis using computation rather than math...I presented this tutorial for PyData Global 2021...Here are the slides I presented...And Here is the video...

Teaching Data Visualization with Mini-Projects
There’s a special assignment I have designed for my visualization course that I have been using successfully for a number of years. I call it “mini-projects”. A mini-project is a project students can develop over 1 or 2 weeks. It’s big enough to simulate nontrivial decisions designers have to make in real projects, and it’s small enough to be carried out in a small amount of time. That is, it’s efficient and effective...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍