A Deep Dive into NLP Tokenization and Encoding with Word and Sentence Embeddings
This is a deep dive: over 8,000 words long. Don’t be afraid to bookmark this article and read it in pieces. There is a lot to cover. We will start with basic One-Hot encoding, move on to word2vec word and sentence embeddings, build our own custom embeddings using R, and finally, work with the cutting-edge BERT model and its contextual embeddings...
Making Deep Learning Go Brrrr From First Principles
So, you want to improve the performance of your deep learning model. How might you approach such a task? Often, folk fall back to a grab-bag of tricks that might've worked before or saw on a tweet. "Use in-place operations! Set gradients to None! Install PyTorch 1.10.0 but not 1.10.1!"...It's understandable why users often take such an ad-hoc approach performance on modern systems (particularly deep learning) often feels as much like alchemy as it does science. That being said, reasoning from first principles can still eliminate broad swathes of approaches, thus making the problem much more approachable...So, if you want to keep your GPUs going brrrr, let's discuss the three components your system might be spending time on - compute, memory bandwidth, and overhead...
Announcing the 2022 AI Index Report
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. The annual report tracks, collates, distills, and visualizes data relating to artificial intelligence, enabling decision-makers to take meaningful action to advance AI responsibly and ethically with humans in mind...
A Message From This Week's Sponsor
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
Data Science Articles & Videos
The “0 / 1 / Done” Strategy for Data Science
To achieve operational excellence in applied data science delivery aim for: 0-day Handovers, 1-day Prototyping and to declare projects as Done when Completely Done. Work backwards from these goals, conducting a gap analysis between those and current team capabilities, to identify process, tooling and governance initiatives to reach them...
What Good Data Product Managers Do – And Why You Probably Need One
We are seeing data departments modernize their team structure with data product managers at the helm of such projects...In this article, we’ll walk through: a) What is a data product manager? How did the role evolve?, b) What is a data product?, c) What does a data product manager do? What skills do they need?, d) What background do product managers need? Who do they report to?, e) Data product manager vs product manager, f) Data product manager vs data scientist, and g) The future of the data product manager...
How to Build Effective (and Useful) Dashboards
With practice I have been developing a four-step approach (that I am still fine-tuning) to build dashboards that are, first of all, effective, but also useful...In this article I want to take you through these four steps. Whether you are an experienced analyst building data visualizations all day long or a business user using dashboards from time to time, I hope you will find these guidelines useful...
Future ML Systems Will Be Qualitatively Different
In 1972, the Nobel prize-winning physicist Philip Anderson wrote the essay "More Is Different". In it, he argues that quantitative changes can lead to qualitatively different and unexpected phenomena...In this post, I'll argue that emergence often occurs in the field of AI, and that this should significantly affect our intuitions about the long-term development and deployment of AI systems. We should expect weird and surprising phenomena to emerge as we scale up systems. This presents opportunities, but also poses important risks...
Building systems to securely reason over private data
People today rely on AI systems such as assistants and chatbots to help with countless tasks...For systems to execute these tasks, users must provide them with relevant information — such as one’s location or work calendar. In some cases, however, people would prefer to keep information private, which means not uploading it to cloud-based AI systems or sharing it with others...Meta AI is releasing ConcurrentQA, the first public data set for studying information retrieval and question answering (QA) with data from multiple privacy scopes. Alongside the data set and problem exploration, we have developed a new methodology as a starting point for thinking about privacy in retrieval-based settings called Public-Private Autoregressive Information Retrieval (PAIR)...
Generative Flow Networks
I [Yoshua Bengio] have rarely been as enthusiastic about a new research direction. We call them GFlowNets, for Generative Flow Networks. They live somewhere at the intersection of reinforcement learning, deep generative models and energy-based probabilistic modelling...What I find exciting is that they open so many doors, but in particular for implementing the system 2 inductive biases I have been discussing in many of my papers and talks since 2017, that I argue are important to incorporate causality and deal with out-of-distribution generalization in a rational way...
The promise of AI with Demis Hassabis
Last episode of Season 2 of DeepMind: The Podcast...Hannah wraps up the series by meeting DeepMind co-founder and CEO, Demis Hassabis. In an extended interview, Demis describes why he believes AGI is possible, how we can get there, and the problems he hopes it will solve. Along the way, he highlights the important role of consciousness and why he’s so optimistic that AI can help solve many of the world’s major challenges. As a final note, Demis shares the story of a personal meeting with Stephen Hawking to discuss the future of AI and discloses Hawking’s parting message...
Winning at Competitive ML in 2022
Hoping to win a machine learning competition in 2022? Here’s what you need to know. I collaborated with ML Contests, using their database of over 80 competitions that took place in 2021 across Kaggle, DrivenData, AICrowd, Zindi, and 13 other platforms. Wherever the information was available, we categorized winners to figure out what made them win...
You're invited to the first-ever Metrics Store Summit
is hosting the first-ever industry summit on the metrics layer. The first-ever
Metrics Store Summit
on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there!
Register today for free.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Training & Resources
Efficient Transformers: A Survey
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning...Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains...
A GFlowNet is a trained stochastic policy or generative model, trained such that it samples objects $x$ through a sequence of constructive steps, with probability proportional to $R(x)$, where $R$ is some given non-negative integrable reward function...
The Bayesian Learning Rule
We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout...
, Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian