Data Science Weekly Newsletter

Issue

426

January 20, 2022

‍

Editor's Picks

‍

These Bored Apes Do Not Exist
What’s worse than 10,000 Bored Ape Yacht Club NFTs? 100,000. In this blog post, we’ll attempt to train a GAN + super-resolution model to produce 100,000 Bored Ape Yacht Club ape images that do not actually exist...

Prospective Learning: Back to the Future
Research on both natural intelligence (NI) and artificial intelligence (AI) generally assumes that the future resembles the past: intelligent agents or systems (what we call 'intelligence') observe and act on the world, then use this experience to act on future experiences of the same kind. We call this 'retrospective learning'...We argue that this is not what true intelligence is about...In many real world problems, both NIs and AIs will have to learn for an uncertain future. Both must update their internal models to be useful for future tasks...and using these objects effectively in a new context or to achieve previously unencountered goals. This ability to learn for the future we call 'prospective learning'. We articulate four relevant factors that jointly define prospective learning....

AI in health and medicine - Nature Review
AI is poised to broadly reshape medicine, potentially improving the experiences of both clinicians and patients. We discuss key findings from a 2-year weekly effort to track and share key developments in medical AI. We cover prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment. We also address several promising avenues for novel medical AI research, including non-image data sources, unconventional problem formulations and human–AI collaboration. Finally, we consider serious technical and ethical challenges in issues spanning from data scarcity to racial bias...

‍

A Message From This Week's Sponsor

‍

Retool is the fast way to build an interface for any database With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow. Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.

‍

Data Science Articles & Videos

‍

How vectorization speeds up your Python code
Python is not the fastest programming language. So when you need to process a large amount of homogeneous data quickly, you’re told to rely on “vectorization.”...This leads to more questions: a) What does “vectorization” actually mean?, b) When does it apply?, and c) How does vectorization actually make code faster?...To answer that question, we’ll consider interesting performance metrics, learn some useful facts about how CPUs work, and discover that NumPy developers are working hard to make your code faster...

TwitterEng Data Science Team ideas for our annual learning stipends
The Data Science team @TwitterEng crowdsourced ideas for our annual learning stipends, and I thought I would share more broadly for folks looking to level up their #causalinference #experimentation skills. Just in time for #TidyTuesday #RStats and New Years resolution season 🤓🧵...

My Journey Building PyMC Labs: Five Principles from Open Source that Boost Innovation at any Company
I've always wondered why open-source software (OSS) was so much better and more innovative than what companies with armies of highly paid and trained programmers could produce. And if we figured out why, could we apply these same principles to make companies more innovative and fun places to work at?...

Casual Robotics with Kevin Zakka - a new podcast about Robots x AI
A show where people at the intersection of Robotics and Artificial Intelligence can have relaxed conversations about the field, its history, and its exciting future directions...Episode #1, Progress Towards General Purpose Robots, features Eric Jang (research scientist on the Robotics team at Google ) and Pete Florence (Research Scientist at Google Brain, MIT PhD in Robotics)...

Learning-theoretic Perspectives on Model Predictive Control (MPC) via Competitive Control
Since the 1980s, Model Predictive Control (MPC) has been one of the most influential and popular process control methods in industries. The key idea of MPC is straightforward: with a finite look-ahead window of the future, MPC optimizes a finite-time optimal control problem at each time step, but only implements/executes the current timeslot and then optimizes again at the next time step, repeatedly...In this blog, we will discuss some recent results which took the first step in understanding MPC from learning-theoretic perspectives. More specifically, we will show that MPC is a competitive online learner and it enjoys near-optimal dynamic regret guarantees...

How to master Streamlit for data science
To build a web app you’d typically use such Python web frameworks as Django and Flask. But the steep learning curve and the big time investment for implementing these apps present a major hurdle...Streamlit makes the app creation process as simple as writing Python scripts!...In this article, you’ll learn how to master Streamlit when getting started with data science...

When less is more: Simplifying inputs aids neural network understanding
How do neural network image classifiers respond to simpler and simpler inputs? And what do such responses reveal about the learning process? To answer these questions, we need a clear measure of input simplicity (or inversely, complexity), an optimization objective that correlates with simplification, and a framework to incorporate such objective into training and inference. Lastly we need a variety of testbeds to experiment and evaluate the impact of such simplification on learning. In this work, we measure simplicity with the encoding bit size given by a pretrained generative model, and minimize the bit size to simplify inputs in training and inference...

Next-generation seaborn interface
Over the past 8 months, I have been developing an entirely new interface for making plots with seaborn. This page demonstrates some of its functionality...

Write Better Scientific Papers by Understanding Your Readers' Thoughts
In this article, I will share a few methods to structure your writing on three levels. We start with thinking about your key messages and building the story. On the next level we use paragraphs to convey your main arguments. Your sentences, finally, ensure that your reader thinks about the right things at the right time...As a result your readers will understand your story simpler and faster. And you will have a better and painless experience bringing your messages onto paper and out into the world...

A friendly introduction to linear algebra for ML (ML Tech Talks) [Video]
In this session of Machine Learning Tech Talks, Tai-Danae Bradley, Postdoc at X, the Moonshot Factory, will share a few ideas for linear algebra that appear in the context of Machine Learning. Chapters: a) Introduction, b) Data Representations, c) Vector Embeddings, d) Dimensionality Reduction, and e) Conclusion...

‍

Webinar

‍

Live Webinar | How to Align AI & BI to Business Outcomes Wednesday, Jan 26 at 2PM ET (11AM PT) Get practical advice from Global 1000 data leaders at Visa, Cigna, Amazon, and HCL technologies on how they are aligning AI & BI toward business outcomes at their organizations. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

(Senior) Analytics Engineer - Fabulous - Remote Fabulous is a mobile app helping thousands of people every day to change their lifestyles by integrating healthy habits into their lives. Fabulous is using a behavioral economics lens to help everyone achieve their fullest potential. We work closely with researchers based at Duke University and our advisor is Dan Ariely, author of NYT bestseller Predictably Irrational. We are looking for an experienced Analytics Engineer to consolidate the Data Science team and lead the development and enrichment of our Data Pipelines. We have a modern Data-Stack based on Fivetran, dbt, BigQuery, Amplitude, Metabase...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

A probabilistic programming language in 70 lines of Python
In this post I will explain how Probabilistic Programming Languages (PPLs) work by showing step-by-step how to build a simple one in Python...

What Is Causal Inference? An Introduction for Data Scientists
We have heuristics around when causality may not exist, such as “correlation doesn’t imply causation” and “past performance is no indication of future returns,” but pinning down causal effects rigorously is challenging. It’s not an accident that most heuristics about causality are negative—it’s easier to disprove causality than to prove it. As data science, statistics, machine learning, and AI increase their impact on business, it’s all the more important to re-evaluate techniques for establishing causality...

Overview of Supervised Machine Learning Algorithms
There are so many machine learning algorithms out there, and we can find different kinds of overviews and cheat sheets. Why another overview? I tried to build this different overview with these three main focuses in mind: a) a complete hierarchical relationship for all algorithms, b) the apparent relationships between them, and c) a structure to better answer usual ML questions — for example, the effect of numerical scaling, the feature importance, linear model vs non-linear model...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍