Data Science Weekly Newsletter

Issue

432

March 3, 2022

‍

Editor's Picks

‍

The Weird and Wonderful World of AI Art
The reality right now is really, really crazy...I don’t think the majority of AI researchers would even have suspected that these images could be created with current tools. The rapidity of the past year’s developments have surprised even some of the most bullish technologists...The AI art we had *before* 2021 was intriguing, but tended to be abstract, esoteric, and just not that relatable to a human. The AI art we have *now* is fully controllable, and can be about whatever you want it to be...What changed? Well, there’s something to be said for the new wave of publicity and interest, which certainly accelerated the pace of our art-generation techniques. But the main development is the rise of **multimodal learning**....

Conversational Agents: Theory and Applications
We provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing....A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology. ...

CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine...

‍

A Message From This Week's Sponsor

‍

Free Course: Natural Language Processing (NLP) for Semantic Search Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

‍

Data Science Articles & Videos

‍

Why it's best to keep software and data analysis repositories separate
There are many best practices behind developing R packages, but one that wasn’t very clear to me at first when I starting writing my own software was: Software and data analysis repositories are not the same and should be kept in separate places...The problem?...Let me give you an example of something I’ve seen lately...

Data Observability vs. Data Testing: Everything You Need to Know
You already test your data. Do you need data observability, too?...In the article, we highlight when it makes sense to test your data and when it makes sense to rely on observability to catch bugs in your data. He also highlights four ways data observability differs from data testing...While both observability and testing help you achieve reliable and trustworthy data, each method solves for high-quality data differently...

Building a brand as a scientist
Making discoveries and contributing your ideas and/or work are fundamental components of being a scientist (I am treating of the word “scientist” very broadly here). However, another important component of being a scientist is learning how to build your “brand” as a scientist...Now, lots of people have written about this topic, but I do not think this topic is discussed enough with early career researchers and scientists, in particular at the stage of a graduate student or postdoctoral scientist. However, it can have a potentially large impact on someone’s career. Therefore, I thought I would write down some of the things I have thought about as important for helping me think about building my own brand...

What's your favorite unpopular/forgotten Machine Learning method? [Reddit Discussion]
It seems there's a lot of attention (ha ha) on developing the most promising methods/models in Machine Learning, but there are a lot of less popular methods that fly under the radar or die out. I want to learn more about the nooks-and-crannies of ML techniques, so in this spirit I have a few questions for discussion!...

Datacast Episode #85 - Ad Exchange, Stream Processing, and Data Discovery with Shinji Kim
Our wide-ranging conversation touches on her Software Engineering education at Waterloo; her time as a Management Consultant at Deloitte; her first entrepreneurship stint building the mobile game Shufflepix that led to working as a product manager at YieldMo; her second startup Concord solving stream processing and being acquired by Akamai; her current journey with Select Star solving the data discovery problems; lessons learned from finding early adopters, hiring, and fundraising; and much more...

Introducing TorchRec, a library for modern production recommendation systems
We are excited to announce TorchRec, a PyTorch domain library for Recommendation Systems. This new library provides common sparsity and parallelism primitives, enabling researchers to build state-of-the-art personalization models and deploy them in production...

ML & Neuroscience: January 2022 must-reads
What can Machine Learning do for Neuroscience? In this new series, we are going to explore the relationship between ML and neuroscience. This month, Oxford, Stanford, UCL, MIT, Fujitsu and Harvard Medical School researchers and their findings in ML and Neuroscience...

Apache Airflow Tutorial for Beginners Part 1: Introduction and local installation [Video]
Today I am going to introduce Apache Airflow and guide you through how to install airflow locally in your python environment to get a properly running airflow on your local machine. By watching this video, you will know: What is Apache Airflow, Which problems does it try to solve, How to install it on your local machine...

A Data-Driven Approach to Understanding How the Brain Works
A meta-analysis of 18,000 fMRI studies challenges neuroscientists’ understanding of brain functions and reaffirms the need for more targeted treatments for mental disorders...

Weaviate Podcast #9: Karen Beckers about the role of vector search in eCommerce
Karen Beckers, Data Scientist from Squadra Machine Learning Company, gives insightful information about how to use vector search in eCommerce in this podcast with Connor Shorten. Some topics are image-based datasets, vector search for data scientists, the future of eCommerce, and many more!...

A Decade of Deep Learning: How the AI Startup Experience Has Evolved
The artificial intelligence field has evolved dramatically since the deep learning revolution kicked off in 2012, and Richard Socher has been around for all of it...In this interview, Socher discusses a number of topics, including: how things have changed for AI startups in the last decade; the differences between doing AI in startups, enterprises, and academia; and how new machine learning techniques, such as transformer models, empower companies to build advanced products with a fraction of the resources they would have needed in the past...

Webinar*

Upcoming Webinar, March 9th at 11 am PT/2 pm ET Scaling conda: A Faster conda for a Growing Community With more than 25 million users, Anaconda is the world’s most popular data science platform and the foundation of modern machine learning. The conda package manager is used by many people in the multidisciplinary scientific computing community. This webinar will introduce one of conda’s latest features: a new solver backend based on the community project “libmamba,” which was added to improve speed and accuracy and handle increasingly complex conda environments. Register here!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Tools

‍

Build Smart Data Pipelines with StreamSets Summer ‘21 Beta

Run your first smart data pipeline in StreamSets. It’s easy and completely free. Quickly build and deploy streaming, batch, CDC, ETL and ML pipelines. Handle data drift automatically so you can keep jobs running even when schemas and structures change. Deploy across hybrid and multi-cloud platforms.
Imagine less hands-on maintenance and reliable scalability so you can focus on responding to business requests and needs as quickly as possible.
Learn more about StreamSets Summer ‘21 Beta.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Lead Data Engineer - electricityMap - Copenhagen The electricityMap team is hiring a data engineer to help us build and maintain a scalable data pipeline and database that forms the foundation of our mission to accelerate the energy system to a zero-carbon future.

Our mission is to organise the world’s electricity data to drive tangible reductions in carbon emissions. electricityMap started as a popular open-source project 5 years ago and is now used every day by citizens, companies, universities, NGOs, and policy makers around the world to understand and reduce the climate impact of electricity.

You will be joining a fun, international and inclusive team in our mission to tackle climate change – while simultaneously building your professional experience in the rapidly growing industry of climate tech...

‍

Training & Resources

‍

How To Create a SQL Practice Database with Python
We should probably invest some time to learn SQL...But there is just one problem: How to practice querying a database when there is no database, to begin with?...In the following sections, we are going to address this fundamental problem and learn how to create our own MySQL database from scratch. With the help of Python and some external libraries, we will create a simple script that automatically creates and populates our tables with randomly generated data....

Introduction to Continual Learning
This is a tutorial to connect the mathematics and machine learning theory to practical implementations addressing the continual learning problem of artificial intelligence. We will learn this in python by examining and deconstructing a method called elastic weight consolidation (EWC)....

How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions [YouTube Video]
It turns out that multi-head self-attention and convolutions are complementary. So, what makes multi-head self-attention different from convolutions? How and why do Vision Transformers work? In this video, we will find out by explaining the paper “How Do Vision Transformers Work?” by Namuk & Kim, 2021....

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍