Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
March 3, 2022

Editor's Picks

  • The Weird and Wonderful World of AI Art
    The reality right now is really, really crazy...I don’t think the majority of AI researchers would even have suspected that these images could be created with current tools. The rapidity of the past year’s developments have surprised even some of the most bullish technologists...The AI art we had *before* 2021 was intriguing, but tended to be abstract, esoteric, and just not that relatable to a human. The AI art we have *now* is fully controllable, and can be about whatever you want it to be...What changed? Well, there’s something to be said for the new wave of publicity and interest, which certainly accelerated the pace of our art-generation techniques. But the main development is the rise of **multimodal learning**....
  • Conversational Agents: Theory and Applications
    We provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing....A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology. ...
  • CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
    We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine...

A Message From This Week's Sponsor

Free Course: Natural Language Processing (NLP) for Semantic Search Learn how to build semantic search applications by making machines understand language as people do. This free course covers everything you need to build state-of-the-art language models, from machine translation to question-answering, and more. Brought to you by Pinecone. Start reading now.

Data Science Articles & Videos

  • Why it's best to keep software and data analysis repositories separate
    There are many best practices behind developing R packages, but one that wasn’t very clear to me at first when I starting writing my own software was: Software and data analysis repositories are not the same and should be kept in separate places...The problem?...Let me give you an example of something I’ve seen lately...
  • Data Observability vs. Data Testing: Everything You Need to Know
    You already test your data. Do you need data observability, too?...In the article, we highlight when it makes sense to test your data and when it makes sense to rely on observability to catch bugs in your data. He also highlights four ways data observability differs from data testing...While both observability and testing help you achieve reliable and trustworthy data, each method solves for high-quality data differently...
  • Building a brand as a scientist
    Making discoveries and contributing your ideas and/or work are fundamental components of being a scientist (I am treating of the word “scientist” very broadly here). However, another important component of being a scientist is learning how to build your “brand” as a scientist...Now, lots of people have written about this topic, but I do not think this topic is discussed enough with early career researchers and scientists, in particular at the stage of a graduate student or postdoctoral scientist. However, it can have a potentially large impact on someone’s career. Therefore, I thought I would write down some of the things I have thought about as important for helping me think about building my own brand...
  • Datacast Episode #85 - Ad Exchange, Stream Processing, and Data Discovery with Shinji Kim
    Our wide-ranging conversation touches on her Software Engineering education at Waterloo; her time as a Management Consultant at Deloitte; her first entrepreneurship stint building the mobile game Shufflepix that led to working as a product manager at YieldMo; her second startup Concord solving stream processing and being acquired by Akamai; her current journey with Select Star solving the data discovery problems; lessons learned from finding early adopters, hiring, and fundraising; and much more...
  • ML & Neuroscience: January 2022 must-reads
    What can Machine Learning do for Neuroscience? In this new series, we are going to explore the relationship between ML and neuroscience. This month, Oxford, Stanford, UCL, MIT, Fujitsu and Harvard Medical School researchers and their findings in ML and Neuroscience...
  • A Decade of Deep Learning: How the AI Startup Experience Has Evolved
    The artificial intelligence field has evolved dramatically since the deep learning revolution kicked off in 2012, and Richard Socher has been around for all of it...In this interview, Socher discusses a number of topics, including: how things have changed for AI startups in the last decade; the differences between doing AI in startups, enterprises, and academia; and how new machine learning techniques, such as transformer models, empower companies to build advanced products with a fraction of the resources they would have needed in the past...


Upcoming Webinar, March 9th at 11 am PT/2 pm ET Scaling conda: A Faster conda for a Growing Community With more than 25 million users, Anaconda is the world’s most popular data science platform and the foundation of modern machine learning. The conda package manager is used by many people in the multidisciplinary scientific computing community. This webinar will introduce one of conda’s latest features: a new solver backend based on the community project “libmamba,” which was added to improve speed and accuracy and handle increasingly complex conda environments. Register here!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


Build Smart Data Pipelines with StreamSets Summer ‘21 Beta

Run your first smart data pipeline in StreamSets. It’s easy and completely free. Quickly build and deploy streaming, batch, CDC, ETL and ML pipelines. Handle data drift automatically so you can keep jobs running even when schemas and structures change. Deploy across hybrid and multi-cloud platforms.
Imagine less hands-on maintenance and reliable scalability so you can focus on responding to business requests and needs as quickly as possible.
Learn more about StreamSets Summer ‘21 Beta.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


Training & Resources

  • How To Create a SQL Practice Database with Python
    We should probably invest some time to learn SQL...But there is just one problem: How to practice querying a database when there is no database, to begin with?...In the following sections, we are going to address this fundamental problem and learn how to create our own MySQL database from scratch. With the help of Python and some external libraries, we will create a simple script that automatically creates and populates our tables with randomly generated data....
  • Introduction to Continual Learning
    This is a tutorial to connect the mathematics and machine learning theory to practical implementations addressing the continual learning problem of artificial intelligence. We will learn this in python by examining and deconstructing a method called elastic weight consolidation (EWC)....


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.