Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
August 19, 2021

Editor's Picks

  • On the Opportunities and Risks of Foundation Models
    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations)...
  • Causal Inference for The Brave and True
    A light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis. Everything in Python and with as many memes as I could find...Part I of the book contains core concepts and models for causal inference. You will learn how to represent causal questions with potential outcome notation, learn about causal graphs, what is bias and how to deal with it. Most of the content here is well established...Part II (WIP) contains modern development and applications of causal inference to the (mostly tech) industry. While Part I focuses mostly on identifying average treatment effects, Part II takes a shift to personalization and heterogeneous effect estimating with CATE models...
  • Systems for Machine Learning
    Machine learning’s increasing importance to real-world applications brought awareness of a new field focused on ML in practice - machine learning systems (or, as some call it, MLOps). This field acts as a bridging point between the domains of computer systems and machine learning, considering the new challenges of machine learning with a lens shaped by traditional systems research...So what are these “ML challenges”?...

A Message From This Week's Sponsor

FREE Intro to Classification and kNN Workshop Join us on Tuesday, August 24 from 6:00-7:00pm ET, and dive into classification, the branch of supervised machine learning for predicting categories. Register now and join Kimberly Fessel, Lead Data Scientist at Metis, as she walks students through how to identify classification problems and provides an intro to k-nearest neighbors (kNN).

Data Science Articles & Videos

  • - Annotated PyTorch Paper Implementations
    This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better...
  • Supporting COVID-19 policy response with large-scale mobility-based modeling
    Social distancing measures, such as restricting occupancy at venues, have been a primary intervention for controlling the spread of COVID-19. However, these mobility restrictions place a significant economic burden on individuals and businesses. To balance these competing demands, policymakers need analytical tools to assess the costs and benefits of different mobility reduction measures.In this paper, we present our work motivated by our interactions with the Virginia Department of Health on a decision-support tool that utilizes large-scale data and epidemiological modeling to quantify the impact of changes in mobility on infection rates...
  • Introducing the Center for Research on Foundation Models (CRFM)
    The Center for Research on Foundation Models (CRFM) is a new interdisciplinary initiative born out of the Stanford Institute for Human-Centered Artificial Intelligence (HAI) that aims to make fundamental advances in the study, development, and deployment of foundation models...A major focus of CRFM is to develop open, easy-to-use tools, as well as rigorous principles, for training and evaluating foundation models so that a more diverse set of participants can meaningfully critique and improve them...
  • Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
    We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates...
  • robomimic - a framework for robot learning from demonstration
    robomimic is a framework for robot learning from demonstration. It offers a broad set of demonstration datasets collected on robot manipulation domains, and learning algorithms to learn from these datasets. This project is part of the broader Advancing Robot Intelligence through Simulated Environments (ARISE) Initiative, with the aim of lowering the barriers of entry for cutting-edge research at the intersection of AI and Robotics...
  • Complete guide to understanding Node2Vec algorithm
    A typical input for a machine learning model is a vector that represents each data point. The process of translating relationships and network structure into a set of vectors is named node embedding. A node embedding algorithm aims to encode each node into an embedding space while preserving the network structure information. Encoding entity relationships allows you to capture the context of each data point rather than just focusing on its attributes...
  • Impressions of the GDMC AI Settlement Generation Challenge in Minecraft
    The GDMC AI settlement generation challenge is a PCG competition about producing an algorithm that can create an "interesting" Minecraft settlement for a given map. This paper contains a collection of written experiences with this competition, by participants, judges, organizers and advisors. We asked people to reflect both on the artifacts themselves, and on the competition in general. The aim of this paper is to offer a shareable and edited collection of experiences and qualitative feedback - which seem to contain a lot of insights on PCG and computational creativity, but would otherwise be lost once the output of the competition is reduced to scalar performance values...
  • aorist - Data without hassle
    aorist is a Python library that generates Python, Jupyter, and Airflow data pipelines. You can seamlessly move between these dialects with a one-line change to your code...
  • The hacker who spent a year reclaiming his face from Clearview AI
    Matthias Marx is a hacker and researcher studying security systems from Hamburg, Germany. For a year, he pursued the controversial facial recognition company Clearview AI after lodging a complaint with the Hamburg Data Protection Authority that the company was using his biometric data without his consent...In January, the Hamburg data protection authority ordered Clearview to delete the code that identified Marx’s face, saying that the technology violated European data protection rules. His campaign has been at the forefront of an international push by privacy activists, condemning the company and calling for more stringent controls on facial recognition tech. We spoke to Marx about his odyssey to get his face back...
  • Maze: Applied Reinforcement Learning for Real-World Problems
    We are excited to announce Maze, a new framework for applied reinforcement learning (RL). Beyond introducing Maze, this blog post also outlines the motivation behind it, what distinguishes it from other RL frameworks, and how Maze can support you when applying RL — and hopefully prevent some headaches along the way...


Quick Question For You: Do you want a Data Science job? After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course. The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ... *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


  • Senior Data Analyst - HER - Remote

    We are looking for a Senior Data Analyst to help us re-develop our existing data workflow, enable better scalability, and improve accuracy. In addition to this, we’re looking for someone to help improve our ability to discover the relevant information in our data, driving our decisions in delivering an ever improving service.

    The primary focus of the role will be in establishing a new data gathering pipeline, doing statistical analysis, and helping build the analytical basis for the prediction systems. This is the perfect opportunity to be intricately involved in running analytical experiments in a methodical manner, and give us a hand in improving the next generation of recommendation systems that power our social experience.
Want to post a job here? Email us for details >>

Training & Resources

  • Quantile Regression - A simple method to estimate uncertainty in Machine Learning
    When generating predictions about an output, it is sometimes useful to get a confidence score or, similarly, a range of values around this expected value in which the actual value might be found. Practical examples include estimating an upper and lower bound when predicting an ETA or stock price since you not only care about the average outcome but are also very interested in the best-case and worst-case scenarios in when trying to minimize risk e.g. avoid getting late or not loosing money...While most Machine Learning techniques do not provide a natural way of doing this, in this article, we will be exploring Quantile Regression as a means of doing so...
  • Interactive Animated Visualization with AnimatPlot
    There is N number of python libraries that help in visualizing the data like Matplotlib, Seaborn, etc...What if I tell you that you can animate your visualizations? Pretty interesting right! How cool it will be if you can do that. Let’s unravel the mystery behind this and learn about how we can animate our regular plots...AnimatPlot is an open-source python library that is built on top of Matplotlib and is used for creating highly interactive animated plots. In this article, we will explore some of the functionalities that AnimatPlot provides. Let’s get started...
  • A Survey of Transformers
    Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing...In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research....


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.