Data Science Weekly Newsletter - Issue 420

Issue #388

Apr 29 2021

Editor Picks
  • Clustergam: visualisation of cluster analysis
    When we want to do some cluster analysis to identify groups in our data, we often use algorithms like K-Means, which require the specification of a number of clusters. But the issue is that we usually don’t know how many clusters there are...There are many methods on how to determine the correct number, like silhouettes or elbow plot, to name a few. But they usually don’t give much insight into what is happening between different options, so the numbers are a bit abstract...Matthias Schonlau proposed another approach – a clustergram. Clustergram is a two-dimensional plot capturing the flows of observations between classes as you add more clusters...
  • Bayesian and frequentist results are not the same, ever
    I often hear people say that the results from Bayesian methods are the same as the results from frequentist methods, at least under certain conditions. And sometimes it even comes from people who understand Bayesian methods...You can’t compare results from Bayesian and frequentist methods because the results are different kinds of things. Results from frequentist methods are generally a point estimate, a confidence interval, and/or a p-value. Each of those results is an answer to a different question...
  • Inside Netflix’s Quest to End Scrolling: Inside the Play Something Feature
    How the company is working to solve one of its biggest threats: decision fatigue...Ten years ago, Netflix got the idea that its app should work more like regular TV...Instead of having subscribers start their streaming sessions scrolling through rows and rows of content, they wondered what would happen if a show or movie simply began playing as soon as someone clicked into the app — you know, like turning on your family’s boxy old Zenith...

A Message from this week's Sponsor:


Get exclusive content to fuel your breakthroughs at The Edge –
powered by Z by HP & Nvidia

Meet the demands of your workflows with articles, case studies, videos, podcasts, webinars and more, at the new Z by HP data science center. Hit the ground running with the latest research and industry trends, and–for an extra dose of motivation–check out our Ambassador section. There you’ll find experiences, favorite tools and their data science goals for the future that’ll help turn your data into transformative business results.

Check it out.


Data Science Articles & Videos

  • A Learning Theoretic Perspective on Local Explainability
    There has been a growing interest in interpretable machine learning (IML), towards helping users better understand how their ML models behave...In what follows, we’ll first provide a quick introduction to local explanations. Then, we’ll motivate and answer our two main questions by discussing a pair of corresponding generalization guarantees that are in terms of how “accurate” and “complex” the explanations are...Finally, we’ll examine our insights in practice by verifying that these guarantees capture non-trivial relationships in a few real-world datasets...
  • The Fourier transform is a neural network
    “Can neural networks learn the Fourier transform?”...We can consider the the discrete Fourier transform (DFT) to be an artificial neural network: it is a single layer network, with no bias, no activation function, and particular values for the weights. The number of output nodes is equal to the number of frequencies we evaluate...
  • Search: Query Matching via Lexical, Graph, and Embedding Methods
    Search and recommendations have a lot in common. They help users learn about new products, and need to retrieve and rank millions of products in a very short time (<150ms). They’re trained on similar data, have content and behavioral-based approaches, and optimize for engagement (e.g., click-through rate) and revenue (e.g., conversion, gross merchandise value)...In this post, we’ll explore various ways to normalize, rewrite, and retrieve (aka match) documents using the query. (Though the ranking step is also important, we’ll focus on query processing and matching for now). We’ll compare three main approaches: Lexical, Graph, and Embedding...
  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
    Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis...TL;DR: We incorporate a compositional 3D scene representation into the generative model which leads to more controllable image synthesis...
  • Sceneformer: Indoor Scene Generation with Transformers
    We address the task of indoor scene generation by generating a sequence of objects, along with their locations and orientations conditioned on a room layout. Large-scale indoor scene datasets allow us to extract patterns from user-designed indoor scenes, and generate new scenes based on these patterns. Existing methods rely on the 2D or 3D appearance of these scenes in addition to object positions, and make assumptions about the possible relations between objects. In contrast, we do not use any appearance information, and implicitly learn object relations using the self-attention mechanism of transformers. We show that our model design leads to faster scene generation with similar or improved levels of realism compared to previous methods...
  • Gradient-based Adversarial Attacks against Text Transformers
    We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs...
  • Self-supervised Video Object Segmentation by Motion Grouping
    Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation...
  • dualFace:Two-Stage Drawing Guidance for Freehand Portrait Sketching
    In this paper, we propose dualFace, a portrait drawing interface to assist users with different levels of drawing skills to complete recognizable and authentic face sketches. dualFace consists of two-stage drawing assistance to provide global and local visual guidance: global guidance, which helps users draw contour lines of portraits (i.e., geometric structure), and local guidance, which helps users draws details of facial parts (which conform to user-drawn contour lines), inspired by traditional artist workflows in portrait drawing...
  • Model-Based RL for Decentralized Multi-agent Navigation
    As robots become more ubiquitous in day-to-day life, the complexity of their interactions with each other and with the environment grows. In a controlled environment, such as a lab, multiple robots can coordinate their actions and efforts through a centralized planner that facilitates communication between individual agents. And while much research has been done to address reliable sensor-informed goal navigation, in many real-world applications aligning goals across independent robotic agents must be done without a centralized planner, which poses non-trivial challenges...
  • From Sequential to Parallel, a story about Bayesian Hyperparameter Optimization
    Do you think that it is possible to improve a process by making each step more inefficient? Every data scientist knows what a headache and time-consuming process the hyperparameter search can be. In his newest blog post, Andres shares how we reduce the run time of our Bayesian hyperparameter search to a third by making each step more inefficient...

Conference Videos*


Scale Transform:
AI Leaders Showcase Research Breakthroughs and Real-World Impact

Scale Transform broke records and brought together more than 9,000+ leading researchers, practitioners, and executives.

The conference featured an all-star line-up of 27 of the leading AI researchers and practitioners and 19 sessions from the latest research breakthroughs to the real-world impact across industries.

Some of our speakers, representing a variety of backgrounds, industries, and use cases includes:
  • Kevin Scott, CTO of Microsoft
  • Andrew Ng, Founder of DeepLearning.AI
  • Fei-Fei Li, CS Professor at Stanford University and Co-Director at Stanford Institute for Human-Centered AI
  • Sam Altman, Co-Founder and CEO at OpenAI

Register to get access to all Transform videos...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Tecton AI - SF / NYC

    Tecton is building an enterprise Feature Store that is transforming the way companies solve real-world problems with machine learning at scale. Our founding team created Uber's Michelangelo ML Platform, which has become the blueprint for modern ML platforms in large organizations. We recently received Series B funding from Sequoia Capital and Andreessen Horowitz, have paying enterprise customers, and have growing engineering teams in SF and NYC. The team has years of experience building and operating business-critical machine learning systems at scale at places like Uber, Google, Facebook, Airbnb, Twitter, Quora, and AdRoll...

    Software Engineer, Machine Learning

    Software Engineer, Data Infrastructure

    Software Engineer, Frontend

        Want to post a job here? Email us for details >>


Training & Resources

  • Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
    In this text, we make a modest attempt to apply the Erlangen Programme mindset to the domain of deep learning, with the ultimate goal of obtaining a systematisation of this field and ‘connecting the dots’. We call this geometrisation attempt ‘Geometric Deep Learning’...We believe this text would appeal to a broad audience of deep learning researchers, practitioners, and enthusiasts. A novice may use it as an overview and introduction to Geometric Deep Learning. A seasoned deep learning expert may discover new ways of deriving familiar architectures from basic principles and perhaps some surprising connections. Practitioners may get new insights on how to solve problems in their respective fields....
  • The Algorithms - Python
    All algorithms implemented in Python (for education)...These implementations are for learning purposes only. Therefore they may be less efficient than the implementations in the Python standard library...
  • Explainable AI Cheat Sheet
    Your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions...




  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.