Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
September 9, 2021

Editor's Picks

  • A friendly introduction to machine learning compilers and optimizers
    This post is a (hopefully) friendly, tearless introduction to ML compilers. It starts with the rise of edge computing, which, I believe, brought compilers out of the realm of system engineers into the realm of general ML practitioners...The next section is about the two major problems with deploying ML models on the edge: compatibility and performance, how compilers can address these problems, and how compilers work. It ends with a few resources on how to significantly speed up your ML models with just a few lines of code...
  • scikit-learn reaches Version 1.0.0 - release notes
    scikit-learn was started in 2007 as a Google Summer of Code project by David Cournapeau. Later that year, Matthieu Brucher started work on this project as part of his thesis...In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel of INRIA took leadership of the project and made the first public release, February the 1st 2010. Since then, several releases have appeared following a ~ 3-month cycle, and a thriving international community has been leading the development...This release hits version 1.0.0!...
  • Getting started with 3D content for synthetic data
    Building computer vision systems that use synthetic data is a transformative shift from ones that use real data...Luckily, the film and video game industries have faced the same content challenges for more than forty years. In that time they have developed new techniques for content creation as well as vast repositories of content, much of which is already perfect for synthetic data...This post will introduce the best ways for sourcing 3D content, each lending itself to a different type of computer vision application...

A Message From This Week's Sponsor

Quick Question For You: Do you want a Data Science job? After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course. The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...

Data Science Articles & Videos

  • The State of Creative Automation
    Like many other parts of our life and work, creativity, the ability to invent or design something new from scratch, might seem like it’s the last safe harbour from automation/machines. But with new advances in AI, creativity is now being transformed by technology and offered as a service to consumers and companies...In this post I cover a non-exhaustive list of content categories that are being re-invented with AI...
  • Readying Medical Students for Medical AI: The Need to Embed AI Ethics Education
    Medical students will almost inevitably encounter powerful medical AI systems early in their careers. Yet, contemporary medical education does not adequately equip students with the basic clinical proficiency in medical AI needed to use these tools safely and effectively. Education reform is urgently needed, but not easily implemented, largely due to an already jam-packed medical curricula. In this article, we propose an education reform framework as an effective and efficient solution, which we call the Embedded AI Ethics Education Framework...
  • The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
    In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy...
  • Datasets: A Community Library for Natural Language Processing
    The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks...
  • Punctuation in novels
    When we think of novels, of newspapers and blogs, we think of words. We easily forget the little suggestions pushed in between: the punctuation. But how can we be so cruel to such a fundamental part of writing? Inspired by a series of posters, I wondered what did my favorite books look like without words. Can you tell them apart or are they all a-mush? In fact, they can be quite distinct...
  • Is traditional higher education in trouble from online course providers?
    I'm going to answer this with a definitive Yes, but actually no. Given my recent experience completing the six-course Project Management certificate offered jointly between Google and Coursera --- which by all measures should be the state of the art in massive online courses --- there are real reasons traditional higher education ought to be paying careful attention, but also some advantages that we in traditional universities still have that keep us ahead of the competition... for now...
  • A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
    The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy...This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain...


TransformX Conference: Driving AI from Experimentation to Reality Join Scale AI for our two-day, virtual conference featuring 100+ speakers and 60+ sessions. We’re bringing together a community of leaders, visionaries, practitioners, and researchers across industries as we explore the shift from research to reality within AI and Machine Learning. Register now to secure your free ticket... *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


  • Senior Data Scientist - TikTok - LA

    TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy by offering a home for creative expression and an experience that is genuine, joyful, and positive.
    • Generate useful features from large amount of data
    • Apply supervised and unsupervised machine learning techniques, such as linear and logistic regression, decision trees, and k-means clustering
    • Develop segmentation models, classification models, propensity models, LTV models, experimental design, optimization models
    • Perform statistical analysis such as KPI deep dives, performance marketing efficiency, behavioral clustering, and user journey analytics
    • Curate audiences and inform engagement tactics to enable differentiated, relevant marketing touches across channels (social, email, in app, push)
    • Synthesize analytics and statistical approaches into easy-to-consume storylines, both visually and verbally, and provide indicated actions for executive audiences
    • Capture business requirements for data and analytic solutions and collaborate XFN to ensure business requirements align with business needs
    • Analyze creatives and surface insights that will help drive engagement and retention
    • Support day-to-day collaboration with performance marketing to communicate insights and recommend data informed strategies
Want to post a job here? Email us for details >>

Training & Resources

  • Physics-based Deep Learning Book [free]
    This document contains a practical and comprehensive introduction of everything related to deep learning in the context of physical simulations. As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started. Beyond standard supervised learning from data, we’ll look at physical loss constraints, more tightly coupled learning algorithms with differentiable simulations, as well as reinforcement learning and uncertainty modeling...
  • Reinforcement Learning Lecture Series 2021
    Taught by DeepMind researchers, this series was created in collaboration with University College London (UCL) to offer students a comprehensive introduction to modern reinforcement learning...Comprising 13 lectures, the series covers the fundamentals of reinforcement learning and planning in sequential decision problems, before progressing to more advanced topics and modern deep RL algorithms...
  • CMU's 11-767: On-Device Machine Learning - Fall 2021
    On-Device Machine Learning is a project-based course covering how to build, train, and deploy models that can run on low-power devices (e.g. smart phones, refrigerators, and mobile robots). The course will cover advances topics on distillation, quantization, weight imprinting, power calculation and more. Every week we will discuss a new research paper and area in this space one day, and have a lab working-group the second. Specifically, students will be provided with low-power compute hardware (e.g. SBCs and inference accelerators) in addition to sensors (e.g. microphones, cameras, and robotics) for their course project...


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.