Data Science Weekly Newsletter - Issue 411

Issue #379

Feb 25 2021

Editor Picks
  • I wrote a book about using data science to solve “everyday” problems
    I've always wanted to write a book. I have helped write 3 different deeply technical books (and one solutions manual), but I wanted something fun, interesting, and valuable...So I wrote "Everyday Data Science" which is a collection of stories, tutorials, jokes, math, and code all written to inspire people to analyze their personal data...In general, I was also inspired by the challenge to "make $100 online" which I have done in the past month since launching. It was daunting, and I felt quite vulnerable, but overall I'm pleased with what I've made...I wrote up this quick post to give you an idea of the process I followed to write the book, and some of the content...
  • Python Concurrency: The Tricky Bits
    As a data scientist who is spending more time on software engineering, I was recently forced to confront an ugly gap in my knowledge of Python: concurrency. To be honest, I never completely understood how the terms async, threads, pools and coroutines were different and how these mechanisms could work together. Every time I tried to learn about the subject, the examples were a bit too abstract for me, and I hard time internalizing how everything worked...This changed when a friend of mine recommended a live coding talk by David Beazley, an accomplished Python educator...This blog post documents what I learned along the way so others can benefit, too...

A Message from this week's Sponsor:


Feature store: The data platform for building, deploying, and using ML features

The Uber Michelangelo team built the first feature store to scale Uber’s Machine Learning to 1000s of production models in just a few years. Feature stores have now become an essential part of the modern stack for operational ML. They bring DevOps principles to ML data, and allow data scientists to build great ML features, get them to production instantly, and share them across teams. Mike Del Balso, Co-Founder of Tecton, and Willem Pienaar, creator of Feast, teamed up to provide a joint definition of feature stores and how they can solve the data problem for ML.


Data Science Articles & Videos

  • Workloads of Counting Queries: Enabling Rich Statistical Analyses with Differential Privacy
    In this post, we will look at answering a collection of counting queries—which we call a workload—under differential privacy. This has been the subject of considerable research effort because it captures several interesting and important statistical tasks. By analyzing the specific workload queries carefully, we can design very effective mechanisms for this task that achieve low error...
  • First return, then explore
    We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly ‘remembering’ promising states and returning to such states before intentionally exploring. Go-Explore solves all previously unsolved Atari games and surpasses the state of the art on all hard-exploration games1, with orders-of-magnitude improvements on the grand challenges of Montezuma’s Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task...
  • Towards Simple, Interpretable, and Trustworthy AI
    In this episode of the Data Exchange [Podcast] I speak with Sheldon Fernandez, CEO at Darwin AI, and Alex Wong, Professor at the University of Waterloo, Co-Founder of DarwinAI (Chief Scientist) and Euclid Labs about building tools to help companies operationalize machine learning and AI...
  • Fast Inverse Square Root — A Quake III Algorithm [Video]
    In this video we will take an in depth look at the fast inverse square root and see where the mysterious number 0x5f3759df comes from. This algorithm became famous after id Software open sourced the engine for Quake III. On the way we will also learn about floating point numbers and newton's method...
  • The Technology Behind Cinematic Photos
    Looking at photos from the past can help people relive some of their most treasured moments. Last December we launched Cinematic photos, a new feature in Google Photos that aims to recapture the sense of immersion felt the moment a photo was taken, simulating camera motion and parallax by inferring 3D representations in an image. In this post, we take a look at the technology behind this process, and demonstrate how Cinematic photos can turn a single 2D photo from the past into a more immersive 3D animation...
  • Launching the Facebook Map
    For the past year and a half, it’s been our privilege to work on one of our largest and most ambitious undertakings ever: collaborating closely with a team of Facebook engineers, designers, and data experts to roll out a global, multi-scale base map for all of Facebook’s billions of users. In late 2020, this map went live, and we’re extremely proud of the results...Here's how we did it, what we did, and why...
  • A Data Pipeline is a Materialized View
    Materialized views never saw widespread adoption as a primary tool for building data pipelines, likely due to their limitations and ties to relational database technologies. Perhaps with this new wave of tools like dbt and Materialize we’ll see materialized views used more heavily as a primary building block in the typical data pipeline...Regardless of whether we see that kind of broad change, materialized views are still a useful design tool for conceptualizing what we are doing when we build data pipelines...
  • Autonomous navigation of stratospheric balloons using Reinforcement Learning [Video]
    Marlos C. Machado: Efficiently navigating a superpressure balloon in the stratosphere requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques. This talk describes the use of reinforcement learning to create a high-performing flight controller for Loon superpressure balloons. Our algorithm uses data augmentation and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems...
  • The Future of PyMC3, or: Theano is Dead, Long Live Theano
    TL;DR: PyMC3 on Theano with the new JAX backend is the future, PyMC4 based on TensorFlow Probability will not be developed further...With the ability to compile Theano graphs to JAX and the availability of JAX-based MCMC samplers, we are at the cusp of a major transformation of PyMC3. Without any changes to the PyMC3 code base, we can switch our backend to JAX and use external JAX-based samplers for lightning-fast sampling of small-to-huge models...



Score 200 Free Cores with Coiled Cloud Before 3/12

Coiled, the company providing scalable data science and machine learning with Dask, turned 1 year old this month - and they want to give you a gift to celebrate.

They’re building Coiled Cloud, which provides hosted Dask clusters, docker-less managed software, and zero-click deployments. Through March 12th, all Coiled Cloud users get 200 free cores.

Burst to the cloud with your data science and ML workflows. Help Coiled burn their cloud startup credits. No credit card required.

Come to the Dask side! Sign up with Coiled today and get your free cores here.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Senior Revenue Data Scientist - Mozilla - Remote

    Come join a company with open-source at its heart! Mozilla is wholly owned by a non-profit and strives to build products that keep the Internet open, accessible, and secure for everyone. You’ll be part of our Data Org where you’ll join a talented team of Data Scientists and Data Engineers. We have a mature data pipeline that processes terabytes of data per day.

    We’re looking for a Senior Data Scientist to join our Revenue Data Science team. You’ll work with a cross-functional team to understand and strengthen Mozilla’s financial health. You’ll have a chance to collaborate with folks from across the company and have a visible impact on our success....

        Want to post a job here? Email us for details >>


Training & Resources




  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.