Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
January 27, 2022

Editor's Picks

  • DeepMind: The Podcast, Season 2 (12 episodes)
    In the highly-praised, award-nominated "DeepMind: The Podcast", mathematician and broadcaster Hannah Fry goes behind the scenes of world-leading research lab DeepMind to find out how AI can benefit our lives and the society we live in....
  • Overdebunked! Six Statistical Critiques That Don’t Quite Work
    Statistical results and data analyses are quite often wrong. Sometimes they’re wrong because of carelessness, sometimes they’re wrong even though we cared a lot because it’s just really hard to get them right, and other times they’re wrong on purpose. It shouldn’t shock anyone to hear this...below, I’ve listed six statistical critiques I commonly see on social media, and why they’re not great critiques...These aren’t technical errors - they’re not about misinterpreting a p-value or whatever, but more about common-sense critiques of published statistical results that anyone could make...

A Message From This Week's Sponsor

A Two-Day Virtual Interactive ML Community Event for AI/ML Developers and Data Scientists. Learn from 35+ AI experts from DeepMind, Spotify, Twitter, Disney, HuggingFace, Instacart, Colgate, Linkedin, Pinterest, Mobileye, HSBC, AstraZeneca, Verizon, BBC and more in sessions about building real-world AI and machine learning applications, best practices and strategies in AI infrastructure, ML in production, and exciting research that you can apply to your next ML or DL projects.

Data Science Articles & Videos

  • The Non-Engineer’s Guide to Bad Data
    This article is written by a data engineer for a non-technical audience troubleshooting the “broken dashboard” problem and can help data teams educate their stakeholders on the process of tackling broken data pipelines...the reader will learn: a) The role the data engineering team plays in troubleshooting data quality issues and their current responsibilities, b) The impact "bad" data can have on their business, c) A simplified explanation of why data breaks and why it takes time to discover and fix data quality issues, and d) And how data teams rely on data observability to reduce the likelihood of "bad" data entering their Tableau or Looker dashboards and reports...
  • ML and NLP Research Highlights of 2021
    2021 saw many exciting advances in machine learning (ML) and natural language processing (NLP). In this post, I will cover the papers and research areas that I found most inspiring...1) Universal Models, 2) Massive Multi-task Learning, 3) Beyond the Transformer, 4) Prompting, 5) Efficient Methods, 6) Benchmarking, 7) Conditional Image Generation, 8) ML for Science, 9) Program Synthesis, 10) Bias, 11) Retrieval Augmentation, 12) Token-free Models, 13) Temporal Adaptation, 14) The Importance of Data, and 15) Meta-learning...
  • How do you document predictive models just in case they are audited?
    [Reddit Discussion]...I work at a bank and am about to start building my first predictive model. I'm curious how you document your models in case auditors ask to see them? I'm also meeting with our internal auditors next week to come up with a plan, but I'd love to know what you do at your organization if you are willing to share...
  • Two reasons Kubernetes is so complex
    While some of those feelings are fairly universal of learning any new system, Kubernetes really does feel a lot bigger, scarier, and more intractable than some other systems I’ve worked with. As I’ve learned it and worked with it, I’ve tried to understand why it looks the way it does, and which design decisions and tradeoffs lead to it looking the way it does. I don’t claim to have the full answer, but this post is an attempt to commit to paper two specific thoughts or paradigms I have that I reach for as I try to understand why working with Kubernetes feels so hairy sometimes....
  • How to navigate ML research literature
    My slides on "How to navigate ML research literature" for Winter ML school...How to read papers?...How to filter out?...Where to get?...What does peer review mean?...
  • How to run effective ML research
    Gave a talk about ML papers reproducibility at winter school on "How to run effective ML research". Discussed some challenges 🥲 during implementation, objectives , and tips ✍️.Here are the slides...
  • Beginner mistakes to avoid in building Data Pipeline
    [Reddit Discussion] I've recently been promoted to a Data Engineering position at work. That being said, my first project is helping migrate data from SAP ECC to SQL Server and solidify our data pipeline so my Analytics team can extract data in a more streamlined way for our dashboards and modeling...I don't have much guidance from technical leadership or access to technical expertise in this undertaking, and I wanted to see if there were any Sr. DE's that had common "rookie" mistakes they've seen in similar initiatives that I should look out for...
  • Topology and Computability
    Readers of this blog are familiar with notions of computability – basically, the question is, what can machines do without human assistance? And you are familiar with machines. Electronic ones of course, but I always like to think of machines as composed of gears, levers and pulleys...Topology? That’s another story. Rubber doughnuts being continuously stretched but always preserving that hole. Or calculus and differential equations...So what’s the connection? You’d be surprised...


Check out the new Anaconda Community for all-things data! Want insights into the newest developments in the world of data, or need help getting “unstuck” on a problem? Our Community Forums is the place to go! Be the first to engage with other professionals and ask questions to the broader data community. Users can join in conversations around trends, debate new features, post questions to the community, and more. Plus, it’s another avenue for technical help! Create your free Anaconda Community account now.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


Training & Resources

  • A method for explaining machine learning models: Shapley values (SHAP)
    A prediction can be explained by assuming that each feature value of the instance is a “player” in a game where the prediction is the payout. Shapley values – a method from coalitional game theory – tells us how to fairly distribute the “payout” among the features...Shapley values: a) Model-agnostic: Use with any model, b) Theoretic foundation: Game theory, c) Good software ecosystem, and d) Local and global explanations...
  • Regression and Other Stories [Book PDF, Free]
    Most textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is about using regression to solve real problems of comparison, estimation, prediction, and causal inference. Unlike other books, it focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can...
  • Modern Robotics: Mechanics, Planning, and Control [Book PDF, Free]
    This introduction to robotics offers a distinct and unified perspective of the mechanics, planning and control of robots. Ideal for self-learning, or for courses, as it assumes only freshman-level physics, ordinary differential equations, linear algebra and a little bit of computing background. Modern Robotics presents the state-of-the-art, screw-theoretic techniques capturing the most salient physical features of a robot in an intuitive geometrical way. With numerous exercises at the end of each chapter, accompanying software written to reinforce the concepts in the book and video lectures aimed at changing the classroom experience, this is the go-to textbook for learning about this fascinating subject...


P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.