Data Science Weekly Newsletter - Issue 410

Issue #378

Feb 18 2021

Editor Picks
  • DSPods: Data Science Podcast Directory
    DSPods is an Open Source collection of podcasts about Data Science. The plain text podcast list can be found at dspods/ The website just adds a few bells and whistles around this podcasts list to make it easy for perusal. What are those bells and whistles?...1) Find all Data Science podcasts here. No need to trawl search engine results and blog posts, 2) The list is community created and curated, thus has a good signal-to-noise ratio, 3) Easily find the active podcasts; avoid the heartbreak when you realise that the podcast you liked so much is no longer releasing any new episodes, and 4) Since the details of the podcasts are updated daily, it’s never static...

A Message from this week's Sponsor:


Secure Your Spot in a Future TDI Cohort

Take Data Science Essentials

Whether you’re looking to improve your core data wrangling and analysis skills, or qualify for our data science fellowship program, TDI’s online class, Data Science Essentials, will help you achieve your goals.

This 8-week, evening online class was developed with insights from our industry partners and uses the same rigorous methodology as our data science fellowship.

Successfully complete Data Science Essentials and you’ll earn a spot in a future TDI fellowship cohort, plus a $1,000 scholarship!

The next class begins March 23.
Take Data Science Essentials.


Data Science Articles & Videos

  • Learn About Transformers: A Recipe
    Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks. While it has mostly been used for NLP tasks, it is now seeing heavy adoption to address computer vision tasks. That makes it a very important technique to understand and be able to apply...
  • PyTextRank
    PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: a) extract the top-ranked phrases from text documents, b) run low-cost extractive summarization of text documents, c) help infer links from unstructured text into structured data...One of the motivations for PyTextRank is to provide support (eventually) for entity linking, in contrast to the more commonplace usage of named entity recognition. These approaches can be used together in complementary ways to improve the results overall...
  • High-Performance Large-Scale Image Recognition Without Normalization
    Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets...
  • Structured Prediction part one - Deriving a Linear-chain CRF
    In this post, we’ll talk about linear-chain CRFs applied to part-of-speech (POS) tagging. In POS tagging, we label all words with a particular class, like verb or noun. This can be useful for things like word-sense disambiguation, dependency parsing, machine translation, and other NLP applications. In the following, we will cover the necessary background to understand the motivation behind using a CRF, talking about things like discriminative and generative modeling, probabilistic graphical models, training a linear-chain CRF with efficient inference methods using dynamic programming like belief propagation (AKA the forward-backward algorithm), and decoding with Viterbi...
  • Machine learning accelerated computational fluid dynamics
    Numerical simulation of fluids plays an essential role in modeling many physical phenomena, such as weather, climate, aerodynamics and plasma physics. Fluids are well described by the Navier-Stokes equations, but solving these equations at scale remains daunting, limited by the computational cost of resolving the smallest spatiotemporal features. This leads to unfavorable trade-offs between accuracy and tractability. Here we use end-to-end deep learning to improve approximations inside computational fluid dynamics for modeling two-dimensional turbulent flows. For both direct numerical simulation of turbulence and large eddy simulation, our results are as accurate as baseline solvers with 8-10x finer resolution in each spatial dimension, resulting in 40-80x fold computational speedups...
  • D3.js Turns 10: A Decade of Growth and Gratitude
    Many of the world's iconic data visualizations have been created using D3. And that's because D3 has changed how the world visualizes data. From its release 10 years ago to today, it has transformed what and how dataviz is created in the world. In 10 years, so much can happen, especially in the world of open-source JavaScript libraries. Today, we're taking a moment to reflect, express our gratitude, and talk about what comes next...
  • ML Reproducibility Challenge 2020 and Spring 2021
    The primary goal of this event is to encourage the publishing and sharing of scientific results that are reliable and reproducible. In support of this, the objective of this challenge is to investigate reproducibility of papers accepted for publication at top conferences by inviting members of the community at large to select a paper, and verify the empirical results and claims in the paper by reproducing the computational experiments, either via a new implementation or using code/data or other information provided by the authors...
  • Is back-prop biologically plausible?
    One of the common criticisms of Deep Learning is it's training algorithms, back-propagation of error (back-prop), has no biologically plausible implementation, despite evidence of something like it occurring in the brain. The default implementation is considered biologically implausible due to it's reliance on bi-directional synapses. However, there's been a lot of publications published showing implementations that should be plausible...
  • Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning
    Biologically Plausible Deep Learning (BPDL) is an active research field at the intersection of Neuroscience and Machine Learning, studying how we can train deep neural networks with a "learning rule" that could conceivably be implemented in the brain... My somewhat contrarian opinion is that designing biologically plausible alternatives to backprop is the wrong question to be asking...



NEW from Gradient Flow: 2021 Trends in Data, Machine Learning, and AI

Read up on current trends and predictions for growth and development by Ben Lorica, Mikio Braun, and Jenn Webb.

Topics include: Tools for Building Machine Learning & AI Applications, Data Management & Data Engineering, Cloud Computing, and Emerging AI Trends.

Get the free report

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Senior Revenue Data Scientist - Mozilla - Remote

    Come join a company with open-source at its heart! Mozilla is wholly owned by a non-profit and strives to build products that keep the Internet open, accessible, and secure for everyone. You’ll be part of our Data Org where you’ll join a talented team of Data Scientists and Data Engineers. We have a mature data pipeline that processes terabytes of data per day.

    We’re looking for a Senior Data Scientist to join our Revenue Data Science team. You’ll work with a cross-functional team to understand and strengthen Mozilla’s financial health. You’ll have a chance to collaborate with folks from across the company and have a visible impact on our success....

        Want to post a job here? Email us for details >>


Training & Resources

  • Patterns, predictions, and actions: A story about machine learning [Free Textbook]
    This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences...
  • How do you deal with imposter syndrome at work? [ Reddit Discussion ]
    At my new job I often feel like I don’t know enough when compared to my co workers. I know this is normal, and is sometimes a motivating factor to learn and get better. But often, it can be very demotivating as well, feeling like you aren’t enough and I got here just by luck?...I just feel like the field is so vast, how much ever I read up there is definitely going to be things I don’t know...It would be great to hear from veterans and seniors out there in research and academia about how you overcame this or how you deal with this feeling?...Also is this often dependent on work environments? Or is this just everywhere? And do women and minorities feel like this much more?...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.