Data Science Weekly Newsletter - Issue 404

Issue #372

Jan 07 2021

Editor Picks
  • DALL·E: Creating Images from Text
    We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language...DALL·E ... We decided to name our model using a portmanteau of the artist Salvador Dalí and Pixar’s WALL·E... DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images...
  • Medicine's Machine Learning Problem
    Last year MIT researchers trained an algorithm that was more accurate at predicting the presence of cancer within five years of a mammogram than techniques typically used in clinics, and a 2018 survey found that 84 percent of radiology clinics in the United States are using or plan to use machine learning software. The sense of excitement has been captured in popular books such as Eric Topol’s Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (2019). But despite the promise of these data-based innovations, proponents often overlook the special risks of datafying medicine in the age of artificial intelligence...
  • How Transformers work in deep learning and NLP: an intuitive introduction
    The famous paper “Attention is all you need” in 2017 changed the way we were thinking about attention. With enough data, matrix multiplications, linear layers, and layer normalization we can perform state-of-the-art-machine-translation...Nonetheless, 2020 is definitely the year of transformers! From natural language now they are into computer vision tasks. How did we go from attention to self-attention? Why does the transformer work so damn well? What are the critical components for its success?...Read on and find out...

A Message from this week's Sponsor:


Online Data Science Programs from Drexel University

Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career.

Learn more.


Data Science Articles & Videos

  • Finding the Narrative with Natural Language Processing
    There’s no shortage of articles detailing how various topic modeling algorithms work and tutorials on how to code them up (and I relied heavily on them while working through this project) — what felt missing, however, was a high-level framework for implementing them into a comprehensive workflow and how to distill findings and insights into a compelling narrative...With this in mind, I’ll be outlining my process for topic modeling and creating a few insightful visualizations using the Russian Troll Tweets dataset...
  • Reflections on my (Machine Learning) PhD Journey
    With this year coming to an end, I’ve put together some of my reflections and lessons learned from my (Machine Learning) PhD experiences. I discuss topics including expectations going in, common challenges during the PhD (and some strategies for helping with them), keeping up with papers, the community nature of research and developing a research vision. I hope that these topics are helpful in navigating the PhD and research in Machine Learning!...
  • CLIP: Connecting Text and Images
    We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP (Contrastive Language–Image Pre-training) can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and 3...
  • Understanding Deadly Police Encounters with Data Science
    As Americans continue to confront systemic racism and its fatal manifestations in policing, I’ve sought to gain insight into where deadly police encounters are likely to happen where the victim was a person of color using the tools of data science. More specifically, I built a classification model that can identify where deadly encounters are likely to happen based on socio-economic characteristics of the communities in which they occur and created an interactive map of the United States to visualize their geographic spread using Tableau...
  • A simple solution for monitoring ML systems
    This blog post aims to provide a simple, open-source solution for monitoring ML systems. We'll discuss industry-standard monitoring tools and practices for software systems and how they can be adapted to monitor ML systems...
  • Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization
    My previous post Object Detection at 1840 FPS made some readers wonder who would need to detect anything at 1840 FPS, but my good friend and “performance geek” Tanel Põder had a different response: Nice article, I wonder if you could get to 2000 FPS?...Challenge accepted...This article is a deep dive into the techniques needed to get there. We will rewrite Pytorch model code, perform ONNX graph surgery, optimize a TensorRT plugin and finally we’ll quantize the model to bits (to 8 bit precision, that is). We will also keep track of divergence from full-precision accuracy with the COCO2017 validation dataset...
  • Creating “Unbiased News” Using Data Science
    Finding Common Ground Amongst Media Outlets Across the Political Spectrum...I decided to see how data science can help with this matter. I wanted to create “Unbiased News” for the reader to be able to get more impartial information on current events...
  • Which Machine Learning Classifiers are best for small datasets? An empirical study
    Although "big data" and "deep learning" are dominant, my own work at the Gates Foundation involves a lot of small (but expensive) datasets, where the number of rows (subjects, samples) is between 100 and 1000. For example, detailed measurements throughout a pregnancy and subsequent neonatal outcomes from pregnant women. A lot of my collaborative investigations involve fitting machine learning models to small datasets like these, and it's not clear what best practices are in this case...Along with my own experience, there is some informal wisdom floating around the ML community...
  • Controllable Neural Text Generation
    In this post, we will delve into several approaches for controlled content generation with an unconditioned langage model. Note that model steerability is still an open research question. Each introduced method has certain pros & cons: 1) Apply guided decoding strategies and select desired outputs at test time, 2) Optimize for the most desired outcomes via good prompt design, and 3) Fine-tune the base model or steerable layers to do conditioned content generation...



Jumpstart Your Data Career Today

One Week Left for Priority Enrollment for TDI’s Spring Data Science Fellowship

Apply by January 15 and you could earn a priority enrollment scholarship, and increase your chances of getting a full-tuition scholarship!

Master the skills you need to succeed in the business world with TDI. You’ll work with:
  • Live code
  • Real data sets
  • Experienced, live instructors
Plus, you can attend full-time for 8 weeks, or part-time for 20 weeks and get the same great training. Early applications close on January 15. Apply Now.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Data Scientist - Apple Pay Analytics - NYC

    You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...

        Want to post a job here? Email us for details >>


Training & Resources

  • Probabilistic Machine Learning: An Introduction [ 2021 Edition out now ]
    "My favorite machine learning book just received a face-lift! 'Probabilistic Machine Learning: An Introduction' is the most comprehensive and accessible book on modern machine learning by a large margin. It now also covers the latest developments in deep learning and causal discovery. With this upgrade it will remain the reference book for our field that every respected researcher needs to have on their desk."...
  • The Illustrated Transformer
    In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformers outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.