Data Science Weekly Newsletter

Issue

391

May 20, 2021

‍

Editor's Picks

‍

Get started with machine learning on Arduino
Arduino is on a mission to make machine learning simple enough for anyone to use. We’ve been working with the TensorFlow Lite team over the past few months and are excited to show you what we’ve been up to together: bringing TensorFlow Lite Micro to the Arduino Nano 33 BLE Sense. In this article, we’ll show you how to install and run several new TensorFlow Lite Micro examples that are now available in the Arduino Library Manager...The first tutorial below shows you how to install a neural network on your Arduino board to recognize simple voice commands...

Clarifying exceptions and visualizing tensor operations in deep learning code
One of the biggest challenges when writing code to implement deep learning networks, particularly for us newbies, is getting all of the tensor (matrix and vector) dimensions to line up properly. It's really easy to lose track of tensor dimensionality in complicated expressions involving multiple tensors and tensor operations...To help myself and other programmers debug tensor code, I built a new library called TensorSensor. TensorSensor clarifies exceptions by augmenting messages and visualizing Python code to indicate the shape of tensor variables. It works with Tensorflow, PyTorch, and NumPy, as well as higher-level libraries like Keras and fastai.

Papers with Code partners with arXiv
We are excited to announce our partnership with arXiv to support links to code on arXiv...Machine learning articles on arXiv now have a Code tab to link official and community code with the paper...

‍

A Message From This Week's Sponsor

‍

Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.
Download a Program Brochure

‍

Data Science Articles & Videos

‍

Feature Store: The Missing Data Layer in ML Pipelines?
A feature store is a central vault for storing documented, curated, and access-controlled features. In this blog post, we discuss the state-of-the-art in data management for deep learning and present the first open-source feature store...

A new open source framework for automatic differentiation with graphs
GTN is an open source framework for automatic differentiation with a powerful, expressive type of graph called weighted finite-state transducers (WFSTs). Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs. AI researchers and engineers can use GTN to more effectively train graph-based machine learning models...

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches...

Knowledge Distillation: Can a neural network train other networks?
Currently, there are three main methods out there to compress a neural network while preserving the predictive performance: a) weight pruning, b) quantization, and knowledge distillation...In this post, my goal is to introduce you to the fundamentals of knowledge distillation, which is an incredibly exciting idea, building on training a smaller network to approximate the large one...

Machine Learning Software is Now Doing the Task of Counting Craters On Mars
Mars is the subject of intense scientific scrutiny. Telescopes, rovers, and orbiters are all working to unlock the planet’s secrets. There are a thousand questions concerning Mars, and one part of understanding the complex planet is understanding the frequency of meteorite strikes on its surface...On some date between March 2010 and May 2012, a meteor slammed into Mars’ thin atmosphere. It broke into several pieces before it struck the surface, creating what looks like nothing more than a black speck in CTX camera images of the area. The new AI tool, called an automated fresh impact crater classifier, found it. Once it did, NASA used HiRISE to confirm it...That was the classifier’s first find, and in the future, NASA expects AI tools to do more of this kind of work, freeing human minds up for more demanding thinking...

Random Forest on GPUs: 2000x Faster than Apache Spark
In this article, we explore implementations of distributed random forest training on clusters of CPU machines using Apache Spark and compare that to the performance of training on clusters of GPU machines using RAPIDS and Dask. While GPU computing in the ML world has traditionally been reserved for deep learning applications, RAPIDS is a library that executes data processing and non-deep learning ML workloads on GPUs, leading to immense performance speedups when compared to executing on CPUs...

Research highlights: Robustness of Bayesian Neural Networks to Gradient-Based Attacks
Deep learning’s vulnerability to adversarial attacks has become a hot topic in machine learning research. While so far there has (thankfully) been few examples of true adversarial attacks impacting deployed models ‘in the wild’, the increasing reliance on ML models for practical applications means that this is an important weakness to protect against. In fact, there doesn’t even need to be an ‘adversary’ with bad intentions present for this to be something worth considering. A few innocent but unusual examples can be enough to cause catastrophic failure in a deployed model, as neural networks can be sensitive to data points which are far in distribution from their training dataset, leading to predictions which are both over-confident and wrong...

Fake News Detection From Ideation to Deployment: Setting Up Your Machine Learning Project
In this first of a series of posts, we will be describing how to build a machine learning-based fake news detector from scratch. That means we will literally construct a system that learns how to discern reality from lies, using nothing but raw data. And our project will take us all the way from initial ideation to deployed solution...

The Art of Learning by Example
We are entering an age where machine models can use a real-time camera to determine if a bike lane is blocked by a truck, extract sidewalk data from the mass of inputs delivered by aerial images, and count users of an intersection over time without a person expending any time at all...This article will delve into...three current applications of computer vision that can most facilitate better planning and design: digitizing the built environment at scale, reimagining urban observation, and empirically evaluating the impacts of changes to the built environment...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Associated Press (AP) - New York, NY

The Associated Press is the essential global news network, delivering fast, unbiased news from every corner of the world to all media platforms and formats. Founded in 1846, AP today is the largest and most trusted source of independent news and information. On any given day more than half the world's population sees news from AP.
The Associated Press seeks a Data Science Manager based in New York, NY. The Data Science Manager will help manage data analysis, data science and data engineering solutions supporting business intelligence, news search, content enrichment and metadata services. We are a small focused team within Metadata Technology working closely with various departments and functions across the organization to design and build solutions with data, analytics and machine learning methods...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Understanding the Role of Individual Units in a Deep Neural Network [PDF]
Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks...

Imaginaire
Imaginaire is a pytorch library that contains optimized implementation of several image and video synthesis methods developed at NVIDIA...

Yann LeCun’s Deep Learning Course at CDS is Now Fully Online & Accessible to All
CDS is excited to announce the release of all materials for Yann LeCun’s Deep Learning, DS-GA 1008, co-taught in Spring 2020 with Alfredo Canziani. This unique course material consists of a mix of close captioned lecture videos, detailed written overviews, and executable Jupyter Notebooks with PyTorch implementations. The course covers the latest techniques in both deep learning and representation learning, focusing on supervised/self-supervised learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍