Data Science Weekly Newsletter

Issue

332

April 2, 2020

‍

Editor's Picks

‍

Agent57: Outperforming the human Atari benchmark
The Atari57 suite of games is a long-standing benchmark to gauge agent performance across a wide range of tasks. We’ve [Deepmind] developed Agent57, the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games. Agent57 combines an algorithm for efficient exploration with a meta-controller that adapts the exploration and long vs. short-term behaviour of the agent....

How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’
How should non-epidemiologists publicly discuss COVID-19 data and models? When leaders and citizens are especially sensitive to signals on public health, what is our intellectual responsibility to defer to the analysis of more expert speakers? I argue that our responsibility during crisis is the same as it was before; to do good work, to the best of our abilities, with the scientific principles of curiosity and honesty...To demonstrate the point, this note provides a detailed review of a recent piece “Coronavirus Perspective” (Epstein 2020a). By applying and illustrating data science principles point for point to this non-epidemiological take on epidemiological questions, it is hoped that the reader will take away not why they should avoid working on new topics but rather how they should approach those topics in an honest, curious, and rigorous way...

Open Access to ACM Digital Library During Coronavirus Pandemic
As the coronavirus/COVID-19 pandemic continues, we at ACM would like to do what we can to help support the computing community. Many computing researchers and practitioners are now working remotely. In addition, teaching and learning have also moved online as more and more campuses close...We believe that ACM can help support research, discovery and learning during this time of crisis by opening the ACM Digital Library to all. For the next three months, there will be no fees assessed for accessing or downloading work published by ACM. ...

‍

A Message From This Week's Sponsor

‍

Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.
Download a Program Brochure

‍

Data Science Articles & Videos

‍

When to assume neural networks can solve a problem
The question: “What are the problems we should assume can be solved with machine learning?”, or even narrower and more focused on current developments “What are the problems we should assume a neural network is able to solve?”, is one I haven’t seen addressed much...so when someone asks me this question about a specific problem, I can often give a fairly reasonable confidence answer provided I can take a look at the data...Thus, I thought it might be helpful to lay down the heuristic that generates such answers. I by no means claim these are precise or evidence-based in the scientific sense, but I think they might be helpful, maybe even a good start point for further discussion on the subject.

An Illustrated Guide to Graph Neural Networks - A breakdown of the inner workings of GNNs…
Graph Deep Learning (GDL) is an up-and-coming area of study. It’s super useful when learning over and analysing graph data. Here, I’ll cover the basics of a simple Graph Neural Network (GNN) and the intuition behind its inner workings. Don’t worry, there are tons of colourful diagrams for you to visualise what’s happening!...

Lessons Learned from my Failures as a Grad Student Focused on AI [Reddit Discussion]
Hey ML subreddit. I posted on here a little while back with my blog post about lessons learned from failures after 3 years of grad school, and people seemed to like it. So, just posting a link to a video version with most of the same content but more graphics / examples....

Machine translation of cortical activity to text with an encoder–decoder framework
A decade after speech was first decoded from human brain signals, accuracy and speed remain far below that of natural speech. Here we show how to decode the electrocorticogram with high accuracy and at natural-speech rates. Taking a cue from recent advances in machine translation, we train a recurrent neural network to encode each sentence-length sequence of neural activity into an abstract representation, and then to decode this representation, word by word, into an English sentence....

AI Ethics Newsletter #3 : Privacy face masks, ML Fairness, Deepfakes, Franken-algorithms and more ...
Welcome to the third edition of our weekly newsletter that will help you navigate the fast changing world of AI Ethics! Every week we dive into research papers that caught our eye, sharing a summary of those with you and presenting our thoughts on how it links with other work in the research landscape. We also share brief thoughts on interesting articles and developments in the field....

My First Year as a Data Scientist
I was very fortunate to be given the opportunity by my employer to change role from a full-stack software developer to data scientist with the in-house Data Team. I've now spent over a year in this new data science role and thoroughly enjoy the different challenges it brings over software development. I wanted to write this blog post to firstly help me reflect on what I've been doing over the past year: what changes I had to undergo, what I did well, what I could improve on. And, secondly to help others in a similar situation to me - either thinking about transitioning or already transitioning by writing about some of the pitfalls and challenges I've had....

Modeling 3D Shapes by Reinforcement Learning [Video & Paper]
We explore how to enable machines to model 3D shapes like human modelers using reinforcement learning (RL). In 3D modeling software like Maya, a modeler usually creates a mesh model in two steps: (1) approximating the shape using a set of primitives; (2) editing the meshes of the primitives to create detailed geometry. Inspired by such artist-based modeling, we propose a two-step neural framework based on RL to learn 3D modeling policies....

Physics Informed Deep Learning - Data-driven solutions and discovery of Nonlinear Partial Differential Equations
We introduce physics informed neural networks – neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. We present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations....

Build an app to generate photorealistic faces using TensorFlow and Streamlit
This overview will walk you through creating a Streamlit app [Streamlit - a Python framework that makes writing apps as easy as writing Python scripts] to play with one of the hairiest and black-box-iest models out there: a deep Generative Adversarial Network (GAN). In this case, we’ll visualize Nvidia’s PG-GAN [1] using TensorFlow to synthesize photorealistic human faces from thin air....

‍

Webinar

‍

Webinar: Why Understanding CVEs Is Critical for Data Scientists

CVEs are Common Vulnerabilities and Exposures found in software components. Because modern software is complex, vulnerabilities tend to emerge over time. A golden rule in security is, wherever valuable data can be found, hackers will go.
Ignoring a high CVE score can result in security breaches and unstable applications. Read the blog “Why Understanding CVEs Is Critical for Data Scientists”
and register for our upcoming webinar “What are CVEs and Why Do They Matter?”
to learn what you should look out for when using open-source software for your data science projects.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Head of Data Science - Tessian - London, United Kingdom

Our mission is to secure the Human Layer. This involves deploying near real-time machine learning models at massive scale to some of the world’s largest organisations to keep their most sensitive data private and secure. To do this, we're looking for an inspiring Head of Data Science ready to lead and grow our Data Science team, who is excited about the opportunities and challenges that come with building and deploying real-time production models.
Find out more about life as a Tessian Engineer...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Fast and Reproducible Deep Learning [Video / Slides from Chicago ML Meetup]
There are endless resources for someone who wants to learn to train a deep learning model, but running a successful deep learning project requires managing many additional moving parts that are much less discussed...This talk provides best practices for accomplishing these tasks efficiently and reproducibly. Tools that are covered include the Creevey library for processing large collections of files; pip-tools and nvidia-docker for managing dependencies; and MLflow Tracking for tracking experiments....

Autoencoders for Content-based Image Retrieval with Keras and TensorFlow
In this tutorial, you will learn how to use convolutional autoencoders to create a Content-based Image Retrieval system (i.e., image search engine) using Keras and TensorFlow...

Getting started with TinyML [Video]
If you're interested in running machine learning on embedded devices but aren't sure how to get started, Pete Warden from Google's TensorFlow Micro team will run through how to build and run your own TinyML applications. This will include an overview of the different boards that are available, the software frameworks, and tutorials to get you up and running....

‍

Books

‍

Data Science in Production: Building Scalable Model Pipelines with Python
This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍