Data Science Weekly Newsletter - Issue 364

Issue #332

Apr 2 2020

Editor Picks
 
  • Agent57: Outperforming the human Atari benchmark
    The Atari57 suite of games is a long-standing benchmark to gauge agent performance across a wide range of tasks. We’ve [Deepmind] developed Agent57, the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games. Agent57 combines an algorithm for efficient exploration with a meta-controller that adapts the exploration and long vs. short-term behaviour of the agent....
  • How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’
    How should non-epidemiologists publicly discuss COVID-19 data and models? When leaders and citizens are especially sensitive to signals on public health, what is our intellectual responsibility to defer to the analysis of more expert speakers? I argue that our responsibility during crisis is the same as it was before; to do good work, to the best of our abilities, with the scientific principles of curiosity and honesty...To demonstrate the point, this note provides a detailed review of a recent piece “Coronavirus Perspective” (Epstein 2020a). By applying and illustrating data science principles point for point to this non-epidemiological take on epidemiological questions, it is hoped that the reader will take away not why they should avoid working on new topics but rather how they should approach those topics in an honest, curious, and rigorous way...
  • Open Access to ACM Digital Library During Coronavirus Pandemic
    As the coronavirus/COVID-19 pandemic continues, we at ACM would like to do what we can to help support the computing community. Many computing researchers and practitioners are now working remotely. In addition, teaching and learning have also moved online as more and more campuses close...We believe that ACM can help support research, discovery and learning during this time of crisis by opening the ACM Digital Library to all. For the next three months, there will be no fees assessed for accessing or downloading work published by ACM. ...
 
 

A Message from this week's Sponsor:

 

 
Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.

Download a Program Brochure
 

 

Data Science Articles & Videos

 
  • When to assume neural networks can solve a problem
    The question: “What are the problems we should assume can be solved with machine learning?”, or even narrower and more focused on current developments “What are the problems we should assume a neural network is able to solve?”, is one I haven’t seen addressed much...so when someone asks me this question about a specific problem, I can often give a fairly reasonable confidence answer provided I can take a look at the data...Thus, I thought it might be helpful to lay down the heuristic that generates such answers. I by no means claim these are precise or evidence-based in the scientific sense, but I think they might be helpful, maybe even a good start point for further discussion on the subject.
  • Machine translation of cortical activity to text with an encoder–decoder framework
    A decade after speech was first decoded from human brain signals, accuracy and speed remain far below that of natural speech. Here we show how to decode the electrocorticogram with high accuracy and at natural-speech rates. Taking a cue from recent advances in machine translation, we train a recurrent neural network to encode each sentence-length sequence of neural activity into an abstract representation, and then to decode this representation, word by word, into an English sentence....
  • My First Year as a Data Scientist
    I was very fortunate to be given the opportunity by my employer to change role from a full-stack software developer to data scientist with the in-house Data Team. I've now spent over a year in this new data science role and thoroughly enjoy the different challenges it brings over software development. I wanted to write this blog post to firstly help me reflect on what I've been doing over the past year: what changes I had to undergo, what I did well, what I could improve on. And, secondly to help others in a similar situation to me - either thinking about transitioning or already transitioning by writing about some of the pitfalls and challenges I've had....
  • Modeling 3D Shapes by Reinforcement Learning [Video & Paper]
    We explore how to enable machines to model 3D shapes like human modelers using reinforcement learning (RL). In 3D modeling software like Maya, a modeler usually creates a mesh model in two steps: (1) approximating the shape using a set of primitives; (2) editing the meshes of the primitives to create detailed geometry. Inspired by such artist-based modeling, we propose a two-step neural framework based on RL to learn 3D modeling policies....
  • Build an app to generate photorealistic faces using TensorFlow and Streamlit
    This overview will walk you through creating a Streamlit app [Streamlit - a Python framework that makes writing apps as easy as writing Python scripts] to play with one of the hairiest and black-box-iest models out there: a deep Generative Adversarial Network (GAN). In this case, we’ll visualize Nvidia’s PG-GAN [1] using TensorFlow to synthesize photorealistic human faces from thin air....
 
 

Webinar*

 

 
Webinar: Why Understanding CVEs Is Critical for Data Scientists

CVEs are Common Vulnerabilities and Exposures found in software components. Because modern software is complex, vulnerabilities tend to emerge over time. A golden rule in security is, wherever valuable data can be found, hackers will go.

Ignoring a high CVE score can result in security breaches and unstable applications. Read the blog “Why Understanding CVEs Is Critical for Data Scientists” and register for our upcoming webinar “What are CVEs and Why Do They Matter?” to learn what you should look out for when using open-source software for your data science projects.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
 

 

Jobs

 
  • Head of Data Science - Tessian - London, United Kingdom

    Our mission is to secure the Human Layer. This involves deploying near real-time machine learning models at massive scale to some of the world’s largest organisations to keep their most sensitive data private and secure. To do this, we're looking for an inspiring Head of Data Science ready to lead and grow our Data Science team, who is excited about the opportunities and challenges that come with building and deploying real-time production models.

    Find out more about life as a Tessian Engineer...

        Want to post a job here? Email us for details >> team@datascienceweekly.org
 

 

Training & Resources

 
  • Fast and Reproducible Deep Learning [Video / Slides from Chicago ML Meetup]
    There are endless resources for someone who wants to learn to train a deep learning model, but running a successful deep learning project requires managing many additional moving parts that are much less discussed...This talk provides best practices for accomplishing these tasks efficiently and reproducibly. Tools that are covered include the Creevey library for processing large collections of files; pip-tools and nvidia-docker for managing dependencies; and MLflow Tracking for tracking experiments....
  • Getting started with TinyML [Video]
    If you're interested in running machine learning on embedded devices but aren't sure how to get started, Pete Warden from Google's TensorFlow Micro team will run through how to build and run your own TinyML applications. This will include an overview of the different boards that are available, the software frameworks, and tutorials to get you up and running....

 

Books

 

  • Data Science in Production: Building Scalable Model Pipelines with Python

    This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.