Data Science Weekly Newsletter

Issue

353

August 27, 2020

‍

Editor's Picks

‍

How to hire smarter than the market: a toy model
If you ask the average recruiter how they find people, it’s usually some type of Boolean search on LinkedIn, and when you ask them how they grade resumes, it’s typically some combination of having CS degrees from fancy schools, having the exact experience with the tech stack you have (down to frameworks), etc. God forbid if someone has a gap on their resume, or if they need visa sponsorship...What my model implies, is that there’s an “arbitrage opportunity” here. In fact, it’s a bit of a silver lining to the fact that the market is biased...

The Uncanny Valley of ML [Video] JuneAndrews, Stitch Fix: Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies...When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we’ll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls...

Why TinyML is a giant opportunity
TinyML broadly encapsulates the field of machine learning technologies capable of performing on-device analytics of sensor data at extremely low power. Between hardware advancements and the TinyML community’s recent innovations in machine learning, it is now possible to run increasingly complex deep learning models directly on microcontrollers...In other words, those 250 billion microcontrollers in our printers, TVs, cars, and pacemakers can now perform tasks that previously only our computers and smartphones could handle...

‍

A Message From This Week's Sponsor

‍

Watch Now: Hands-On Tutorial for Generative Adversarial Networks (GANs)

Are you interested in generating data to improve model accuracy? Are you concerned about training instability or failure to converge issues? Watch this recorded webinar about GANs, presented by the Principal Data Scientist for EMEA at Domino Data Lab.
This webinar covers the GAN framework, how to implement a basic GAN model, how adversarial networks are used to generate training samples, training difficulties, and recent research to improve upon GANs' training, including Wasserstein GAN (WGAN).
Watch Now.

‍

Data Science Articles & Videos

‍

Are your coding skills good enough for a Data Science job?
Coding in the data science industry is very different from software development. Your coding skills are not simply restricted to your technical know-how but require a good amount of data and business understanding. Today I am going to talk about “Consistency” and how to bring that in your coding practices...

An algorithm that learns through rewards may show how our brain does too
By optimizing reinforcement-learning algorithms, DeepMind uncovered new details about how dopamine helps the brain learn...

Autonomous Dog Training with Companion
In this post, I will share how we developed systems to understand dog behavior and influence it through training, and our journey to miniaturizing our computing to fit in a B2B dog training product...In the past two years, our small dedicated team of animal behaviorists and engineers has been working on a new device capable of engaging dogs and working on training autonomously, when people can't be with them. Our first product, the CompanionPro, interacts with dogs through the use of lights, sounds, and a treat dispenser to work on basic obedience behaviors like sit, down, stay, and recall...

VizSeq: A visual analysis toolkit for accelerating text generation research
A Python toolkit that simplifies visual analysis on a wide variety of text generation tasks. The output of models for tasks such as machine translation, image captioning, and speech recognition are blocks of text, which aren’t easy to inspect with the naked eye. Existing automatic evaluation tools generally rely on task-specific metrics, such as BLEU (bilingual evaluation understudy, a common machine translation metric). But these abstract numbers don’t always match up with a human assessment. VizSeq provides a unified, scalable solution. With a user-friendly interface and the latest NLP advancements, VizSeq improves productivity via visualization in Jupyter Notebook and a web app. Moreover, it provides a collection of multiprocess scorers for fast evaluation on large data sets...

COTA: Improving Uber Customer Care with NLP & Machine Learning
Uber’s Customer Obsession team leverages five different customer-agent communication channels powered by an in-house platform that integrates customer support ticket context for easy issue resolution. With hundreds of thousands of tickets surfacing daily on the platform across 400+ cities worldwide, this team must ensure that agents are empowered to resolve them as accurately and quickly as possible...Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing (NLP) techniques to help agents deliver better customer support...In this article, we discuss our motivations behind creating COTA, outline its backend architecture, and showcase how the powerful tool has led to increased customer satisfaction...

3D People Dataset
First dataset of dressed humans with specific geometry representation for the clothes. It contains ~2 Million images with 40 male/40 female performing 70 actions. Every subject-action sequence is captured from 4 camera views and annotated with: RGB, 3D skeleton, body part and cloth segmentation masks, depth map, optical flow, and camera parameters....

For Fluid Equations, a Steady Flow of Progress
A startling experimental discovery about how fluids behave started a wave of important mathematical proofs....The discoveries center on the Euler equations, posed by Leonhard Euler in 1757. Mathematicians and physicists have used them to model how fluids evolve over time. If you toss a rock into a still pond, how will the water be moving five seconds later? The Euler equations can tell you...

DDSP: Differentiable Digital Signal Processing [TensorFlow]
Today, we’re pleased to introduce the Differentiable Digital Signal Processing (DDSP) library. DDSP lets you combine the interpretable structure of classical DSP elements (such as filters, oscillators, reverberation, etc.) with the expressivity of deep learning...

AlphaFold: Using AI for scientific discovery
In our study published today in Nature, we [DeepMind] demonstrate how artificial intelligence research can drive and accelerate new scientific discoveries. We’ve built a dedicated, interdisciplinary team in hopes of using AI to push basic research forward: bringing together experts from the fields of structural biology, physics, and machine learning to apply cutting-edge techniques to predict the 3D structure of a protein based solely on its genetic sequence...

‍

Training

‍

Launch your new career in data science today!

The Data Science Career Track is a 6-month, self-paced online course that will pair you with your own industry expert mentor as you learn skills like data wrangling and data storytelling, and build your unique portfolio to stand out in the job market. Land your dream job as data scientist within six months of graduating or the course is free.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Senior Data Scientist - TRANZACT - NJ or Raleigh, NC

Tranzact is a fast paced, entrepreneurial company offering a well-rounded suite of marketing solutions to help insurance companies stay ahead of the competition. The Senior Data Scientist will be solving the toughest problems at Tranzact by using data. More specifically, responsible for gathering data, conducting analysis, building predictive algorithms and communicating findings to drive profitable growth and performance across Tranzact. Must have a strong grasp on the data structure, business needs, and statistical and predictive modeling. Minimum 7 years of experience building predictive algorithms...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Introduction to R
This repository contains the documentation, background material, scripts, and data for a tutorial on the Introduction to R. This introductory workshop is greared towards people with a interest/ background in geography and environmental science...This tutorial requires two things to be installed: The language R and the IDE Rstudio...

Optimizing sample sizes in A/B testing, Part I: General summary
If you’re a data scientist, you’ve surely encountered the question, “How big should this A/B test be?”...The standard answer is to do a power analysis...In most business decisions, you want to choose a policy that maximizes your benefits minus your costs...In this three-part blog post, I’ll present a new way of determining optimal sample sizes that completely abandons the notion of statistical significance...

The Case for Bayesian Deep Learning
I posted a response to recent misunderstandings around Bayesian deep learning. I have since been urged to collect and develop my remarks into an accessible and self-contained reference. For this purpose, I have written the note posted here. I hope this exposition will be helpful to those seeking to understand what makes Bayesian inference distinctive, and why Bayesian inference is worthwhile in deep learning. This note is also intended to help clarify the poorly understood connections between approximate Bayesian inference and deep ensembles, in light of the recent misunderstanding that deep ensembles and Bayesian methods are competing approaches...

‍

Books

‍

Data Science in Production: Building Scalable Model Pipelines with Python
This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍