Data Science Weekly Newsletter

Issue

291

June 20, 2019

‍

A Message From This Week's Sponsor

‍

Python & R + SQL. Powerful & Shareable

Mode Studio combines a SQL editor, native Python & R notebooks, and visualization builder in one platform. Connect data from anywhere and analyze with your preferred language. Layer custom visualizations with HTML, CSS, and JS or use out-of-the-box charts.
Sign Up - Free Forever.

‍

Data Science Articles & Videos

‍

Adobe’s new AI tool can spot when a face has been Photoshopped
It was nearly twice as good at identifying manipulated images as humans...

Language as an Abstraction for Hierarchical Deep Reinforcement Learning
We propose to use language as the abstraction between the high-level and low-level policy, and demonstrate that the resulting policy can successfully solve long-horizon tasks with sparse reward and can generalize well using the compositionality of language even in challenging high-dimensional observation and action spaces. First, we demonstrate the benefit of our method in a low-dimensional observation space through various ablation and comparison against different HRL methods, and then scale our methods to challenging pixel observation space where the baselines cannot make progress...

Bill Gates just backed a chip startup that uses light to turbocharge AI
Luminous Computing has developed an optical microchip that runs AI models much faster than other semiconductors while using less power...

Your next doctor’s appointment might be with an AI
A new wave of chatbots are replacing physicians and providing frontline medical advice—but are they as good as the real thing?...

XLNet: Generalized Autoregressive Pretraining for Language Understanding
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation...

Data-Free Quantization through Weight Equalization and Bias Correction
We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks...

Off-Policy Classification -
A New Reinforcement Learning Model Selection Method
We propose a new off-policy evaluation method, called off-policy classification (OPC), that evaluates the performance of agents from past data by treating evaluation as a classification problem, in which actions are labeled as either potentially leading to success or guaranteed to result in failure. Our method works for image (camera) inputs, and doesn’t require reweighting data with importance sampling or using accurate models of the target environment, two approaches commonly used in prior work. We show that OPC scales to larger tasks, including a vision-based robotic grasping task in the real world....

The fourth Industrial revolution emerges from AI and the Internet of Things
IoT has arrived on the factory floor with the force of Kool-Aid Man exploding through walls...

Open-sourcing PyRobot to accelerate AI robotics research
PyRobot is a framework and ecosystem that enables AI researchers and students to get up and running with a robot in just a few hours, without specialized knowledge of the hardware or of details such as device drivers, control, and planning. PyRobot will help Facebook AI advance our long-term robotics research, which aims to develop embodied AI systems that can learn efficiently by interacting with the physical world. We are now open-sourcing PyRobot to help others in the AI and robotics community as well...

‍

Webinar

‍

Data-driven to Model-Driven:
The Strategic Shift Being Made by Leading Organizations

Join us on July 10th at 10 a.m. PT for a webinar featuring Forrester Research Senior Analyst Kjell Carlsson and Domino Chief Data Scientist Josh Poduska, as they demystify the model-driven business.
Register now to attend live, or receive the recording following the webinar.

Want to post here? Email us for details >> team@datascienceweekly.org

‍

Jobs

‍

Data Scientist, Analytics - DoorDash - NYC

The Analytics team is looking for Data Scientists to drive measurement, strategy, and tactical decision-making across the company. You’ll be solving problems that range from customer acquisition to balancing supply and demand to new city launches to marketplace efficiency. You’ll be designing and analyzing A/B tests for new product features, generating and sizing opportunities to prioritize new initiatives, and defining key performance metrics for the team. Data Scientists at DoorDash work cross-functionally to uncover insights and turn them into actionable recommendations...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Transpose A Matrix In PyTorch
Learn how to use transpose a matrix in PyTorch by using the PyTorch T operation, via a screencast video and full tutorial transcript...

Automate executing AWS Athena queries and moving the results around S3 with Airflow: a walk-through
If you happen to store structured data on AWS S3, chances are you already use AWS Athena. It is a hosted version of Facebook’s PrestoDB and provides a way of querying structured data (stored in say .csv or .json files) using standard ANSI SQL. Given its competitive pricing structure (5 USD for 1 TB of scanned data), it currently seems to be the best tool for digging through data saved in “cold storage” in S3 (as opposed to “hot storage” in for instance Amazon Redshift or some other analytical or transactional database)...

ICML 2019 Tutorial:
Recent Advances in Population-Based Search for Deep Neural Networks
Quality Diversity, Indirect Encodings, and Open-Ended Algorithms. We will cover new, exciting, unconventional techniques for improving population-based search. These ideas are already enabling us to solve hard problems. They also hold great promise for further advancing machine learning, including deep neural networks...

‍

Books

‍

Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin
"Guesstimation enables anyone with basic math and science skills to estimate virtually anything--quickly--using plausible assumptions and elementary arithmetic"...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian