Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
June 20, 2019

Editor's Picks

A Message From This Week's Sponsor

Python & R + SQL. Powerful & Shareable

Mode Studio combines a SQL editor, native Python & R notebooks, and visualization builder in one platform. Connect data from anywhere and analyze with your preferred language. Layer custom visualizations with HTML, CSS, and JS or use out-of-the-box charts.
Sign Up - Free Forever.

Data Science Articles & Videos

  • Language as an Abstraction for Hierarchical Deep Reinforcement Learning
    We propose to use language as the abstraction between the high-level and low-level policy, and demonstrate that the resulting policy can successfully solve long-horizon tasks with sparse reward and can generalize well using the compositionality of language even in challenging high-dimensional observation and action spaces. First, we demonstrate the benefit of our method in a low-dimensional observation space through various ablation and comparison against different HRL methods, and then scale our methods to challenging pixel observation space where the baselines cannot make progress...
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding
    With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation...
  • Off-Policy Classification -
    A New Reinforcement Learning Model Selection Method

    We propose a new off-policy evaluation method, called off-policy classification (OPC), that evaluates the performance of agents from past data by treating evaluation as a classification problem, in which actions are labeled as either potentially leading to success or guaranteed to result in failure. Our method works for image (camera) inputs, and doesn’t require reweighting data with importance sampling or using accurate models of the target environment, two approaches commonly used in prior work. We show that OPC scales to larger tasks, including a vision-based robotic grasping task in the real world....
  • Open-sourcing PyRobot to accelerate AI robotics research
    PyRobot is a framework and ecosystem that enables AI researchers and students to get up and running with a robot in just a few hours, without specialized knowledge of the hardware or of details such as device drivers, control, and planning. PyRobot will help Facebook AI advance our long-term robotics research, which aims to develop embodied AI systems that can learn efficiently by interacting with the physical world. We are now open-sourcing PyRobot to help others in the AI and robotics community as well...


Data-driven to Model-Driven:
The Strategic Shift Being Made by Leading Organizations

Join us on July 10th at 10 a.m. PT for a webinar featuring Forrester Research Senior Analyst Kjell Carlsson and Domino Chief Data Scientist Josh Poduska, as they demystify the model-driven business.
Register now to attend live, or receive the recording following the webinar.

Want to post here? Email us for details >>


  • Data Scientist, Analytics - DoorDash - NYC

    The Analytics team is looking for Data Scientists to drive measurement, strategy, and tactical decision-making across the company. You’ll be solving problems that range from customer acquisition to balancing supply and demand to new city launches to marketplace efficiency. You’ll be designing and analyzing A/B tests for new product features, generating and sizing opportunities to prioritize new initiatives, and defining key performance metrics for the team. Data Scientists at DoorDash work cross-functionally to uncover insights and turn them into actionable recommendations...
        Want to post a job here? Email us for details >>

Training & Resources

  • Transpose A Matrix In PyTorch
    Learn how to use transpose a matrix in PyTorch by using the PyTorch T operation, via a screencast video and full tutorial transcript...
  • Automate executing AWS Athena queries and moving the results around S3 with Airflow: a walk-through
    If you happen to store structured data on AWS S3, chances are you already use AWS Athena. It is a hosted version of Facebook’s PrestoDB and provides a way of querying structured data (stored in say .csv or .json files) using standard ANSI SQL. Given its competitive pricing structure (5 USD for 1 TB of scanned data), it currently seems to be the best tool for digging through data saved in “cold storage” in S3 (as opposed to “hot storage” in for instance Amazon Redshift or some other analytical or transactional database)...


  • Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin
    "Guesstimation enables anyone with basic math and science skills to estimate virtually anything--quickly--using plausible assumptions and elementary arithmetic"...
    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page

    P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.