Data Science Weekly Newsletter - Issue 282

Issue #282

Apr 18 2019

Editor Picks
  • What Machine Learning needs from Hardware
    The single most important message I want to get across is that there are a lot of new applications that are blocked from launching because we don’t have enough computing power. Many other existing products would be improved if we could run models that require more computing power than is available. It doesn’t matter whether you’re on the cloud, a mobile device, or even embedded, teams have models they’d like to run that they can’t...
  • Why software projects take longer than you think – a statistical model
    Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact...
  • One Model to Rule Them All
    The machine learning community focuses too much on predictive performance. But machine learning models are always a small part of a complex system. This post discusses our obsession with finding the best model and emphasizes what we should do instead: Take a step back and see the bigger picture in which the machine learning model is embedded...

A Message from this week's Sponsor:


Find A Data Science Job Through Vettery

Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies growing their data science teams.

Get started.

Data Science Articles & Videos

  • Solving Problems With Data Science & Uber
    In today's episode we have the grandfather of data science at Uber, Bradley Voytek, to discuss his work with one of the most famous and successful startups out there and his current work at UCSD...
  • New Google Brain Optimizer Reduces BERT Pre-Training Time
    Google Brain researchers have proposed LAMB (Layer-wise Adaptive Moments optimizer for Batch training), a new optimizer which reduces training time for its NLP training model BERT (Bidirectional Encoder Representations from Transformers) from three days to just 76 minutes....
  • The Unreasonable Effectiveness of Mathematics in the Natural Sciences
    The first point is that the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it. Second, it is just this uncanny usefulness of mathematical concepts that raises the question of the uniqueness of our physical theories...
  • An Environment for Machine Learning of Higher-Order Theorem Proving
    We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light theorem prover that can be used as a reinforcement learning environment....
  • End-to-End Robotic Reinforcement Learning without Reward Engineering
    We propose a method for end-to-end learning of robotic skills in the real world using deep reinforcement learning. We learn these policies directly on pixel observations, and we do so without any hand-engineered or task-specific reward functions, and instead learn the rewards for such tasks from a small number of user-provided goal examples (around 80), followed by a modest number of active queries (around 25-75)...
  • Data as Prior/Innate knowledge for Deep Learning models
    Despite the general agreement that innate structure and learned knowledge need to be combined, there is no simple approach to incorporate this innate structure into learning systems. As a scientific community, we are just starting to research approaches to incorporate prior into deep learning models...



The premier machine learning conference series is back!

Predictive Analytics World (PAW) brings together five co-located industry-specific events in Las Vegas: PAW Business, PAW Financial, PAW Industry 4.0, PAW Healthcare and Deep Learning World, gathering the top practitioners and the leading experts in data science and machine learning. By design, this mega-conference is where to meet the who's who and keep up on the latest techniques, making it the leading machine learning event. On stage: Google, Apple, Uber, Facebook, LinkedIn, Twitter and more...


Want to post here? Email us for details >>


  • Data Scientist – Personalization - Spotify - NYC

    We are seeking a Data Scientist to join our Product Insights team, focusing on Personalization. We have several roles open across different levels of seniority. You will be solving complex data problems and delivering the insight that helps to define our understanding of music, audio and our listeners to develop how we personalize at Spotify. This role works closely with a multidisciplinary team of data scientist, user researchers, data engineers, product teams and designers. Your work will impact the way the world experiences music and audio!...

        Want to post a job here? Email us for details >>


Training & Resources

  • Tutorials on getting started with PyTorch and TorchText for sentiment analysis
    This repo contains tutorials covering how to do sentiment analysis using PyTorch 1.0 and TorchText 0.3 using Python 3.7. The first 2 tutorials will cover getting started with the de facto approach to sentiment analysis: recurrent neural networks (RNNs). The third notebook covers the FastText model and the final covers a convolutional neural network (CNN) model...
  • A Repository of Conversational Datasets
    Progress in Machine Learning is often driven by large datasets and consistent evaluation metrics. To this end, PolyAI is releasing a collection of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation framework for models of conversational response selection...



  • Reproducible Research with R and R Studio

    "a very practical book that teaches good practice in organizing reproducible data analysis and comes with a series of examples..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.