Data Science Weekly Newsletter - Issue 376

Issue #344

Jun 25 2020

Editor Picks
  • Who Is Responsible When Autonomous Systems Fail
    As more autonomous and artificial intelligence (AI) systems operate in our world, the need to address issues of responsibility and accountability has become clear. However, if the outcome of the Uber self-driving accident is a harbinger of what lies ahead, there is cause for concern. Is it an appropriate allocation of responsibility for Rafaela Vasquez alone — and neither Uber, the actor who developed and deployed the technology, nor the state of Arizona, which allowed the testing to be conducted in the first place — to be held accountable?...
  • A strange earthquake swarm lasted for years. Scientists finally know why.
    Early in 2016, without any fanfare, a swarm of earthquakes silently revved up in Southern California...Over the past four years there have been more than 22,000 temblors. Yet the source behind all this activity has been a mystery...Now, in one of the highest resolution looks at a seismic swarm yet, scientists have zeroed in on a likely cause...The analysis used a computer algorithm to tease out the locations and timing of the tiny temblors, creating a stunningly detailed portrait of the swarm activity as it unfolded along a spidery network of fractures. This intimate picture of the swarm’s progression suggests that the cluster of quakes was triggered by fluids being naturally injected into the fault system. The work hints that fluids may play a role in other swarms detected around the world—and the method used could prove useful for improving global seismic analysis...
  • What I learned from looking at 200 machine learning tools
    To better understand the landscape of available tools for machine learning production, I decided to look up every AI/ML tool I could find...After filtering out applications companies, tools that aren’t being actively developed, and tools that nobody uses, I got 202 tools...This post consists of 6 parts: I. Overview, II. The landscape over time, III. The landscape is under-developed, IV. Problems facing MLOps, V. Open source and open-core, VI. Conclusion...

A Message from this week's Sponsor:


Take the new Developer Economics survey

In 2019, Python was used by 8.4M developers working in data science. What will change in 2020 and beyond? We want to know! Take this survey and share your views about the most important tools, platforms, and resources. You may win one out of $15,000 worth of prizes! Open until August 10th. Start now!


Data Science Articles & Videos

  • Is depth useful for self-attention? - A theoretical perspective
    This work is the first to explain a surprising empirical phenomenon: in BERT/Transformer-like architectures, deepening doesn't seem to be better than widening (increasing representation dimension), in contrast to the fundamental *deep* learning premise. This is supported by extensive ablations...
  • NLP & my Grandfather’s Letters During WWII
    Years ago my aunt had typed up 300+ letters my grandfather, Silvio Tontar, wrote to my grandmother, Annette, during WWII...This is not exactly a research question, but I suppose I wanted to know how he felt during his time overseas. I wanted to see if I could use the text data to empathize with his situation, even a just little bit. When was he happy, if at all? When was he upset? How did his language change throughout time? My goal was to use data science to answer these questions more easily, as opposed to re-reading the letters over and over again...
  • Natural Language Processing Advancements By Deep Learning: A Survey
    Natural Language Processing (NLP) helps empower intelligent machines by enhancing a better understanding of the human language for linguistic-based human-computer communication...This survey categorizes and addresses the different aspects and applications of NLP that have benefited from deep learning. It covers core NLP tasks and applications and describes how deep learning methods and models advance these areas. We further analyze and compare different approaches and state-of-the-art models...
  • Discovering Symbolic Models from Deep Learning with Inductive Biases
    We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures...
  • Evaluation of ML Model to Predict Outcomes of Antidepressant Treatment
    Can machine learning models predict improvement of various depressive symptoms with antidepressant treatment based on pretreatment symptom scores and electroencephalographic measures?...In this prognostic study, using the machine learning approach of gradient-boosted decision trees, the ElecTreeScore algorithm could reliably distinguish the patients who responded to treatment from those who did not based on various depressive symptoms using pretreatment symptom scores and electroencephalographic features (using the cross-validation approach on 518 patients)...
  • Democratizing Kaplan-Meier
    If you want to understand customer retention, you might ask ‘of those who have made one purchase from us, how many have made two’? Unfortunately, this has a lot of potential to be misleading because not all of your customers have the same ‘exposure’...this problem can be solved with Kaplan Meier analysis....We’ve used Looker on PostgreSQL to allow anyone in the company to perform Kaplan-Meier analysis on our retention data to drive insights and make decisions...



Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Data Scientist - Amazon Demand Forecasting - New York

    The Amazon Demand Forecasting team seeks a Data Scientist with strong analytical and communication skills to join our team. We develop sophisticated algorithms that involve learning from large amounts of data, such as prices, promotions, similar products, and a product's attributes, in order to forecast the demand of over 190 million products world-wide. These forecasts are used to automatically order more than $200 million worth of inventory weekly, establish labor plans for tens of thousands of employees, and predict the company's financial performance. The work is complex and important to Amazon. With better forecasts we drive down supply chain costs, enabling the offer of lower prices and better in-stock selection for our customers...

        Want to post a job here? Email us for details >>


Training & Resources

  • Logistic Regression from scratch
    In this article we'll take a deep dive into the Logistic Regression model to learn how it differs from other regression models such as Linear- or Multiple Linear Regression, how to think about it from an intuitive perspective and how we can translate our learnings into code while implementing it from scratch...
  • Structured data classification from scratch
    This example demonstrates how to do structured data classification, starting from a raw CSV file. Our data includes both numerical and categorical features. We will use Keras preprocessing layers to normalize the numerical features and vectorize the categorical ones...




  • Seven Databases in Seven Weeks:
    A Guide to Modern Databases and the NoSQL Movement

    "A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.