Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
May 21, 2020

Editor's Picks

  • Practical Skills for The AI Product Manager
    In our previous article, What You Need to Know About Product Management for AI, we discussed the need for an AI Product Manager. This role includes everything a traditional PM does, but also requires an operational understanding of machine learning software development, along with a realistic view of its capabilities and limitations...In this article, we shift our focus to the AI Product Manager’s skill set, as it is applied to day to day work in the design, development, and maintenance of AI products. To understand the skills that product managers need, we’ll start with the process of product development, then consider how this process differs in different kinds of organizations...
  • Best Practices for Technical Debt in Machine Learning
    I recently revisited the paper Hidden Technical Debt in Machine Learning Systems (Sculley et al. 2015)...This post covers some of the relevant points of the Tech Debt Paper, while also giving additional advice on top that’s not 5 years out of date. Some of this advice is in the form of tools that didn’t exist back then…and then some is in the form of tools/techniques that definitely did exist that the authors missed a huge opportunity by not bringing up...
  • China’s Approach to AI Ethics
    It is...surprising that the western misconception that China lacks debate around AI ethics seems to prevail, when in fact: i) existing Chinese AI principles largely align with global ones; ii) Chinese discussions enjoy unprecedented government support; and iii) the country is already investigating the technical and social implementation of these principles by exploring how they interact with its distinct cultural heritage...In this essay I explore AI ethics in China through the lens, healthcare, smart cities and social credit consider which, if any, ethical issues are unique to China and which should be seen as global concerns...

A Message From This Week's Sponsor

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.

Data Science Articles & Videos

  • A Visual Survey of Data Augmentation in NLP
    Unlike Computer Vision where using image data augmentation is a standard practice, augmentation of text data in NLP is pretty rare. This is because trivial operations for images like rotating an image a few degrees or converting it into grayscale doesn’t change its semantics. This presence of semantically invariant transformation is what made augmentation an essential toolkit in Computer Vision research...I was curious if there were attempts at developing augmentation techniques for NLP and explored the existing literature. In this post, I will give an overview of the current approaches being used for text data augmentation based on my findings...
  • Is the Brain a Useful Model for Artificial Intelligence?
    In the summer of 2009, the Israeli neuroscientist Henry Markram strode onto the TED stage in Oxford, England, and made an immodest proposal: Within a decade, he said, he and his colleagues would build a complete simulation of the human brain inside a supercomputer. They'd already spent years mapping the cells in the neocortex, the supposed seat of thought and perception. “It's a bit like going and cataloging a piece of the rain forest,” Markram explained. “How many trees does it have? What shapes are the trees?” Now his team would create a virtual rain forest in silicon, from which they hoped artificial intelligence would organically emerge. If all went well, he quipped, perhaps the simulated brain would give a follow-up TED talk, beamed in by hologram...
  • Grounding Language in Play - A scalable approach for controlling robots with natural language
    Natural language is perhaps the most versatile and intuitive way for humans to communicate tasks to a robot. Prior work on Learning from Play provides a simple approach for learning a wide variety of robotic behaviors from general sensors. However, each task must be specified with a goal image—something that is not practical in open-world environments. In this work we present a simple and scalable way to condition policies on human language instead...
  • Risks in Building Machine Learning Systems
    Building ML systems can be complicated and challenging...especially since best practices in the nascent field of AI engineering are still coalescing. Consequently, a surprising fraction of ML projects fail or underwhelm. Behind the hype, there are three essential risks to analyze when building an ML system: 1) poor problem solution alignment, 2) excessive time or monetary cost, and 3) unexpected behavior once deployed. In this post I'll discuss each risk and provide a way of thinking about risk analysis in ML systems...
  • Differentiable Reasoning over Text
    We all rely on search engines to navigate the massive amount of online information published every day. Modern search engines not only retrieve a list of pages relevant to a query but often try to directly answer our questions by analyzing the content of those pages. One area they currently struggle at, however, is multi-hop Question Answering that requires reasoning with information taken from multiple documents to arrive at the answer...
  • Improving performance and scalability of data science libraries
    In this episode of the Data Exchange [Podcast] I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists...
  • OpenTPOD - Create deep learning based object detectors without writing a single line of code
    OpenTPOD is an all-in-one open-source tool for nonexperts to create custom deep neural network object detectors. It is designed to lower the barrier of entry and facilitates the end-to-end authoring workflow of custom object detection using state-of-art deep learning methods. It provides several features via an easy-to-use web interface, including; training data management, data annotation through seamless integration, one-click training / fine-tuning of object detection deep neural networks, one-click model export for interface with Tensorflow Serving, and extensible architecture for easy addition of new deep neural network architectures...


Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


  • Data Scientist - Amazon Demand Forecasting - New York

    The Amazon Demand Forecasting team seeks a Data Scientist with strong analytical and communication skills to join our team. We develop sophisticated algorithms that involve learning from large amounts of data, such as prices, promotions, similar products, and a product's attributes, in order to forecast the demand of over 190 million products world-wide. These forecasts are used to automatically order more than $200 million worth of inventory weekly, establish labor plans for tens of thousands of employees, and predict the company's financial performance. The work is complex and important to Amazon. With better forecasts we drive down supply chain costs, enabling the offer of lower prices and better in-stock selection for our customers...

        Want to post a job here? Email us for details >>

Training & Resources

  • CMU Neural Nets for NLP 2020 [24 Video Playlist]
    This class will start with a brief overview of neural networks, then spend the majority of the class demonstrating how to apply neural networks to natural language problems. Each section will introduce a particular problem or phenomenon in natural language, describe why it is difficult to model, and demonstrate several models that were designed to tackle this problem. In the process of doing so, the class will cover different techniques that are useful in creating neural network models, including handling variably sized and structured sentences, efficient handling of large data, semi-supervised and unsupervised learning, structured prediction, and multilingual modeling...


40% off at Manning

Do more with your data!
If you're looking to make your data skills stand out, then be sure to check out Manning's range of books and video courses. They're offering 40% off everything in their catalog, so there's no better time to learn something new...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
  P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.