Data Science Weekly Newsletter

Issue

374

January 21, 2021

‍

Editor's Picks

‍

No Recent Automation Revolution
An automation revolution driven by a new kind of automation tech should induce changes in the total amount and rate of automation, and in which kinds of jobs get more automated. But looking at all U.S. jobs 1999-2019, we find no change whatsoever in the kinds of jobs more likely to be automated. We don’t even see a net change in overall level of automation, though language habits may be masking such changes. And having a job get more automated is not correlated at all with changes in its pay or employment...

OpenAI technology, just an HTTPS call away
Apply our API to any language task — semantic search, summarization, sentiment analysis, content generation, translation, and more — with only a few examples or by specifying your task in English...

Explore connected papers in a visual graph
Connected papers is a unique, visual tool to help researchers and applied scientists find and explore papers relevant to their field of work....

‍

A Message From This Week's Sponsor

‍

Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.
Download a Program Brochure

‍

Data Science Articles & Videos

‍

Container technologies at Coinbase - Why Kubernetes is not part of our stack
Container orchestration platforms are complex and amazing technologies, helping some businesses and teams solve a whole suite of problems. What’s commonly overlooked however, is that container technologies also create a large set of challenges that must be overcome to prevent failures...

Unsupervised Translation of Programming Languages
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another..Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler...

What's your plan to manage technical debt in ML development? [Reddit Discussion]
I have spoken to many young people in the ML community who are unaware of what technical debt is and how it can be dealt with in a production setting. I have a feeling that there are many talented data scientists out there who are simply not taught how to use tried-and-tested software engineering practices, either at university or in online courses like Coursera and Udacity. This leads to scattered scripts, notebooks and glue-code in the exploration phase which requires effort in the time of weeks to get into production and/or automate...

A framework for feature engineering and machine learning pipelines
This blog post presents a simple yet efficient framework to structure machine learning pipelines and aims to avoid the following pitfalls: a) Spaghetti code, b) Long debugging, and c) Asymmetries between training and inference...On the opposite, it aims at: 1) Enhancing code maintainability, 2) Improving iteration speed, and 3) Making transition to production easier...

Viewing machine learning and data science applications as sociotechnical systems
In this episode of the Data Exchange I speak with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY...on training the next-generation of data scientists on ethics and fairness in ML...

GPT-3 Language Model: A Technical Overview
GPT-3, The $4,600,000 Language Model...Some interesting take-aways...a) GPT-3 demonstrates that a language model trained on enough data can solve NLP tasks that it has never seen. That is, GPT-3 studies the model as a general solution for many downstream jobs without fine-tuning, b) It would take 355 years to train GPT-3 on a Tesla V100, the fastest GPU on the market, and c) It would cost ~$4,600,000 to train GPT-3 on using the lowest cost GPU cloud provider...

What is the most interesting idea in ML/DL that you think doesn't get enough attention? [Reddit Discussion]
It seems that hot areas in the field have a lot of inertia and pick up a lot of researchers, which makes complete sense. What are some underserved or underrated ideas that you think are particularly interesting?...

Exploration Strategies in Deep Reinforcement Learning
Exploitation versus exploration is a critical topic in Reinforcement Learning. We’d like the RL agent to find the best solution as fast as possible. However, in the meantime, committing to solutions too quickly without enough exploration sounds pretty bad, as it could lead to local minima or total failure. Modern RL algorithms that optimize for the best returns can achieve good exploitation quite efficiently, while exploration remains more like an open topic...I would like to discuss several common exploration strategies in Deep RL here...

The Obligatory [Slate Star Codex] GPT-3 Post
I would be failing my brand if I didn’t write something about GPT-3, but I’m not an expert and discussion is still in its early stages. Consider this a summary of some of the interesting questions I’ve heard posed elsewhere, especially comments by gwern and nostalgebraist...Both of them disagree pretty strongly on the implications of GPT-3. I don’t know enough to resolve that disagreement, so this will be a kind of incoherent post, and hopefully stimulate some more productive comments...

‍

Survey

‍

Take the new Developer Economics survey

In 2019, Python was used by 8.4M developers working in data science. What will change in 2020 and beyond? We want to know! Take this survey and share your views about the most important tools, platforms, and resources. You may win one out of $15,000 worth of prizes! Open until August 10th. Start now!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Amazon Demand Forecasting - New York

The Amazon Demand Forecasting team seeks a Data Scientist with strong analytical and communication skills to join our team. We develop sophisticated algorithms that involve learning from large amounts of data, such as prices, promotions, similar products, and a product's attributes, in order to forecast the demand of over 190 million products world-wide. These forecasts are used to automatically order more than $200 million worth of inventory weekly, establish labor plans for tens of thousands of employees, and predict the company's financial performance. The work is complex and important to Amazon. With better forecasts we drive down supply chain costs, enabling the offer of lower prices and better in-stock selection for our customers...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Stanford University - CS 330: Deep Multi-Task and Meta Learning

While deep learning has achieved remarkable success in supervised and reinforcement learning problems, such as image classification, speech recognition, and game playing, these models are, to a large degree, specialized for the single task they are trained for. This [lecture videos available] course will cover the setting where there are multiple tasks to be solved, and study how the structure arising from multiple tasks can be leveraged to learn more efficiently or effectively...

The Algorithms

Open Source resource for learning Data Structures & Algorithms and their implementation in any Programming Language...

Creating a Virtual Pen And Eraser with OpenCV

Wouldn’t it be cool if you could just wave a pen in the air to draw something virtually and it actually draws it on the screen?...This whole application will fundamentally be built on Contour Detection. You can consider Contours as something like a closed curve having the same color or intensity, it is like a blob...

‍

Books

‍

Seven Databases in Seven Weeks:
A Guide to Modern Databases and the NoSQL Movement
"A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."... For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍