Data Science Weekly Newsletter

Issue

345

July 2, 2020

‍

Editor's Picks

‍

Reflecting on a year of making machine learning actually useful
For those of you who don’t know my story, I’ll give you the short version: I did machine learning research for two years, decided not to get a PhD at the time, and became the first machine learning engineer at Viaduct, a startup that provides an end-to-end machine learning platform for automakers. Since we essentially sell machine learning as a service, we’re directly responsible for making machine learning actually work in the real world...

The Art of AI: An interview with Kai-Fu Lee
A leading figure in the Chinese tech scene and in artificial-intelligence development globally, Kai-Fu Lee earned a PhD in computer science from Carnegie Mellon University in 1988 before serving in executive roles at Apple, SGI, Microsoft, and Google, where he was president of Google China...Here, he discusses with Project Syndicate the global AI race, the current state of the field, and what may – and should – come next...

(Re)Discovering Protein Structure and Function Through Language Modeling
Trained solely on language modeling, the Transformer's attention mechanism recovers high-level structural and functional properties of proteins...In our study, we show how a language model, trained simply to predict a masked (hidden) amino acid in a protein sequence, recovers high-level structural and functional properties of proteins. In particular, we show how the Transformer language model uses attention (1) to capture the folding structure of proteins, connecting regions that are apart in the underlying sequence but spatially close in the protein structure, and (2) targets binding sites, a key functional component of proteins...

‍

A Message From This Week's Sponsor

‍

Take the new Developer Economics survey

In 2019, Python was used by 8.4M developers working in data science. What will change in 2020 and beyond? We want to know! Take this survey and share your views about the most important tools, platforms, and resources. You may win one out of $15,000 worth of prizes! Open until August 10th. Start now!

‍

Data Science Articles & Videos

‍

Building AI Trading Systems
Lessons learned building a profitable algorithmic trading system using Reinforcement Learning techniques...About two years ago I wrote a little piece about applying Reinforcement Learning to the markets. A few people asked me what became of it. So this post covers some high-level things I've learned...

Apple’s AI plan: a thousand small conveniences
Sprinkled throughout Apple’s WWDC announcements about iOS, iPadOS, and macOS were a number of features and updates that have machine learning at their heart. Some weren’t announced onstage, and some features that almost certainly use AI weren’t identified as such, but here’s a quick recap of the more prominent mentions that we spotted...

Common Voice Dataset Release - Mid Year 2020
We are halfway through 2020, and already it’s been an exciting year for Common Voice! Thanks to the enthusiasm and incredible engagement from our Common Voice communities, we are releasing an updated dataset 972 with 7,226 total hours of contributed voice data. 5,591 of these hours have been confirmed valid by our diligent contributors. Dataset fun fact: this release comprises over 5.5million clips*!...

Announcing Pylance: Fast, feature-rich language support for Python in Visual Studio Code
To deliver an improved user experience, we’ve created Pylance as a brand-new language server based on Microsoft’s Pyright static type checking tool. Pylance leverages type stubs (.pyi files) and lazy type inferencing to provide a highly-performant development experience. Pylance supercharges your Python IntelliSense experience with rich type information, helping you write better code, faster. The Pylance extension is also shipped with a collection of type stubs for popular modules to provide fast and accurate auto-completions and type checking...

Fiber: Distributed Computing for AI Made Simple
To enable future generations of large-scale computation for algorithms...we [Uber] developed Fiber, a new distributed computing library that helps users scale local computation methods to hundreds or even thousands of machines with ease. Fiber makes it fast, easy, and resource-efficient to power large-scale computation projects using Python, simplifying the ML model training process and leading to more optimal results...

Model Cards for Model Reporting
Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting...

Smooth Adversarial Training
It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training...

The machine learning community has a toxicity problem
[Reddit Discussion - 556 comments]
The peer-review process is broken...there is a reproducibility crisis...there is a worshiping problem...the way Yann LeCun talked about biases and fairness topics was insensitive...machine learning, and computer science in general, have a huge diversity problem...moral and ethics are set arbitrarily...cut-throat publish-or-perish mentality...discussions have become disrespectful...

Auxiliary Tuning and its Application to Conditional Text Generation
We designed a simple and efficient method, called Auxiliary Tuning, for adapting a pre-trained Language Model (LM) to a novel task, and demonstrated the approach on the task of conditional text generation. Our approach supplements the original pre-trained model with an auxiliary model that shifts the output distribution according to the target task..

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - DrivenData - Remote

DrivenData brings the transformative power of data science to organizations tackling the world’s biggest challenges. We run online machine learning challenges with social impact, and we work directly with mission-driven organizations to drive change through data science and engineering.
We are looking for a skilled data scientist who is interested in using their job to take on tough social challenges, grow their skillset, and build real-world applications. As a core member of a small team your role will include writing code (Python) everyday, brainstorming approaches to data science problems, working closely with other data scientists and engineers on the team, and taking an open and constructive mindset to getting things done across multiple projects....

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

DeepMind AI reading list - AtHomeWithAI
Curated Resource List - A list of educational resources curated by DeepMind Scientists and Engineers for students interested in learning more about artifical intelligence, machine learning and other related topics...

The Machine Learning Summer School
The Machine Learning Summer School [28 June - 10 July 2020] by the Max Planck Institute for Intelligent Systems, Tübingen, Germany...Lectures are livestreamed to Youtube and will remain accessible to the general public. Please click on the link labelled "Video" to access the videos on Youtube. If you have questions to the speakers, please submit your questions on Reddit. See the full program in full screen here...

The Analytics Setup Guidebook

Restructure your knowledge of the complex data analytics landscape, and learn how to build scalable analytics & BI stacks in the modern cloud era...Get the PDF here [PDF
]

‍

Books

‍

Seven Databases in Seven Weeks:
A Guide to Modern Databases and the NoSQL Movement
"A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."... For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍