Data Science Weekly Newsletter

Issue

312

November 14, 2019

‍

Editor's Picks

‍

Computers Evolve a New Path Toward Human Intelligence
Neural networks that borrow strategies from biology are making profound leaps in their abilities. Is ignoring a goal the best way to make truly intelligent machines?...

History’s message about regulating AI
As we consider artificial intelligence, we would be wise to remember the lessons of earlier technology revolutions—to focus on the technology’s effects rather than chase broad-based fears about the technology itself...

Highlights from the 2019 Google AI Residency Program
The program’s latest installment was our most successful yet, as residents advanced progress in a broad range of research fields, such as machine perception, algorithms and optimization, language understanding, healthcare and many more. Below are a handful of innovative projects from some of this year’s alumni...

‍

A Message From This Week's Sponsor

‍

Introducing Helix: the first dynamic data engine for data science teams

Helix is the first instant responsive data engine that creates a dual backbone of modern business intelligence and interactive data science. Now, self-serve dashboards can be built with a single query. Every report or ad hoc exploration can be visually explored and extended by stakeholders...

‍

Data Science Articles & Videos

‍

AI and Compute
We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period)...

Predicting Airbnb prices with machine learning and location data
A case study using data from the City of Edinburgh, Scotland...

Key challenges for delivering clinical impact with artificial intelligence
Artificial intelligence (AI) research in healthcare is accelerating rapidly, with potential applications being demonstrated across various domains of medicine. However, there are currently limited examples of such techniques being successfully deployed into clinical practice. This article explores the main challenges and limitations of AI in healthcare, and considers the steps required to translate these potentially transformative technologies from research to clinical practice...

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
They used TED Talk transcripts with laughter cues to create a humor dataset that can be used for humor detection and other humor analyses...

The Measure of Intelligence
I've just released a fairly lengthy paper on defining & measuring intelligence, as well as a new AI evaluation dataset, the "Abstraction and Reasoning Corpus". I've been working on this for the past 2 years, on & off...

Releasing Spleeter: Deezer Research source separation engine
Spleeter is an open-source project from Deezer (the French Spotify) that uses Deep Learning to do source separation on musical tracks. Built with Keras and TensorFlow. It runs out-of-the-box on CPU!...

The AI hiring industry is under scrutiny—but it’ll be hard to fix
An artificial-intelligence tool has already been used on over a million applicants. But critics worry that these types of algorithm are trained on limited data and so will be more likely to mark “traditional” applicants (white, male) as more employable...

Fast Transformer Decoding: One Write-Head is All You Need
Multi-head attention layers, as used in the Transformer neural sequence model, are a powerful alternative to RNNs for moving information across and between sequences. While training these layers is generally fast and simple, due to parallelizability across the length of the sequence, incremental inference (where such paralleization is impossible) is often slow, due to the memory-bandwidth cost of repeatedly loading the large "keys" and "values" tensors. We propose a variant called multi-query attention, where the keys and values are shared across all of the different attention "heads", greatly reducing the size of these tensors and hence the memory bandwidth requirements of incremental decoding...

How should a Data Scientist's resume differ from an Academic CV?
Your academic cv is very coursework and research focused. You've heard business resumes need to be more action and results oriented, but you're not sure what that means for you. You're looking for advice on how to re-work your academic cv and not finding much advice out here. To help get you started, here are some thoughts on what you'll need to do...

‍

Training

‍

Create D3 Data Visualizations As Fast As You Can Sketch

You need to create a D3.js data visualization to communicate your insights. But... #d3BrokeAndMadeArt! This time, your data join appears to have broken and the JavaScript console shows an error you don't recognize. Last time, you got stuck trying to figure out how to make axes that didn't look like 3rd graded made them. It makes you want to strangle D3 with your bare hands. Just how steep does the D3 learning curve need to be?!
What if you could learn and master D3 quickly and deeply?
Great news! - You can ... Check out DashingD3js.com Screencasts today!

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Driven Brands - Charlotte, NC

The Data Scientist for Driven Brands focus will be responsible for providing reliable marketing, media and promotional performance analysis and reporting to Senior Executives and Business Unit Management to be used to make decisions impacting the performance of the business...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

XLM-R: State-of-the-art cross-lingual understanding through self-supervision
A new model, called XLM-R, that uses self-supervised training techniques to achieve state-of-the-art performance in cross-lingual understanding, a task in which a model is trained in one language and then used with other languages without additional training data. Our model improves upon previous multilingual approaches by incorporating more training data and languages — including so-called low-resource languages, which lack extensive labeled and unlabeled data sets...

All you need to know about text preprocessing for NLP and Machine Learning
I thought of shedding some light around what text preprocessing really is, the different methods of text preprocessing, and a way to estimate how much preprocessing you may need...

New R Support in Azure Machine Learning
A new R package azuremlsdk (available to install from Github now, and from CRAN soon), provides the interface to the Azure Machine Learning service. ...

‍

Books

‍

The Lady Tasting Tea:
How Statistics Revolutionized Science in the Twentieth Century An insightful, revealing history of how mathematics transformed our world...
"I have taken courses in statistics, taught it many times and solved several statistical problems that have appeared in journals. But until I read this book, I never really thought about it in so deep and philosophical a manner..."...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍