Data Science Weekly Newsletter

Issue

287

May 23, 2019

‍

Editor's Picks

‍

A.I. Took a Test to Detect Lung Cancer. It Got an A
Artificial intelligence may help doctors make more accurate readings of CT scans used to screen for lung cancer...

Inside Facebook's New Robotics Lab,
Where AI and Machines Friend One Another
But, like a hare zigzagging back and forth to avoid a falcon, this robot’s seeming madness is in fact a special brand of cleverness, one that Facebook thinks holds the key not only for better robots, but for developing better artificial intelligence. This robot, you see, is teaching itself to explore the world. And that, Facebook says, could one day lead to intelligent machines like telepresence robots...

How Data (and Some Breathtaking Soccer) Brought Liverpool to the Cusp of Glory
The club is finishing a phenomenal season — thanks in part to an unrivaled reliance on analytics...

‍

A Message From This Week's Sponsor

‍

Become a Data Analyst with Thinkful

The Data Analytics program is for people who are starting from the very beginning. Learn how to scrape, collect and analyze data, use SQL and Tableau, and get an introduction to Python. We'll get you a job within six months of graduating or you'll get your tuition back.

‍

Data Science Articles & Videos

‍

Grocery bills can predict diabetes rates by neighborhood
Dietary habits are notoriously difficult to monitor. Now data scientists have analyzed sales figures from London’s biggest grocer to link eating patterns with local rates of high blood pressure, high cholesterol, and high blood sugar...

Maintainable ETLs: Tips for Making Your Pipelines Easier to Support and Extend
Core to any data science project is…wait for it…data! Preparing data in a reliable and reproducible way is a fundamental part of the process. If you’re training a model, calculating analytics, or just combining data from multiple sources and loading them into another system, you’ll need to build a data processing or ETL1 pipeline...

PaperRobot: Incremental Draft Generation of Scientific Ideas
We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper...

Google’s AI can now translate your speech while keeping your voice
Researchers trained a neural network to map audio “voiceprints” from one language to another...

Cross-lingual Language Model Pretraining
Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining...

A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing
This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today's computing architectures. This has motivated spatial architectures, where the arrangement of data-storage and compute hardware is distributed and explicitly aligned to the computation dataflow, most notably for matrix-vector multiplication...

How we might protect ourselves from malicious AI
New research could make deep-learning models much harder to manipulate in harmful ways...

Auto-classification of NAVER Shopping Product Categories using TensorFlow
This article introduces the process of automatically matching NAVER Shopping product categories using TensorFlow, and explains how we solved a few problems arising during the process of applying machine learning to actual data used by our service...

Data-Efficient Image Recognition with Contrastive Predictive Coding
Large scale deep learning excels when labeled images are abundant, yet data-efficient learning remains a longstanding challenge. While biological vision is thought to leverage vast amounts of unlabeled data to solve classification problems with limited supervision, computer vision has so far not succeeded in this `semi-supervised' regime. Our work tackles this challenge with Contrastive Predictive Coding, an unsupervised objective which extracts stable structure from still images...

‍

Event

‍

Big Data and AI Toronto 2019

Big Data and AI Toronto is a 2-in-1 learning experience engineered to address the greatest business challenges technology leaders are facing today.
During 2 days of case studies, demos and panels, attendees will engage with global thought-leaders in Big Data and AI, including experts from Uber, Bloomberg and SAS!
Register for your free expo pass and join 5000 attendees, 150 speakers, and 90 exhibiting brands on June 12-13 th at The Metro Toronto Convention Centre.
Stay up-to-date on the newest speakers and program highlights by subscribing to the Big Data and AI Toronto newsletter

Want to post here? Email us for details >> team@datascienceweekly.org

‍

Jobs

‍

Data Enginner / Data Scientist - Validate Health - Chicago

Interested in being part of a small founding team, so you can see your direct impact on improving the healthcare industry? Want to be one of the rockstars building an innovative product from the ground up?
Validate Health is an early stage healthcare analytics company on a mission to improve accessibility to healthcare by enabling medical organizations to operate at stable and sustainable financial models.
This position is a versatile combination of Data Engineer and Data Scientist roles. You’ll get to play a key role in shaping the delivery of powerful data-driven products that enable sustainable value-based healthcare models...

‍

Training & Resources

‍

AvgPool2D: How to Incorporate Average pooling into a PyTorch Neural Network
Learn how to incorporate average pooling into a PyTorch Neural Network, via a screencast video and full tutorial transcript...

Eureka: Mixmatch — A holistic approach to semi-supervised learning
Semi-supervised learning (SSL) is a form of supervised learning where we have a lot of extra information about the input data (X). SSL aims to utilize this extra information about the input data distribution to make a prediction on unlabeled data using only miniscule of labeled data. Or, you can look at SSL as unsupervised learning which has certain constraints for clustering — Points having the same label should go to the same cluster and so on. Some of the hyper-parameters in unsupervised learning could be inferred from this extra data — Number of clusters in case of clustering. Let us now see what are the assumption in an SSL...

ReshapeGAN: Object Reshaping by Providing A Single Reference Image
ReshapeGAN is a Tensorflow-based framework for training and testing of ReshapeGAN: Object Reshaping by Providing A Single Reference Image...

‍

Books

‍

40% off at Manning

Do more with your data!
If you're looking to make your data skills stand out, then be sure to check out Manning's range of books and video courses. They're offering 40% off everything in their catalog, so there's no better time to learn something new...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian

‍