Data Science Weekly Newsletter

Issue

327

February 27, 2020

‍

Editor's Picks

‍

Andreessen-Horowitz craps on “AI” startups from a great height
Andreessen-Horowitz has always been the most levelheaded of the major current year VC firms...Their recent review on how “AI” differs from software company investments is absolutely brutal. I am pretty sure most people didn’t get the point, so I’ll quote it emphasizing the important bits....

A Deep Learning Approach to Antibiotic Discovery
Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub—halicin—that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens...This work highlights the utility of deep learning approaches to expand our antibiotic arsenal through the discovery of structurally distinct antibacterial molecules....

The 2010s: Our Decade of Deep Learning / Outlook on the 2020s
By: Jürgen Schmidhuber
The present post focuses on the recent decade's most important developments and applications based on our work, also mentioning related work, and concluding with an outlook on the 2020s, also addressing privacy and data markets....

‍

A Message From This Week's Sponsor

‍

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.

‍

Data Science Articles & Videos

‍

Thinking About ‘Ethics’ in the Ethics of AI
A major international consultancy firm identified ‘AI ethicist’ as an essential position for companies to successfully implement artificial intelligence (AI) at the start of 2019...To provide specific guidance and recommendations, the ethical analysis of AI further needs to specify the technology, e.g. autonomous vehicles, recommender systems, etc., the methods, e.g. deep learning, reinforcement learning, etc., and the sector(s) of application, e.g. healthcare, finance, news, etc. In this article, we shall focus on the ethical issues related to autonomous AI, i.e. artificial agents, which can decide and act independently of human intervention, and we shall illustrate the ethical questions of autonomous AI with plenty of examples....

YOLO Creator Joseph Redmon Stopped CV Research Due to Ethical Concerns
Joseph Redmon, creator of the popular object detection algorithm YOLO (You Only Look Once), tweeted last week that he had ceased his computer vision research to avoid enabling potential misuse of the tech — citing in particular “military applications and privacy concerns.”....

Lessons learned managing the GitLab Data team
TFrom April 2018 to May 2019 I was the manager of the Data team for GitLab. I took this role after my manager left, when I started reporting directly to the CFO as a Data Engineer...What follows are a few lessons I learned (and relearned!) in my one-year stint as the manager of the Data team. Eventually, I aim to become a manager again and I hope to remember these lessons and learn even more...

Modern Data Lakes Overview
As Data volumes grow to new, unprecedented levels, new tools and techniques are coming into picture to handle this growth. One of the fields that evolved is Data Lakes. In this post we’ll take a look at the story of evolution of Data Lakes and how modern Data Lakes like Iceberg, Delta Lake are solving important problems....

Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot
In this post I want to compare ClickHouse, Druid, and Pinot, the three open source data stores that run analytical queries over big volumes of data with interactive latencies...

How Modiface utilized TensorFlow.js in production for AR makeup try on in the browser
ModiFace has been creating artificial intelligence tech for the beauty industry for over a decade...As smartphones hit the market, ModiFace quickly took advantage of the platform to switch from a virtual try on for a 2D image, to a virtual try on for a live 3D video...since then we’ve expanded our live virtual try on to be even more accessible by expanding our reach to the web using TensorFlow.js....

Overview of Active Learning for Deep Learning
Most of the focus of the machine learning community is about creating better algorithms for learning from data. But getting useful annotated datasets is difficult. Really difficult. It can be expensive, time consuming, and you still end up with problems like annotations missing from some categories...I think that being able to build practical machine learning systems is a lot about tools to annotate data, and that a lot of the future innovation in building systems that solve real problems will be about being able to annotate high quality datasets quickly...Active Learning is a great building block for this, and is under utilized in my opinion...In this post I will give a short introduction to Classical Active Learning, and then go over several papers that focus on Active Learning for Deep Learning...

Lex Fridman's AI Podcast Episode 73: Andrew Ng: Deep Learning, Education, and Real-World AI [Video]
Andrew Ng is one of the most impactful educators, researchers, innovators, and leaders in artificial intelligence and technology space in general. He co-founded Coursera and Google Brain, launched deeplearning.ai, Landing.ai, and the AI fund, and was the Chief Scientist at Baidu. As a Stanford professor, and with Coursera and deeplearning.ai, he has helped educate and inspire millions of students including me...

Using Kalman Filter to Predict Corona Virus Spread
This article presents the implementation of an online real-time Kalman filter algorithm to predict the spread of COVID19 per given region...

‍

Conference

‍

The Premier Machine Learning Conference

5 days, 8 tracks, 160 speakers and over 150 exciting sessions
Join Machine Learning Week 2020 , May 31 – June 4, Las Vegas! It brings together five co-located events: PAW Business, PAW Financial, PAW Industry 4.0, PAW Healthcare, Deep Learning World. This event is where to meet the who’s who and keep up on the latest techniques, making it the leading machine learning event that excites and unites. You can expect top-class experts from world-famous companies such as Google, Microsoft, Lyft, Verizon, Visa and LinkedIn!
Secure your ticket now!
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Senior Principal Data Scientist - PepsiCo eCommerce - NYC

To ensure continued success in the food and beverage space, PepsiCo has assembled a dedicated eCommerce team – tasked with optimizing eCommerce operations and developing innovations that will give PepsiCo a sustainable competitive advantage. While tied closely to broader PepsiCo, the eCommerce group more closely resembles a start-up environment; embracing the core values of having a bias for action, being results-oriented, maintaining a community-focus, and prioritizing people.
PepsiCo’s Data Science and Analytics group is a team of data scientists, technology specialists, and business innovators who operate within eCommerce to build industry-leading systems and solutions. By focusing on machine learning and automation, the Data Science & Analytics group is pushing the bounds of possibility for PepsiCo and its strategic partners.

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Torchmeta: A Meta-Learning library for PyTorch
Torchmeta is a collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta received the Best in Show award at the Global PyTorch Summer Hackathon 2019. The library is open-source, and you can try it with pip install torchmeta....

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer
In “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”, we present a large-scale empirical survey to determine which transfer learning techniques work best and apply these insights at scale to create a new model that we call the Text-To-Text Transfer Transformer (T5). We also introduce a new open-source pre-training dataset, called the Colossal Clean Crawled Corpus (C4). The T5 model, pre-trained on C4, achieves state-of-the-art results on many NLP benchmarks while being flexible enough to be fine-tuned to a variety of important downstream tasks...

Deep Reinforcement Learning in PyTorch
Modular, optimized implementations of common deep RL algorithms in PyTorch, with unified infrastructure supporting all three major families of model-free algorithms: policy gradient, deep-q learning, and q-function policy gradient. Intended to be a high-throughput code-base for small- to medium-scale research (large-scale meaning like OpenAI Dota with 100's GPUs)...

‍

Books

‍

Data Science in Production: Building Scalable Model Pipelines with Python
This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍