Data Science Weekly Newsletter

Issue

405

August 26, 2021

‍

Editor's Picks

‍

Season 1 finale of @therobotbrains podcast with the amazing @ilyasut Co-Founder/Chief Scientist @OpenAI
On the last episode (Ep.22) of Season One of The Robot Brains Podcast our guest is Ilya Sutskever. Ilya is the Co-Founder and Chief Scientist of OpenAI. As a PhD student at Toronto, Ilya was one of the authors on the 2012 AlexNet paper that completely changed the field of AI, resulting in the widespread adoption of deep learning, resulting in the avalanche of AI breakthroughs we’ve seen the past 10 years...His breakthroughs include AlexNet, seq2seq, MT, GPT, CLIP, DallE, Codex...

The Modern Data Experience: How a revolution comes together. Or doesn’t
Over the last several months, catalyzed by a post by Emilie Schario and Taylor Murphy, it’s become popular to say that data teams should think of everything they create as a product, and the rest of their colleagues as their customers. To build on this idea, what should that product be? What should it feel like to go from question, through technology and tools, through collaboration and conversation, to an answer?...

The 7 Reasons Most Machine Learning Funds Fail Marcos Lopez de Prado [Video]
This talk, titled The 7 Reasons Most Machine Learning Funds Fail, looks at the particularly high rate of failure in financial machine learning. The few managers who succeed amass a large number of assets, deliver consistently exceptional performance to their investors. However, that is a rare outcome. This presentation will go over the 7 critical mistakes underlying most financial machine learning failures based off of Marcos López de Prado’s experiences and observations...

‍

A Message From This Week's Sponsor

‍

Get Retool free for up to a year and $160,000 in startup discounts Why spend so much time on internal tooling, CRUD apps, and dashboards built from scratch? Retool is a 10x faster way to build custom internal tools, and now it's free for early-stage startups to use for up to a year. They've also created a deal book worth $160K in startup discounts to give startups access to the tools they need for great internal tools, for free. Get your discount here.

‍

Data Science Articles & Videos

‍

How DeepMind's Generally Capable Agents Were Trained
One of DeepMind's latest papers, Open-Ended Learning Leads to Generally Capable Agents, explains how DeepMind produced agents that can successfully play games as complex as hide-and-seek or capture-the-flag without even having trained on or seen these games before...As far as I know, this is an entirely unprecedented level of generality for a reinforcement-learning agent...The following is a high-level summary of the paper, meant to be accessible to non-specialists, that should nevertheless produce something resembling a gears-level model...

It’s a Peoples Game, Isn’t It?! A Comparison Between the Investment Returns of Business Angels and Machine Learning Algorithms
Investors increasingly use machine learning (ML) algorithms to support their early stage investment decisions. However, it remains unclear if algorithms can make better investment decisions and if so, why. Building on behavioral decision theory, our study compares the investment returns of an algorithm with those of 255 business angels (BAs) investing via an angel investment platform...

Using ML and Optimization to Solve DoorDash’s Dispatch Problem
DoorDash delivers millions of orders every day with the help of DeepRed, the system at the center of our last-mile logistics platform. But how does DeepRed really work and how do we use it to keep the marketplace running smoothly? To power our platform we needed to solve the “dispatch problem”: how to get each order from the store to the customer, via Dashers, as efficiently as possible. In this blog post, we will discuss the details of the dispatch problem, how we used ML and optimization to solve the problem, and how we continuously improve our solution with simulations and experimentation...

Reinforcement Learning Course Materials
Lecture notes, tutorial tasks including solutions as well as online videos for the reinforcement learning course hosted by Paderborn University. Source code for the entire course material is open and everyone is cordially invited to use it for self-learning (students) or to set up your own course (lecturers)...

Knowledge Graphs 2021: A Data Odyssey [PDF]
Over the last 15 years, huge knowledge bases, also known as knowledge graphs, have been automatically constructed from web data, and have become a key asset for search engines and other use cases...This position paper reviews these advances and discusses lessons learned. It highlights the role of "DB thinking" in building and maintaining high-quality knowledge bases from web contents. Moreover, the paper identifies open challenges and new research opportunities...

gslides: Creating charts in Google slides
gslides is a Python package that helps analysts turn pandas dataframes into Google slides & sheets charts by configuring and executing Google API calls...The package provides a set of classes that enable the user full control over the creation of new visualizations through configurable parameters while eliminating the complexity of working directly with the Google API...

An oscilloscope for deep learning
The Data Exchange Podcast: Charles Martin on how ideas from physics can be used to build practical tools for evaluating and tuning neural networks...

AI-Generated Plant Collage
300 plants from an alternate reality. Created...using AI....

The Gartner Magic Quadrant for Metadata Management was just scrapped. Here’s everything you need to know.
This week, Gartner took a huge step toward this by scrapping its Magic Quadrant for Metadata Management Solutions and replacing it with a Market Guide for Active Metadata. This change heralds a new way of approaching metadata in today’s modern data stack...In this article, I try to unpack...where metadata management is headed...

Text Data Augmentation for Deep Learning
In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data...

‍

Training

‍

Quick Question For You: Do you want a Data Science job? After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course. The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more ... *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Senior Data Analyst - HER - Remote

We are looking for a Senior Data Analyst to help us re-develop our existing data workflow, enable better scalability, and improve accuracy. In addition to this, we’re looking for someone to help improve our ability to discover the relevant information in our data, driving our decisions in delivering an ever improving service.

The primary focus of the role will be in establishing a new data gathering pipeline, doing statistical analysis, and helping build the analytical basis for the prediction systems. This is the perfect opportunity to be intricately involved in running analytical experiments in a methodical manner, and give us a hand in improving the next generation of recommendation systems that power our social experience.

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Foundations of Deep RL -- 6-lecture series by Pieter Abbeel
Lecture 1: MDPs, Exact Solution Methods, Max-ent RL, Lecture 2: Deep Q-Learning, Lecture 3: Policy Gradients and Advantage Estimation, Lecture 4: TRPO and PPO, Lecture 5: DDPG and SAC, and Lecture 6: Model-based RL....

prettymaps: small Python library to draw customized maps from OpenStreetMap data
A small set of Python functions to draw pretty maps from OpenStreetMap data. Based on osmnx, matplotlib and shapely libraries...

useR! 2021 (The R Conference) Recordings
Recordings of Keynotes...Recordings of Talks (Regular Sessions)...Recordings of Tutorials...and... Recordings of Elevator Pitches...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍