Data Science Weekly Newsletter

Issue

346

July 9, 2020

‍

Editor's Picks

‍

On Crashing the Barrier of Meaning in Artificial Intelligence
In October 2018, the Santa Fe Institute held a three-day workshop, organized by Barbara Grosz, Dawn Song, and myself [Melanie Mitchell ], called Artificial Intelligence and the Barrier of Meaning. Thirty participants from a diverse set of disciplines — artificial intelligence, robotics, cognitive and developmental psychology, animal behavior, information theory, and philosophy, among others — met to discuss questions related to the notion of understanding in living systems and the prospect for such understanding in machines. In the hope that the results of the workshop will be useful to the broader community, this article summarizes the main themes of discussion and highlights some of the ideas developed at the workshop...

The Cost of AI Training is Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI
The cost to train an artificial intelligence (AI) system is improving at 50x the pace of Moore’s Law...After just five years of development, deep learning...seems to have reached a tipping point in both cost and performance, paving the way for widespread adoption over the next decade...During the past ten years, the computing resources devoted to AI training models have exploded. After doubling every two years from 1960 to 2010, AI compute complexity has soared 10x every year, as shown below....

Training GANs - From Theory to Practice
GANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society. However, training GANs remains challenging (in part) due to the lack of convergent algorithms for nonconvex-nonconcave min-max optimization. In this post, we present a new first-order algorithm for min-max optimization which is particularly suited to GANs. This algorithm is guaranteed to converge to an equilibrium, is competitive in terms of time and memory with gradient descent-ascent and, most importantly, GANs trained using it seem to be stable...

‍

A Message From This Week's Sponsor

‍

Take the new Developer Economics survey

In 2019, Python was used by 8.4M developers working in data science. What will change in 2020 and beyond? We want to know! Take this survey and share your views about the most important tools, platforms, and resources. You may win one out of $15,000 worth of prizes! Open until August 10th. Start now!

‍

Data Science Articles & Videos

‍

Packaging data and machine learning models for sharing
This blog is about several different ways of "releasing" data science projects, with an emphasis on preserving meaningful links about the origins of derived data and models. I'm not making any strong assumptions about whether project materials are relased within an organization (only to teammates, for example) or to the whole internet...

On Moving from Statistics to Machine Learning, the Final Stage of Grief
I think I’ve gotten a good grasp of the mindset underlying machine learning and how it differs from traditional statistics, so I thought I’d write about it for those who have a similar background to me considering a similar move...This post is geared toward people who are excellent at statistics but don’t really “get” machine learning and want to understand the gist of it in about 15 minutes of reading...

Introducing neural supersampling for real-time rendering
Real-time rendering in virtual reality presents a unique set of challenges — chief among them being the need to support photorealistic effects, achieve higher resolutions, and reach higher refresh rates than ever before...Our [Facebook Reality Labs] SIGGRAPH technical paper, entitled “Neural Supersampling for Real-time Rendering,” introduces a machine learning approach that converts low-resolution input images to high-resolution outputs for real-time rendering. This upsampling process uses neural networks, training on the scene statistics, to restore sharp details while saving the computational overhead of rendering these details directly in real-time applications...

Business Card Neural Network
I set out to create a full neural network that could fit on the back of a business card. The code below is the result, creating a 3-layer fully-connected neural network with leaky-relu activations and training it to generate a small image of my name...

Machines for unlocking the deluge of COVID-19 papers, articles, and conversations
In this episode of the Data Exchange [Podcast] I speak with Amy Heineike, Principal Product Architect at Primer.ai, a startup building machines that can read and write. Primer recently used their technology to build COVID-19 Primer, a web site that provides an overview of the latest research papers, media coverage, and social media conversations pertaining to COVID-19...

Shopify's Data Science & Engineering Foundations
Shopify’s Data Science & Engineering team supports our internal teams, merchants, and partners with high quality, daily insights so they can “Make great decisions quickly.” Here are the foundational approaches to data warehousing and analysis that empower us to deliver the best results for our ecosystem...

A Graphical Analysis of Women's Tops Sold on Goodwill's Website
After 10ish years of second-hand shopping, I've started to ask myself a lot of questions about the clothes I've been buying...In the absense of any conclusive answers, I tried to get the data myself...I set up a script that collected information on listings for more than four million women's shirts for sale through Goodwill's website, going back to mid-2014...The information is deeply flawed—a Goodwill online auction is very different from a Goodwill store—but we can get an idea of how thrift store offerings have changed through the years...

Analysis of YouTube Trending Videos of 2019 (US)
Around 1.5 years ago, I did an analysis of YouTube trending videos in US. That analysis was performed on trending videos of some months in 2017 and 2018. The analysis received a lot of interest on Kaggle and Reddit...Today, I present an improved and expanded version of that analysis. This analysis is more advanced and contains new interesting elements. In this analysis, All trending videos for the whole year of 2019 were analyzed (More than 70,000 videos)...

The NBA’s 2023 MVP Is...Predicting NBA Players’ Careers Using Recurrent Linear Regression
The general concept for this project is that a player’s past performance and age are predictive of their future performance. Barring injury, a player’s next season will usually be similar to their most recent season. With this in mind, it’s possible to use statistics of past seasons to predict 1 season into the future. The output prediction can then be used as an input to predict 2 seasons into the future, which can then be used to predict 3 seasons into the future, etc...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Senior Data Scientist - Grubhub - NY / Chicago

Grubhub is looking for a data scientist to join the Pricing team. As a part of Pricing, you’ll be a member of a small team of data scientists and engineers who shape and optimize how we charge our diners, shaping hundreds of millions in revenue annually. You will work closely both with financial stakeholders as well as engineers to ship models that make Grubhub more efficient with the way in which it charges customers. You’ll construct models and A/B tests as well as write code to improve our modeling capabilities...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Mathematics for Machine Learning - Linear Algebra
Welcome to the “Mathematics for Machine Learning: Linear Algebra” course, offered by Imperial College London...This course was designed to help you quickly build an intuitive understanding of linear algebra...

How I Built a Convolutional Neural Network (CNN) That Detects Pneumonia From Chest X-Rays
Before getting into how we can use artificial neural networks to detect pneumonia, let’s understand a little bit about pneumonia as it will pave the road to building the model...

How to Install MySQL and Create an Employees Sample Database

Data scientists and analyst are expected to be able to write and execute complex queries in SQL. If you’re just getting started with SQL or are looking for a sandbox to test queries, then this guide if for you...

‍

Books

‍

Seven Databases in Seven Weeks:
A Guide to Modern Databases and the NoSQL Movement
"A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."... For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍