Data Science Weekly Newsletter

Issue

390

May 13, 2021

‍

Editor's Picks

‍

Why data quality is key to successful ML Ops
While we in the data community are all still riding the high of discovering and tuning predictive algorithms that can tell us whether a picture shows a dog or a blueberry muffin, we’re also beginning to realize that ML isn’t just a magic wand you can wave at a pile of data to quickly get insightful, reliable results...Instead, we are starting to treat ML like other software engineering disciplines that require processes and tooling to ensure seamless workflows and reliable outputs. Data quality, in particular, has been a consistent focus, as it often leads to issues that can go unnoticed for a long time, bring entire pipelines to a halt, and erode the trust of stakeholders in the reliability of their analytical insights...In this post, we are going to look at ML Ops, a recent development in ML that bridges the gap between ML and traditional software engineering, and highlight how data quality is key to ML Ops workflows in order to accelerate data teams and maintain trust in your data...

State of AI Report 2020
The State of AI Report analyses the most interesting developments in AI...Now in its third year, the State of AI Report 2020 features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions: a) Research: Technology breakthroughs and capabilities, b) Talent: Supply, demand and concentration of AI talent, c) Industry: Areas of commercial application for AI and its business impact, d) Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI, and e) Predictions: What we believe will happen and a performance review to keep us honest...

What Does It Mean If a Vaccine Is ‘Successful’?
The pharma companies are all using different playbooks to test their Covid-19 shots, so the first team to claim victory may not have the best formula...

‍

A Message From This Week's Sponsor

‍

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.

‍

Data Science Articles & Videos

‍

It's time for statistics departments to start supporting their applied students
Statistics departments are failing their applied students. In this post, I have a lot of opinions and give two pieces of advice: statistics departments need to start supporting their applied students, and they need to hire applied faculty...

Fireside Chat: Diversity in AI with Dr. Timnit Gebru
Join us [AI Ethics Diaries Podcast] for a candid conversation with senior research scientist at Google and eminent scholar, Dr.Timnit Gebru on her seminal work in uncovering racial bias in automated facial analysis algorithms and datasets, including the “Gender Shades” project, and get her thoughts on racial power dynamics, what meaningful actions for progress look like, and last but not the least, get the inside scoop on Eritrean and Ethiopian food...

Human-First AI - Because AI Needs The Human Eye
Human-First AI solves the “cold-start” problem of Industrial AI: encoding human expertise to augment the lack of data, while bridging to powerful ML—based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more...

How Awa Dieng found her passion for machine learning
Welcome to the latest installment of our blog series “My Path to Google.” These are real stories from Googlers, interns and alumni highlighting how they got to Google, what their roles are like and even some tips on how to prepare for interviews...Today’s post is all about Awa Dieng, an AI Resident on the Google Brain team in our Ghana office. Awa shares her path to working in research and machine learning at Google and how her work ensures AI systems are beneficial for everyone...

The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology
The ongoing coronavirus disease 2019 (COVID-19) pandemic has heightened discussion of the use of mobile phone data in outbreak response. Mobile phone data have been proposed to monitor effectiveness of non-pharmaceutical interventions, to assess potential drivers of spatiotemporal spread, and to support contact tracing efforts. While these data may be an important part of COVID-19 response, their use must be considered alongside a careful understanding of the behaviors and populations they capture. Here, we review the different applications for mobile phone data in guiding and evaluating COVID-19 response, the relevance of these applications for infectious disease transmission and control, and potential sources and implications of selection bias in mobile phone data...

RecSys 2020 - Takeaways and Notable Papers
RecSys (Recommender Systems Conference) 2020 ran from 22nd - 26th September. It was a great opportunity to peek into some of the latest thinking about recommender systems from academia and industry. Here are some observations and notes on papers I enjoyed...

From Research to Production with Deep Semi-Supervised Learning
Semi-supervised learning (SSL), a subfield that combines both supervised and unsupervised learning, has grown in popularity in the deep learning research community over the past few years. It’s very possible that, at least in the short-term, SSL approaches could be the bridge between label-heavy supervised learning and a future of data-efficient modeling...In this post, we talk about when you should consider using SSL approaches in your production environments and the lessons we’ve learned using them to improve our object detection models at Uizard...

Strong Women Through the Lens of The New York Times
The goal of this project is to investigate women’s representation in The New York Times throughout the past 70 years by means of sentiment analysis, frequent term visualization and topic modeling...

Bringing the Mona Lisa Effect to Life with TensorFlow.js
Urban legend says that Mona Lisa's eyes will follow you as you move around the room. This is known as the “Mona Lisa effect.” For fun, I recently programmed an interactive digital portrait that brings this phenomenon to life through your browser and webcam. At its core, the project leverages TensorFlow.js, deep learning, and some image processing techniques. The general idea is as follows: first, we must generate a sequence of images of Mona Lisa’s head, with eyes gazing from the left to right. From this pool, we’ll continuously select and display a single frame in real-time based on the viewer’s location. In this post, I’ll walk through the specifics of the project’s technical design and implementation...

‍

Training

‍

Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:

Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!

Click here to learn more
...
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Associated Press (AP) - New York, NY

The Associated Press is the essential global news network, delivering fast, unbiased news from every corner of the world to all media platforms and formats. Founded in 1846, AP today is the largest and most trusted source of independent news and information. On any given day more than half the world's population sees news from AP.
The Associated Press seeks a Data Science Manager based in New York, NY. The Data Science Manager will help manage data analysis, data science and data engineering solutions supporting business intelligence, news search, content enrichment and metadata services. We are a small focused team within Metadata Technology working closely with various departments and functions across the organization to design and build solutions with data, analytics and machine learning methods...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Keeping your data pipelines healthy with the Great Expectations GitHub Action
In this post, we show you how you can use GitHub Actions together with the open source project Great Expectations to automatically test, document, and profile your data pipelines as part of your traditional CI workflows. Checking data in this way can help data teams save time and promote analytic integrity of their data. We’ll also show how to generate dashboards that give you insight into data problems as part of your CI process...

usemodels 0.0.1
We’re very excited to announce the first release of the usemodels package. The tidymodels packages are designed to provide modeling functions that are highly flexible and modular. This is powerful, but sometimes a template or skeleton showing how to start is helpful. The usemodels package creates templates for tidymodels analyses so you don’t have to write as much new code...

A programming language for scientific machine learning and differentiable programming
In this episode of the Data Exchange Podcast I speak with Viral Shah, co-founder and CEO, Julia Computing. Along with his Julia language co-creators, Viral was awarded the 2019 Wilkinson prize, for outstanding contributions in the field of numerical software...Over the past few years, Julia continued to add packages at a steady pace and the package manager is really quite impressive and solid. We spent much of the podcast discussing the state of Julia, Julia 1.5, the Julia ecosystem and community...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍