Data Science Weekly Newsletter - Issue 416

Issue #384

Apr 01 2021

Editor Picks
  • Redefining what a map can be with new information and AI
    Sixteen years ago, many of us held a printout of directions in one hand and the steering wheel in the other to get around— without information about the traffic along your route or details about when your favorite restaurant was open. Since then, we’ve been pushing the boundaries of what a map can do, propelled by the latest machine learning. This year, we’re on track to bring over 100 AI-powered improvements to Google Maps so you can get the most accurate, up-to-date information about the world, exactly when you need it. Here's a snapshot of how we're using AI to make Maps work better for you with a number of updates coming this year...
  • Stop Calling Everything AI, Machine-Learning Pioneer Says
    While the science-fiction discussions about AI and super intelligence are fun, they are a distraction,” he says. “There’s not been enough focus on the real problem, which is building planetary-scale machine learning–based systems that actually work, deliver value to humans, and do not amplify inequities...

A Message from this week's Sponsor:


Kickstart Your New Career with a Data Science & Analytics Bootcamp

Bootcamps are starting soon! Don’t miss your chance to join a Data Scientist-led, online Metis bootcamp with career support until you’re hired. Ready to take your data science or analytics career to the next level? Learn more about the Metis Online Data Science & Analytics Bootcamps.


Data Science Articles & Videos

  • Avoiding the 4 Major Pitfalls of Data Science Projects
    There are countless reasons a project could fail due...selection bias, target leakage, data drift, p-hacking, overfitting and more...In this article, however, we’ll focus on some fundamental issues pertaining to collaboration with the business...
  • The Language Interpretability Tool (LIT) is an open-source platform for visualization and understanding of NLP models
    The Language Interpretability Tool (LIT) is for researchers and practitioners looking to understand NLP model behavior through a visual, interactive, and extensible tool...Use LIT to ask and answer questions like: a) What kind of examples does my model perform poorly on?, b) Why did my model make this prediction? Can it attribute it to adversarial behavior, or undesirable priors from the training set?, c) Does my model behave consistently if I change things like textual style, verb tense, or pronoun gender?, and more...
  • The Chinese Approach to AI: An Analysis of Policy, Ethics, and Regulation
    This paper explores current China’s current AI policies, their future plans, and ethical standards they’re working on. The authors zoom in on China’s country-wide strategic effort, i.e. the ‘New Generation Artificial Intelligence Development Plan’ (AIDP). The strategic aims of the plan can be divided up into 3 main goals: international competition, economic development, and social governance...
  • Announcing ICLR 2021 Outstanding Paper Awards
    We are thrilled to announce the winners of the ICLR 2021 Outstanding Paper Awards. While there are 860 excellent papers in our program, and many of them of exceptional quality, we would like to highlight 8 papers that are especially noteworthy...Award winners (in alphabetical order) are...
  • MADGRAD Optimization Method
    We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and image-to-image tasks in vision, and recurrent and bidirectionally-masked models in natural language processing. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance, even on problems for which adaptive methods normally perform poorly...
  • How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
    In this blog post, we explain how key technologies, such as AutoML, DNN, Multi-Task Learning, Multi-Tower models, and Model Calibration, allow for highly performant and scalable solutions as we build out the ads marketplace at Pinterest. We also discuss the basics of AutoML and how it’s used for Pinterest Ads...
  • Predicting your next query even before you type!
    On the Flipkart website, when you click inside the Search box, you’ll see a list of auto-suggestions that help you plan the best query with less typing effort...We introduced the ‘Discover More’ feature just below the auto-suggestions list, which displays personalized queries based on user’s past shopping activities on Flipkart and some popular user queries across different categories...
  • How to Compare ML Experiment Tracking Tools to Fit Your Data Science Workflow
    Tracking your experiments in an efficient and organized manner is crucial. Having to try and recreate a model from a couple of months ago that holds an important result for your project is a frustrating situation that can be avoided with some foresight. Having so many options of different experiment management tools and platforms that offer experiment tracking can be more a hindrance than a help. It can be hard to tell the difference between them and can keep us from making a decision and picking one. We hope that this post will help you to discover different experiment tracking tools and pick the one that fits your data science workflow best...
  • Cross-validation: what does it estimate and how well does it do it? [PDF]
    Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population...



Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Data Scientist - HelloFresh - Chicago, IL or New York, NY

    Embedded in the NYC Tech Hub, we are building a cross-functional team of data scientists, analysts and engineers with the mission to bring the modeling and analytical capabilities of our marketing organization to the next level.

    As a Data Scientist, you will support the analytic needs of our Growth organization comprising Technology, Digital Product and Marketing. You will play a pivotal role in helping us continue to succeed as the leading global meal kit provider. This role will solve challenging problems using vast repositories of customer data to provide detailed and actionable insights; core responsibilities include the development and automation of Marketing BI tools, predictive modeling, professional-grade dashboarding and reporting for some of our most critical initiatives and enhancing and facilitating the information extraction process...

        Want to post a job here? Email us for details >>


Training & Resources

  • Out of Distribution Generalization in Machine Learning
    Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data and what kind of guarantees we hope to obtain from them...
  • What will the major ML research trends be in the 2020s? [Reddit Discussion]
    What do you think the next 10 years will bring in ML research? What conventionally accepted trend do you think will not happen?...e.g...a) Will deep learning continue to eat everything?, b) Will multi-task multi-domain learning make few-shot learning available for most domains? (Or is deep learning on the slow end of the sigmoid curve now?), c) Will safe, ethical, explainable AI rise, or is that hogwash?, d) Will advances decouple from compute power?, e) Will Gary Marcus and Judea Pearl win out in the symbolic/structural/causal war against deep learning?, f) Are there still major breakthroughs in language? Do we just finetune GPT-3?, g) Will we make big breakthroughs in theory and fundamental ML? Or is this the decade of application? (Healthcare will finally deploy models that beat logistic regression!)...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.