Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
June 3, 2021

Editor's Picks

  • Emerging Architectures for Modern Data Infrastructure
    As an industry, we’ve gotten exceptionally good at building large, complex software systems. We’re now starting to see the rise of massive, complex systems built around data – where the primary business value of the system comes from the analysis of data, rather than the software directly...And yet, despite all of this energy and momentum, we’ve found that there is still a tremendous amount of confusion around what technologies are on the leading end of this trend and how they are used in practice. In the last two years, we talked to hundreds of founders, corporate data leaders, and other experts – including interviewing 20+ practitioners on their current data stacks – in an attempt to codify emerging best practices and draw up a common vocabulary around data infrastructure. This post will begin to share the results of that work and showcase technologists pushing the industry forward...
  • FermiNet: Quantum Physics and Chemistry from First Principles
    In an article recently published in Physical Review Research, we [DeepMind] show how deep learning can help solve the fundamental equations of quantum mechanics for real-world systems. Not only is this an important fundamental scientific question, but it also could lead to practical uses in the future, allowing researchers to prototype new materials and chemical syntheses in silico before trying to make them in the lab. Today we are also releasing the code from this study so that the computational physics and chemistry communities can build on our work and apply it...
  • Companies Are Rushing to Use AI—but Few See a Payoff
    A study finds that only 11 percent of firms that have deployed artificial intelligence are reaping a “sizable” return on their investments...The report, from Boston Consulting Group and MIT Sloan Management Review, is one of the first to explore whether companies are benefiting from AI. Its sobering finding offers a dose of realism amid recent AI hype. The report also offers some clues as to why some companies are profiting from AI and others appear to be pouring money down the drain...

A Message From This Week's Sponsor

Machine Learning for Everyone with Eric Siegel

Business-side leadership every machine learning practitioner needs to master
There are many how-to machine learning courses for hands-on techies, but there are practically none that also serve business leaders – a striking omission, since success with ML relies on a very particular business leadership practice just as much as it relies on adept number crunching. This end-to-end, three-course series on Coursera will empower you to launch machine learning. Accessible to business-level learners and yet vital to techies as well, it covers both the state-of-the-art techniques and the business-side leadership best practices.

Data Science Articles & Videos

  • A deep active learning system for species identification and counting in camera trap images
    Motion-activated cameras, also known as camera traps, are a critical tool for population surveys, as they are cheap and non-intrusive. However, extracting useful information from camera trap images is a cumbersome process: a typical camera trap survey may produce millions of images that require slow, expensive manual review...In this paper, we focus not on automating the labeling of camera trap images, but on accelerating this process. We combine the power of machine intelligence and human intelligence to build a scalable, fast, and accurate active learning system to minimize the manual work required to identify and count animals in camera trap images. Our proposed scheme can match the state of the art accuracy on a 3.2 million image dataset with as few as 14,100 manual labels, which means decreasing manual labeling effort by over 99.5%...
  • Mismatches between Traditional Optimization Analyses and Modern Deep Learning
    You may remember our previous blog post showing that it is possible to do state-of-the-art deep learning with learning rate that increases exponentially during training. It was meant to be a dramatic illustration that what we learned in optimization classes and books isn’t always a good fit for modern deep learning, specifically, normalized nets, which is our term for nets that use any one of popular normalization schemes,e.g. BatchNorm (BN), GroupNorm (GN), WeightNorm (WN). Today’s post (based upon our paper with Kaifeng Lyu at NeurIPS20) identifies other surprising incompatibilities between normalized nets and traditional analyses. We hope this will change the way you teach and think about deep learning!...
  • How to put machine learning models into production
    The goal of building a machine learning model is to solve a problem, and a machine learning model can only do so when it is in production and actively in use by consumers. As such, model deployment is as important as model building...In this article, I’m going to talk about some of the practices and methods that will help get machine learning models in production. I’ll discuss different techniques and use cases, as well as the pros and cons of each method...
  • Awesome Data Engineering
    Learning path and resources to become a data engineer...Best books, best courses and best articles on each subject...How to read it: First, not every subject is required to master. Look for the "essentiality" measure. Then, each resource standalone for its measurements. "coverage" and "depth" are relative to the subject of the specific resource, not the entire category...
  • Neural Databases
    In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema...This paper presents a first step in answering that question. We describe NeuralDB, a database system with no pre-defined schema, in which updates and queries are given in natural language. We develop query processing techniques that build on the primitives offered by the state of the art Natural Language Processing methods...
  • Translating lost languages using machine learning
    Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or “undeciphered” — that is, we don’t know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts...researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently made a major development in this area: a new system that has been shown to be able to automatically decipher a lost language, without needing advanced knowledge of its relation to other languages. They also showed that their system can itself determine relationships between languages, and they used it to corroborate recent scholarship suggesting that the language of Iberian is not actually related to Basque...
  • 2020 Summer Intern Projects at StitchFix
    Thank you to all the 2020 summer interns that worked with the Stitch Fix Algorithms team. For the first time, the internship program was fully remote, but that hasn’t stopped them from working on impactful projects. This post summarizes some of the projects they worked on...
  • Deep Learning in the Era of Edge Computing: Challenges and Opportunities
    Although the Internet is the backbone of edge computing, its true value lies at the intersection of gathering data from sensors and extracting meaningful information from the sensor data. We envision that in the near future, majority of edge devices will be equipped with machine intelligence powered by deep learning. However, deep learning-based approaches require a large volume of high-quality data to train and are very expensive in terms of computation, memory, and power consumption. In this chapter, we describe eight research challenges and promising opportunities at the intersection of computer systems, networking, and machine learning. Solving those challenges will enable resource-limited edge devices to leverage the amazing capability of deep learning...
  • A radical new technique lets AI learn with practically no data
    “Less than one”-shot learning can teach a model to identify more objects than the number of examples it is trained on...a new paper from the University of Waterloo in Ontario suggests that AI models should also be able to do this—a process the researchers call “less than one”-shot, or LO-shot, learning. In other words, an AI model should be able to accurately recognize more objects than the number of examples it was trained on. That could be a big deal for a field that has grown increasingly expensive and inaccessible as the data sets used become ever larger...


Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


  • Data Scientist - Associated Press (AP) - New York, NY

    The Associated Press is the essential global news network, delivering fast, unbiased news from every corner of the world to all media platforms and formats. Founded in 1846, AP today is the largest and most trusted source of independent news and information. On any given day more than half the world's population sees news from AP.
    The Associated Press seeks a Data Science Manager based in New York, NY. The Data Science Manager will help manage data analysis, data science and data engineering solutions supporting business intelligence, news search, content enrichment and metadata services. We are a small focused team within Metadata Technology working closely with various departments and functions across the organization to design and build solutions with data, analytics and machine learning methods...
        Want to post a job here? Email us for details >>

Training & Resources

  • A Programmer’s Intuition for Matrix Multiplication
    What does matrix multiplication mean?...1) Matrix multiplication scales/rotates/skews a geometric plane...2) Matrix multiplication composes linear operations...We need another intuition for what's happening...I'll put a programmer's viewpoint into the ring: 3) Matrix multiplication is about information flow, converting data to code and back...
  • AdaBelief Optimizer: fast as Adam, generalizes as good as SGD, and sufficiently stable to train GANs
    Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability. We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability...


  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.