Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
July 15, 2021

Editor's Picks

  • AlphaFold: a solution to a 50-year-old grand challenge in biology
    Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognised as a solution to this grand challenge by the organisers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world...
  • PyTorch Developer Day 2020 [Video]
    PyTorch Developer Day 2020, where we'll look at technical talks, core updates, and project deep dives. The video will cover a variety of topics, including updates to the core framework and new tools and libraries to support development across a number of domains. You'll also hear from the community on the latest research powered by PyTorch...
  • rjs: R in JavaScript
    Introducing R in JavaScript, a way to insert R code directly into websites, powered by OpenCPU...

A Message From This Week's Sponsor

Are You asking Your Data the Right Questions?

Before you can get answers from your data, you need to know which questions to ask. At CMU’s Tepper School of Business, we help you build your analytical expertise and business acumen so that you can take your insights to the next level.
Download a Program Brochure

Data Science Articles & Videos

  • Named Tensor Notation
    Most papers about neural networks use the notation of vectors and matrices from applied linear algebra. This notation is very well-suited to talking about vector spaces, but less well-suited to talking about neural the realm of mathematical notation, then, we want two things: first, the flexibility of working with multidimensional arrays, and second, the perspicuity of identifying indices by name instead of by position. This document describes our proposal to do both...
  • Data versus Science: Contesting the Soul of Data-Science
    The post below...expresses my firm belief that the current data-fitting direction taken by “Data Science” is temporary (read my lips!), that the future of “Data Science” lies in causal data interpretation and that we should prepare ourselves for the backlash swing...
  • Precise spatial representations in the hippocampus of a food‑caching bird
    The hippocampus is an ancient neural circuit required for the formation of episodic memories. In mammals, this ability is thought to depend on well-documented patterns of neural activity, including place cells and sharp wave ripples...Our findings suggest a striking conservation of hippocampal mechanisms across distant vertebrates, in spite of vastly divergent anatomy and cytoarchitecture. At the same time, these results demonstrate that the exact implementation of such common mechanisms may conform to the unique ethological needs of different species...
  • Designing Data Science Tools at Spotify
    Spotify operates at a massive scale: We have millions of listeners whose activities generate huge amounts of raw data. Raw data by itself is not that helpful though; we need to be able to process, manage, and distill it into insights that can inform new features or improvements to the experience. And to do that, we need usable, well-designed tools that ensure these insights can be easily understood...
  • Bodywork
    Bodywork is a simple framework for machine learning engineers to run model-training workloads and deploy model-scoring services on Kubernetes...It automates the repetitive tasks that most machine learning engineers think of as DevOps, allowing them to focus their time on what they do best - machine learning...Bodywork uses Kubernetes for running machine learning workloads and services, because we believe that Kubernetes comes shipped with all the resources required for building an effective Machine Learning Operations (MLOps) platform...
  • Airbnb-quality data for all
    How to build and maintain high quality data without raising billions...Airbnb has always been a data driven company...Back in 2015, they were laying the foundation to ensure that data science was democratized at Airbnb. Meanwhile, they have grown to more than 6,000 people and have raised more than $6b of venture funding...Fortunately, companies no longer need to reinvent the wheel or make massive investments to improve and maintain high quality data. New startups...are building the technology needed to monitor, triage and root cause data quality issues efficiently at scale...
  • Google's AI can keep Loon balloons flying for over 300 days in a row
    Huge stratospheric balloons that act as floating cell towers in remote areas can stay in the air for hundreds of days thanks to an artificially intelligent pilot created by Google and Loon...Keeping these huge balloons in a fixed position is difficult as they can get blown off course. Now, researchers at Loon and Google have joined forces to create an AI controller that can counter the harsh winds of the stratosphere by releasing air to descend or adding it to ascend, riding atmospheric currents in the desired direction...
  • Language Interpretability Tool (LIT)
    The Language Interpretability Tool (LIT) is a visual, interactive model-understanding tool for NLP models...LIT is built to answer questions such as: a) What kind of examples does my model perform poorly on?, b) Why did my model make this prediction? Can this prediction be attributed to adversarial behavior, or to undesirable priors in the training set?, c) Does my model behave consistently if I change things like textual style, verb tense, or pronoun gender?...


Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!


  • Data Scientist - Apple Pay Analytics - NYC

    You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...
        Want to post a job here? Email us for details >>

Training & Resources

  • Alibi Detect
    Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. The outlier detection methods should allow the user to identify global, contextual and collective outliers...
  • Why you shouldn't get your Ph.D. (Reddit Discussion w/ 254 comments)
    I'm a fifth year Ph.D. student studying Machine Learning. I just want to share some of the observations I've made throughout my "journey". Maybe my experience differs completely from others, but after talking with my colleagues about these things, I don't think I am unique in how I feel about getting a Ph.D...
  • Why you should get your Ph.D. (Reddit Discussion w/ 108 comments)
    I have been hearing some negativity about PhDs recently, much of it justified I am sure. However, as someone who has largely enjoyed their PhD in reinforcement learning, I thought I might explain some of the great things that can come from a PhD and give my advice on things to consider. My advice is not scientific and I am sure many others have written better advice you should also read...


  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...
    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.