Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
Issue
328
March 5, 2020

Editor's Picks

  • Why your brain is not a computer
    For decades it has been the dominant metaphor in neuroscience. But could this idea have been leading us astray all along?...Every day, we hear about new discoveries that shed light on how brains work, along with the promise – or threat – of new technology that will enable us to do such far-fetched things...And yet there is a growing conviction among some neuroscientists that our future path is not clear. It is hard to see where we should be going, apart from simply collecting more data or counting on the latest exciting experimental approach. As the German neuroscientist Olaf Sporns has put it: “Neuroscience still largely lacks organising principles or a theoretical framework for converting brain data into fundamental knowledge and understanding.”...
  • A Primer in BERTology: What we know about how BERT works
    Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. We also provide an overview of the proposed modifications to the model and its training regime. We then outline the directions for further research...
  • Machine Learning for Everyone
    In simple words. With real-world examples. Yes, again

    If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems..or fishy fairytales about artificial intelligence, data-science magic, and jobs of the future...I decided to write a post I’ve been wishing existed for a long time. A simple introduction for those who always wanted to understand machine learning. Only real-world problems, practical solutions, simple language, and no high-level theorems. One and for everyone. Whether you are a programmer or a manager...



A Message From This Week's Sponsor



Help meet the growing demand in data science.

The Data Science Career Track is a 6-month, self-paced online course that will pair you with your own industry expert mentor as you learn skills like data wrangling and data storytelling, and build your unique portfolio to stand out in the job market.
Land your dream job as data scientist within six months of graduating or the course is free.


Data Science Articles & Videos

  • Transformers are Graph Neural Networks
    Engineer friends often ask me: Graph Deep Learning sounds great, but are there any big commercial success stories? Is it being deployed in practical applications?...Besides the obvious ones–recommendation systems at Pinterest, Alibaba and Twitter–a slightly nuanced success story is the Transformer architecture...Through this post, I want to establish links between Graph Neural Networks (GNNs) and Transformers. I’ll talk about the intuitions behind model architectures in the NLP and GNN communities, make connections using equations and figures, and discuss how we could work together to drive progress....
  • How Hard Will The Robots Make Us Work?
    In warehouses, call centers, and other sectors, intelligent machines are managing humans, and they’re making work more stressful, grueling, and dangerous....
  • How I Did it - From Psych graduate to VP of Data Science at Lazada
    Many people are curious about how I got into the field of data science with a Psychology degree. They’re curious how I made the first step...Then, they find out I was also VP of Data Science at Lazada and they’re interested in how that happened...In a world where almost every data science leader has a technical Ph.D. (or two), I’m unusual. A statistical anomaly. I guess this is why people are curious about my story...
  • Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study
    Computed tomography (CT) is the preferred imaging method for diagnosing 2019 novel coronavirus (COVID19) pneumonia...Our research aimed to construct a system based on deep learning for detecting COVID-19 pneumonia on high resolution CT...With the assistance of the model, the reading time of radiologists was greatly decreased by 65%...The deep learning model showed a comparable performance with expert radiologist, and greatly improve the efficiency of radiologists in clinical practice. It holds great potential to relieve the pressure of frontline radiologists, improve early diagnosis, isolation and treatment, and thus contribute to the control of the epidemic....asdf...
  • Cultivating Algos - How we grow data science at Stich Fix
    In this interactive visualization, we provide an articulation of our environment at Stitch Fix– the organizational structure, roles, and processes that define our unique way of working –and how it is different from the way data science operates at other companies. While some of this was intentionally designed,other parts evolved over time or became clear only in hindsight. Here’s a closer look at how we run our data science....
  • A critique of pure learning and what artificial neural networks can learn from animal brains
    Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms—supervised or unsupervised—but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a “genomic bottleneck”. The genomic bottleneck suggests a path toward ANNs capable of rapid learning....
  • How We Improved Data Discovery for Data Scientists at Spotify
    At Spotify, we believe strongly in data-informed decision making...To enable Spotifiers to make faster, smarter decisions, we’ve developed a suite of internal products to accelerate the production and consumption of insights. One of these products is Lexikon, a library of data and insights that help employees find and understand the data and knowledge generated by members of our insights community...In this blog post, we want to share the story of how we iterated on Lexikon to better support data discovery.
  • The Illustrated Self-Supervised Learning
    Yann Lecun...introduced the “cake analogy” to illustrate the importance of self-supervised learning...we have seen the impact of self-supervised learning in the Natural Language Processing field where recent developments (Word2Vec, Glove, ELMO, BERT) have embraced self-supervision and achieved state of the art results...Curious to know how self-supervised learning has been applied in the computer vision field, I read up on existing literature on self-supervised learning applied to computer vision through a recent survey paper by Jing et. al...This post is my attempt to provide an intuitive visual summary of the patterns of problem formulation in self-supervised learning...



Guide



Guide to Open-Source Tools and Libraries

The world of open-source tools and libraries is vast and difficult to navigate. With hundreds of thousands of packages available for unique purposes, how do you know where to look for what you need? Our guide to open-source tools will help any newcomer get started, and if you’re a veteran data scientist, you may discover some helpful tools you didn’t know about. Download the Guide to Open Source to start exploring
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



Jobs

  • Data Scientist - Two Sigma- NYC

    Two Sigma is looking for Data Scientists from a variety of backgrounds to help propel and enhance its data-driven investment initiatives. As a Two Sigma Data Scientist, you will explore a breadth of challenges: identifying timely and unique data sets, diving deep into a diverse set of data domains, visualizing and exploring underlying data drivers, and developing data set features and forecasts...

        Want to post a job here? Email us for details >> team@datascienceweekly.org


Training & Resources

  • The fastai book - draft
    These draft notebooks cover an introduction to deep learning, fastai, and PyTorch...These notebooks will be used for a course we're teaching in San Francisco from March 2020, and will be available as a MOOC from around July 2020...
  • Pattern Recognition and Machine Learning algorithms implemented in Python
    Python codes implementing algorithms described in Bishop's book "Pattern Recognition and Machine Learning"...[PRML Book description from Amazon - "The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning..."]...
  • NLP Paper Summaries
    This [GitHub] repository contains a list of NLP paper summaries intended to make NLP techniques and topics more approachable and accessible. Work in progress....


Books


  • Data Science in Production: Building Scalable Model Pipelines with Python
    This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....
    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
    .

     

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.