Data Science Weekly Newsletter - Issue 409

Issue #377

Feb 11 2021

Editor Picks
  • Ditching Excel for Python – Lessons Learned from a Legacy Industry
    Admittedly, the following observations do come from a very niche industry. But I believe that the broader trends discussed here will also apply to other legacy companies and sectors...In this post, I explore the following topics: Section 1 – An introduction to reinsurance, Section 2 – The need for technological disruption in the reinsurance industry, Sections 3 to 5 – How Python is replacing Excel, Section 6 – Arguments against adopting new technologies, Section 7 – The evolution of the reinsurance industry...If you are curious about the transition from Excel to Python but aren’t interested in reinsurance, skip to section 3...Without further ado, let’s get into it...
  • Papers with Code Newsletter #4
    Welcome to the 4th issue of the Papers with Code newsletter. In this edition, we cover: the latest progress in self-attention vision models 🏞, an efficient CapsNet architecture based on self-attention routing 👁, a unified framework for vision-and-language learning , Dataset feature release! We highlight some new datasets added to Papers with Code 🗂, ...and much more!...

A Message from this week's Sponsor:


Don’t Miss Your Chance to Jumpstart Your Data Career

TDI Spring 2021 Fellowship Applications Close TOMORROW

Complete your application today and you could be on your way to becoming a leading data scientist. It’s as easy as 1, 2, 3:
  1. Attend part-time or full-time and master the in-demand skills employers are craving
  2. Work with our career services team and land a job with one of our hiring partners
  3. Take over the data world

Apply today and don’t pay a dime until you get a job. Applications close February 12.
Apply Now.


Data Science Articles & Videos

  • How Can We Fix the Data Science Talent Shortage?
    Data science has always had a high barrier to entry, with 42% of data science roles requiring a master’s degree or higher, according to a recent Burning Glass report. As automation takes over the menial parts of a data scientist’s job (80% of a data scientist’s time is spent cleaning data), the bar for landing a data science job is both higher and lower, said Lane—a trend somewhat analogous to America’s shrinking middle class...Companies seek to fill lower-paying data analyst roles that require fewer years of experience, while also hiring highly skilled data scientists with domain expertise who can solve a specific business problem—like determining risk factors for certain diseases or increasing voter turnout...
  • Exploring hyperparameter meta-loss landscapes with Jax
    In this post, we’ll walk through an example showing how extraordinarily complex meta-loss landscapes can emerge from a relatively simple setting and as a result gradients of these loss landscapes become a lot less useful...We’ll do this in a relatively new machine learning library: Jax. Jax is amazing. I’ve been using it more and more for my research and have noticed it gaining traction at Google. Despite this, very few externally have even heard about it, let alone use it. This post I also wanted to show some features that make doing this type of exploration easy...
  • Machine Learning, Kolmogorov Complexity, and Squishy Bunnies
    Is there any reason to apply Machine Learning to problems we already have working solutions for? Tasks such as physics simulation, where the rules and equations governing the task are already well known and explored? Well it turns out in many cases there are good reasons to do this - reasons related to many interesting concepts in computer science such as the trade-off between memorization and computation, and a concept called Kolmogorov complexity...
  • The Doctor Will Sniff You Now
    That’s how Alexei Koulakov, a researcher at Cold Spring Harbor Laboratory, who studies how the human olfactory system works, envisions one possible future of our healthcare. A physicist turned neuroscientist, Koulakov is working to understand how humans perceive odors and to classify millions of volatile molecules by their “smellable” properties. He plans to catalogue the existing smells into a comprehensive artificial intelligence network. Once built, Deep Nose will be able to identify the odors of a person or any other olfactory bouquet of interest—for medical or other reasons. “It will be a chip that can diagnose or identify you,” Koulakov says. Scent uniquely identifies a person or merchandise, so Deep Nose can also help at the border patrol, sniffing travelers, cargo, or explosives...
  • Machine Learning Library built by 9th Grader : SeaLion
    Recently during the first semester of my 9th grade I've worked on a library called SeaLion (from scratch.) It is for machine learning algorithms, from the basic linear regression up to neural networks...SeaLion is a machine learning library that's extremely comprehensive. Rather than assuming you know the theory behind the algorithms it guides you every step of the way. It mostly deals with regression, unsupervised clustering, bayesian models, dimensionality reduction, neural networks, etc. There are code examples that explain each function in SeaLion, how to use it, and most importantly why and when to use it...
  • A better way to build ML — why you should be using Active Learning
    Data labelling is often the biggest bottleneck in machine learning...Active learning lets you train machine learning models with much less labelled data...In active learning, you first provide a small number of labelled examples. The model is trained on this "seed" dataset. Then, the model "asks questions" by selecting the unlabeled data points it is unsure about, so the human can "answer" the questions by providing labels for those points. The model updates again and the process is repeated until the performance is good enough. By having the human iteratively teach the model, it's possible to make a better model, in less time, with much less labelled data...
  • Gradient Descent Models Are Kernel Machines (Deep Learning)
    This paper shows that models which result from gradient descent training (e.g., deep neural nets) can be expressed as a weighted sum of similarity functions (kernels) which measure the similarity of a given instance to the examples used in training. The kernels are defined by the inner product of model gradients in the parameter space, integrated over the descent (learning) path...This result makes it very clear that without regularity imposed by the ground truth mechanism which generates the actual data (e.g., some natural process), a neural net is unlikely to perform well on an example which deviates strongly (as defined by the kernel) from all training examples...
  • Is It Luck Or Skill: Establishing Role Of Skill In Mutual Fund Management And Fantasy Sports
    The emergence of Online Fantasy Sports Platforms (OFSP) has presented a challenge for the regulatory bodies across the globe: do they represent game of skill or game of chance? A game of skill is a game where the outcome is determined mainly by the predominance of mental or physical skill, rather than chance. Gambling or a game of chance is where outcomes are entirely driven by luck and skill has no role to play. In this work, we present a novel data-driven test that helps address this question. In particular, the failure of the test leads to the conclusion that the outcomes are based on the predominance of skill, and not based on luck...
  • Tools for building robust, state-of-the-art machine learning models
    In this episode of the Data Exchange [Podcast] I speak with Michael Mahoney, a researcher at UC Berkeley’s RISELab, ICSI, and Department of Statistics. Mike and his collaborators were recently awarded one of the best papers awards at NeurIPS 2020, one of leading research conferences in machine learning...



Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Data Scientist - Apple Pay Analytics - NYC

    You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...

        Want to post a job here? Email us for details >>


Training & Resources

  • A Complete Machine Learning Project From Scratch: Setting Up
    In this first of a series of posts, I will be describing how to build a machine learning-based fake news detector from scratch. That means I will literally construct a system that learns how to discern reality from lies (reasonably well), using nothing but raw data. And our project will take us all the way from initial setup to deployed solution...
  • Deep Learning Theory [ Reddit Discussion ]
    wonder, what are some recent advancements in the theory of deep learning. I am curious about some of the more general and holistic theories. E.g. I know about the lottery ticket hypothesis (overparametrized networks have a lot of combinations, there is a high probability that at least one will be near-optimal) or the double descent hypothesis (overparametrized networks generalize better). What are some other interesting hypotheses and theories related to deep learning that try to explain how does it all work in the end? I am especially interested in the most recent stuff, e.g. was there anything that gained traction in the community at NeurIPS?...
  • PATTERNS, PREDICTIONS, AND ACTIONS A story about machine learning
    [ Free Book ]

    This is a book for all students in the sprawling field of machine learning. The material we cover supports a one semester graduate introduction to machine learning. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices. In its conception, our book is both an old take on something new and a new take on something old. Looking at it one way, we return to the roots with our emphasis on pattern classification. We believe that the practice of machine learning today is surprisingly similar to pattern classification of the 1960s, with a few notable innovations from more recent decades...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.