Data Science Weekly Newsletter - Issue 405

Issue #373

Jan 14 2021

Editor Picks
  • What machine learning can do for developmental biology
    Developmental biology has grown into a data intensive science with the development of high-throughput imaging and multi-omics approaches. Machine learning is a versatile set of techniques that can help make sense of these large datasets with minimal human intervention, through tasks such as image segmentation, super-resolution microscopy and cell clustering. In this Spotlight, I introduce the key concepts, advantages and limitations of machine learning, and discuss how these methods are being applied to problems in developmental biology...
  • Deep learning-enabled medical computer vision
    Here we survey recent progress in the development of modern computer vision techniques—powered by deep learning—for medical applications, focusing on medical imaging, medical video, and clinical deployment...then...we discuss several example medical imaging applications that stand to benefit—including cardiology, pathology, dermatology, ophthalmology–and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies...
  • The White House Launches the National Artificial Intelligence Initiative Office
    Today the White House Office of Science and Technology Policy (OSTP) established the National Artificial Intelligence Initiative Office...After recognizing the strategic importance of AI to the Nation’s future economy and security, the...Administration issued the first ever national AI strategy, committed to doubling AI research investment, established the first-ever national AI research institutes, released the world’s first AI regulatory guidance, forged new international AI alliances, and established guidance for Federal use of AI...

A Message from this week's Sponsor:


Do You Have What it Takes to Become a Leading Data Scientist?

You Could Land Your Dream Data Job with TDI

Prove you have the skills needed to take on the future of data with TDI.

Our expert instructors will help you master the in-demand skills and programs you need to succeed in data science.

Work with live code, real-world data sets, and the best darn instructors this side of a spreadsheet to create a capstone project that will wow potential employers. 

Apply early to increase your chances of earning a scholarship.

Applications are due February 12. Apply Now.


Data Science Articles & Videos

  • We Don't Need Data Scientists, We Need Data Engineers
    There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills...
  • How To Become a Data Engineer
    The demand for data engineers is growing rapidly. According to the demand has increased by 45% in 2019. The median salary for Data Engineers in SF Bay Area is around $160k. So the question is: how to become a data engineer?...
  • How we rebuilt the Walmart Autocomplete Backend
    With this re-architecture, we were able to scale the application without compromising on latency or performance; eliminating reverse-proxy, lead to fewer network hops and services...We were also able to make the data rendering service separate from the data generation/insertion service; making it easier to introduce new features without memory implications and complete separation of responsibilities...
  • Convolutional Neural Nets: Foundations, Computations, and New Applications
    We review mathematical foundations of convolutional neural nets (CNNs) with the goals of: i) highlighting connections with techniques from statistics, signal processing, linear algebra, differential equations, and optimization, ii) demystifying underlying computations, and iii) identifying new types of applications...we show how to apply CNNs to new types of applications such as optimal control, flow cytometry, multivariate process monitoring, and molecular simulations...
  • Meta Pseudo Labels
    We present Meta Pseudo Labels, a semi-supervised learning method that achieves a new state-of-the-art top-1 accuracy of 90.2% on ImageNet, which is 1.6% better than the existing state-of-the-art. Like Pseudo Labels, Meta Pseudo Labels has a teacher network to generate pseudo labels on unlabeled data to teach a student network. However, unlike Pseudo Labels where the teacher is fixed, the teacher in Meta Pseudo Labels is constantly adapted by the feedback of the student's performance on the labeled dataset. As a result, the teacher generates better pseudo labels to teach the student...
  • The EfficientDet Architecture in PyTorch
    This blog post is a direct continuation of my previous blog post explaining EfficientDets. In my previous post, we looked and understood what’s inside an EfficientDet and also read about the various components such as BiFPN and Compound Scaling that make an EfficientDet network so powerful...Today, our focus will be to build on top of that knowledge and showcase how to implement the network using PyTorch step-by-step...
  • The neural network of the Stockfish chess engine
    An important recent change to Stockfish was to introduce a neural network to evaluate the positions in the search tree, instead of just relying on hardcoded heuristics. It’s still much simpler than Leela Chess’s neural network, and only slows down Stockfish to exploring 50 million positions per second...The real cleverness of Stockfish’s neural network is that it’s an efficiently-updatable neural network (NNUE). Specifically, it’s a simple feedforward network with...
  • Ensemble Learning: Bagging & Boosting
    How to combine weak learners to build a stronger learner to reduce bias and variance in your ML model...The bias and variance tradeoff is one of the key concerns when working with machine learning algorithms. Fortunately there are some Ensemble Learning based techniques that machine learning practitioners can take advantage of in order to tackle the bias and variance tradeoff, these techniques are bagging and boosting. So, in this blog we are going to explain how bagging and boosting works, what theirs components are and how you can implement them in your ML problem...
  • Real World Applications of Markov Decision Process
    Markov Decision Process (MDP) is a foundational element of reinforcement learning (RL). MDP allows formalization of sequential decision making where actions from a state not just influences the immediate reward but also the subsequent state...This article provides some real world examples of finite MDP. We also show the corresponding transition graphs which effectively summarizes the MDP dynamics. Such examples can serve as good motivation to study and develop skills to formulate problems as MDP...



Become a Machine Learning Engineer, program backed by Andew Ng’s AI Fund

FourthBrain offers a hybrid online learning program, with live instruction, that trains applicants to become a Machine Learning Engineer.

Our project-based curriculum and comprehensive career services are designed to enable graduates to land a job as a Machine Learning Engineer.

The program is backed by Andrew Ng’s AI Fund and classes start in March.

Please join our Info Session and Q&A with current students on Tuesday, January 19th at 4pm PST / 7pm EST.

Register Now for FREE.

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Data Scientist - Apple Pay Analytics - NYC

    You will play a key role improving the Apple Pay product experience. As a member of the analytics team you will be supporting a product function. You will partner with business owners, understand goals, craft KPIs and measure ongoing performance. You will initially engage with the product and engineering teams in ensuring that we have the appropriate instrumentation in place to deliver on these metrics. You will subsequently use advanced statistical, ML and analytical techniques to analyze product performance and identify key insights that inform product improvements and business strategy. The role requires a high degree of independence, ownership and collaboration working cross functionally across all levels of a highly matrixed organization...

        Want to post a job here? Email us for details >>


Training & Resources

  • Rules of Machine Learning: Best Practices for ML Engineering
    This document is intended to help those with a basic knowledge of machine learning get the benefit of Google's best practices in machine learning. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine-learned model, then you have the necessary background to read this document...
  • ForML
    ForML is a framework for researching, implementing and operating data science projects...Use ForML to formally describe a data science problem as a composition of high-level operators. ForML expands your project into a task dependency graph specific to a given life-cycle phase and executes it using any of its supported runners...Solutions built on ForML are naturally easy to reuse, extend, reproduce, or share and collaborate on...
  • ML Visuals
    ML Visuals is a new collaborative effort to help the machine learning community in improving science communication by providing free professional, compelling and adequate visuals and figures. Currently, we have over 100 figures (all open community contributions). You are free to use the visuals in your machine learning presentations or blog posts. You don’t need to ask permission to use any of the visuals but it will be nice if you can provide credit to the designer/author (author information found in the slide notes). Check out the versions of the visuals below...



  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.