Data Science Weekly Newsletter - Issue 361

Issue #329

Mar 12 2020

Editor Picks
 
  • Google’s New Shoe Insole Analyzes Your Soccer Moves
    How powerful is your kick? Did you pass effectively? The latest Jacquard wearable uses machine learning to scrutinize a player’s footwork in real time...It's just the latest incursion into ambient computing from Google's Advanced Technology and Projects (ATAP) team, the folks behind Jacquard. I spoke to the team about how the Tag's new mechanics work and what the world will look like once the computers around us can sense our presence...
  • 10 Ways that Human-in-the-Loop Machine Learning is Used Today
    This article...highlights...10 different ways that people are using Human-in-the-Loop Machine Learning today, each focusing on a different advantage that it brings. Ten advantages of using Human-in-the-Loop systems over purely automated systems are: avoiding bias, creating employment, augmenting rare data, maintaining human-level precision, incorporating subject matter experts, ensuring consistency & accuracy, making work easier, improving efficiency, providing transparency & accountability, and increasing safety...
  • The evolution of deep learning and PyTorch
    Soumith Chintala, co-author and maintainer of PyTorch, shares the story of how the framework was created and evolved. He also shares the future directions of this community-driven framework...
 
 

A Message from this week's Sponsor:

 

 
Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.
 

 

Data Science Articles & Videos

 
  • Podcast: Potential and Pitfalls of AI with Dr. Eric Horvitz
    Dr. Eric Horvitz is a technical fellow at Microsoft, and is director of Microsoft Research Labs, including research centers in Redmond, Washington, Cambridge, Massachusetts, New York, New York, Montreal, Canada, Cambridge, UK, and Bengaluru, India. He is one of the world’s leaders in AI, and a thought leader in the use of AI in the complexity of the real world...On this podcast, we talk to Dr. Horvitz about a wide range of topics, including his thought leadership in AI, his study of AI and its influence on society, the potential and pitfalls of AI, and how useful AI can be in a country like India...
  • Knowledge Graphs
    In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs...
  • AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
    Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces. Our goal is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging...
  • Deep Reinforcement Learning For Trading Applications
    Reinforcement learning is a machine learning paradigm that can learn behavior to achieve maximum reward in complex dynamic environments, as simple as Tic-Tac-Toe, or as complex as Go, and options trading. In this post, we will try to explain what reinforcement learning is, share code to apply it, and references to learn more about it. First, we’ll learn a simple algorithm to play Tic-Tac-Toe, then learn to trade a non-random price series. Finally, we’ll talk about how reinforcement learning can master complex financial concepts like option pricing and optimal diversification...
  • Zoom In: An Introduction to Circuits
    Just as the early microscope hinted at a new world of cells and microorganisms, visualizations of artificial neural networks have revealed tantalizing hints and glimpses of a rich inner world within our models...This has led us to wonder: Is it possible that deep learning is at a similar, albeit more modest, transition point?...Most work on interpretability aims to give simple explanations of an entire neural network’s behavior. But what if we instead take an approach inspired by neuroscience or cellular biology — an approach of zooming in? What if we treated individual neurons, even individual weights, as being worthy of serious investigation? What if we were willing spend thousands of hours tracing through every neuron and its connections? What kind of picture of neural networks would emerge?...
  • We made SQL visual - why and how
    I’ve spent almost a decade now obsessed with the problem of truly enabling anyone—not just data teams—to explore and understand their business data. I still obsess over this as passionately as ever. It’s a much harder problem than I ever realized, but it’s just as important...Today, we’re excited to announce that we have—through thousands of design iterations, dozens of functioning prototypes, several hundred user tests, and countless hours of development—created an interface that truly enables the business user to work with data. We call the interface Visual SQL...
  • SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
    Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training...Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF...
  • Learning to Simulate Complex Physics with Graph Networks
    Here we present a general framework for learning simulation, and provide a single model implementation that yields state-of-the-art performance across a variety of challenging physical domains, involving fluids, rigid solids, and deformable materials interacting with one another. Our framework---which we term "Graph Network-based Simulators" (GNS)---represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time... Our GNS framework is the most accurate general-purpose learned physics simulator to date, and holds promise for solving a wide range of complex forward and inverse problems...
 
 

Conference*

 

 
The Premier Machine Learning Conference

5 days, 8 tracks, 160 speakers and over 150 exciting sessions

Join Machine Learning Week 2020 , May 31 – June 4, Las Vegas! It brings together five co-located events: PAW Business, PAW Financial, PAW Industry 4.0, PAW Healthcare, Deep Learning World. This event is where to meet the who’s who and keep up on the latest techniques, making it the leading machine learning event that excites and unites. You can expect top-class experts from world-famous companies such as Google, Microsoft, Lyft, Verizon, Visa and LinkedIn!

Secure your ticket now!


*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
 

 

Jobs

 
  • Head of Data Science - Tessian - London, United Kingdom

    Our mission is to secure the Human Layer. This involves deploying near real-time machine learning models at massive scale to some of the world’s largest organisations to keep their most sensitive data private and secure. To do this, we're looking for an inspiring Head of Data Science ready to lead and grow our Data Science team, who is excited about the opportunities and challenges that come with building and deploying real-time production models.

    Find out more about life as a Tessian Engineer...

        Want to post a job here? Email us for details >> team@datascienceweekly.org
 

 

Training & Resources

 
  • Deploying Apache Airflow to Google Kubernetes Engine
    Recently, I wanted to set-up Airflow to orchestrate ETL jobs for a side-project of mine, and have been itching to wrap my head around basic Kubernetes, so I figured what better way to get my feet wet than marry the two together!...The following tutorial is my quick guide to standing up an Apache Airflow deployment on a GKE cluster, after having struggled around some of the minor details around adapting a Bitnami template...
  • 5 lesser-known pandas tricks
    pandas needs no introduction as it became the de facto tool for data analysis in Python. As a Data Scientist, I use pandas daily and I am always amazed by how many functionalities it has. In this post, I am going to show you 5 pandas tricks that I learned recently and using them helps me to be more productive...
  • From PyTorch to JAX: towards neural net frameworks that purify stateful code
    JAX, Google's now-over-a-year-old Python library for machine learning and other numerical computing describes itself as “Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more”—and while that definition is certainly fitting, it is a bit intimidating. I would describe JAX as numpy, but on GPU, and then move on to the one feature we will be most concerned with in this post: its autodifferentiation capability...

 

Books

 

  • Data Science in Production: Building Scalable Model Pipelines with Python

    This book provides a hands-on approach to scaling up Python code to work in distributed environments in order to build robust pipelines. Readers will learn how to set up machine learning models as web endpoints, serverless functions, and streaming pipelines using multiple cloud environments. It is intended for analytics practitioners with hands-on experience with Python libraries such as Pandas and scikit-learn, and will focus on scaling up prototype models to production....

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
     


    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.