Data Science Weekly Newsletter - Issue 419

Issue #387

Apr 22 2021

Editor Picks
  • The Goodreads “Classics”: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism
    What is a classic? — remains surprisingly powerful in the twenty-first century because the classics are alive and thriving on the internet, in the marketplace, and among readers, even if not in universities or among academics...With more than 120 million members, Goodreads is the largest social networking site for readers on the internet and a subsidiary of Amazon, one of the wealthiest and most influential corporations in the world. On Goodreads, internet users can categorize any book as a "classic" and publish their own responses to it — gushing praise, mean takedowns, critical analyses, snarky parodies, personal narratives, and more...
  • OpenAI-powered Linux shell uses AI to Do What You Mean
    It's like Alexa/Siri/Cortana for your terminal!...This is a basic Python shell (really, it's a fancy wrapper over the system shell) that takes a task and asks OpenAI for what Linux bash command to run based on your description. For safety reasons, you can look at the command and cancel before actually running it...To be clear, I'm not trying to convince you that having an AI model figure out what Linux command to run based on your written description is a good idea, but the commands that it generates are sometimes surprisingly good...

A Message from this week's Sponsor:


Get exclusive content to fuel your breakthroughs at The Edge –
powered by Z by HP & Nvidia

Meet the demands of your workflows with articles, case studies, videos, podcasts, webinars and more, at the new Z by HP data science center. Hit the ground running with the latest research and industry trends, and–for an extra dose of motivation–check out our Ambassador section. There you’ll find experiences, favorite tools and their data science goals for the future that’ll help turn your data into transformative business results.

Check it out.


Data Science Articles & Videos

  • Paper Notes by Vitaly Kurin
    Hi! I'm a PhD student at the University of Oxford working on Multitask Reinforcement Learning in Graph-Based environments with Shimon Whiteson. I read and take notes of the papers I read. I hope they might be helpful to you (please, share, if they are). I add a paper every day Monday-Friday...
  • Multi-Task Robotic Reinforcement Learning at Scale
    For general-purpose robots to be most useful, they would need to be able to perform a range of tasks, such as cleaning, maintenance and delivery. But training even a single task (e.g., grasping) using offline reinforcement learning (RL), a trial and error learning method where the agent uses training previously collected data, can take thousands of robot-hours, in addition to the significant engineering needed to enable autonomous operation of a large-scale robotic system...Today we present two new advances for robotic RL at scale, MT-Opt, a new multi-task RL system for automated data collection and multi-task RL training, and Actionable Models, which leverages the acquired data for goal-conditioned RL...
  • Carbon Emissions and Large Neural Network Training
    The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters...
  • Semantic Frame Forecast
    This paper introduces semantic frame forecast, a task that predicts the semantic frames that will occur in the next 10, 100, or even 1,000 sentences in a running story. Prior work focused on predicting the immediate future of a story, such as one to a few sentences ahead. However, when novelists write long stories, generating a few sentences is not enough to help them gain high-level insight to develop the follow-up story. In this paper, we formulate a long story as a sequence of "story blocks," where each block contains a fixed number of sentences (e.g., 10, 100, or 200). This formulation allows us to predict the follow-up story arc beyond the scope of a few sentences...
  • Winning at competitive ML in 2021: analysis of 100+ ML Contest winners
    You might be hoping to win a machine learning competition in 2021. If so, this post will tell you just what you need to know!...We collaborated with Eniola Olaleye, ranked #5 on Zindi, to look at what winners do...Together we analysed winners using the ML Contests database of over 100 competitions that took place in 2020 across Kaggle, DrivenData, AICrowd, Zindi, and 20 other platforms. Wherever the information was available, we categorised winners to figure out what made them win...
  • The Starry Cat
    What if Van Gogh had painted the Startled Cat? What would it look like? Let's use style transfer and try to build a possible replica...
  • Apartment Hunting with Python
    The time has come for a big decision: renew my lease, or make the move to a new apartment?...The easiest way to resolve this uncertainty is to systematically investigate apartments listings on a website like Zillow... if you came here looking for a technical article about Python, I promise you’ve found it at last. I’m not going to explain every aspect of the scraper I built; it’s a lot of code and makes use of some intermediate-level features of Python itself, like type hints and packaging. But I’ll walk through some of the interesting bits and try to keep the discussion at a level such that newcomers to web scraping should leave with a better understanding of how it works and how to get started...



Quick Question For You: Do you want a Data Science job?

After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.

The course is broken down into three guides:
  1. Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)

  2. Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate

  3. Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Click here to learn more ...

*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



  • Tecton AI - SF / NYC

    Tecton is building an enterprise Feature Store that is transforming the way companies solve real-world problems with machine learning at scale. Our founding team created Uber's Michelangelo ML Platform, which has become the blueprint for modern ML platforms in large organizations. We recently received Series B funding from Sequoia Capital and Andreessen Horowitz, have paying enterprise customers, and have growing engineering teams in SF and NYC. The team has years of experience building and operating business-critical machine learning systems at scale at places like Uber, Google, Facebook, Airbnb, Twitter, Quora, and AdRoll...

    Software Engineer, Machine Learning

    Software Engineer, Data Infrastructure

    Software Engineer, Frontend

        Want to post a job here? Email us for details >>


Training & Resources

  • An Introduction to Social Network Analysis with NetworkX: Two Factions of a Karate Club
    Networks (a.k.a graphs) are one of the most interesting areas of data science and have been subject to an explosion of interest in recent years. The ability to model the relationship between data points is powerful. This article introduces some basic concepts in network science and gives in python using networkx, the go-to python package for network related analysis...
  • A 13-tweet introduction to machine learning tensors
    A 13-tweet introduction to one of the most basic structures used in machine learning: a tensor...Understanding how tensors work is fundamental. They aren't complex but working with them may get confusing if you don't understand all the pieces...Let's solve that today...




  • Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

    Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.