Data Science Weekly Newsletter

Issue

410

September 30, 2021

‍

Editor's Picks

‍

Top Places to Work for Data Scientists
What would constitute a good place to work for a data scientist? How do you think about it at different stages of your career?...These are important questions to ponder as data science (DS) practitioners witness the field going through a phase of high growth of 37% per year...Depending on your career stage, different types of companies can help you evolve a career in DS. Let’s look at how to quantify this assessment and tailor the opportunities to data scientists of various career stages...

Nowcasting the Next Hour of Rain
At any moment in the UK, according to one study, one third of the country has talked about the weather in the past hour, reflecting the importance of weather in daily life... Our latest research and state-of-the-art model advances the science of Precipitation Nowcasting, which is the prediction of rain (and other precipitation phenomena) within the next 1-2 hours. In a paper written in collaboration with the Met Office and published in Nature, we directly tackle this important grand challenge in weather prediction...

Statistics as algorithmic summarization
Statistics gives us reasonable procedures to estimate properties of a general population by examining only a few individuals from the population. In this regard, statistics is algorithmic: it provides randomized algorithms for extrapolation. In this blog, I’ll review some elementary stats (with as little mathematical formalism as possible), and try to crystalize why this algorithmic view is illuminating...

‍

A Message From This Week's Sponsor

‍

Join Impact 2021 on November 3, 2021: The First-Ever Data Observability Summit. Join Today's Leading Data Pioneers. Hear from data leaders pioneering the technologies & processes shaping data engineering. Featuring First Chief Data Scientist of the U.S., founder of the Data Mesh and many more! Get Your Free Ticket ...

‍

Data Science Articles & Videos

‍

Chat with *William Shatner* about the future of AI
There’s no doubt that technology has enriched our lives. Our phones can easily answer questions for us. Our cars can drive themselves. Robots can perform complicated surgeries on our fragile, fleshy bodies. Artificial intelligence has come a long way in 50 years, but how far will it go in another 50?...

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning
In this paper, a comprehensive overview and survey is presented for Activation Functions (AFs) in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data....

Deep Learning over the Internet: Training Language Models Collaboratively
In this blog post, we describe DeDLOC — a new method for collaborative distributed training that can adapt itself to the network and hardware constraints of participants. We show that it can be successfully applied in real-world scenarios by pretraining sahajBERT, a model for the Bengali language, with 40 volunteers. On downstream tasks in Bengali, this model achieves nearly state-of-the-art quality with results comparable to much larger models that used hundreds of high-tier accelerators...

Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment
In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for accepted papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. ...

Interview with Nihit Desai - Staff Engineer at Facebook
Hi! I’m Nihit. I am a Staff Engineer at Facebook, where I currently work on business integrity...Along with a friend of mine, I also write a biweekly newsletter focused on challenges and opportunities associated with real-world applications of ML...

Deep Learning's Diminishing Returns: The cost of improvement is becoming unsustainable
While deep learning's rise may have been meteoric, its future may be bumpy. Like Frank Rosenblatt before them, today's deep-learning researchers are nearing the frontier of what their tools can achieve. To understand why this will reshape machine learning, you must first understand why deep learning has been so successful and what it costs to keep it that way...

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research
This paper presents a systematic literature review of research at the intersection of SE & DL. The review canvases work appearing in the most prominent SE and DL conferences and journals and spans 128 papers across 23 unique SE tasks. We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques (ML) to a given problem domain, discussing several aspects of the surveyed work at a granular level. The end result of our analysis is a research roadmap that both delineates the foundations of DL techniques applied to SE research, and highlights likely areas of fertile exploration for the future...

A Plug-and-Play Method for Controlled Text Generation
In this work, we present a plug-and-play decoding method for controlled language generation that is so simple and intuitive, it can be described in a single sentence: given a topic or keyword, we add a shift to the probability distribution over our vocabulary towards semantically similar words. We show how annealing this distribution can be used to impose hard constraints on language generation, something no other plug-and-play method is currently able to do with SOTA language generators...

Machine Learning in Astronomy and Physics
This week’s guest is Dr. Viviana Acquaviva, Professor in the Physics Department at the CUNY NYC College of Technology and at the CUNY Graduate Center...Viviana is currently writing a book for Princeton University Press entitled “Machine Learning techniques for Physics and Astronomy”...This conversation is focused on applications of machine learning and data science to physics, their impact on her research, and how the rise of ML has impacted her teaching and research...

Understanding AWK
I would hear people mention Awk and how often they used it, and I was pretty certain I was missing out on some minor superpower...Like this little off hand comment by Bryan Cantrill: "I write three or four Awk programs a day. And these are one-liners. These super quick programs"...It turns out Awk is pretty simple. It has only a couple of conventions and only a small amount of syntax. As a result, it’s straightforward to learn, and once you understand it, it will come in handy more often than you’d think..l.So in this article, I will teach myself, and you, the basics of Awk...

‍

Training

‍

How to streamline data science and feature creation workflows in Snowflake Thurs, Oct 14th, 2:00 PM ET (11:00 AM PT) Learn how AtScale + Snowflake eliminates data marts like SSAS for native dimensional analysis in the Data Cloud. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Senior Data Scientist - TikTok - LA

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy by offering a home for creative expression and an experience that is genuine, joyful, and positive.
- Generate useful features from large amount of data
- Apply supervised and unsupervised machine learning techniques, such as linear and logistic regression, decision trees, and k-means clustering
- Develop segmentation models, classification models, propensity models, LTV models, experimental design, optimization models
- Perform statistical analysis such as KPI deep dives, performance marketing efficiency, behavioral clustering, and user journey analytics
- Curate audiences and inform engagement tactics to enable differentiated, relevant marketing touches across channels (social, email, in app, push)
- Synthesize analytics and statistical approaches into easy-to-consume storylines, both visually and verbally, and provide indicated actions for executive audiences
- Capture business requirements for data and analytic solutions and collaborate XFN to ensure business requirements align with business needs
- Analyze creatives and surface insights that will help drive engagement and retention
- Support day-to-day collaboration with performance marketing to communicate insights and recommend data informed strategies

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Decision Trees: A Guide with Examples
A tutorial covering Decision Trees, complete with code and interactive visualizations...

Word2vec with PyTorch: Implementing the Original Paper
Covering all the implementation details, skipping high-level overview. Code attached....

Self Attention Tutorial Video
This is the first video on attention mechanisms. We'll start with self attention and try to explain why it's just a re-weighting tactic...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍