Receive the Data Science Weekly Newsletter every Thursday
Easy to unsubscribe at any time. Your e-mail address is safe.
Data Science Weekly Newsletter
September 23, 2021
Trees have long served as models of intellectual inquiry and as sites of religious and civic deliberation. Now, as we learn more about plant intelligence, they are inspiring deeper forms of ecological investigation...
Robots Must Be Ephemeralized
In this blog post, I outline why it is tempting for roboticists to iterate directly on real life, and how the difficulty of evaluating general-purpose robots will eventually force us to increasingly rely on offline evaluation techniques such as simulation...
Interview of Erik Bernhardsson - Former CTO @ Better.com
Up until quite recently, I was the CTO of Better.com for six years, taking the eng team from 1 person to 300, and doing all sort of “CTO stuff” – mostly recruiting, but also lots of technical stuff, occasionally writing code. Before Better, I was at Spotify for 6.5 years, initially running the (very nascent) data/BI team, then later managing the music recommendation team. I built the first version of the music rec system at Spotify...
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models...This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain...
Image Encoders: BigTransfer vs CLIP
I've been mucking around with building a meme search engine...To do so I’m testing a couple of different image encoders: a) Big Transfer encoder from Google and b)CLIP image encoder...In essence, these use a neural network to turn an image file into vector embeddings that can be compared for a similarity (“nearest neighbor”) search. Which one is best (at least for memes)? Let’s put them to the test. We’ll index 10,000 memes and compare...
An End-to-End Guide to Photogrammetry with Mobile Devices
Constructing 3D models with photogrammetry allows journalists to share objects and environments with their audiences in a comprehensive, immersive way that can’t be achieved with photography or videography alone...Over the past several years, the R&D team at The Times has worked to simplify the production of photogrammetry-driven stories...This resource compiles what we've learned into a series of guides, demos and open-source software tools that we hope will aid anyone seeking to capture, process and deliver high-quality 3D models...
Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD)...
Machine Learning Hyperparameter Optimization with Argo
How the hyperparameters of our machine learning models are tuned at Canva...Canva uses a variety of machine learning (ML) models, such as recommender systems, information retrieval, attribution models, and natural language processing for various applications. A typical problem is the amount of time and engineering effort in choosing a set of optimal hyperparameters and configurations used to optimize a learning algorithm’s performance...
Is BI dead? On dismantling data's ship of Theseus
Over the last decade, many of the early BI functions have been stripped out of BI and relaunched as independent products...The splinter of the modern data stack that we call BI is diminished, but mostly unchanged. It’s as though we took our definition of BI from twenty years ago and started crossing off clauses, until we’re left with “visualization and reporting.”...BI tools should aspire to do one thing, and do it completely: They should be the universal tool for people to consume and make sense of data. If you—an analyst, an executive, or any person in between—have a question about data, your BI tool should have the answer...
Scaling TensorFlow to 300 million predictions per second
We present the process of transitioning machine learning models to the TensorFlow framework at a large scale in an online advertising ecosystem. In this talk we address the key challenges we faced and describe how we successfully tackled them; notably, implementing the models in TF and serving them efficiently with low latency using various optimization techniques...
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy by offering a home for creative expression and an experience that is genuine, joyful, and positive.
Generate useful features from large amount of data
Apply supervised and unsupervised machine learning techniques, such as linear and logistic regression, decision trees, and k-means clustering
Perform statistical analysis such as KPI deep dives, performance marketing efficiency, behavioral clustering, and user journey analytics
Curate audiences and inform engagement tactics to enable differentiated, relevant marketing touches across channels (social, email, in app, push)
Synthesize analytics and statistical approaches into easy-to-consume storylines, both visually and verbally, and provide indicated actions for executive audiences
Capture business requirements for data and analytic solutions and collaborate XFN to ensure business requirements align with business needs
Analyze creatives and surface insights that will help drive engagement and retention
Support day-to-day collaboration with performance marketing to communicate insights and recommend data informed strategies
Want to post a job here? Email us for details >> email@example.com
Training & Resources
River: Online machine learning in Python
River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data...