Receive the Data Science Weekly Newsletter every Thursday
Easy to unsubscribe at any time. Your e-mail address is safe.
Data Science Weekly Newsletter
February 17, 2022
The Economics of Data Businesses
handful of business models dominate tech today: SaaS, marketplaces, e-commerce, on-demand, social networks and so on. Most of these business models have been studied widely, both their execution and their underlying dynamics...But there’s one notable exception: data businesses. Despite the fact that many of the largest and most dominant tech firms in the world are data businesses, there are not many resources on the what, how and why of this business model...This essay is an attempt to change that...
Machine Learning from the Viewpoint of Investors
In this podcast, we interview two investors who focus heavily on machine learning to get their take on the state of the machine learning industry today: Leigh-Marie Braswell at Founders Fund and Davis Treybig at Innovation Endeavors. We discuss their perspectives on opportunities within MLOps and applied machine learning, common pitfalls and challenges seen in machine learning startups, and new projects they find exciting and interesting in the space...
Compute Trends Across Three Eras of Machine Learning
Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). In this paper we study trends in the most readily quantified factor - compute...Based on observations we split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era. Overall, our work highlights the fast-growing compute requirements for training advanced ML systems...
A Message from this week's Sponsor:
Retool is the fast way to build an interface for any database
With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow.
After helping hundred of readers like you get Data Science jobs, we've distilled all the real-world-tested advice into a self-directed course.
The course is broken down into three guides:
Data Science Getting Started Guide. This guide shows you how to figure out the knowledge gaps that MUST be closed in order for you to become a data scientist quickly and effectively (as well as the ones you can ignore)
Data Science Project Portfolio Guide. This guide teaches you how to start, structure, and develop your data science portfolio with the right goals and direction so that you are a hiring manager's dream candidate
Data Science Resume Guide. This guide shows how to make your resume promote your best parts, what to leave out, how to tailor it to each job you want, as well as how to make your cover letter so good it can't be ignored!
Textless NLP: Generating expressive speech from raw audio
Text-based language models such as BERT, RoBERTa, and GPT-3 have made huge strides in recent years...There is an important limitation, however: These applications are mainly restricted to languages with very large text data sets suitable for training AI models...We’re introducing Generative Spoken Language Model (GSLM), the first high-performance NLP model that breaks free of this dependence on text. GSLM leverages recent breakthroughs in representation learning, allowing it to work directly from only raw audio signals, without any labels or text...
Why You Should (or Shouldn't) Be Using JAX in 2022
JAX hit the scene in late 2018...DeepMind announced in 2020 that it is using JAX to accelerate its research, and a growing number of publications and projects from Google Brain and others are using JAX. With all of this buzz, it seems like JAX is the next big Deep Learning framework, right?...Wrong. In this article we’ll clarify what JAX is (and isn’t), why you should care (or shouldn't, but you probably should), and whether you should (or shouldn’t) use it...
New Podcast: Vanishing Gradients - a data podcast with Hugo Bowne-Anderson
A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson. It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts...
How Data Science Drives Private Equity
Jaclyn Rice Nelson talks to Drew Conway about data-driven private equity, and why it is one of the most exciting places to apply data science...From investment sourcing to due diligence and analyzing post-investment data assets, the range of challenges is matched by the rich data and potential for enormous impact...including: a) How PE investors and PE-backed companies can use data to build a competitive advantage and b) Why PE is one of the most exciting places to apply data science...
Red Flags to Look Out for When Joining a Data Team
Looking for new data science opportunities in this heated market? Before you accept that offer, here are some red flags to beware of. While these are from the perspective of data science, they would also apply to most tech roles...
Perspectives in machine learning for wildlife conservation
Inexpensive and accessible sensors are accelerating data acquisition in animal ecology. These technologies hold great potential for large-scale ecological understanding, but are limited by current processing approaches which inefficiently distill data into relevant information. We argue that animal ecologists can capitalize on large datasets generated by modern sensors by combining machine learning approaches with domain knowledge. Incorporating machine learning into ecological workflows could improve inputs for ecological models and lead to integrated hybrid modeling tools...
DeepMind: The Podcast - AI for science
Step inside DeepMind's laboratories and you'll find researchers studying DNA to understand the mysteries of life, seeking new ways to use nuclear energy, or putting AI to the test in mind-bending areas of maths. In this episode, Hannah meets Pushmeet Kholi, the head of science at DeepMind, to understand how AI is accelerating scientific progress. Listeners also join Hannah on a [virtual] safari in the Serengeti in East Africa to find out how researchers are using AI to conserve wildlife in one of the world’s most spectacular ecosystems...
EvoJAX: Hardware-Accelerated Neuroevolution
EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Built on top of the JAX library, this toolkit enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs. EvoJAX achieves very high performance by implementing the evolution algorithm, neural network and task all in NumPy, which is compiled just-in-time to run on accelerators...
Introducing Hex Tiles: Large-scale spatial data prepped and ready for analytics in minutes
We have something spatial to announce: Hex Tiles, a next-generation tiling system that gives data scientists the ability to easily unify diverse spatial datasets, conduct on-the-fly analytics, and quickly visualize and explore big data on a planetary scale. What’s more, this can all be done in a matter of minutes and within your browser through the Unfolded Platform...
How to Think Less About Data Visualization
Is there a way we can think less about the process of producing charts? Are there heuristics we can follow that make data visualization feel more like speaking our first language?...Thanks to the pioneering work of the late Leland Wilkinson, the answer to these questions is Yes...
Check out the new Anaconda Community for all-things data!
Want insights into the newest developments in the world of data, or need help getting “unstuck” on a problem?
Our Community Forums is the place to go! Be the first to engage with other professionals and ask questions to the broader data community. Users can join in conversations around trends, debate new features, post questions to the community, and more. Plus, it’s another avenue for technical help!
Create your free Anaconda Community account now. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Get data science interview questions frequently asked at top companies every Monday, Wednesday & Friday. Solve the problem before receiving the solution the next morning. Check your work and sharpen your skills! Join our free newsletter.
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!
Is the Normal Curve Too Good to Be True?
In this article, I’ll use charts and simulations to demonstrate how traditional methods that rely on normality can fail us. We’ll also look at alternative methods implemented in Python that are more powerful and accurate. All the code will be provided so that you can check the results for yourself...