A visual introduction to probability and statistics...
Album de Statistique Graphique
The Album de Statistique Graphique is a set of annual publications of data visualizations in France in the late 1800’s. I first heard about them from Michael Friendly a decade ago and have always been on the lookout to find them. Over the course of my thesis I did find a couple copies in research libraries, but the particular libraries required signing agreements that I would not share the photos (why do libraries do this?). Now, finally, they are on-line, easily accessible, in high quality scans courtesy of David Rumsey (thank you!). And they are amazing!...
Surnames and Social Mobility
To what extent do parental characteristics explain child social outcomes?
Typically, parent-child correlations in socioeconomic measures are in the
range 0.2-0.6. Surname evidence suggests, however, that the
intergenerational correlation of overall status is much higher. This paper
shows, using educational status in England 1170-2012 as an example, that
the true underlying correlation of social status is in the range 0.75-0.85...
A Message From This Week's Sponsor
Data Science Articles & Videos
Dynamic Planning Networks
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize valuable information before acting...
CycleGAN, a Master of Steganography
CycleGAN [Zhu et al., 2017] is one recent successful approach to learn a transformation
between two image distributions. In a series of experiments, we demonstrate
an intriguing property of the model: CycleGAN learns to “hide” information
about a source image into the images it generates in a nearly imperceptible, highfrequency
"Modern" C++ Lamentations
This will be a long wall of text, and kinda random! My main points are: 1. C++ compile times are important,2. Non-optimized build performance is important, 3. Cognitive load is important. I don’t expand much on this here, but if a programming language or a library makes me feel stupid, then I’m less likely to use it or like it. C++ does that a lot :) ...
Evolved Radio and its Implications for Modelling Evolution of Novel Sensors
This paper describes an evolvable hardware experiment that
resulted in a network of transistors sensing and utilising the
radio waves emanating from nearby PCs. We argue that this
evolved ‘radio’ is only the second device ever whose sensors
were constructed in a way that in key aspects is analogous to
that found in nature. We highlight the advantages and
disadvantages of this approach and show why it is practically
impossible to implement a similar process in simulation....
Probable more likely than probably
What kind of probability are people talking about when they say something is "highly likely" or has "almost no chance"? The chart below, created by Reddit user zonination, visualizes the responses of 46 other Reddit users to "What probability would you assign to the phase: "for various statements of probability. Each set of responses has been converted to a kernel destiny estimate and presented as a joyplot using R.
Machine Learning Classification Methods and Factor Investing
In this piece, we’ll first review machine learning for classification, a problem which may be less familiar to investors, but fundamental to machine learning professionals. Next, we’ll apply classification to the classic value/momentum factors (spoiler: the results are pretty good)...
Data Scientist, Retention - Disney Streaming Services - NYC
The Data Scientist is a critical position within DSS and in the Data organization who specializes in applying machine learning methods to meet optimization, personalization, recommendations and efficiency related challenges, in close collaboration with engineering and business partners. In this role, you will build and apply machine learning techniques and modern statistics to data both augment decision-making but to also significantly improve operational process problems through automation. You will collaborate across teams to define problems and develop automated solutions with the Data, Product and Engineering teams to be built into our products...
Training & Resources
Modern Deep Learning Techniques Applied to NLP
This project contains an overview of recent trends in deep learning based natural language processing (NLP). It covers the theoretical descriptions and implementation details behind deep learning models, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and reinforcement learning, used to solve various NLP tasks and applications. The overview also contains a summary of state of the art results for NLP tasks such as machine translation, question answering, and dialogue systems...
Data Science from Scratch: First Principles with Python
"It does three things superbly: covers the basic low level tools of a data scientist (the "from scratch" part), gives a great overview of useful Python programming examples for those new to Python, and gives an amazingly succinct yet high level overview of the mathematics and statistics required for data science..."...
For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page