Data Science Weekly Newsletter - Issue 145

Issue #145

Sep 01 2016

Editor Picks
  • The Three Faces of Bayes
    As I’ve read more outside the fields of machine learning and natural language processing — from psychometrics and environmental biology to hackers who dabble in data science — I’ve noticed three distinct uses of the term “Bayesian.”...I’ll present the three main uses of “Bayesian” as I understand them, all through the lens of a naïve Bayes classifier. I hope you find it useful and interesting!...

A Message from this week's Sponsor:


  • Harness the business power of big data.

    How far could you go with the right experience and education? Find out. At Capitol Technology University. Earn your PhD Management & Decision Sciences — in as little as three years — in convenient online classes. Banking, healthcare, energy and business all rely on insightful analysis. And business analytics spending will grow to $89.6 billion in 2018. This is a tremendous opportunity — and Capitol’s PhD program will prepare you for it. Learn more now!

Data Science Articles & Videos

  • Majority Of Mathematicians Hail From Just 24 Scientific ‘families’
    Most of the world’s mathematicians fall into just 24 scientific 'families', one of which dates back to the fifteenth century. The insight comes from an analysis of the Mathematics Genealogy Project (MGP), which aims to connect all mathematicians, living and dead, into family trees on the basis of teacher–pupil lineages, in particular who an individual's doctoral adviser was...
  • Debugging Machine Learning
    I've been thinking, mostly in the context of teaching, about how to specifically teach debugging of machine learning. Personally I find it very helpful to break things down in terms of the usual error terms: Bayes error, approximation error, estimation error, and optimization error. I've generally found that trying to isolate errors to one of these pieces, and then debugging that piece in particular has been useful...For instance, my general debugging strategy involves steps like the following...
  • Infrastructure For Deep Learning
    In this post, we'll [OpenAI] share how deep learning research usually proceeds, describe the infrastructure choices we've made to support it, and open-source kubernetes-ec2-autoscaler, a batch-optimized scaling manager for Kubernetes. We hope you find this post useful in building your own deep learning infrastructure...
  • Learning From Imbalanced Classes
    If you’re fresh from a machine learning course, chances are most of the datasets you used were fairly easy. Among other things, when you built classifiers, the example classes were balanced, meaning there were approximately the same number of examples of each class...But when you start looking at real, uncleaned data one of the first things you notice is that it’s a lot noisier and imbalanced...where machine learning classifiers are used to sort through huge populations of negative (uninteresting) cases to find the small number of positive (interesting, alarm-worthy) cases...If you deal with such problems and want practical advice on how to address them, read on...
  • Densely Connected Convolutional Networks
    Recent work has shown that convolutional networks can be substantially deeper, more accurate and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper we embrace this observation and introduce the Dense Convolutional Network (DenseNet), where each layer is directly connected to every other layer in a feed-forward fashion...The DenseNet obtains significant improvements over the state-of-the-art on all five highly competitive object recognition benchmark tasks (e.g., yielding 3.74% test error on CIFAR-10, 19.25% on CIFAR-100 and 1.59% on SVHN)...


  • Data Scientist - WeWork - NYC

    We are seeking a mid-level data scientist to join WeWork’s Data Science team. This sits within the Data Team, a centralized and relatively small, collaborative, and tightly-knit team whose job is support analysts and decision makers across the organization, and to support you. We want you to focus on shipping machine learning models and make a difference to the business...

Training & Resources

  • Deep Learning Part 1: Comparison of Symbolic Deep Learning Frameworks
    Deep learning is an emerging field of research, which has its application across multiple domains. I try to show how transfer learning and fine tuning strategy leads to re-usability of the same Convolution Neural Network model in different disjoint domains. Application of this model across various different domains brings value to using this fine-tuned model...In this blog (Part1), I describe and compare the commonly used open-source deep learning frameworks. I dive deep into different pros and cons for each framework, and discuss why I chose Theano for my work...
  • Conda: Myths and Misconceptions
    In the four years since its initial release, many words have been spilt introducing conda and espousing its merits, but one thing I have consistently noticed is the number of misconceptions that seem to remain in the (often fervent) discussions surrounding this tool. I hope in this post to do a small part in putting these myths and misconceptions to rest...



  • Python Pocket Reference

    Updated for both Python 3.4 and 2.7, this convenient pocket guide is the perfect on-the-job quick reference. You’ll find concise, need-to-know information on Python types and statements, special method names, built-in functions and exceptions, commonly used standard library modules, and other prominent Python tools...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.
P.S. Interested in reaching fellow readers of this newsletter? Consider sponsoring! Email us for details :) - All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.