Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
July 7, 2016

Editor's Picks

  • The history of R's predecessor, S, from co-creator Rick Becker
    Before there was R, there was S. R was modeled on a language developed at AT&T Bell Labs starting in 1976 by Rick Becker and John Chambers (and, later, Alan Wilks) along with Doug Dunn, Jean McRae, and Judy Schilling. At last week's useR! conference, Rick Becker gave a fascinating keynote address, Forty Years of S...

A Message From This Week's Sponsor

  • Where science and policy change the world. And You.

    Apply your knowledge & skills to federal policy via the AAAS Science & Technology Policy Fellowships. A year-long professional development opportunity for doctoral level data scientists to serve in the federal government in Washington, D.C.
    STPF fosters a career-enhancing network of science leaders who understand policymaking & contribute to society...

Data Science Articles & Videos

  • Google’s DeepMind AI to use 1 million NHS eye scans to spot diseases earlier
    Google’s DeepMind division has announced a partnership with the NHS’s Moorfields Eye Hospital to apply machine learning to spot common eye diseases earlier. The five-year research project will draw on one million anonymous eye scans which are held on Moorfields’ patient database, with the aim to speed up the complex and time-consuming process of analysing eye scans...
  • Is It Brunch Time?
    We begin by using the Twitter Streaming API. This API allows us to subscribe to search terms, for example “brunch”, and get any tweet matching that term sent to our program in real-time. Not only did we collected “brunch” tweets but we also collected tweets containing “breakfast”, “lunch”, and “dinner” to use as controls (which we will review later). We allowed the program run from 2015–06–01 to 2016–05–31 which yielded 100M+ tweets for analysis...
  • Why Python is Slow: Looking Under the Hood
    When I teach courses on Python for scientific computing, I make this point very early in the course, and tell the students why: it boils down to Python being a dynamically typed, interpreted language, where values are stored not in dense buffers but in scattered objects...But I realized something recently: despite the relative accuracy of the above statements, the words "dynamically-typed-interpreted-buffers-vectorization-compiled" probably mean very little to somebody attending an intro programming seminar...So I decided I would write this post, and dive into the details that I usually gloss over...
  • Heavy Metal and Natural Language Processing - Part 1
    Iain scraped and built a dataset of lyrics to 222,623 songs by 7,364 metal bands, then used traditional natural language processing techniques to analyze them...this post is a good tour through the natural language processor's toolkit -- Bag of Words Bayesian filtering, log-likelihood ratio, term frequency -Inverse document frequency, cosine distance, etc. The output of the analysis is sometimes fun and interesting, but the value here is mostly as a good primer on how the different techniques work and when you might use them...
  • The Toronto Raptors Are Using IBM’s Watson to Draft A Winning Team
    After falling to the eventual NBA champs during the Eastern finals, the Toronto Raptors are hungry for a championship title. Thursday’s draft will be crucial in crafting a winning lineup, and when it comes to deciding who makes the team, the Raptors will be able to consult their newest recruit: IBM’s Watson...
  • Going Beyond Full Utilization: The Inside Scoop On Nervana’s Winograd Kernels
    This is part 2 of a series of posts on how Nervana uses the Winograd algorithm to make convolutional networks faster than ever before. In the first part we focused on benchmarks demonstrating a 2-3x algorithmic speedup. This part will get a bit more technical and dive into the guts of how the Winograd algorithm works, and how we optimized it for GPUs...


  • Data Scientist - RockStar - NYC
    Rockstar Games (developers of Grand Theft Auto, Max Payne, Red Dead Redemption, L.A. Noire, Bully & more) is seeking an experienced data scientist to join our Analytics practice and help advance our business intelligence capabilities. Successful candidates will work with analytics and product leadership to assure that the most relevant real-time and historical data is identified, tracked, analyzed, and made actionable across all of our games...

Training & Resources

  • Basic Interactive Geospatial Analysis in Python
    Geospatial analysis is a massive field with a rich history. Python has some pretty slick packages for working with geospatial data such as, but not limited to, Shapeley, Fiona, and Descartes...GeoPandas sits on top of these packages and exposes a familiar Pandas-like API that makes a series of element-wise and aggregation methods (from the base packages) easy to apply to dataframes containing geometry data...


Easy to unsubscribe at any time. Your e-mail address is safe.