Data Science Weekly Newsletter - Issue 256

Issue #256

Oct 18 2018

Editor Picks
  • The Economist's Big Mac Index is calculated with R
    Since its inception in 1986, the Big Mac Index has been compiled and calculated manually, twice a year. But starting with the most recent published index, the index is now calculated with R. This is the first example of a new program at The Economist to publish the data and methods behind its journalism, and here the data and code behind the Big Mac Index have been published as a Github repository...
  • Will Compression Be Machine Learning’s Killer App?
    When I talk to people about machine learning on phones and devices I often get asked “What’s the killer application?“. I have a lot of different answers, everything from voice interfaces to entirely new ways of using sensor data, but the one I’m most excited about in the near-team is compression. Despite being fairly well-known in the research community, this seems to surprise a lot of people, so I wanted to share some of my personal thoughts on why I see compression as so promising...

A Message from this week's Sponsor:


Beginner Data Science Courses - Live & Online

Leading data science training provider, Metis, allows you the flexibility to skill up in data science from anywhere. Learn from industry experts in a unique Live Online format where you’ll interact with instructors and classmates in real time during class. Browse our selection of beginner-level courses and enroll today!

View Courses  


Data Science Articles & Videos

  • M.I.T. Plans College for Artificial Intelligence, Backed by $1 Billion
    Every major university is wrestling with how to adapt to the technology wave of artificial intelligence — how to prepare students not only to harness the powerful tools of A.I., but also to thoughtfully weigh its ethical and social implications. A.I. courses, conferences and joint majors have proliferated in the last few years. But the Massachusetts Institute of Technology is taking a particularly ambitious step, creating a new college backed by a planned investment of $1 billion...
  • Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things
    This paper develops a novel tree-based algorithm, called Bonsai, for efficient prediction on IoT devices – such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash...
  • Applying Deep Learning to Metastatic Breast Cancer Detection
    A pathologist’s microscopic examination of a tumor in patients is considered the gold standard for cancer diagnosis, and has a profound impact on prognosis and treatment decisions. One important but laborious aspect of the pathologic review involves detecting cancer that has spread (metastasized) from the primary site to nearby lymph nodes. Detection of nodal metastasis is relevant for most cancers, and forms one of the foundations of the widely-used TNM cancer staging...
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers...
  • Neural networks don’t understand what optical illusions are
    Machine-vision systems can match humans at recognizing faces and can even create realistic synthetic faces. But researchers have discovered that the same systems cannot recognize optical illusions, which means they also can’t create new ones...
  • Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
    The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes.In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient...
  • Image-Based eCommerce Product Discovery: A Deep Learning Case Study
    To further improve discoverability of Macy’s product catalog online we introduced an easy shopping experience for finding products which are hard to describe using text-based search. In this talk, Macy’s engineers share how they’ve implemented visual similarity using CNNs...


  • Data Scientist - Hearts & Science - NYC

    As every CMO knows, employing technology, data and analytics to business challenges is the cost of entry, not a nice-to-have. Unlocking the value of consumer data can deliver growth, revenue and ROI, and create differentiation to stay on top. Real-time action is instrumental in driving results within a highly fragmented marketplace. Hearts & Science’s Marketing Science solutions are built to deliver insightful results with speed, accuracy and a single version of the truth. In leveraging the DNA of a marketing agency with a talent pool of developers, data scientists and Ph.Ds, we hold a unique position in the increasingly crowded ad tech and consulting space...


Training & Resources

  • Discriminator Rejection Sampling
    We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly...




  • Data Visualization with Python and JavaScript:
    Scrape, Clean, Explore & Transform Your Data

    Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations...

    For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page.

    P.S., Want to reach our audience / fellow readers? Consider sponsoring - grab a spot now; first come first served! All the best, Hannah & Sebastian
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.