Data Science Weekly Newsletter - Issue 42

Issue #42

Sept 11 2014

Editor Picks

  • Recursive Deep Learning for NLP and Computer Vision
    As the amount of unstructured text data that humanity produces overall and on the Internet grows, so does the need to intelligently process it and extract different types of knowledge from it. My research goal in this thesis is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks...
  • Large-scale graph partitioning with Apache Giraph
    At the end of last year, we talked about the graph processing system Apache Giraph and our work to make it run at Facebook’s massive scale. Today, we’d like to present one of the use cases in which Giraph enabled computations that were previously difficult to process without incurring high latency...
  • Using Neo4J for Document Classification
    Graphs are a perfect solution to organize information and to determine the relatedness of content. In this webinar, Neo4j Developer Evangelist Kenny Bastani will discuss using Neo4j to perform document classification. He will demonstrate how to build a scalable architecture for classifying natural language text using a graph-based algorithm called Hierarchical Pattern Recognition...
 
 

Data Science Articles & Videos

 
  • New to Machine Learning? Avoid these three mistakes
    Modern machine learning (i.e. not the theoretical statistical learning that emerged in the 70s) is very much an evolving field and despite its many successes we are still learning what exactly can ML do for data practitioners. I gave a talk on this topic earlier this fall at Northwestern University and I wanted to share these cautionary tales with a wider audience...
  • The Science of Crawl (Part 1): Deduplication of Web Content
    We've come to discover that building a functional crawler can be done relatively cheaply, but building a robust crawler requires overcoming a few technical challenges. In this series of blog posts, we will walk through a few of these technical challenges including content deduplication, link prioritization, feature extraction and re-crawl estimation...
  • Data science: how is it different to statistics ?
    Recently, there has been much hand-wringing about the role of statistics in data science. In this and future columns, I’ll discuss both the threat and opportunity of data science. I believe that statistics is a crucial part of data science, but at the same time, most statistics departments are at grave risk of becoming irrelevant...
  • NYT Data Scientist Chris Wiggins on the way we create and consumer content
    “At The New York Times, we produce a lot of content every day, but we also have a lot of data about the way people engage with that content,” Wiggins says. “[The Times] wanted to build out a data science function not only to curate and make available those data, but to learn from those data. In particular, the thing that the New York Times is interested in learning is: what makes for a good long-term relationship with a reader?”...
  • Long Memory and the Nile: Herodotus, Hurst and H
    The ancient Egyptians were a people with long memories. So, it seems reasonable, and maybe even appropriate, that one of the first attempts to understand long memory in time series was motivated by the Nile...
  • Accelerate Machine Learning with the cuDNN Deep Neural Network Library
    Because of the increasing importance of DNNs in both industry and academia and the key role of GPUs, NVIDIA is introducing a library of primitives for deep neural networks called cuDNN. The cuDNN library makes it easy to obtain state-of-the-art performance with DNNs, and provides other important benefits...
 
 

Jobs

 
  • Data Scientist - CIA - Washington DC

    Do you have a passion for creating data-driven solutions to the world's most difficult problems? The CIA needs technically-savvy specialists to organize and interpret Big Data to inform US decision makers, drive successful operations, and shape CIA technology and resource investments. The CIA is looking for individuals from diverse educational backgrounds to fill the role of data scientist. If you have experience in data analytics, computer science, mathematics, statistics, economics, operations research, computational social science, quantitative finance, engineering or other data analysis fields, consider a career as a Data Scientist at CIA....
 
 

Training & Resources

   
 

Books

 

 
 
P.S. Enjoyed the newsletter? Please forward it to friends and peers - we'd love to have them onboard too :-) - All the best, Hannah & Sebastian
 
Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.