Data Science Weekly Newsletter

Issue

193

August 3, 2017

‍

Editor's Picks

‍

Inside Salesforce’s Quest to Bring Artificial Intelligence to Everyone
Starting two years ago, a band of artificial-intelligence acolytes within Salesforce escaped the towering headquarters with the goal of crazily multiplying the impact of the machine learning models that increasingly shape our digital world—by automating the creation of those models. As shoppers checked out sofas above their heads, they built a system to do just that...

What a nerdy debate about p-values shows about science — and how to fix it
There’s a huge debate going on in social science right now. The question is simple, and strikes near the heart of all research: What counts as solid evidence?...

Machines Are Developing Language Skills Inside Virtual Worlds
Now teams at DeepMind, an AI-focused subsidiary of Alphabet, and Carnegie Mellon University have developed a way for machines to figure out simple principles of language for themselves inside 3-D environments based on first-person shooter computer games...

‍

‍

Get started with Python for data science in minutes

Using Python for data science and machine learning is easy with ActiveState’s Python distribution. Pre-bundled with 300+ packages, ActivePython includes NumPy, SciPy, scikit-learn, TensorFlow, Theano and Keras, and leverages the Intel Math Kernel Library, so you can focus on your data and not setting up software. Download ActivePython and start developing for free
.

‍

‍

To Build a Smarter Chatbot, First Teach It a Second Language
Translation can help an algorithm’s overall language skills...

Machine Learning Infrastructure at Stripe
Machine learning at Stripe has a foundation built on Python and the PyData stack, with scikit-learn and pandas continuing to be core components of an ML pipeline that feeds a production system written in Scala. This talk will cover the ML Infra team’s work to bridge the serialization and scoring gap between Python and the JVM, as well as how ML Engineers ship models to production...

Fashioning with Networks: Neural Style Transfer to Design Clothes
In this paper, the neural style transfer algorithm is applied to fashion so as to synthesize new custom clothes. We construct an approach to personalize and generate new custom clothes based on a users preference and by learning the users fashion choices from a limited set of clothes from their closet...

The AI Hierarchy of Needs
Think of AI as the top of a pyramid of needs. Yes, self-actualization (AI) is great, but you first need food, water and shelter (data literacy, collection and infrastructure)...

Natural Language Processing with Small Feed-Forward Networks
We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models...

Earth from Space
Analyzing DigitalGlobe’s high resolution satellite data to obtain a detailed representation of land use on Earth....

Predicting Personality from Book Preferences with User-Generated Content Labels
Psychological studies have shown that personality traits are associated with book preferences. However, past findings are based on questionnaires focusing on conventional book genres and are unrepresentative of niche content. For a more comprehensive measure of book content, this study harnesses a massive archive of content labels, also known as 'tags', created by users of an online book catalogue, Goodreads.com...

‍

‍

Junior Data Scientist/Data Scientist - Penguin Random House US - New York City, USA
The Data Science & Analytics group at Penguin Random House is seeking a Junior Data Scientist/Data Scientist...In this role, you will have an opportunity to work on a variety of high-profile projects under the mentorship of Senior Data Scientists and in collaboration with key decision makers across the organization...We are an agile team of data scientists and software engineers. The team has a wide mandate encompassing pricing systems, recommendation / personalization systems, title segmentation, supply chain, as well as ad-hoc analysis and data exploration....

‍

‍

Introducing Vectorflow: a lightweight neural network library for sparse data [from Netflix]
We felt the need for a minimalist library that is specifically optimized for training shallow feedforward neural nets on sparse data in a single-machine, multi-core environment. We wanted something small and easy to hack, so we built Vectorflow, one of the many tools our machine learning scientists use...

Deep recommender models using PyTorch.
Spotlight uses PyTorch to build both deep and shallow recommender models. By providing both a slew of building blocks for loss functions (various pointwise and pairwise ranking losses), representations (shallow factorization representations, deep sequence models), and utilities for fetching (or generating) recommendation datasets, it aims to be a tool for rapid exploration and prototyping of new recommender models...

Presenting the d3.loom chart
A new plugin to create butterfly, fan-like, axe shaped charts...

‍

‍

Text Mining with R: A Tidy Approach
Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you’ll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You’ll learn how tidytext and other tidy tools in R can make text analysis easier and more effective....

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

‍