Data Science Datasets

A list of publicly available datasets

General

  • Amazon Public Data Sets
    Public Data Sets on AWS: centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications
  • Wikipedia
    Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries
  • Freebase
    A community-curated database of people, places and things
  • World Bank
    DataBank is an analysis and visualization tool that contains collections of time series data on a variety of topics
  • Windows Azure Marketplace
    Free datasets via Windows Azure Data Market including Academic data, Speech Recognition data, etc.
  • Machine Learning Repository
    200+ Datasets from Center for ML & Intelligent Systems
  • Deep Learning Data Sets
    Music, natural images, text, speech, faces, recommendation systems datasets for benchmarking algorithms
  • Stanford Large Network Dataset Collection
    A collection of about 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges. It includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks.
  • Yahoo Datasets
    We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo! Developer Network.

    And, if you are looking for something specific, you can always try your luck posting on reddit/r/datasets or on Open Data StackExchange

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.