Wednesday 8 November 2017

Public data corpus sources for Analytics/ Analysis


Here is the collection of data sets from different public repos and for various areas of industries.

I will keep updating the list as we go. Sources for the links are referenced at the bottom in the References section. I thank to the original posters.

Broken links are updated with latest urls.

These links are the sources for range of huge volume of data made available by Governments and organizations across the world for good cause.

Please use responsibly.

Aviation

National Flight Data Center (NFDC)

FAA Data & Research

Flight Delay Information

FAA Aviation Safety Information Analysis and Sharing (ASIAS)

Aircraft Situation Display to Industry (ASDI)

NTSB Accident Database & Synopses

OpenFlights.org

The Center for Innovation in Engineering and Science Education Real time data sites

MIT Airline Data Project

Space

Real-Time Space Weather Data Sources

Politics

Data on the U.S. Congress – A Joint Effort from Brookings and the American Enterprise Institute

Sports

Open Sports Data/API

Football (Soccer) Stats

Government

Public Government Data Sets

U.S. Department of Homeland Security Data

Public Data for the State of Utah

Compilations by others

Finding Data on the Internet - Inside-R

Nathan Yau's collection of data sets

Dr. Jerry A. Smith's Favorite Data sets

Hilary Mason's "Research Quality" Data-sets

https://bitly.com/bundles/hmason/1
This is a bundle that gathers public data sets that might be interesting to researchers in a variety of fields in one place.

Peter Skomoroch's list of data sets on Delicious

Data Wrangling blog data set list

Other

DonorsChoose.org - Hacking Education: A Contest for Developers and Data Crunchers

Datasets for "The Elements of Statistical Learning"

Enron Email Dataset

http://www.cs.cmu.edu/~enron/
CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages.

Yandex

The Data Page

Public Data Sets on Amazon

Miami School of Business Statistical Data Sets

Public data put to good use

ASU GeoDA Center Data

UC Irvine Machine Learning Repository

European Cities 1M Data Sets

University of Edinburgh School of Informatics Data Sets for Data Mining

Opinion Mining, Sentiment Analysis, and Opinion Spam Detection

Quandl - Intelligenct search for numerical data

Gephi Graph Visualization Sample Data Sets

CitiBike, by NYC Bike Share - Station data

Air Quality Notifications

The GDELT Project - Global Database of Events, Language, and Tone





References
https://gist.github.com/campeterson/5946446

No comments:

Post a Comment