Data Sets
This is a collection of data sets (and data set collections) that you may find useful when we start to do some more independent work.
Interesting data sets
New York Times Covid data : The data the NYT has been collecting about the pandemic
RIAA sales database : Information about music sales.
Wildlife strike database : Information about encounters between animals and airplanes.
Current Population Survey : Census and survey data from around the world
Eviction Lab data : Data about evictions
Baby names (social security website) : The full set of baby names. Also includes names broken down by state
US census data : Data from the Census Bureau
Google NGrams : Google's massive collection of ngrams (words and phrases) that they have compiled as part of their book scanning project
Million Song Dataset : Audio features and metadata from a million music tracks
Open Street Map : A community driven alternative to Google maps with fairly easy to use geographic data
Topical data set collections
World Health Organization : health statistics from around the world
Data.gov : All kinds of interesting datasets from our government.
The World Bank : A large collection of worldwide development data
Bureau of Labor Statistics : Data from the Department of Labor
GeoCommons : Geographic data and visualizations
Sports Reference : A ridiculous amount of sporting data
General data set collections
Vega Datasets : This is a collection of the common example data sets you will see in class and in many textbooks and tutorials.
Kaggle Datasets : A large repository of assorted datasets (they claim over 50,000 datasets).
Google Public Data : A big collection of public data, complete with some simple visualizations of it
Data Portals : A very meta listing of open data set collections
Visual Analytics Benchmark Repository : Yet another good collection of data sources
Last updated 09/23/2021