Data Sets

Published

September 1, 2022

This is a collection of data sets (and data set collections) that you may find useful when we start to do some more independent work.

Interesting data sets

New York Times Covid data
The data the NYT has been collecting about the pandemic
RIAA sales database
Information about music sales.
Wildlife strike database
Information about encounters between animals and airplanes.
Current Population Survey
Census and survey data from around the world
Eviction Lab data
Data about evictions
Baby names (social security website)
The full set of baby names. Also includes names broken down by state
US census data
Data from the Census Bureau
Google NGrams
Google’s massive collection of ngrams (words and phrases) that they have compiled as part of their book scanning project
Million Song Dataset
Audio features and metadata from a million music tracks
Open Street Map
A community driven alternative to Google maps with fairly easy to use geographic data

Topical data set collections

World Health Organization
health statistics from around the world
Data.gov
All kinds of interesting datasets from our government.
The World Bank
A large collection of worldwide development data
Bureau of Labor Statistics
Data from the Department of Labor
GeoCommons
Geographic data and visualizations
Sports Reference
A ridiculous amount of sporting data

General data set collections

Vega Datasets
This is a collection of the common example data sets you will see in class and in many textbooks and tutorials.
Kaggle Datasets
A large repository of assorted datasets (they claim over 50,000 datasets).
Google Public Data
A big collection of public data, complete with some simple visualizations of it
Data Portals
A very meta listing of open data set collections
Visual Analytics Benchmark Repository
Yet another good collection of data sources