CS 465 Information Visualization

Data Sets

This is a collection of data sets (and data set collections) that you may find useful when we start to do some more independent work.

Interesting data sets

New York Times Covid data : The data the NYT has been collecting about the pandemic

RIAA sales database : Information about music sales.

Wildlife strike database : Information about encounters between animals and airplanes.

Current Population Survey : Census and survey data from around the world

Eviction Lab data : Data about evictions

Baby names (social security website) : The full set of baby names. Also includes names broken down by state

US census data : Data from the Census Bureau

Google NGrams : Google's massive collection of ngrams (words and phrases) that they have compiled as part of their book scanning project

Million Song Dataset : Audio features and metadata from a million music tracks

Open Street Map : A community driven alternative to Google maps with fairly easy to use geographic data

Topical data set collections

World Health Organization : health statistics from around the world

Data.gov : All kinds of interesting datasets from our government.

The World Bank : A large collection of worldwide development data

Bureau of Labor Statistics : Data from the Department of Labor

GeoCommons : Geographic data and visualizations

Sports Reference : A ridiculous amount of sporting data

General data set collections

Vega Datasets : This is a collection of the common example data sets you will see in class and in many textbooks and tutorials.

Kaggle Datasets : A large repository of assorted datasets (they claim over 50,000 datasets).

Google Public Data : A big collection of public data, complete with some simple visualizations of it

Data Portals : A very meta listing of open data set collections

Visual Analytics Benchmark Repository : Yet another good collection of data sources


Last updated 09/23/2021