Data Sets
This is a collection of data sets (and data set collections) that you may find useful when we start to do some more independent work.
Interesting data sets
- New York Times Covid data
- The data the NYT has been collecting about the pandemic
- RIAA sales database
- Information about music sales.
- Wildlife strike database
- Information about encounters between animals and airplanes.
- Current Population Survey
- Census and survey data from around the world
- Eviction Lab data
- Data about evictions
- Baby names (social security website)
- The full set of baby names. Also includes names broken down by state
- US census data
- Data from the Census Bureau
- Google NGrams
- Google’s massive collection of ngrams (words and phrases) that they have compiled as part of their book scanning project
- Million Song Dataset
- Audio features and metadata from a million music tracks
- Open Street Map
- A community driven alternative to Google maps with fairly easy to use geographic data
Topical data set collections
- World Health Organization
- health statistics from around the world
- Data.gov
- All kinds of interesting datasets from our government.
- The World Bank
- A large collection of worldwide development data
- Bureau of Labor Statistics
- Data from the Department of Labor
- GeoCommons
- Geographic data and visualizations
- Sports Reference
- A ridiculous amount of sporting data
General data set collections
- Vega Datasets
- This is a collection of the common example data sets you will see in class and in many textbooks and tutorials.
- Kaggle Datasets
- A large repository of assorted datasets (they claim over 50,000 datasets).
- Google Public Data
- A big collection of public data, complete with some simple visualizations of it
- Data Portals
- A very meta listing of open data set collections
- Visual Analytics Benchmark Repository
- Yet another good collection of data sources