In our examples, we use publicly-available data sources, as much as possible. For your convenience, we included a link to the datafiles we use.

Air Quality US Cities
CA College Readiness
California School and School District Performance

The following files have been adapted from California Department of Education's API Data Files for 2013 Growth API. They represent the measurements of California school performance by district, and by school.

Record layout appears here:

Flag values appear here:

Here are additional files for supplementing the data in the two main files:

Canadian Census
Capital Expenditure by Industry
Customer Value Analysis
Analysis of customers that lease automobiles: customer-value-analysis.csv.
Bureau of Transportation Statistics

These files represent data regarding On-Time Performance of Air Carriersobtained from the following URL:

EPA Data 2015
HUD Residential/Business Ratios
Geographic Features

Geographic Features (for the US), found on United States Board on Geographic Names Website. We recommend downloading the latest All States Zip file.

Income Tax Returns

Recently published summary data on Individual Tax Returns:

Source: Internal Revenue Service (SOI Stats):

See also the supplemental data for US Regions at us_regions.csv, and for mapping zipcodes to latitude and longitude at zipcode-lat-long.csv.

NCAA Basketball

ncaa_basketball_results.csv for years 1985 through 2018. Original data courtesy of NCAA Tournament Results, a dataset compiled by Michael Roy.


Additional information can be found on the Wikipedia List of NCAA Division I men's basketball programs.

New York Taxicab Rides
New York Vehicle Collisions

nypd_motor_vehicle_collisions_2018.csv contains information on all vehicle collisions reported during 2018 in New York city, including time, latitude and longitude, approximate street location, number of vehicles, fatalities, injuries, and so on.

US State Information
  • State Abbreviations

    state-abbreviations.csv, contains abbreviations, full names, and country designation for all US States, US Commonwealths and Territories, Canadian Provinces and Territories, and Mexican States; 3-column table.

  • State Names

    state-names.csv, contains abbreviations and full names for all US States and Canadian Provinces and Territories, 2-column table.

TV Home Viewing, with DMA
San Francisco Police Department Incidents, with Latitude and Longitude, for 2010 through 2015

San Francisco Cab Ride Distribution
S&P 500 Historical Monthly
Shakespeare's Works

This tab-delimited file contains the 154 sonnets attributed to William Shakespeare: shakespeare_sonnets.csv. The text of the sonnets is in the public domain.

This semicolon-delimeted file contains the text of all plays authored by William Shakespeare: shakespeare_play_text.csv. We derived this file from the William Shakepeare Plays archive on Datahub; it is in the public domain.

Encyclopedia of Stars

suns.csv, adapted from, and used by permission from David Nash.

US Census or census-pop.csv, population by state, for 1790 - 2010
census-pop-curent.csv, population by state, for 1790 - 2017
pop-2011-2017.csv, population by state, for 2011 - 2017
US Counties
US Trade

Exports, Imports and Trade Balance by Country, Monthly totals, 1985-present: us-trade.csv.

Source: United States Census Bureau:

World Literacy Rates

A series that represents the literacy rate, in percentage, for various countries or regional entities, over time: cross-country-literacy-rates.csv

Source: Our World in Data: Literacy: