In our examples, we use publicly-available data sources, as much as possible. For your convenience, we included a link to the datafiles we use.
The following files have been adapted from California Department of Education's API Data Files for 2013 Growth API. They represent the measurements of California school performance by district, and by school.
Record layout appears here: http://www.cde.ca.gov/ta/ac/ap/reclayout13g.asp.
Flag values appear here: http://www.cde.ca.gov/ta/ac/ap/flagdef13g.asp.
Here are additional files for supplementing the data in the two main files:
canada_census_population_dwellings.csv
Derived from the following source: Statistics Canada, 2011 Census of Population, Statistics Canada. 2012. Population and dwelling counts, for Canada and forward sortation areas as reported by the respondents. Population and Dwelling Count Highlight Tables. 2011 Census, 2006 Census, 2001 Census. For further information, refer to special notes for 2011 and 2006.
Derived from the following source: http://www.geonames.org/, used under Creative Commons Attribution 3.0 License.
These files represent data regarding On-Time Performance of Air Carriersobtained from the following URL: https://www.transtats.bts.gov/DL_SelectFields.asp.
2014: flights-2014.zip
January 2014: flights-2014-01.zip
January/February 2015: flights-2015-01-02.zip
Geographic Features (for the US), found on United States Board on Geographic Names Website. We recommend downloading the latest All States Zip file.
AllStates_20160801.zip. It can be unzipped and uploaded as a partitioned table, by state code.
feature-classes.csv contains descriptions for geographic feature classes; original source at United States Board on Geographic Names.
Recently published summary data on Individual Tax Returns:
Source: Internal Revenue Service (SOI Stats): https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-2016-zip-code-data-soi
See also the supplemental data for US Regions at us_regions.csv, and for mapping zipcodes to latitude and longitude at zipcode-lat-long.csv.
ncaa_basketball_results.csv for years 1985 through 2018. Original data courtesy of data.world NCAA Tournament Results, a dataset compiled by Michael Roy.
Additional information can be found on the Wikipedia List of NCAA Division I men's basketball programs.
nypd_motor_vehicle_collisions_2018.csv contains information on all vehicle collisions reported during 2018 in New York city, including time, latitude and longitude, approximate street location, number of vehicles, fatalities, injuries, and so on.
state-abbreviations.csv, contains abbreviations, full names, and country designation for all US States, US Commonwealths and Territories, Canadian Provinces and Territories, and Mexican States; 3-column table.
state-names.csv, contains abbreviations and full names for all US States and Canadian Provinces and Territories, 2-column table.
sfpd-incidents.csv.zip, with Latitude and Longitude, for 2010 through 2015
This tab-delimited file contains the 154 sonnets attributed to William Shakespeare: shakespeare_sonnets.csv. The text of the sonnets is in the public domain.
This semicolon-delimeted file contains the text of all plays authored by William Shakespeare: shakespeare_play_text.csv. We derived this file from the William Shakepeare Plays archive on Datahub; it is in the public domain.
suns.csv, adapted from www.astronexus.com, and used by permission from David Nash.
Exports, Imports and Trade Balance by Country, Monthly totals, 1985-present: us-trade.csv.
Source: United States Census Bureau: https://www.census.gov/foreign-trade/statistics/country/index.html.
A series that represents the literacy rate, in percentage, for various countries or regional entities, over time: cross-country-literacy-rates.csv
Source: Our World in Data: Literacy: https://ourworldindata.org/literacy