Working with Non-Nested Census Geographies in Censusdis

Working with Non-Nested Census Geographies in Censusdis

Introduction

Most U.S. Census data sets are keyed by geography. Concepts like population, median income, and gender, race, or age ranges of residents are only meaningful when we tie them to geographies like states, counties, or census tracts. The U.S. Census provides data at a wide variety of geographies, which nest in a hierarchy as shown here:

[Image originally from https://www.census.gov/content/dam/Census/data/developers/geoareaconcepts.pdf.]

The lines in this image represent containment. Regions are fully contained in the nation; divisions are contained in regions; states are contained in divisions, counties are contained in states, census tracts are contained in counties; block groups are contained in tracts, and blocks are contained in block groups. All of these geographies down the center of the diagram are referred to as on-spine.

But there are other off-spine geographies, like Core Based Statistical Areas (CBSAs), congressional districts and many others. They may be contained by large on-spine geographies, but they don’t properly contain smaller on-spine geographies larger than block.

For example, a CBSA is not necessarily contained in any on-spine geography below the nation. CBSAs like the Kansas City CBSA or the New York City CBSA, for example, cross state lines.

Containing Geographies with censusdis

Often we want to look as smaller geographical areas, like census tracts, but only those that are contained within an off-spine geography like a CBSA or a congressional district. Unfortunately, the U.S. Census API does not let us do this directly. And early versions of the censusdis package that wraps the U.S. Census API for Python users didn’t either. But now, through the censusdis.data.contained_within() API we can easily make this kind of query.

Here is an example of how we can query all of the census tracts in the New York City area CBSA (note that in the Census API and censusdis, CBSAs are called metropolitan_statistical_area_micropolitan_statistical_areas):

import censusdis.data as ced
from censusdis.datasets import ACS5
from censusdis.msa_msa import NEW_YORK_NEWARK_JERSEY_CITY_NY_NJ_PA_METRO_AREA

df_ny_tracts = ced.contained_within(
    metropolitan_statistical_area_micropolitan_statistical_area=NEW_YORK_NEWARK_JERSEY_CITY_NY_NJ_PA_METRO_AREA
).download(
    ACS5,
    2020,
    ["NAME", "B19013_001E"],
    state="*",
    county="*",
    tract="*"
)

The first clause of the query indicates that we are looking to download data for geographies that are contained within the NYC CBSA. This doesn’t mean we want data tied to that CBSA, but that we want to restrict the data from the next clause of the query to be for geographies that are contained by the CBSA.

The second clause is where we specify the dataset, vintage, variable names, and geographies we want, just like in a normal call to ced.download(). But in this case, we are asking for all states, all counties, and all tracts in the country. That’s a lot of census tracts. But because of the ced.contained_within(), we won’t actually download all of these. Instead, censusdis will first use maps in downloads from the U.S. Census to figure out what states overlap the CBSA, get data only from those, and then use the maps again to filter that data down to those that are physically contained in the CBSA.

If we plot the data on a map, it looks like this:

Demonstration Notebooks

There is a lot more to this API, which is best demonstrated with sample notebooks. Some of the notebooks currently available are:

Here are some of the maps these notebooks produce:

Please see the notebooks themselves for more details.

Conclusions

The notion of querying geographies contained within others, even if they are not both on-spine, is powerful. And now, it is simple to do for a wide variety of use cases. We hope you will try it out for your own applications. Please try it out, and raise any suggestions or feedback as an issue or discussion in the censusdis project.