One of the design goals of the censusdis project is that as new data sets are added or existing data sets and variables change from year to year, the system should be able to integrate the changes without any human intervention. On September 21, 2023, the U.S. Census released an important new data set called the Detailed Demographic and Housing Characteristic File A, also known as the DHC-A. In the words of the Census Bureau, DHC-A contains, “population counts and sex by age statistics for approximately 1,500 detailed racial and ethnic groups, such as German, Lebanese, Jamaican, Chinese, Native Hawaiian, and Mexican, as well as American Indian and Alaska Native (AIAN) tribes and villages like the Navajo Nation.”
The day after DHC-A was released, I sat down to write a notebook to load DHC-A data and do some simple analysis. If I could do make this work without needing to modify censusdis, that would be proof that we had passed an important test of our design goal.
The good news is that the test worked. The notebook is now available on github. It downloads data from DHC-A, finds the data on a handful of the 1,500 groups, representing a small set of Asian populations, then determines what fraction of the Asian population these groups make up in counties across the United States that have at least 1,000 total Asian residents. It then plots this information on maps that look like this:
This is really just the tip of the iceberg for the type of analysis that can be done with this data. The key point of the notebook is to demonstrate how easy it is to access the DHC-A data in Python.
If you are already familiar with censusdis, please download the notebook and have a look. If you have not used censusdis before, you may want to start with this tutorial, and then look at some of the other introductory notebooks before you proceed to the new DHC-A notebook.
I’m looking forward to the U.S. Census bureau publishing additional data sets in the future, and I am excited that people with be able to work with them with censusdis, just as they can with DHC-A and many other data sets.