This morning I woke up to an email from the U.S. Census Bureau announcing that the American Community Survey (ACS) 5-Year data for 2018-2022 was just released.
This is big news that I, and I’m sure many others, have been eagerly awaiting.
One of the core design goals of the censusdis project is that it worka with any data set that the U.S. Census Bureau publishes. In particular, when the bureau releases new data sets or new vintages of existing data sets censusdis should just work, without any need for updates or modifications to the code. We tested this theory a few months ago when the DHC-A data set was released. But the ACS is much more popular, so it made sense to try it out right away.
I decided to try to download the ACS5 2022 data in a google colab notebook. In the first cell, I installed censusdis, since it’s not part of the standard colab environment.
!pip install censusdis
This installed various dependencies and then installed the latest version of censusdis
. At the time of this writing that’s 0.99.5.
Next, I imported what I needed to download data:
import censusdis.data as ced
And finally, I downloaded the new data set with
df_2022 = ced.download(
dataset="acs/acs5",
vintage=2022,
download_variables=['NAME', 'B01003_001E'],
state="*"
)
df_2022.head()
And sure enough, the output it produced looked like this:
That’s it, we have the latest state population estimates from the ACS5 2022 data set that was only publicly released nine hours ago. If I had been awake at 1am this morning, presumably it would have worked then too.
Now, by coincidence, version 0.99.5 of censusdis was just released yesterday. So what would have happened if I had already set up an environment with an older release of censusdis and I wanted to use the new data? Would that have worked? It was easy enough to try. I just changed the !pip install
line at the top of the notebook to
!pip install censusdis==0.13.3
0.13.3 is a version from April of this year. There have been more than a hundred updates to the code between 0.13.3 and 0.99.5 so it’s pretty radically different than the censusdis of today.
The older release uses older versions of some dependencies like pandas, so I had to restart colob before I ran it, but once I did that, lo and behold it just worked with the 2022 data set.
0.13.3 was missing a lot of the features of the newer versions of censusdis, but it has the core downloading functionality. Like 0.99.5, it had no problem downloading data from a data set that didn’t exist when it was released.
I’m very happy that censusdis lived up to the promise that it would be able to download new data sets as they were released. Even older versions now work well with newer data sets that were not available when they were released. A year from now, there is every reason to believe that the 2023 ACS5 data will work just fine with version 0.99.5 of censusdis. In the meantime, I’m going to update some of my ongoing research projects to use the new 2022 data, and I encourage others to do so no matter what version of censusdis they are currently using.