Command-Line censusdis for Plotting Maps

Command-Line censusdis for Plotting Maps

In an earlier post entitled Command-Line censusdis for Data Pipelines and One-Time Analysis, we discussed how the new command-line interface (CLI) to censusdis can be used to download and manipulate data from the U.S. Census API. In this post, we will build on that and show how the same CLI can be used to generate plots of data.

Getting Started

If you are new to censusdis, we suggest you go back to the previous post for information on how to install it and how to get started downloading data with the command-line interface. Once you have done that, you are ready to dive right in.

Plotting Data we Download

As we saw in the previous post, running

censusdis --help

gives us a help message like the following:

usage: censusdis [-h] [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                 [--logfile LOGFILE]
                 {download,plot} ...

options:
  -h, --help            show this help message and exit
  --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Logging level.
  --logfile LOGFILE     Optional file path that logs should be appended to.
                        The file will be created if it does not exist.

command:
  Choose one of the following commands.

  {download,plot}
    download            Download data from the U.S. Census API.
    plot                Plot data on a map.

There are two commands, download, which we covered last time, and plot. Let’s begin by looking at the help for plot.

censusdis plot --help 

This produces the following help message:

usage: censusdis plot [-h] (--dataspec DATASPEC | -i INPUT_DATA_FILE)
                      [--api-key API_KEY] -o OUTPUT
                      plotspec

positional arguments:
  plotspec              A plot specification YAML file.

options:
  -h, --help            show this help message and exit
  --dataspec DATASPEC   A data specification YAML file. If provided,
                        data is downloaded from the U.S. Census API as
                        in the download command.
  -i INPUT_DATA_FILE, --input-data-file INPUT_DATA_FILE
                        Local file to load data from. This is normally
                        a .geojson file that was previously downloaded
                        with the download command.
  --api-key API_KEY     Optional API key. Ignored if data is loaded
                        from a local file with -i/--input-data-file.
                        Alternatively, store your key in
                        ~/.censusdis/api_key.txt. It you don't have a
                        key, you may get throttled or blocked. Get one
                        from
                        https://api.census.gov/data/key_signup.html
  -o OUTPUT, --output OUTPUT
                        Output file to store the plotted map in. Format
                        will be determined from the file extension.
                        .png or .jpeg typically.

The plotspec argument is the name of a plot specification YAML file. It’s a little different than the data spec YAML file we worked with last time, but can work in tandem with it to plot data that the data spec specifies.

We also need to know where to get the data. One option is to use a data spec to download it. The other option is to read it from a local file we previously downloaded.

Let’s look at what a minimal plot specification file looks like.

!PlotSpec
variable: B19013_001E

The only thing this tells us is the variable to plot. Save this in a file called plot-income.yaml. We chose that name because B19013_001E is a variable representing median household income.

Now we need a data spec to download it. Copy the following data spec into a file called income.yaml:

!DataSpec
dataset: ACS5
vintage: 2020
geography:
  state: '*'
specs:
  !VariableList
  variables:
    - NAME
    - B19013_001E
with_geometry: true

This looks an awful lot like some of the data specs we saw before. It says to download the median income variable we want to plot for all states. But there is one difference. In line 11, we added with_geometry: true. This indicates that we should not just download the data, but also a geometric representation of every state we download data for. We need these to plot maps.

Now, we have everything we need to plot out first map. To do so, run

censusdis plot --dataspec income.yaml plot-income.yaml -o income.png

This will download the data specified in the data spec income.yaml, then plot it as specified by plot-income.yaml. The result is saved in the output file income.png, which looks like this:

The color of each state indicates the state’s median income. Brighter greens are higher income states and darker greens are lower income states.

Controlling the Appearance of the Plot

Our first plot was a good start, but we probably want other things, like a title, better colors, and better formatting of the numbers to complete the plot.

We can do all those things by extending our plot specification. Here’s what a more complete spec looks like:

- !PlotSpec
  variable: B19013_001E
  title: "Median Household Income by State"
  legend_format: dollar
  plot_kwargs:
    figsize: [12, 6]
    cmap: YlGn
    vmin: 0
- !PlotSpec
  boundary: true
  plot_kwargs:
    color: black
    linewidth: 0.5

Save this to plot-income2.yaml.

Before we run it, let’s look at the changes we made. First, notice that we have two plot specifications. When there is more than one, we plot them in the order they appear in the file. The first spec tells us to plot the variable we were already plotting. But we also specify a title for the plot on line 3. On line 4, we specify a format for the numbers on the legend. We can choose from int, float, dollar, or percent. If none of those are to our liking, we can specify a Python string format, for example '{x:0.3f}', which would format the numbers as floating point numbers with three digits after the decimal place.

The next part of the specification is plot_kwargs. This is a dictionary of optional arguments that are passed on to matplotlib, which is the underlying system that does the plotting. You can see all the optional arguments that are available here. For now, we have chosen three. The first, figsize, is the size of the figure. Notice how we changed the aspect ratio to make it wide. The second, cmap, is the color map to use. The colormaps we have to choose from are listed here. Finally, we added vmin. This specifies the minimum value of the variable we are plotting, for scaling the color map. Try running with and without the vmin line and note the difference in appearance.

The second plot specification will be plotted on top of the first. We are going to use it to outline each state with a black line so the borders are more visible, especially in cases where neighboring states have very close to the same color. Instead of specifying a variable, we specify boundary: true to indicate that we just want to plot the boundaries of the geographies. Then as in the first plot spec, we add keyword args. The two we need are color on line 12 and linewidth on line 13. Matplotlib offers many colors to choose from. You can see a list of them here.

Now that we have our spec, we can generate the plot with the command

censusdis plot --dataspec income.yaml plot-income2.yaml -o income2.png

The result shows the same data we plotted before, but with all the configuration changes we added. It looks like this:

More Configuration Options

In the example above, we used some, but not all, of the plot configuration options available. All of they available options are documented in the (Python) documentation for the class PlotSpec, which is what the YAML is converted into when you run the command-line interface. Any of the parameters listed there can be put into the YAML file under the tag !PlotSpec, just as we did with variable, title, and so on in our example above.

Seeing White, Revisited

One of the first demo notebooks we created in the early day of the censusdis project was Seeing White.ipynb. It demonstrated how, using Python code in a notebook, we can download data on race and ethnicity, compute the fraction of the population in each county that identifies as white. It also plotted the results on a map of the entire country. We will now demonstrate how we can do approximately the same thing from the command line.

First, we need to download data. We can do that with a data configuration stored in a file called seeing-white-data.yaml. The configuration looks like this:

!DataSpec
dataset: DECENNIAL_PUBLIC_LAW_94_171
vintage: 2020
geography:
  state: ALL_STATES_AND_DC
  county: '*'
specs:
  - !VariableList
    variables:
      - NAME
  - !Group
    group: P1
    denominator: P1_001N
with_geometry: true

The data set we are using is the Public Law 94-171 data set that the U.S. Census publishes every 10 years. This is the data set that is used to reapportion congressional districts among the states. It includes racial and demographic data to help states ensure that they are not biasing their congressional districts for or against any particular groups.

We are going to download data for all counties in all states and DC. As usual, we get the NAME variable to help us with any spot-checking or debugging we want to do. The group we download is P1, which has counts of members of different racial groups. Since we want fractions, we need a denominator, which is P1_001N, the total population. Finally, we want geometry, since our whole goal is to plot a map.

Next, we need a plot configuration. We put that in seeing-white-plot.yaml. It looks like this:

- !PlotSpec
  variable: frac_P1_003N
  title: "White Alone Population as a Percent of County Population"
  legend_format: percent
  plot_kwargs:
    figsize: [12, 6]
    cmap: gray
    vmin: 0.0
    vmax: 100.0

All of this should look familiar from the examples above. We want to plot the fraction of the total population that is white alone. The count of people who identify as white alone is in P1_003N, so the fractional variable we generated will be in frac_P1_003N. This is simply P1_003N / P1_001N on a county-by-county basis based on the denominator config above.

We want for format the legend as a percentage, so it is nice and human readable, so we specify that with legend_format. Finally, we specify a few additional details about the appearance of the plot in the plot_kwargs section.

Now, we can run with these configurations to download our data and generate our plot as follows:

censusdis plot seeing-white-plot.yaml --dataspec seeing-white-data.yaml -o seeing-white.png

This produces the following plot:

This is not 100% the same as what we produced in the notebook. The low-level Python API will always allow us to make additional tweaks and adjustments that are not fully supported in the command-line interface. But it’s a lot lower effort to create a couple of configuration files like those above and get a nice intial plot to give you a feel for what the data looks like, or generate large numbers of plots of different geographies or variables as part of a data pipeline.

Next Steps

With these plots we have just scratched the surface in terms of what censusdis can do. We encourage you to try out your own variations. As a starting point the examples we just looked at are available in the censusdis-cli-demo repository. If there are other aspects of the plots you would like to be able to format or control, by all means let us know by filing an issue or starting a new discussion in the censusdis repository.