Diversity and Integration – Open Source Code

We are pleased to announce that we have released the code used to produce the data and visualizations behind I Want to Visit Ligonier, Indiana and Diversity and Integration in America: An Interactive Visualization. This code was built on top of the open source packages divintseg and censusdis, which we first released in September of 2022 and have updated regularly since.

The code is in two packages. The first, dis-datagen generates the data behind the interactive map and builds the web site around the map. The second, islands-of-diversity, contains a notebook that was used to generate the smaller more-focused maps of various towns and census tracts that appear in I Want to Visit Ligonier, Indiana.

dis-datagen

Although one of the fundamental features of the map dis-datagen produces is that it is interactive, the site that hosts it is actually static. By static, I mean that no code running on the server side to render it in the browser. Every request the browser makes is an ordinary HTTP GET. Once the site has been built, it can be uploaded to a cloud storage bucket or any other storage that can be accessed via HTTP. Everything on the server side is a static file, such as HTML, CSS, Javascript, or map tiles that represent different areas of the country at different zoom levels.

Some of the map tiles are raster images. These are used at low zoom levels when the map is showing all or nearly all of the country. Others are vector tiles. These represent the boundaries or locations of different features, like states, cities, and census tracts. These are served to the browser, where Javascript renders them for viewing. The tiles are stored in the PMTiles format, which was specifically designed to handle this approach to map rendering. The rendering itself is coordinated by the open source package Leaflet.

dis-datagen builds the site using a GNU Makefile. GNU Make1 has been around since 1988. Despite its age, it remains a useful tool to managing a build process where each artifact depends on one or more others and can be built from them using one or more shell commands.

Without going into all the gory details, the main steps the Makefile orchestrates are:

  1. Download raw census data for census tracts and blocks from the U.S. Census API. Because of the large volume of data, and the fact that the process is not 100% reliable, this is done state-by-state. I often run this part of the process using make’s built-in parallelization support using the -j flag.
  2. Compute diversity and integration at the census tract level and write the results to a geojson file for each state.
  3. Generate raster tiles by clipping vector features and rendering them in the tile area, then saving the results to a series of PNG files.
  4. Generate vector tiles in PMTiles format by reading the geojson files.
  5. Copy the HTML, CSS, and JS files that make up the site to a distribution folder.

Once all of this is done the site simply has to be uploaded to the cloud storage bucket or other web server that will serve it. The whole process takes me about an hour end-to-end on an M1 Pro MacBook Pro, though this varies a lot with the speed of the internet connection I am on.

islands-of-diversity

The islands-of-diversity project loads the data generated by dis-datagen from the web site where it was published (see Diversity and Integration Data). It also uses ARNOLD highway map data and some U.S. Census data downloaded with censusdis.

The project consists of two notebooks. The first, Islands of Diversity.ipynb, loads the dis-datagen data for the eighty thousand plus census tracts in the United States and then compute the diversity of the neighborhood around each of them as explained in I Want to Visit Ligonier, Indiana. It then sorts the islands of diversity (isolated areas of diversity surrounded by non-diverse areas) and identifies various relevant facts like where the top 100 islands are and groups them by state for further analysis.

The second notebook, Ligonier.ipynb, focuses on producing the specific maps used in I Want to Visit Ligonier, Indiana to illustrate the various points it makes about the location and nature of islands of diversity.

Feedback

We have made these projects open-source so that others can read them, reproduce the results, identify any bugs they might encounter, and possibly replicate our approaches in their own research. If you have any feedback, please feel free to submit an issue in either project, or contact us by email at info at datapinions dot com.

Footnotes

  1. We acknowledge that the primary author of GNU Make has been credibly accused of a variety of inappropriate behavior and that a number of organizations have severed financial ties with the Free Software Foundation as a result. We wish there was a viable cross-platform alternative to GNU Make, and if we find one we will attempt to switch to it.