On the Diversity of Similarity and the Similarity of Diversity

Introduction

After I published I Want to Visit Ligonier, Indiana, Peter Shapiro reached out to me with some interesting questions about the method I used to measure diversity and integration. I’ve thought about it a bit, and how it’s different from the approach Peter has used in some of his work. My conclusion is that it’s important for anyone interested in the diversity and integration of communities to understand both methods, how they relate to one another, and what each can tell us.

Let’s start with a chart showing the demographics of three census tracts (areas where a few thousand people live) in different parts of the country.

All three of these tracts have one large group that makes up 50-60% of the population. In the tract in Hawaii, that group is Asian. In the tract in Georgia it is Black, and in the tract in Texas it is white. Another roughly 30% of each tract is made up of the next two largest groups, and all the other groups are small if not zero.

Under one school of thought, these tracts are about equally diverse. Sure, they have different large groups, but their relative sizes are close. Under another school of thought, these three tracts are very different. The tract in Hawaii does not look at all like the overall population of the U.S., whereas the tract in Texas does. The one in Georgia is somewhere in between.

Defining Diversity

When I talk about diversity, I tend to take the first view. I see these three tracts as almost exactly the same. Without going into the mathematical details of how the computation is done (see Diversity and Integration in America: An Interactive Visualization. if you are interested), I give these three tracts diversity scores of 58.1%, 58.1%, and 58.0%.

The intuition behind this definition of diversity is that in a diverse area, any given resident has a high probability of encountering residents who belong to other ethnic groups in their day to day life. So an area that is 40% white, 30% Black, 20% Hispanic or Latino, and 10% Asian would have a diversity score of 70%. More importantly, a second area that was 10% white, 20% Black, 30% Hispanic or Latino, and 40% Asian would be exactly as diverse as the first one, with the same 70% diversity score. It doesn’t matter which particular group is the largest or smallest; it only matters that the four group sizes are the same.

Defining Similarity

Peter Shapiro’s work, in contrast to mine, relies on an approach that measures how similar each tract is to the country as a whole. I call this metric similarity. It is the opposite of a metric commonly called dissimilarity (for more details on the similarity/dissimilarity approach, see this article.)

The whole country is 60% white, 18% Hispanic or Latino, 12% Black, 6% Asian, and smaller percentages of other groups. This is very close to the percentages in tract 37.11 in Texas. The result is that this tract has a similarity score of 97.8%, the highest of any tract in the country. The other two tracts are further from the overall nationwide number. When we calculate their similarity, the Hawaiian tract scores 14.9% and the Georgia tract scores 56.3%.

When it comes to similarity, in contrast to diversity, which group represents which percentage of the population matters. Thus our hypothetical examples get different similarity scores. An area that is 40% white, 30% Black, 20% Hispanic or Latino, and 10% Asian would have a similarity score of 69.8%. More importantly, a second area that was 10% white, 20% Black, 30% Hispanic or Latino, and 40% Asian would have a similarity score of only 47.9%. The reason is that the first area, which has more white people and fewer Asian people, more closely resembles the U.S. as a whole.

The intuition behind the similarity metric and why one might want to use it is as follows: if race and ethnicity were not factors in where people could or did choose to live, then every tract in the country would look like tract 37.11. People would be evenly distributed across the country without any regard to race or ethnicity. From a public policy standpoint, if our desire is to foster widespread, if not universal, integration, it is obviously not possible for the proportion of any demographic group in a specific location to substantially exceed the national average for that group. To put it another way, if Blacks make up 12% of the population, and one location is 25% Black, then other locations must be less than 12% Black to make up the difference. If our societal goal is to have a truly integrated nation, then the goal of similarity is compelling. 

What’s the Difference?

So how do we reconcile these numbers? Three census tracts certainly aren’t enough to do so. So I widened the lens and looked at census tracts across the country. My goal was to try to understand how diversity and similarity are related and what they can tell us, either individually or together.

Specifically, I wanted to know:

  1. How are diversity and similarity related? When one goes up does the other also go up? Is one always larger or smaller than the other? Is one a simple function of the other?
  2. In what way do communities that get similar scores look similar or different? Since we are reducing the demographics of a community to a single number, we’d hope that communities that score similarly are similar in other ways.
  3. Conversely, in what ways to communities that look similar get similar or different scores? To some extent this depends of what we mean by looking similar.
  4. Taken together, do diversity and similarity tell us more than either does on its own?

Getting Started

To answer these questions, the first thing I did is generate a plot of diversity against similarity. Just eyeballing a scatter plot can tell us a lot before we start doing any mathematical or statistical analysis or start talking about correlation or any kind of functional relationship.

I started with data from the U.S. Census American Community Survey 5-Year data from 2020. This data covers all kinds of topics, but the group of variables I chose, which goes by the catchy name of B03002, estimates the number of people of each of several racial and ethnic groups in a given area. The areas I chose to look at are called census tracts. A typical census tract has a few thousand people. In rural areas some census tracts cover many square miles. In dense urban areas, a census tract may cover just a few blocks. I looked at the 83,479 tracts out of the 84,414 the census identifies that had at least one hundred residents. Many of the rest had zero residents and only exist to fill in geographic space.

For people who don’t identify as Hispanic or Latino, I counted the number of people of each racial group the data tracks–white, Black, Asian, and so on. I counted all Hispanic and Latino people, regardless of race, as a single group. From here on out, when I refer to a racial group, such as white, Black, or Asian, I implicitly mean people of that race who do not identify as Hispanic or Latino. When I refer to people who are Hispanic or Latino, they could be any race.

Calder Plots

I downloaded the B03002 data using the censusdis open source package. I then used the divintseg package to compute the diversity of each tract and each tract’s similarity to the demographics of the U.S. as a whole. I then plotted diversity vs. similarity for all 83,479 tracts. Here is the scatter plot I got:

I don’t know about you, but when I first saw this picture I was pretty shocked. It looks more like the silhouette of a sculpture by Alexander Calder than it looks like any scatter plot I am used to seeing. Not only is there no obvious correlation between the two variables, but there’s also the strange legs stretching down from about 50% diversity to zero, with empty spaces between them. I went back to my calculations several times, certain that I must have done something wrong to produce such a strange shape.

The Toes

In the end, I convinced myself that my math was correct, nicknamed this plot a Calder plot, and started digging into why the shape might be so strange. The first clue was at the tip of the toes of this strange animal. These points all have zero or very near zero diversity. That means they are made up entirely or almost entirely of members of a single racial or ethnic group. Which were they? Here’s another Calder plot with the tips of the toes highlighted that tells us the answer:

The Legs

In the previous plot, I highlighted the tips of each of each toe. Notice that each one is a tract dominated by a different race. That seemed like a clue. I wondered where other tracts that were overwhelmingly populated by one group fell. So I plotted all the tracts where a single group made up more than 80% of the population and color coded them by what that group was. Here are the results:

Each leg corresponds to a different group. The blue leg, which contains the 29,814 tracts that are more than 80% white, is the largest. The orange leg is barely visible. It contains the 46 tracts (only one out of every 1,814 in the country) that are 80% or more Asian. The brown, purple, and green legs are over 80% Hispanic or Latino, Black and Native American respectively.

So why are these highly homogenous communities grouped into separate legs like this? Diversity-wise, all of the legs top off around the same level right in the neighborhood of 35-36%. This makes sense. A tract that was 80% one group and 20% another would have 32% diversity according to the formula I used. If that 20% were broken up into several smaller groups, the number could get a little higher, theoretically as high as 36% if it were broken into a very large number of small groups. I’ll leave the computation of those bounds as an exercise for the interested reader.

The legs behave similarly when it comes to diversity, so similarity must be where the action is. Why is the blue leg (mostly white tracts) so far to the right and the green one (mostly Native American tracts) so far to the left? The answer starts at the toes. The tip of the blue leg is at 60% similarity. 60% is also the fraction of the overall U.S. population that is white. The tip of the brown leg is at 18%, which is the fraction of the overall U.S. population that is Hispanic or Latino. The tip of the purple leg is at 12%. which is the fraction of the overall U.S. population that is Black. And so on.

It turns out that the mathematical formula for similarity is such that whenever a tract is populated entirely by one group the similarity score for the tract is exactly equal to the fraction of the population in the country that belongs to that group.

The diversity metric, on the other hand, gives all of these homogeneous tracts the same score: 0%.

Similar Diversity and Diverse Similarity

We’ve got some idea of what is going on down in the legs and toes, so what about the rest of the plot? In this section we’ll look at a series of groups of tracts that are similar in diversity but have different similarities. In the next section we’ll cut the other way and look at groups of tracts that are similar in similarity but have different levels of diversity. The goal is to help us understand what aspects of the demographics of a tract drive changes in diversity and similarity and what aspects don’t seem to affect them.

We should note that in choosing the tracts we studied we tried to avoid cherry picking. Instead, in most cases we looked for the min and max tracts on one or the other metric and as close to the median of those two as was available, subject to needing to shift in some cases to avoid the gaps between the legs in the Calder plots.

Revisiting our First Three Tracts

Let’s start by going back to the first three tracts we looked at, one in Hawaii, one in Georgia, and one in Texas. They all had almost exactly the same diversity, but different similarities. Where do they fall on the Calder plot? We highlighted them here:

Vertically they are at almost exactly the same level, 58.0-58.1% diversity. But horizontally they look completely different. Let’s look back at their racial breakdown again.

The one with the lowest similarity is the Hawaiian tract. Its largest group, around 60%, is Asian, which is a relatively small 6% of the U.S. population. Its second largest group, over 20%, is native Hawaiian or other Pacific Islander, which is only 0.6% of the U.S. population. And it hardly has any white population at all. All three of those contribute to the very low similarity it has to the U.S. overall.

Now let’s look at the Georgia tract. Its largest group is Black, but it also has a large enough white population to keep its similarity score from falling as low as the Hawaiian tract.

Finally, the demographics of the Texas tract look almost exactly like those of the country as a whole, so it gets a very high score. As we noted before, as and is apparent on the Calder plot, it has the highest similarity score of any tract anywhere in the country.

75% Diversity Tracts

We saw some no-diversity tracts at the toes of our Calder plot, and some reasonably diverse ones in the body. What about high-diversity tracts? Here’s what three of them look like, first on the Calder plot and then on demographic bar charts.

In order to get such a high score (~75%) on the diversity scale, these tracts have to have significant numbers of each of four or five different groups. But just as with the previous set of three tracts, the one that is the most similar to the U.S population as a whole is the one where white residents are the largest group and Hispanic or Latino residents are the second largest group.

50% Diversity Tracts

If we continue down to tracts with 50% diversity, and then to 25% diversity, we see increasingly large single groups that are larger than all others. And just as in our previous examples, the way to get a higher similarity score at a fixed level of diversity is to have the largest group be white.

25% Diversity Tracts

Given the emphasis similarity puts on the presence of white residents, there are some effects that appear counterintuitive if we want to use similarity as a measure in some sense of goodness or desirability of a given distribution of racial and ethnic groups. For example, compare the bar chart in the 75% diversity for tract 5112 in Plymouth County, Massachusetts to that of tract 1141.05 in Tarrant County, Texas in the 25% diversity section. 1141.05 is overwhelmingly white with a small Hispanic of Latino minority. But it gets a higher similarity score than 5112, which has no extremely large group and has a somewhat balanced ratio of Black and white residents.

Now compare 1141.05 to another one of the 25% diverse tracts, 22.03 in Bernalillo County, New Mexico. 1141.05 and 22.03 are almost mirror images of one another; they simply swap white and Hispanic or Latino populations. That one change reduces the similarity score from 74.4% to 32.8%.

One might argue that the existence of tracts like 23.03 is more problematic that tracts like 1141.05 because these majority minority tracts concentrate groups in a way that goverment services like schools and libraries and private businesses like grocery stores are less likely to adequately serve. There is definitely something to this argument, and I am sure it has been studied to a greater extent than I am currently aware. This is the reason I very much hesitate to advocate for a universal preference for diversity over similarity as a metric, but instead advocate understanding how each behaves and applying them where appropriate given this knowledge.

Similar Similarity and Diverse Diversity

Just as we took horizontal cuts of the Calder plot at different levels of diversity, we can take vertical cuts at different levels of similarity. By this point I trust that you are well versed in reading Calder plots and the demographic bar charts that go along with them. So in the spirit of a picture being worth a thousand words, I’ll jump straight to the charts.

75% Similarity Tracts

50% Similarity Tracts

25% Similarity Tracts

For each group of three tracts at a given level of similarity, the bar charts make it clear how increasing diversity is driven by a move from one large group at low diversity, typically white or Hispanic and Latino to a more even representation at larger diversity. As we move from 75% similarity down to 25%, the group that is large changes from white to Hispanic or Latino.

Conclusions

Returning to our four initial questions, I think we now know enough about how diversity and similarity are related to come up with some good answers.

  1. The empirical relationship between diversity and similarity is complex, as shown by the shape of the Calder plot of all 83,479 tracts.
  2. Tracts with similar diversity scores tend to have distributions that look alike, although not necessarily with the same groups represented in the same way. For example, tracts with high diversity tend to have a number of well-represented groups, whereas tracts with low diversity tend to have one large group. For similarity, the situation is a bit more complex. A tract with low similarity could have one large group, like tract 59.03 in Miami-Dade County, Florida does, or a number of well-represented groups, like tract 89.38 om Honolulu County, Hawaii.
  3. Tracts that look similar, in the sense that the fraction of all groups is about the same, get similar scores for both diversity and similarity. However, groups that have similar distributions, but different largest and second largest groups get similar diversity scores but can get very different similarity scores. For example, tract 78.04 in Honolulu County, Hawaii and tract 104.16 in Lubbock County, Texas have very similar looking distributions, except for the fact that in 78.04 the largest group is Asian and the second largest is two or three races, whereas in 104.16 the largest group is Hispanic or Latino and the second largest in white.
  4. In low-diversity scenarios, similarity actually helps a lot in differentiating tracts that are quite different. We saw this in the 25% diversity tracts. Knowing which leg of the Calder plot different low-diversity tracts fall in, which depends on their similarity score, tells us a lot about the communities that their diversity score alone does not.

There’s plenty more to explore here beyond the points we looked at, and I look forward to doing so in the future.

Acknowledgements

I’d like to thank Peter Shapiro not only for inspiring the research that led to this essay but also for his insightful comments on a draft version.

Update [April 10, 2023]

The code used to produce the plots above is now publicly available on GitHub.