significant homophily
non-significant homophily


The tendency to associate with individuals of the same gender creates profound divisions within professional and social contexts. We investigate this homophily tendency within the scholarly literature. We examine co-authorships within millions of papers extracted from the JSTOR corpus from 1960 to 2011. We develop a set of methods with minimual assumptions that account for field structure and control for confounding induced by the heterogeneity of the scholarly landscape--most notably differing women representation across fields and sub-fields. We find that controlling for these confounding factors does not fully explain the tendency to co-author with individuals of the same gender. This suggests that, across wide swaths of the scholarly landscape, gender plays an active role in team formation. The interactive visualization presented above explores the results of this study for various discipolines. Details of the method and results can be found in the preprint article posted on the arXiv:

Wang, Y.S., Lee, C.J., West, J.D., Bergstrom, C.T., Erosheva, E.A. (2019) Gender-based homophily in collaborations across a heterogeneous scholarly landscape. arXiv:1909.01284

How to use it

Mouse over any field to reveal statistics about that field. Left-click on any field to zoom in to that field and its subdisciplines. Right click, or click on the left arrow at top right, to move back up to higher levels of structure. Hover over a field to see the histogram and associative statistics.

 The gender browser provides a multiscale view of gender homophily across scholarly publishing. The size of the box indicates the size of a field relative to the top-level field. The grey boxes indicate nonexistent fields. Some fields such as ecology and evolution have many layers with many subfields. Smaller fields such as History have less subfields. The green colors indicates the p-value for comparing the expected alpha values with the observed alpha values. The observed alpha values are represented with a red line in the histogram graphs. The darker green colors are statistically significant. Lighter green is less so. For example, "Macroeconomics" has an expected alpha value around 0.03 and an observed alpha value of 0.07. The p-value indicated for this field is 0.030. The legend below provides the cutoff values for each color. There is a small number of fields that have a p-value of "NA" because there were not enough authors to calculate a p-value. They are indicated with a light, faded blue color.


How it works

We use the hierarchical map equation to uncover the structure of disciplines, subdisciplines, specialties, subspecialties, and so forth in the JSTOR corpus, based upon the network of citations among over 7,000,000 scholarly articles spanning the period from 1960 to 2011. This generates the hierarchical classification of scholarly activities revealed in the gender browser. We have named each field manually by inspecting the papers therein.

For each author of each paper in the collection, gender is determined by extracting the given (first) name, and looking at the gender distribution of this name in the US census database; gender is recorded only when we can assign gender with >95 confidence. This is a probabilistic approach and may incorrectly impute gender in some cases, including intersex, transgender, and/or non-binary authors.


The JSTOR gender browser has been developed by: