Gender Homophily in Science
Background
The tendency to associate with individuals of the same gender creates profound divisions within professional and social contexts. We investigate this homophily tendency within the scholarly literature. We examine co-authorships within millions of papers extracted from the JSTOR corpus from 1960 to 2011. We develop a set of methods with minimual assumptions that account for field structure and control for confounding induced by the heterogeneity of the scholarly landscape--most notably differing women representation across fields and sub-fields. We find that controlling for these confounding factors does not fully explain the tendency to co-author with individuals of the same gender. This suggests that, across wide swaths of the scholarly landscape, gender plays an active role in team formation. The interactive visualization presented above explores the results of this study for various discipolines. Details of the method and results can be found in the preprint article posted on the arXiv:
Wang, Y.S., Lee, C.J., West, J.D., Bergstrom, C.T., Erosheva, E.A. (2019) Gender-based homophily in collaborations across a heterogeneous scholarly landscape. arXiv:1909.01284
How to use it
Mouse over any field to reveal statistics about that field. Left-click on any field to zoom in to that field and its subdisciplines. Right click, or click on the left arrow at top right, to move back up to higher levels of structure. Hover over a field to see the histogram and associative statistics.
The gender browser provides a multiscale view of gender homophily across scholarly publishing. The size of the box indicates the size of a field relative to the top-level field. The grey boxes indicate nonexistent fields. Some fields such as ecology and evolution have many layers with many subfields. Smaller fields such as History have less subfields. The green colors indicates the p-value for comparing the expected alpha values with the observed alpha values. The observed alpha values are represented with a red line in the histogram graphs. The darker green colors are statistically significant. Lighter green is less so. For example, "Macroeconomics" has an expected alpha value around 0.03 and an observed alpha value of 0.07. The p-value indicated for this field is 0.030. The legend below provides the cutoff values for each color. There is a small number of fields that have a p-value of "NA" because there were not enough authors to calculate a p-value. They are indicated with a light, faded blue color.
How it works
We use the hierarchical map equation to uncover the structure of disciplines, subdisciplines, specialties, subspecialties, and so forth in the JSTOR corpus, based upon the network of citations among over 7,000,000 scholarly articles spanning the period from 1960 to 2011. This generates the hierarchical classification of scholarly activities revealed in the gender browser. We have named each field manually by inspecting the papers therein.
Given the scale of our dataset, we use Social Security and crowd sourced population records to impute (i.e., estimate the values of the missing data for the purpose of our analysis) the gender for authorships based on how these given names tend to be gendered. Due to limitations of the population-scale data, we are limited to inferring the gender of men and women authors and acknowledge the inability to include intersex and non-binary identities in our imputation. For each authorship, we treat gender as known if the respective first name--or one of the first names in case of double names--is used for either men only or women only at least 95% of the time.
Personnel
The JSTOR gender browser has been developed by: