September 28, 2022

How do you study race and racism when there’s no consensus on how these terms are used in research? New research from a collaborative team of doctoral students at the Carolina Population Center (CPC), three of which are also students at the UNC Gillings School of Global Public Health, tackles this question and provides guidance for researchers across disciplines.

The doctoral students – who come from the fields of epidemiology, sociology, anthropology and health policy – were inspired by Thomas LaVeist’s 1996 article, “Why we should continue to study race… but do a better job: an essay on race, racism and health,” and began to look at how race was conceptualized, operationalized and used in academic research journals.

Their findings were published recently in the American Journal of Epidemiology. Authors Rae Anne Martinez (Epidemiology), Nafeesa Andrabi (Sociology), Andrea Goodwin (Sociology), Rachel Wilbur (Anthropology), Natalie  Smith (Health Policy and Management) and Paul Zivich (Epidemiology) recently answered questions in an interview with CPC about their work, which was also the subject of a six-part series on thoughtfully using race and ethnicity in population health research on the Interdisciplinary Association for Population Health Science (IAPHS)’s blog.

How were authors defining (or not defining) race and ethnicity?

We reviewed United States human-subject research from five epidemiology journals: American Journal of Epidemiology, Annals of Epidemiology, Epidemiology, Journal of Clinical Epidemiology, Journal of Epidemiology and Community Health, published between 1995 and 2018. Of all the articles we found, 329 included data on participants’ race and/or ethnicity, and only four of these studies defined race and/or ethnicity. One case combined race and ethnicity into an ethnoracial construct and defined using it as a proxy measure of respondent culture. Two studies referenced the U.S. Census Office of Management and Budget’s definitions, and another conceptualized ethnicity as a marker of cultural beliefs about health. But the remaining 325 studies included race and/or ethnicity data without providing any definitions of the constructs.

Why does that matter?

We believe that the profound absence of definitions of race and/or ethnicity is indicative of a few key issues within the field and, broadly, U.S. society.

First, we may not define race and ethnicity because we believe that what these constructs mean is obvious… but it isn’t. There are competing perspectives on what race “is,” ranging from biologically essentialist views that wrongfully deem race an innate, biological characteristic to social constructionist views that deem race a categorizing system created by people in the last few centuries primarily to enact and establish white Christian supremacy. If we don’t define what race means or say why we’re including it in our research, we assume that it means the same thing to everyone and that it means the same thing across time. This allows for interpretations of the findings that can be biologically essentialist and cause further harm to racialized and minoritized people in the U.S.

There was a stark difference in how race and ethnicity data were treated in comparison to other data, which may represent a broader sense that race and ethnicity exist in the background but don’t matter enough to get words on the page. In contrast, we often found robust definitions for other variables like diabetes, which included information about the use of biomarker data or physician diagnosis, clinical or diagnostic cut-offs, type of diabetes studied, and justification for that type.

Second, we may not define race and/or ethnicity or explain mechanistically why we include them in our study design because we, in fact, don’t know the answer. Instead, we may be participating in the act of ritualistic regression. Increasingly, people living in the U.S., especially people racialized as white, are having to confront the notion of race and racism, yet the extent to which people understand the ideology and systems of racism and race may be limited. This is coming across in our science. What we find are study designs that go through the motion of including race and ethnicity data because we know we should, rather than engaging in meaningful reflection to ask how would race and ethnicity impact the outcomes we are looking at, what measure of race and ethnicity would best be suited for the kinds of research questions we are asking and the extent to which we are actually able to include the dimensions of race and/or ethnicity that are relevant to our studies (i.e., skin tone versus closed-ended self-identification).

How did you get the idea for doing this research?

We are all members of the inaugural “Biosocial” T32 training program cohort here at CPC. Through shared training in the T32 program, we had many discussions regarding our own disciplines and research. We noticed that, across our discipline-specific training, we had all been exposed to critiques, commentaries and recommendations for incorporating race and ethnicity thoughtfully into population health scholarship. These commentaries largely called for population health researchers to abandon practices of ritualistic regression and critically examine and explain the role of these constructs in health scholarship, though differed across subjects. In an early group meeting we discussed Thomas LaVeist’s 1996 article, “Why we should continue to study race… but do a better job: an essay on race, racism, and health,” and thought it was particularly motivating as an early call to action in better conceptualizing, operationalizing and interpreting race in health scholarship.

However, we also noticed a big discrepancy. The larger body of literature we were being exposed to as students didn’t seem to be matching the changes or actions these commentaries were calling for in terms of thoughtfully engaging with race and ethnicity. So, in this project, we set out to understand if our own disciplines – epidemiology, clinical medicine and medical sociology – had risen to these calls for action and if we were doing “a better job” conceptualizing, operationalizing, and using race and ethnicity in health scholarship. (Spoiler – we weren’t – and likely still aren’t.)

What are some of the complications in coding race and ethnicity?

There are a lot, and while we won’t go into detail here, we can share a bit about what stood out.

In the studies we reviewed, racial coding schemes centered whiteness through coding schemes like “white, non-white.” This ties back to the value of defining race and ethnicity and justifying the inclusion of this data in study designs. What is the value of a non-white group, and what can we meaningfully infer about this group within this study design?

We commonly saw an ambiguous “other” category, where authors did not describe who was included in “other” or how to interpret this category within the analyses. The continued reliance and inclusion of this “other” category may systematically be excluding and leading to the erasure of smaller racial and ethnic groups that ultimately may not get attention or distribution of resources despite vulnerabilities or health disparities.

We also found that most studies that included data on participants’ race and ethnicity often collapsed this data into an ethnoracial construct. Again, this is largely happening without defining race and ethnicity and without describing why it’s theoretically appropriate to combine these into one construct. Collapsing race and ethnicity for the purposes of analyses assumes that these two constructs are capturing similar information and have similar relationships to health outcomes such that we can make meaningful comparisons between groups. In the absence of definitions and justifications, it’s unclear whether the decision to use an ethnoracial construct is intentional and motivated by theory or the study question, or unintentional in that it’s a limitation of the data structure or perhaps just ritualistic practice.

What should researchers think of as they do this work?

The elements we examined in our study – construct definitions, measures, coding and scientific rationale – we consider to be “core” or “base” methods. These correspond to relatively simple questions, like “what is this construct? Why is it important to the study question and population? How do I measure it? How do I code it?” We answer these all the time for our exposures, outcomes and primary effect measure modifiers, even if it is not a formalized part of our research practice or analysis plan. That’s just how fundamental they are.

However, these simple questions can be really difficult with respect to constructs that are normalized to be “common sense” or a natural part of the world (i.e., race, ethnicity, gender). We learn race over our lifetimes through interactions and social, political and educational institutions. What “race” is, the number of and boundaries between “racial groups,” and the scientific relevancy of race may all seem so obvious that we rarely, if ever, attempt to explicitly answer these questions. Ultimately, that impacts our science. When we do attempt to answer them, it may be so uncomfortable we are dissuaded from continuing. We may feel overwhelmed in not knowing where to start, we may feel shame in not having started earlier, or we may feel unsure in how to do so “correctly.”

So, more than anything else, we think it is about actually doing work to answer these seemingly simple questions. Answering these questions explicitly can help us identify or address internalized misconceptions. Moreover, omitting construct definitions, measures, coding and scientific rationale from publications is a threat to scientific rigor and reproducibility. We do recognize that not everyone is going to be a race scholar, but this is where we can lean on the strength of interdisciplinary teams.

We’ve presented more of these thoughts in an academic blog series: “Thoughtfully Measuring and Interpreting Race In Population Health Research.” Find the full series online.

What are you planning to do next?

Well, we are still working to publish the results from the other disciplines (medicine and medical sociology)!

We also used the empirical evidence from our review to inform the development of interactive workshops for a range of audiences, which covers the theoretical considerations of race and ethnicity in health research, tools for communication, and highlights opportunities for improvement at the individual level. We’ve been offering the workshop to a variety of audiences this past year.

Our next virtual offering of the workshop will be in November hosted by IAPHS, which interested folks can check out on their website.

Is there anything else you’d like to add?

Since 2020, we’ve seen many population health researchers asking “what can I do” and “how do I improve?” when thinking about equitable and just research. There are many changes we can make as individuals and collectively. One of the small steps you can take right now is to take a look at your language. In the manuscript you are working on right now, you can examine the terms you use for racial and ethnic groups (e.g., “Hispanic” vs “Latino,” “Black” vs “African American”), what is being capitalized or not (e.g., “White” or “white”), or how we are describing communities (e.g., “vulnerable”) and the decisions behind these choices.

Language holds a lot of power and history. It can clarify or obscure, justify or delegitimize, and so we should be critical and thoughtful in all aspects of its use.

Contact the UNC Gillings School of Global Public Health communications team at

Visit our communications and marketing team page.
Contact with any media inquiries or general questions.

Communications and Marketing Office
125 Rosenau Hall
CB #7400
135 Dauer Drive
Chapel Hill, NC 27599-7400