Alternative Methods for Grouping Race and Ethnicity to Monitor COVID-19 Outcomes and Vaccination Coverage
Weekly / August 13, 2021 / 70(32);1075–1080
Paula Yoon, ScD1; Jeffrey Hall, PhD1; Jennifer Fuld, PhD1; S. Linda Mattocks, MPH1; B. Casey Lyons, MPH1; Roma Bhatkoti, PhD1; Jane Henley, MSPH1; A.D. McNaghten, PhD1; Demetre Daskalakis, MD1; Satish K. Pillai, MD1 (View author affiliations)
View suggested citationSummary
What is already known about this topic?
Analyses of race and ethnicity in COVID-19 data to identify and monitor disparities are complicated by missing or unknown data.
What is added by this report?
Methods that use more race information when ethnicity information is missing resulted in higher estimated COVID-19 case counts, incidence, and vaccination coverage for most racial groups studied; however, these methods have limitations and warrant further examination of potential bias.
What are the implications for public health practice?
Ongoing work with experts is needed to identify methods for optimizing race and ethnicity data when data are incomplete. Multiple data sources are needed to monitor disparities and continued efforts are needed to strengthen the reporting of these data, consistent with CDC’s Data Modernization Initiative.
Population-based analyses of COVID-19 data, by race and ethnicity can identify and monitor disparities in COVID-19 outcomes and vaccination coverage. CDC recommends that information about race and ethnicity be collected to identify disparities and ensure equitable access to protective measures such as vaccines; however, this information is often missing in COVID-19 data reported to CDC. Baseline data collection requirements of the Office of Management and Budget’s Standards for the Classification of Federal Data on Race and Ethnicity (Statistical Policy Directive No. 15) include two ethnicity categories and a minimum of five race categories (1). Using available COVID-19 case and vaccination data, CDC compared the current method for grouping persons by race and ethnicity, which prioritizes ethnicity (in alignment with the policy directive), with two alternative methods (methods A and B) that used race information when ethnicity information was missing. Method A assumed non-Hispanic ethnicity when ethnicity data were unknown or missing and used the same population groupings (denominators) for rate calculations as the current method (Hispanic persons for the Hispanic group and race category and non-Hispanic persons for the different racial groups). Method B grouped persons into ethnicity and race categories that are not mutually exclusive, unlike the current method and method A. Denominators for rate calculations using method B were Hispanic persons for the Hispanic group and persons of Hispanic or non-Hispanic ethnicity for the different racial groups. Compared with the current method, the alternative methods resulted in higher counts of COVID-19 cases and fully vaccinated persons across race categories (American Indian or Alaska Native [AI/AN], Asian, Black or African American [Black], Native Hawaiian or Other Pacific Islander [NH/PI], and White persons). When method B was used, the largest relative increase in cases (58.5%) was among AI/AN persons and the largest relative increase in the number of those fully vaccinated persons was among NH/PI persons (51.6%). Compared with the current method, method A resulted in higher cumulative incidence and vaccination coverage rates for the five racial groups. Method B resulted in decreasing cumulative incidence rates for two groups (AI/AN and NH/PI persons) and decreasing cumulative vaccination coverage rates for AI/AN persons. The rate ratio for having a case of COVID-19 by racial and ethnic group compared with that for White persons varied by method but was <1 for Asian persons and >1 for other groups across all three methods. The likelihood of being fully vaccinated was highest among NH/PI persons across all three methods. This analysis demonstrates that alternative methods for analyzing race and ethnicity data when data are incomplete can lead to different conclusions about disparities. These methods have limitations, however, and warrant further examination of potential bias and consultation with experts to identify additional methods for analyzing and tracking disparities when race and ethnicity data are incomplete.
To improve monitoring of COVID-19–associated outcomes among racial and ethnic groups, CDC used three methods for grouping persons by race and ethnicity to analyze the following six indicators: 1) COVID-19 case counts, 2) cumulative incidence, 3) rate ratios for COVID-19 infection, 4) number of fully vaccinated persons, 5) cumulative vaccination coverage rates, and 6) rate ratios for being fully vaccinated. The method for grouping race and ethnicity used by CDC (current method) begins by grouping persons with Hispanic ethnicity as Hispanic, regardless of race, then groups persons with reported race and non-Hispanic ethnicity as race category, non-Hispanic (which excludes persons with missing or unknown ethnicity and those with non-Hispanic ethnicity and missing or unknown race). The current method was compared with two alternative methods (methods A and B) that have been used previously (2,3). Method A first groups persons based on Hispanic ethnicity (as with the current method) and then groups persons with known race and non-Hispanic ethnicity or unknown or missing ethnicity as race category, non-Hispanic (persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded). Method B groups all persons with Hispanic ethnicity as Hispanic, regardless of race, and persons with reported race and Hispanic, non-Hispanic, unknown, or missing ethnicity are grouped by race category; persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded. Notably, with method B, the groups are not mutually exclusive (Box).
Daily confirmed COVID-19 cases in the United States during January 1, 2020–May 31, 2021, were obtained from CDC’s case-based surveillance system.* Daily data about COVID-19 vaccine doses administered in the United States during December 14, 2020–May 31, 2021, including full vaccination status, were collected by vaccination providers and reported to CDC by multiple sources.† In the case and vaccination data sent to CDC, race was reported as White, Black, AI/AN, Asian, NH/PI, more than one race, other race, unknown race, or missing race. Ethnicity was reported as Hispanic or Latino (Hispanic), non-Hispanic, unknown ethnicity, or missing ethnicity. COVID-19 incidence and vaccination coverage rates were calculated using the 2019 U.S. Census Bureau’s annual resident population estimates.§ The current method and method A used the same population groupings (denominators) for rate calculations (Hispanic persons for the Hispanic group and race category, non-Hispanic persons for the different racial groups). Method B denominators were Hispanic persons for the Hispanic group and persons of Hispanic or non-Hispanic ethnicity for the different racial groups. Rate ratios were used to compare relative differences in COVID-19 incidence and full vaccination coverage rates between racial and ethnic groups. The comparator for the current method and method A was White, non-Hispanic persons and for method B was White persons. This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy.¶
During January 1, 2020–May 31, 2021, U.S. states and four territories reported 26,724,149 COVID-19 cases to CDC. Among these reports, information on race, ethnicity, or both was missing from 26.7%, 35.2%, and 21.7% of reports received, respectively. During December 14, 2020–May 31, 2021, based on vaccine administration data reported to CDC, 126,692,891 fully COVID-19–vaccinated persons were reported in the United States; information on race, ethnicity, or both was missing from 23.1%, 31.7%, and 19.5% of these reports, respectively.
Among persons of Hispanic ethnicity, the numbers of COVID-19 cases and persons fully vaccinated, and population incidence and vaccination coverage rates were the same across the three methods for grouping race and ethnicity (Table 1). Methods A and B resulted in more COVID-19 cases and fully vaccinated persons assigned to a racial group compared with the current method because of the inclusion of persons with unknown or missing ethnicity information. Compared with the current method, method A resulted in case counts that were 16.6% to 37.2% higher across race groups, with the largest relative increase in the AI/AN, non-Hispanic group (37.2%). For method B, for which racial and ethnic groups were not mutually exclusive, the percentage increase in case counts compared with the current method ranged from 25.7% to 58.5% among the five race categories. The largest relative increase in case counts was in the AI/AN group (58.5%); case counts in White persons also increased (45.1%). The estimated population incidence of COVID-19 varied depending on the classification method used. Compared with the current method, method A resulted in higher cumulative COVID-19 incidences among the five racial groups, with the largest increase among AI/AN, non-Hispanic persons (37.2%). Method B resulted in increased cumulative incidence among Asian persons (21.8%), Black persons (19.6%) and White persons (14.3%), and slight decreases among AI/AN persons (7.9%) and NH/PI persons (1.0%).
Compared with the current method, method A resulted in higher numbers of fully vaccinated persons across all racial groups, ranging from 17.8% (non-Hispanic Asian) to 37.3% (non-Hispanic NH/PI) higher. Method B resulted in 19.4% to 51.6% higher numbers of fully vaccinated persons across the racial groups, with the largest relative increase among NH/PI persons (51.6%). Full vaccination coverage also varied depending on the racial and ethnic classification method used. Compared with the current method, method A resulted in higher numbers of fully vaccinated persons per 100,000 for all racial groups, with the largest increase among non-Hispanic NH/PI persons (37.3%). Method B resulted in coverage increases among all racial groups except AI/AN persons, among whom a 23.7% decrease occurred.
When the current method was used, Hispanic and non-Hispanic NH/PI persons were twice as likely as non-Hispanic White persons to have COVID-19 (Table 2). When method A was used, the rate ratio was highest for non-Hispanic AI/AN (1.76) and non-Hispanic NH/PI (1.84) persons; when method B was used, the rate ratio relative to White persons was highest among Hispanic persons (1.72) and NH/PI persons (1.72). Among Asian persons, the rate ratio for COVID-19 was lower across all three methods (0.66–0.71). NH/PI persons had the highest likelihood of being fully vaccinated when the current method (1.70), method A (1.97), and method B (1.92) were used compared with each method’s reference group.
Discussion
Estimation of COVID-19 incidence and vaccination coverage by race and ethnicity is complicated by missing data. Previous studies have proposed methods for classifying race and ethnicity to address such complexities as multirace responses, but these methods do not consider missing data in circumstances such as a public health emergency in which real-time monitoring and action are needed to identify and address disparities (4,5). The alternative methods used in this study (methods A and B) resulted in the analyses of more data by race, which increased estimates of COVID-19 case counts, incidence, and vaccination coverage among most racial groups. The current method, used by CDC, and method A resulted in mutually exclusive racial and ethnic groups. The denominators for rate calculations are either persons reported as Hispanic or persons reported as a race category and non-Hispanic, with an assumption in method A that persons for whom missing ethnicity data were missing are non-Hispanic. Method A is more commonly used when ethnicity is missing from a small percentage of records and other information in the record supports a non-Hispanic designation. When approximately one-third of records are missing ethnicity, as in this report (35% for case and 32% for vaccination coverage data), that assumption might attenuate or amplify disparities for certain groups. With method B, the race and ethnicity groups are not mutually exclusive. This complicates comparisons that use a reference group (often White persons), because the race and ethnicity categories overlap.
The findings in this report are subject to at least four limitations. First, because the analysis did not include persons who identified as multiple races or other race, conclusions cannot be drawn about the use of the alternative methods for grouping and analyzing these racial categories. Second, this report did not explore all possible analytic methods for grouping race and ethnicity. For example, imputation (i.e., replacing missing data with other values) has been examined as a potential method to improve estimates of COVID-19 racial and ethnic disparities (6). Third, data shared with CDC might undercount COVID-19 cases and vaccination coverage and this undercount might differ by race or ethnicity. Finally, although progress has been made to incorporate the Office of Management and Budget standards (such as Statistical Policy Directive No. 15) into the collection and presentation of race and ethnicity data, some data collection efforts still do not fully use this guidance (7).
Although race and ethnicity are not the only measures for assessing health disparities, these measures have been integral to CDC’s understanding of the health outcomes associated with COVID-19 (8–10). This analysis demonstrates that alternative methods for analyzing race and ethnicity data when data are incomplete can lead to different interpretations about disparities and highlights the importance of working with experts to identify methods for analyzing and tracking disparities when race and ethnicity data are incomplete. CDC uses multiple data sources to monitor disparities in COVID-19 outcomes and will continue to optimize the available data and work with jurisdictions to strengthen reporting of these data consistent with CDC’s COVID-19 Response Health Equity Strategy ** and Data Modernization Initiative.††
Acknowledgments
River Pugsley, Derek Cox, Elizabeth Arias, Jacqueline Lucas, Jasmine Nelson, Andrea Gentzke.
Corresponding author: Paula Yoon, pyoon@cdc.gov.
All authors have completed and submitted the International Committee of Medical Journal Editors form for disclosure of potential conflicts of interest. No potential conflicts of interest were disclosed.
* All 50 states, the District of Columbia, New York City, and four U.S. territories (Guam, Northern Mariana Islands, Puerto Rico, and U.S. Virgin Islands) electronically submit standardized information for individual cases of COVID-19 to CDC via a case report form developed for the CDC COVID-19 response (https://www.cdc.gov/coronavirus/2019-ncov/php/reporting-pui.html) or via the CDC National Notifiable Diseases Surveillance System (https://www.cdc.gov/nndss/action/covid-19-response.html).
† COVID-19 vaccine administration data are reported to CDC by multiple entities using immunization information systems, the Vaccine Administration Management System, pharmacy systems, or direct submission of electronic health records. (https://www.cdc.gov/coronavirus/2019-ncov/vaccines/distributing/about-vaccine-data.html). CDC counts persons as being fully vaccinated if they received 2 doses on different days (regardless of time interval) of the 2-dose mRNA vaccine series or received 1 dose of a single-dose vaccine.
§ https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html
¶ 45 C.F.R. part 46.102(l)(2), 21 C.F.R. part 56; 42 U.S.C. Sect. 241(d); 5 U.S.C. Sect. 552a; 44 U.S.C. Sect. 3501 et seq.
** https://www.cdc.gov/coronavirus/2019-ncov/downloads/community/CDC-Strategy.pdf
†† https://www.cdc.gov/surveillance/surveillance-data-strategies/data-IT-transformation.html
References
- Office of Management and Budget. Revisions to the standards for the classification of federal data on race and ethnicity. Fed Registr 1997 Oct 30;62(210):1–9. https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf
- CDC. Sexually transmitted disease surveillance 2019. Technical notes. Atlanta, GA: US Department of Health and Human Services, CDC; 2021. Accessed July 29, 2021. https://www.cdc.gov/std/statistics/2019/technical-notes.htm
- Wong CA, Dowler S, Moore AF, et al. COVID-19 vaccine administration, by race and ethnicity—North Carolina, December 14, 2020–April 6, 2021. MMWR Morb Mortal Wkly Rep 2021;70:991–6. https://doi.org/10.15585/mmwr.mm7028a2 PMID:34264909
- Klein DJ, Elliott MN, Haviland AM, et al. A comparison of methods for classifying and modeling respondents who endorse multiple racial/ethnic categories: a health care experience application. Med Care 2019;57:e34–41. https://doi.org/10.1097/MLR.0000000000001012 PMID:30439794
- Mays VM, Ponce NA, Washington DL, Cochran SD. Classification of race and ethnicity: implications for public health. Annu Rev Public Health 2003;24:83–110. https://doi.org/10.1146/annurev.publhealth.24.100901.140927 PMID:12668755
- Labgold K, Hamid S, Shah S, et al. Estimating the unknown: greater racial and ethnic disparities in COVID-19 burden after accounting for missing race and ethnicity data. Epidemiology 2021;32:157–61. https://doi.org/10.1097/EDE.0000000000001314 PMID:33323745
- Douglas MD, Respress E, Gaglioti AH, et al. Variation in reporting of the race and ethnicity of COVID-19 cases and deaths across US states: April 12, 2020, and November 9, 2020. Am J Public Health 2021;111:1141–8. https://doi.org/10.2105/AJPH.2021.306167 PMID:33856884
- Barry V, Dasgupta S, Weller DL, et al. Patterns in COVID-19 vaccination coverage, by social vulnerability and urbanicity—United States, December 14, 2020–May 1, 2021. MMWR Morb Mortal Wkly Rep 2021;70:818–24. https://doi.org/10.15585/mmwr.mm7022e1 PMID:34081685
- Garg S, Kim L, Whitaker M, et al. Hospitalization rates and characteristics of patients hospitalized with laboratory-confirmed coronavirus disease 2019—COVID-NET, 14 states, March 1–30, 2020. MMWR Morb Mortal Wkly Rep 2020;69:458–64. https://doi.org/10.15585/mmwr.mm6915e3 PMID:32298251
- Smith AR, DeVies J, Caruso E, et al. Emergency department visits for COVID-19 by race and ethnicity—13 states, October–December 2020. MMWR Morb Mortal Wkly Rep 2021;70:566–9. https://doi.org/10.15585/mmwr.mm7015e3 PMID:33857062
BOX. Methods for grouping race and ethnicity* for COVID-19 cases, January 1, 2020–May 31, 2021, and fully vaccinated persons, December 14, 2020–May 31 — United States, 2021
Current method
Race/Ethnicity groups
- American Indian or Alaska Native, non-Hispanic
- Asian, non-Hispanic
- Black or African American, non-Hispanic
- Hispanic
- Native Hawaiian or Other Pacific Islander, non-Hispanic
- White, non-Hispanic
Grouping method
1.Persons with Hispanic ethnicity are grouped as Hispanic, regardless of race.
2.For the remaining records, persons with reported race and non-Hispanic ethnicity, are grouped as race category, non-Hispanic.
3.Persons with missing or unknown ethnicity are excluded even if race is reported, and persons with non-Hispanic ethnicity and missing or unknown race are excluded.
Method A
Race/Ethnicity groups
- American Indian or Alaska Native, non-Hispanic
- Asian, non-Hispanic
- Black or African American, non-Hispanic
- Hispanic
- Native Hawaiian or Other Pacific Islander, non-Hispanic
- White, non-Hispanic
Grouping method
1.Persons with Hispanic ethnicity are grouped as Hispanic, regardless of race.
2.For the remaining records, persons with reported race and non-Hispanic, unknown, or missing ethnicity, are grouped as race category, non-Hispanic.
3.Persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded.
Method B
Race/Ethnicity groups
- American Indian or Alaska Native
- Asian
- Black
- Hispanic
- Native Hawaiian or Other Pacific Islander
- White
Grouping method
1.For all records, persons with Hispanic ethnicity are grouped as Hispanic, regardless of race.
2.Persons with reported race and ethnicity that is Hispanic, non-Hispanic, unknown, or missing are grouped by race category.
3.The groups are not mutually exclusive.
4.Persons with missing or unknown race and missing or unknown or non-Hispanic ethnicity are excluded.
*Multiracial and other race were excluded from analysis.
Suggested citation for this article: Yoon P, Hall J, Fuld J, et al. Alternative Methods for Grouping Race and Ethnicity to Monitor COVID-19 Outcomes and Vaccination Coverage. MMWR Morb Mortal Wkly Rep 2021;70:1075–1080. DOI: http://dx.doi.org/10.15585/mmwr.mm7032a2.
MMWR and Morbidity and Mortality Weekly Report are service marks of the U.S. Department of Health and Human Services.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the U.S. Department of
Health and Human Services.
References to non-CDC sites on the Internet are
provided as a service to MMWR readers and do not constitute or imply
endorsement of these organizations or their programs by CDC or the U.S.
Department of Health and Human Services. CDC is not responsible for the content
of pages found at these sites. URL addresses listed in MMWR were current as of
the date of publication.
All HTML versions of MMWR articles are generated from final proofs through an automated process. This conversion might result in character translation or format errors in the HTML version. Users are referred to the electronic PDF version (https://www.cdc.gov/mmwr) and/or the original MMWR paper copy for printable versions of official text, figures, and tables.
Questions or messages regarding errors in formatting should be addressed to mmwrq@cdc.gov.