What to know
This page summarizes important information that researchers should understand before using the U.S. Cancer Statistics public use database.
Case inclusions and exclusions
Questions?
Cancer registries that are supported by CDC's National Program of Cancer Registries (NPCR) or the National Cancer Institute's (NCI's) Surveillance, Epidemiology, and End Results (SEER) program report all incident cases coded as in situ (non-malignant), invasive (malignant; primary site only), and non-malignant (including borderline and benign) central nervous system tumors according to the International Classification of Diseases for Oncology, Third Edition (ICD-O-3), with the following exceptions:
- In situ cancers of the cervix are not reported.
- Basal and squamous cell carcinomas of the skin are not reported, except when these occur on the skin of the genital organs.
- In situ cancers of the urinary bladder are re-coded as invasive behavior because the information that distinguishes between in situ and invasive bladder cancers is not always available or reliable. Stage for these cases remains coded as in situ.1
Additionally, in this database:
- Cases with an unknown age or with sex other than male or female have been excluded from the database. The frequency counts will not change based on whether Known Age or Male or Female Sex is checked on the SEER*Stat Selection tab.
- Malignant Behavior is a default selection for this database, as this restriction is used by CDC's NPCR and NCI's SEER Program for generating most official cancer statistics. Malignant behavior is defined by the variable Behavior Code ICD-O-3. This database includes in situ and nonmalignant central nervous system (CNS) cases. These nonmalignant cases can be analyzed by unselecting the Malignant Behavior check box on the SEER*Stat Selection tab.
Impact of COVID-19 on cancer incidence data
In March 2020, the World Health Organization declared COVID-19 a pandemic. Soon after, stay-at-home orders, business and school shutdowns, and travel advisories were implemented in the United States to prevent the spread of COVID-19. Additionally, some health care systems reduced access to routine care. These measures interrupted cancer screening, diagnosis, and care as people postponed or deferred health care visits, particularly between March and May 2020.
The 2023 data submission includes new cancer cases diagnosed in 2020 and 2021, the first and second years of the COVID-19 pandemic. The COVID-19 pandemic disrupted health services, leading to delays and reductions in cancer screening and diagnosis, which may have contributed to lower incidence for most cancer sites in 2020. The number of new cases diagnosed in 2021 are still a little lower for some cancer types but have returned to pre-pandemic counts for other cancer types.2
Impact of COVID-19 on joinpoint trends
The decline in cancer incidence in 2020 was likely an impact of the COVID-19 pandemic; estimates such as cancer incidence trends may be biased as a result. The joinpoint regression model for the analysis of trends was not designed to accommodate a one-year anomaly in data. When using joinpoint regression, inclusion of the 2020 data may influence the location of joinpoints, the value of the trend measure (annual percent change) and provide a poor fit of the model and larger confidence intervals. This may lead to incorrect interpretations of population-level cancer prevention and early detection efforts.
CDC and NCI include the 2020 incidence rates in statistical reports and graphics, but do not include them in joinpoint models. The 2021 incidence data will be included in statistical reports and joinpoint models.2 JoinPoint software allows researchers to exclude incidence data for 2020, 2021, or both years from trend analyses. Exclude 2020 data for incidence trend analyses, but 2021 data can be included in incidence trend analyses.
Suppression rules
Suppressing fewer than 16 cases
The suppression rule34 is fewer than 16 cases for the time period based on rate stability. This suppression rule is applied automatically in this database.
When the number of cases used to compute the incidence rates is small, those rates tend to have poor reliability. Therefore, to discourage misinterpretation and misuse of counts, rates, and trends that are unstable because of the small number of cases, these statistics are not shown in tables and figures if the counts are fewer than 16 for the time period. A count of fewer than about 16 in a numerator results in a standard error of the rate that is about 25% or more as large as the rate itself. Equivalently, a count of fewer than about 16 results in the width of the rate's 95% confidence interval being at least as large as the rate itself. These relationships were derived under the assumption of a Poisson process and with the standard population age distribution close to the observed population age distribution.
Another important reason for employing a cell suppression threshold value is to protect the confidentiality of patients whose data are included in a report by reducing or eliminating the risk of identity disclosure. The cell suppression threshold value of 16 is recommended to protect patient confidentiality given the low level of geographic and clinical detail provided.
Complementary cell suppression
Complementary cell suppression prevents users from subtracting to find suppressed counts. Use this practice when any suppression occurs in the data presentation. In addition, when information from other cells, tables, or figures can be used to determine a suppressed cell, suppress counts and rates for at least one other cell. Use this suppression when a single year or multiple years of data are presented.
- If a single state in the nation is suppressed, suppress counts for the nation. Rates, confidence intervals, and populations can be shown at the national level.
- If a single state in a region is suppressed, suppress counts for the region and the nation. Rates, confidence intervals, and populations can be shown at the regional and national levels.
Race and ethnicity suppression
States have the option to suppress race-specific and Hispanic ethnicity-specific data every submission year. While these states can be included in an aggregated analysis, the affected state's race and ethnicity information cannot be reported at the state level.
The merged system-supplied variable, state race ethnicity suppress, can be used to restrict your analysis to the states that are eligible to be included in a state-level analysis of race and ethnicity combinations. If conducting a state-level analysis of race or ethnicity only, manually make restrictions in the SEER*Stat Selection tab.
The following states have data presentation restrictions:
- Data for Hispanic and non-Hispanic American Indian and Alaska Native people cannot be displayed for Illinois, Kansas, New Jersey, and New York.
- Data for Hispanic Asian and Pacific Islander and Hispanic Black people cannot be displayed for Kansas.
For more information, please refer to the Race recode (W, B, AIAN, API), Origin recode NHIA (Hispanic, Non-Hisp), and Race and origin recode (NHW, NHB, NHAIAN, NHAPI, Hispanic) variable descriptions.
Case-level data
As a further mechanism to protect data confidentiality and due to data sharing agreements with some states, the case listing function in SEER*Stat has been disabled for this database.
Benign central nervous system (CNS) tumors
Cancer registries began collecting information on nonmalignant brain and other central nervous system tumors with cases diagnosed in 2004. Collection of these tumors is in accordance with Public Law 107-260, the Benign Brain Tumor Cancer Registries Amendment Act, which mandates that NPCR registries collect data on all brain and other central nervous system tumors with a behavior code of 0 (benign) or 1 (borderline), in addition to in situ and malignant tumors. Data for nonmalignant brain and other nervous system tumors were available from all registries contributing to this report.
Behavior
The behavior variable in the current database is Behavior Code ICD-O-3. Previous database releases included the variable Behavior Recode for Analysis.
The database's default is to restrict analyses to malignant cases. CDC's NPCR and NCI's SEER Program use this restriction when generating most official cancer statistics. To analyze benign, borderline, or in situ cases, uncheck the "Malignant Behavior" box in the SEER*Stat Selection tab.
To create comparable analyses using a database with data from submission years 2018 and earlier:
- Uncheck the "Malignant Behavior" box in the SEER*Stat Selection tab.
- Add the following selection criteria: {Site and Morphology.Behavior recode for analysis} = 'Malignant','Only malignant in ICD-O-3','Only malignant 2010+'.
Primary site variables
Beginning in diagnosis year 2010, some lymphoma and leukemia ICD-O-3 codes were updated based on changes from the World Health Organization. The appropriate site recode variable to include these updates for cancer cases of all ages is Site recode ICD-O-3/WHO 2008. For childhood cancers, the International Classification of Childhood Cancer (ICCC) recode 3rd edition ICD-O-3/IARC 2017 and ICCC recode extended 3rd edition ICD-O-3/IARC 2017 variable definitions are included in the database.56789
Consider reviewing the variable Site recode ICD-O-3/WHO 2008 before using the directly coded primary site. See more information on the SEER primary site recodes.
Stage
A merged variable, Merged Summary Stage, is provided to span time periods when three different staging schemes are used. The following sections describe the coding logic for this merged variable.
For NPCR registries
- If a case was diagnosed in 2001, 2002, 2003, 2016 or 2017, stage at diagnosis is recorded using the SEER Summary Stage 2000 variable value.
- If a case was diagnosed in or between 2004 and 2015, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value. If the Derived SEER Summary Stage 2000 variable is blank or unstaged, and the SEER Summary Stage 2000 variable has a valid value, that value is used to populate the merged variable.
- If a case was diagnosed in 2018 or later, stage at diagnosis is recorded using the Summary Stage 2018 variable value.
For SEER-only registries (Connecticut, Hawaii, Iowa, and New Mexico)
- If a case was diagnosed in 2001, 2002, or 2003, stage at diagnosis is recorded using the SEER Summary Stage 2000 variable value.
- If a case was diagnosed in or between 2004 and 2017, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value.
- If a case was diagnosed in 2018 or later, stage at diagnosis is recorded using the Derived Summary Stage 2018 variable value.
Notes
- Due to changes made in the Summary Stage 2018 Coding Manual, for cases diagnosed in 2018 or later:
- The category Regional, NOS (code 5) is no longer used.
- There is an artificial increase in the category Regional by Direct Extension Only (code 2) for brain, CNS Other, and lymphoma cases. This is because Regional, NOS for these cases changed from code 5 to code 2.
- The category Regional, NOS (code 5) is no longer used.
- Merged Summary Stage data are not available for testis cases.
Reporting delay
NPCR and SEER registries annually submit all eligible years of data to CDC and NCI, respectively. As a result, cases submitted in previous years may be deleted, and new cases diagnosed in previous years may be added. The addition of new cases is called a reporting delay. This reporting delay may cause an appearance of decreasing trends.10 For example, reporting of melanoma cases diagnosed in an outpatient facility may be delayed. As a result, the trend in incident melanoma cases might superficially appear to have dropped in the most recent year.
Checking SEER*Stat frequencies
You can check the setup of your SEER*Stat program by comparing results to those published in the U.S. Cancer Statistics Data Visualizations tool. Note that most of the data in the Data Visualizations tool are restricted to malignant behaviors. Be sure the Malignant Behavior box is selected in the SEER*Stat Selection tab.
- Surveillance, Epidemiology, and End Results Program. SEER Coding and Staging Manual. Bethesda, MD: U.S. Department of Health and Human Services, National Cancer Institute; 2023.
- Surveillance, Epidemiology, and End Results Program. Impact of COVID on the April 2024 SEER Data Release. Bethesda, MD: U.S. Department of Health and Human Services, National Cancer Institute; 2024.
- Federal Committee on Statistical Methodology. Report on Statistical Disclosure Limitations Methodology (Statistical Working Paper 22). Washington, DC: Office of Management and Budget; 2005.
- Doyle P, Lane JI, Theeuwes JM, Zayatz LM. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: Elsevier Science; 2001.
- Fritz A, Percy C, Jack A, et al., editors. International Classification of Diseases for Oncology, Third Edition. Geneva: World Health Organization; 2000.
- International Classification of Diseases for Oncology, Third Edition, First Revision. Geneva: World Health Organization, 2013.
- Ruhl J, Adamo M, Dickie L., Negoita, S. (September 2020). Hematopoietic and Lymphoid Neoplasm Coding Manual. National Cancer Institute, Bethesda, MD, 2020.
- Surveillance, Epidemiology, and End Results Program. 2024 Solid Tumor Rules. Bethesda, MD: U.S. Department of Health and Human Services, National Cancer Institute; 2023.
- Surveillance, Epidemiology, and End Results Program. Hematopoietic and Lymphoid Neoplasm Database. Bethesda, MD: U.S. Department of Health and Human Services, National Cancer Institute; 2016.
- Clegg LX, Feuer EJ, Midthune DN, Fay MP, Hankey BF. Impact of reporting delay and reporting error on cancer incidence rates and trends. J Natl Cancer Inst. 2002;94(20):1537–1545.