Variable Definitions

Purpose

The following variables are available in the U.S. Cancer Statistics Public Use Database, U.S. data (2001–2021). They are listed by SEER*Stat category. Click on the variable name for more information, including the source, description, and considerations for use.

Age at diagnosis

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Age at Diagnosis

Source item number: 230

This variable indicates age range at diagnosis by grouping patient into one of 19 categories (0, 1–4, 5–9, …, 75–79, 80–84, ≥85 years). Derived from the NAACCR variable Age at Diagnosis [230], which is the age (in years) of the patient at diagnosis.

Considerations for use

Different primary tumors for the same patient may have different values. Records for persons with multiple primary cancers cannot be identified in this database.

Race, sex, year of diagnosis, and registry

Source of standard: North American Association of Central Cancer Registries

Source item name: Sex

Source item number: 220

This variable indicates the sex of the patient.

Considerations for use

  • To get the correct population denominator, “female” must be selected when analyzing female-specific cancers (such as ovarian cancer or female breast cancer) and “male” for male-specific cancers (such as prostate cancer).
  • Due to small case counts and the lack of an associated population file, cases for sex other than male or female are excluded from this database.

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Date of initial diagnosis (CoC)

Source item number: 390

This variable indicates the year of initial diagnosis by a recognized medical practitioner for the cancer being reported, whether clinically or microscopically confirmed. Derived from Date of initial diagnosis (CoC) [390].

Considerations for use

As an additional confidentiality measure, date of diagnosis is not provided.

More information

Source of standard: North American Association of Central Cancer Registries

Source item name: Addr at DX–State

Source item number: 80

This variable indicates the U.S. state in which the patient lived at the time the reportable tumor was diagnosed.

Considerations for use

  • If a patient has multiple tumors, the state of residence at the time of diagnosis may differ for each.
  • Multiple records for an individual with more than one primary cancer cannot be linked in this database.

More information

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from Addr at DX–State and U.S. Cancer Statistics publication criteria

Source item number: Derived from NAACCR’s 80

This variable indicates a central cancer registry met the U.S. Cancer Statistics publication criteria for a single year of analysis.

Considerations for Use

  • This variable allows the selection of only cancer registries with data that meet the U.S. Cancer Statistics publication criteria for an individual diagnosis year. Specify the year of diagnosis in the SEER*Stat Selection tab.
  • If you are conducting a multiyear analysis and want to restrict the analysis to the registries that met publication criteria for each of those years (for example, a trend analysis), we recommend using the predefined variables USCS1721 (includes diagnosis years 2017–2021), USCS1221 (includes diagnosis years 2012–2021), or USCS0121 (includes diagnosis years 2001–2021).
  • If you would like to analyze a range of years other than those predefined variables, please contact CDC at uscsdata@cdc.gov and we will create a new variable for you that can be imported into SEER*Stat.

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Race 1 and IHS Link

Source item number: 160 (Race 1) and 192 (IHS Link)

This variable indicates the derived code for the patient’s race. Race is coded separately from Hispanic ethnicity.

Data quality checks code a non-White race before a White race. This variable is created using NAACCR variables Race 1 and the Indian Health Service (IHS) link. If Race 1 is White and there is a positive IHS Link, then Race/Ethnicity is set to American Indian/Alaskan Native (AI/AN).

Considerations for use

States have the option to suppress race-specific and Hispanic-specific data every submission year. While these states can be included in an aggregated analysis, their race and ethnicity information cannot be reported at the state level. The following states have state-level race data presentation restrictions:

  • Data for Hispanic and non-Hispanic American Indian and Alaska Native people cannot be displayed for Illinois, Kansas, New Jersey, and New York.
  • Data for Hispanic Asian and Pacific Islander and Hispanic Black people cannot be displayed for Kansas.

Race is defined by specific physical, hereditary, and cultural traditions or origins, not necessarily by birthplace, place of residence, or citizenship. “Origin” is defined by the U.S. Census Bureau as the heritage, nationality group, lineage, or in some cases, the country of birth of the person or the person’s parents or ancestors before their arrival in the United States. As a standard practice, central cancer registries classify race as coded in the medical record. To address AI/AN misclassification in cancer registry data, registries supported by CDC’s National Program of Cancer Registries Program (NPCR) and the National Cancer Institute’s Surveillance Epidemiology End Results (SEER) Program link their central cancer registry data to the Indian Health Service (IHS) administrative records database.

  • SEER registries link their data annually, with the most recent linkage occurring among cases diagnosed from 1998 to 2021. Annually, 32 NPCR registries with at least one Purchase/Referred Care Service Delivery Area (PRCSDA) county in their state link their data. All NPCR registries link every 5 years, with the most recent linkage occurring in 2021.
  • Although the linkage with IHS does not completely resolve the classification of race for AI/AN cases, it helps provide a more comprehensive and accurate picture of the cancer burden in this population.
  • When interpreting data results, be aware that AI/AN populations may be undercounted during years when linkages did not occur in all NPCR registries.
  • If a project is looking specifically at AI/AN populations, analysts may consider using the USCS Data Visualizations AI/AN restricted to PRCDA only module.

In all separate records of tumors for the same patient, the patient has the same race code.

This variable contains “other unspecified” and “unknown” categories. These groups are coded as “unknown race” for the purpose of analyses as specified in the SEER documentation. Population data are not available for the “other race” and “unknown race” categories.

NPCR–Indian Health Services (IHS) linkage schedule

All NPCR-funded registries link with the IHS every 5 years. The most recent linkage year was 2021.

All state central cancer registries with at least one Purchase/Referred Care Delivery Area (PRCDA) county (previously referred to as Contract Health Service Delivery Area [CHSDA] counties) link with the IHS every year. These include:

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Race 1, IHS Link, and Origin recode NHIA (Hispanic, Non-Hisp)

Source item numbers: 160 (Race 1), 192 (IHS Link), and 191 (NHIA Derived Hisp Origin)

This variable indicates the derived code for the patient’s race and Hispanic ethnicity. It is obtained by merging the race variable, Race recode (W, B, AIAN, API) and Hispanic ethnicity, Origin recode NHIA (Hispanic, Non-Hisp) variables.

Considerations for use

States have the option to suppress race-specific and Hispanic-specific data every submission year. While these states can be included in an aggregated analysis, their race and ethnicity information cannot be reported at the state level. The following states have state-level race data presentation restrictions:

  • Data for Hispanic and non-Hispanic American Indian and Alaska Native people cannot be displayed for Illinois, Kansas, New Jersey, and New York.
  • Data for Hispanic Asian and Pacific Islander and Hispanic Black people cannot be displayed for Kansas.

Race is defined by specific physical, hereditary, and cultural traditions or origins, not necessarily by birthplace, place of residence, or citizenship. “Origin” is defined by the U.S. Census Bureau as the heritage, nationality group, lineage, or in some cases, the country of birth of the person or the person’s parents or ancestors before their arrival in the United States. As a standard practice, central cancer registries classify race as coded in the medical record. To address AI/AN misclassification in cancer registry data, registries supported by CDC’s National Program of Cancer Registries Program (NPCR) and the National Cancer Institute’s Surveillance Epidemiology End Results (SEER) Program link their central cancer registry data to the Indian Health Service (IHS) administrative records database.

  • SEER registries link their data annually, with the most recent linkage occurring among cases diagnosed from 1998 to 2021. Annually, 32 NPCR registries with at least one Purchase/Referred Care Service Delivery Area (PRCSDA) county in their state link their data. All NPCR registries link every 5 years, with the most recent linkage occurring in 2021.
  • Although the linkage with IHS does not completely resolve the classification of race for AI/AN cases, it helps provide a more comprehensive and accurate picture of the cancer burden in this population.
  • When interpreting data results, be aware that AI/AN populations may be undercounted during years when linkages did not occur in all NPCR registries.
  • If a project is looking specifically at AI/AN populations, analysts may consider using the USCS Data Visualization Tools AI/AN restricted to PRCDA only module.

In all separate records of tumors for the same patient, the patient has the same race code.

This variable contains “other unspecified” and “unknown” categories. These groups are coded as “unknown race” for the purpose of analyses as specified in the SEER documentation. Population data are not available for the “other race” and “unknown race” categories.

NPCR–Indian Health Services (IHS) Linkage Schedule

All NPCR-funded registries link with the IHS every 5 years. The most recent linkage year was 2021.

All state central cancer registries with at least one Purchase/Referred Care Delivery Area (PRCDA) county (previously referred to as Contract Health Service Delivery Area [CHSDA] counties) link with the IHS every year. These include:

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Addr at DX–State and U.S. Cancer Statistics publication criteria

Source item number: Derived from NAACCR’s 80

This variable indicates whether a state is funded by CDC’s NPCR or NCI’s SEER Program.

Considerations for use

  • Central cancer registries that received funding from CDC and submitted data during any 2001–2021 diagnosis year are categorized as “NPCR registries.” They include Alabama, Alaska, Arizona, Arkansas, California, Colorado, Delaware, District of Columbia, Florida, Georgia, Idaho, Illinois, Indiana, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, and Wyoming.
  • “SEER registries” are central cancer registries receiving funding only from NCI during the 2001–2021 diagnosis years (Connecticut, Hawaii, Iowa, and New Mexico).

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Addr at DX–State and U.S. Census region

Source item number: Derived from NAACCR’s 80

This variable indicates the U.S. Census region in which the patient lived at the time of diagnosis. The NAACCR data item Address at Diagnosis-State is recoded into one of the four U.S. Census regions: Northeast, Midwest, South, and West.

Considerations for use

  • There is a potential for bias in the incidence rates for Census regions, as only data from registries that met U.S. Cancer Statistics publication criteria are included in the database. It is encouraged that age-adjusted incidence rates for U.S. Census regions be presented only if:
    • At least 80% of the population for the Census region was covered by cancer registries that met U.S. Cancer Statistics publication criteria.
    • The 95% confidence intervals around the observed age-adjusted regional incidence rates based on data from eligible registries for each of six major cancer sites (prostate, female breast, male colorectal, female colorectal, male lung and bronchus, and female lung and bronchus) included the estimate of the regional rate calculated.
  • If any state in a region has a case count of fewer than 16, then the case counts for U.S. Census regions cannot be presented. See the Census Geographic Areas Reference Manual, Chapter 6: Statistical Groupings of States and Counties for a list of states in each region.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Not applicable

Source item number: Not applicable

This variable indicates whether a central cancer registry met the U.S. Cancer Statistics publication criteria for all cancer sites combined each year in 2001–2021.

Considerations for use

  • This variable is used for analysis of combined 2001–2021 data in the NPCR and SEER Incidence – U.S. Cancer Statistics 2001–2021 database. Data from registries that did not meet the U.S. Cancer Statistics publication criteria in any given year were excluded from the database for that year.
  • If you are conducting a multiyear analysis and want to restrict the analysis to the registries that met publication criteria for each of those years (for example, a trend analysis), use the predefined variables USCS1721 (includes diagnosis years 2017–2021), USCS1221 (includes diagnosis years 2012–2021), or this variable (includes diagnosis years 2001–2021).
  • If you would like to analyze a range of years other than those predefined variables, please contact CDC at uscsdata@cdc.gov and we will create a new variable for you that can be imported into SEER*Stat.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Addr at DX–State and U.S. Cancer Statistics publication criteria

Source item number: Derived from NAACCR’s 80

This variable indicates whether a central cancer registry met the U.S. Cancer Statistics publication criteria for all cancer sites combined each year in 2012–2021. When using this variable, restrict the diagnosis years to 2012–2021. This is done in SEER*Stat on the Selection tab using the year of diagnosis variable.

Considerations for use

  • This variable is used for analysis of combined 2012–2021 data in the NPCR and SEER Incidence – U.S. Cancer Statistics 2001–2021 database. Data from registries that did not meet the U.S. Cancer Statistics publication criteria in any given year were excluded from the database for that year.
  • If you are conducting a multiyear analysis and want to restrict the analysis to the registries that met publication criteria for each of those years (for example, a trend analysis), use the predefined variables USCS1721 (includes diagnosis years 2017–2021), this variable (includes diagnosis years 2012–2021), or USCS0121 (includes diagnosis years 2001–2021).
  • If you would like to analyze a range of years other than those predefined variables, please contact CDC at uscsdata@cdc.gov and we will create a new variable for you that can be imported into SEER*Stat.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Addr at DX–State and U.S. Cancer Statistics publication criteria

Source item number: Derived from NAACCR’s 80

This variable indicates whether a central cancer registry met the U.S. Cancer Statistics publication criteria for all cancer sites combined each year in 2017–2021. When using this variable, restrict the diagnosis years to 2017–2021. This is done in SEER*Stat on the Selection tab using the year of diagnosis variable.

Considerations for use

  • This variable is used for analysis of combined 2017–2021 data in the NPCR and SEER Incidence – U.S. Cancer Statistics 2001–2021 database. Data from registries that did not meet the U.S. Cancer Statistics publication criteria in any given year were excluded from the database for that year.
  • If you are conducting a multiyear analysis and want to restrict the analysis to the registries that met publication criteria for each of those years (for example, a trend analysis), use this predefined variable (includes diagnosis years 2017–2021), USCS1221 (includes diagnosis years 2012–2021), or USCS0121 (includes diagnosis years 2001–2021).
  • If you would like to analyze a range of years other than those predefined variables, please contact CDC at uscsdata@cdc.gov and we will create a new variable for you that can be imported into SEER*Stat.

Source of standard: North American Association of Central Cancer Registries

Source item name: NHIA Derived Hisp Origin

Source item number: 191

This variable was derived from the NAACCR standard variables Spanish/Hispanic Origin [190], Name-Last [2230], Name-Maiden [2390], Birthplace [250], Race 1 [160], IHS Link [192], and Sex [220].

The NAACCR Hispanic Identification Algorithm (NHIA) uses the combination of these variables to classify cases directly or indirectly as Hispanic for analytic purposes.

Considerations for use

States have the option to suppress race-specific and Hispanic-specific data every submission year. While these states can be included in an aggregated analysis, the race and ethnicity information cannot be reported at the state level. The following states have state-level race or ethnicity data presentation restrictions:

  • Data for Hispanic and non-Hispanic American Indian and Alaska Native people cannot be displayed for Illinois, Kansas, New Jersey, and New York.
  • Data for Hispanic and non-Hispanic Asian and Pacific Islander people cannot be displayed for Kansas.

Blank values are allowed for states that chose not to include data for NHIA in this file.

More information

NAACCR Race and Ethnicity Work Group. NAACCR Guideline for Enhancing Hispanic/Latino Identification: Revised NAACCR Hispanic/Latino Identification Algorithm [NHIA v2.2.1]. Springfield (IL): North American Association of Central Cancer Registries. September 2011.

Site and morphology

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Primary Site

Source item number: 400

This variable indicates the topography code from The International Classification of Diseases for Oncology, Third Edition (ICD-O-3) for the primary site of the tumor being reported.

Considerations for use

Beginning in diagnosis year 2010, there were updates to some of the lymphoma and leukemia codes. To include these updates, the appropriate primary site variables to use are Site recode ICD-O-3/WHO 2008 for all ages, and ICCC site recode ICD-O-3/WHO 2008 for the childhood cancer recodes.

Source of standard: North American Association of Central Cancer Registries

Source item name: Histologic Type ICD-O-3

Source item number: 522

This variable indicates the morphology code from The International Classification of Diseases for Oncology, Third Edition (ICD-O-3) that describes the histologic type (the microscopic composition of cells and tissue for a specific primary tumor) of the primary tumor being reported.

Considerations for use

  • This data item is required for cancer cases diagnosed on or after January 1, 2001.
  • The histology codes for some tumors may be based on clinical diagnoses, not pathologic confirmation. When analyzing a specific histology, we suggest using the Diagnostic confirmation variable in conjunction with this variable. Beginning with 2010 diagnoses, this item also includes histology codes as specified in the WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (2008), which are listed on pages 4–6 of the NAACCR 2010 Implementation Guidelines.

Source of standard: North American Association of Central Cancer Registries

Source item name: Behavior code ICD-O-3

Source item numbers: 523

This variable is the code for the behavior of the tumor being reported using ICD-O-3. NAACCR adopted ICD-O-3 as the standard coding system for tumors diagnosed beginning January 1, 2001.

Considerations for use

  • This database includes cases with invasive (malignant) and in situ benign/borderline behavior for brain and central nervous system cases diagnosed in 2004 and later.
  • In SEER*Stat’s Selection Tab, the “Malignant Behavior” check box corresponds with Behavior Code ICD-O-3=3 (malignant). This is the default selection in this database. This restriction is used by CDC’s NPCR and NCI’s SEER Program for generating most official cancer statistics. To analyze benign, borderline, or in situ cases, uncheck the “Malignant Behavior” box.

More information

ICD-O-3 Coding Materials

Source of standard: North American Association of Central Cancer Registries

Source item name: Grade

Source item number: 440

This variable indicates the grade or degree of differentiation of the primary tumor being reported. The categories are Well differentiated; Grade I, Moderately differentiated; Grade II, Poorly differentiated; Grade III or Undifferentiated; anaplastic; Grade IV.

For lymphomas and leukemias, this field also is used to indicate T-, B-, Null-, or NK-cell origin.

Considerations for use

  • Data are available for cases diagnosed in 2001 through 2017. This data item was not collected for cases diagnosed in 2018 or later.
  • The practice of grading varies greatly among pathologists throughout the world, and many malignant tumors are not graded routinely. Since different grading systems may be used, review the SEER site-specific modules and the FORDS manual that corresponds with the diagnosis year. Each module has an abstracting, coding, and staging section, which has a morphology and grading subsection. Some modules, but not all, contain notes about the grading system that may have been used. Currently, there is no variable to differentiate a specific grading system from another one if more than two grading systems are mentioned.
  • Diagnostic practices also influence coding practices. For example, preliminary analysis of tumor grade for prostate cancer showed an increase in the proportion of higher grades from 2002 to 2003. Additional review showed this increase to be artificial, as the International Society of Urologic Pathologists, in conjunction with the World Health Organization (WHO), had made a series of recommendations for modification of the Gleason grading system to reflect contemporary knowledge, alleviate uncertainty, and promote uniformity in its application. One recommendation was for pathologists to report all higher tertiary grade components of the tumor as part of the Gleason score. Another recommendation was for reporting of any higher-grade cancer, no matter how small quantitatively.
  • The percentage of cases with a known grade varies by primary cancer site. Rules for coding the tumor grade differ for some primary sites. As a result, it may be appropriate to have a tumor grade coded as “9 – unknown.”
  • For brain tumor cases diagnosed from 2011 to 2017, cancer registries were required to report the WHO grade classification. Please see the variable description CS site-specific factor 1 for more information on this brain-specific grade classification.

Restricted to diagnosis year 2018 or later

Source of standard: North American Association of Central Cancer Registries

Source item name: Grade Clinical

Source item number: 3843

This data item records the grade of a solid primary tumor before any treatment, including surgical resection or neoadjuvant.

For cases diagnosed on January 1, 2018, or later, this data item, along with Grade Pathological, replaces the data item Grade as well as site-specific factors for cancer sites with alternative grading systems such as breast (Bloom-Richardson) and prostate (Gleason).

Considerations for use

  • Data are available for cases diagnosed in 2018 and later. This data item was not collected for cases diagnosed in 2001 to 2017.
  • For cases that are eligible for American Joint Committee on Cancer (AJCC) staging, the recommended grading system is specified in the AJCC Cancer Staging Manual chapter. The AJCC chapter-specific grading systems (codes 1–5) take priority over the generic grade definitions (codes A–E, L, H, and 9). For cases that are not eligible for AJCC staging, if the recommended grading system is not documented, the generic grade definitions apply.
  • Refer to the Site-Specific Data Item (SSDI) Manual and Grade Manual that corresponds with the year when the cases were diagnosed for additional site-specific instructions.

Restricted to diagnosis year 2018 or later

Source of standard: North American Association of Central Cancer Registries

Source item name: Grade Pathological

Source item number: 3844

This data item records the grade of a solid primary tumor that has been resected and for which no neoadjuvant therapy was administered. The highest grade documented from any microscopic specimen of the primary site, whether from the clinical workup or the surgical resection, will be recorded.

For cases diagnosed on January 1, 2018, or later, this data item, along with Grade Clinical, replaces the data item Grade as well as site-specific factors for cancer sites with alternative grading systems such as breast (Bloom-Richardson) and prostate (Gleason).

Considerations for use

  • Data are available for cases diagnosed in 2018 and later. This data item was not collected for cases diagnosed in 2001 to 2017.
  • For cases that are eligible for American Joint Committee on Cancer (AJCC) staging, the recommended grading system is specified in the AJCC Cancer Staging Manual chapter. The AJCC chapter-specific grading systems (codes 1–5) take priority over the generic grade definitions (codes A–E, L, H, and 9). For cases that are not eligible for AJCC staging, if the recommended grading system is not documented, the generic grade definitions apply.
  • Refer to the Site-Specific Data Item (SSDI) Manual and Grade Manual that corresponds with the year when the cases were diagnosed for additional site-specific instructions.

Source of standard: North American Association of Central Cancer Registries

Source item name: Diagnostic confirmation

Source item number: 490

This variable records the best method of diagnostic confirmation of the cancer being reported at any time in the patient’s history. The rules for coding differ between solid tumors and hematopoietic and lymphoid neoplasms.

Considerations for use

  • For analyses that include histology, use the following selection statement in the SEER*Stat Selection tab: “Diagnostic confirmation is = to Microscopically confirmed”.
  • Diagnostic confirmation is useful to calculate rates based on microscopically confirmed cancers. Complete incidence calculations also include cases that are only confirmed clinically. The percentage of cases that are “clinically diagnosed only” is an indication of whether case finding includes sources outside of pathology reports.
  • The microscopically confirmed method has the highest priority for diagnostic confirmation. The remaining values were assigned when the presence of cancer was confirmed with multiple diagnostic methods.
  • “Positive histology AND immunophenotyping AND/OR positive genetic studies” (used only for hematopoietic and lymphoid neoplasms M-9590/3-9992/3) was adopted for use beginning with 2010 diagnoses.

Source of standard: SEER*Stat recode

Source item name: ICD-O-3 Hist/behavior, labeled

Source item number: Not applicable

This variable indicates each ICD-O-3 histology code and behavior code and the respective name of that histology and behavior.

Considerations for use

  • This variable is a five-digit ICD-O-3 morphology code. The first four digits indicate the histology (cell type), and the fifth digit is the behavior code.
  • Please note that the ICD-O-3 morphology codes have been grouped by major morphology headings as found in the International Classification of Diseases for Oncology, Third Edition in the table shown below. However, the morphology codes are not grouped in the database.

Source of standard: SEER*Stat recode

Source item name: ICD-O-3 Hist/behavior, labeled

Source item number: Not applicable

This variable identifies the side of a paired organ, or the side of the body on which the reportable tumor originated. This applies to the primary site only.

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Primary Site and Histologic Code ICD-O-3

Source item numbers: 400 (Primary Site) and 522 (Histologic Code ICD-O-3)

The variable is defined by the SEER Program. Its values are based on NAACCR variables for ICD-O-3, the primary site and histology code of the primary tumor being reported, with updated information for hematopoietic codes based on WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (2008). The site recode variables define the major cancer sites commonly used in reporting cancer incidence data.

Considerations for use

This is the recommended variable for analyses by primary cancer site.

More information

Site recode

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from NAACCR Schema ID

Source item number: NAACCR 3800

Schema ID links site-specific data items (SSDIs) with the appropriate primary site/histology. The values for this data item are derived based on primary site, histology, and schema discriminator fields (when required).

Considerations for use

  • Schema ID is available for cases diagnosed in 2018 or later, and for specific primary site and histology groupings.
  • When analyzing SSDIs (such as Merged estrogen receptor, Merged progesterone receptor, and Merged HER2 summary variables), consider using this data item in the Selection tab to restrict the cases to the appropriate primary sites and histology. For cases diagnosed prior to 2018, use the primary site and histology combination defined by Schema ID to restrict the cases for a comparable analysis.

More information

Source of standard: NCI’s Surveillance, Epidemiology, and End Results (SEER) Program

Source item name: Derived from NAACCR Primary site, Histologic code ICD-O-3, and Behavior code ICD-O-3

Source item numbers: NAACCR 400 (Primary site), 522 (Histologic code ICD-O-3), and 523 (Behavior code ICD-O-3)

This variable indicates the classification of childhood cancer, which is based on tumor morphology and primary site. It emphasizes morphology rather than primary site, as is done for adults.

Considerations for use

NCI's SEER Program defined this variable.

More information

Source of standard: NCI’s Surveillance, Epidemiology, and End Results Program

Source item name: Derived from NAACCR Primary site, Histologic code ICD-O-3, and Behavior code ICD-O-3

Source item numbers: NAACCR 400 (Primary site), 522 (Histologic code ICD-O-3), and 523 (Behavior code ICD-O-3)

This variable indicates the classification of childhood cancer, which is based on tumor morphology and primary site. It emphasizes morphology rather than primary site, as is done for adults. This variable contains extended classification codes of childhood cancer, based on definitions presented in International Classification of Childhood Cancer, Third Edition based on ICD-O-3/IARC 2017.

More information

Source of standard: NCI’s Surveillance, Epidemiology, and End Results Program

Source item name: Derived from NAACCR Primary site, Histologic code ICD-O-3, and Behavior code ICD-O-3

Source item numbers: NAACCR 400 (Primary site), 522 (Histologic code ICD-O-3), and 523 (Behavior code ICD-O-3)

This variable was developed to define the major cancer sites that affect adolescents and young adults (AYAs) between 15 and 39 years of age.

Considerations for use

The SEER Program defined this recode variable based on the classification scheme proposed by RD Barr and colleagues. Refer to the AYA Site Recode for the full list of 318 groups and additional information.

More information

Source of standard: NCI’s Surveillance, Epidemiology, and End Results (SEER) Program

Source item name: Derived from NAACCR Primary site, Histologic code ICD-O-3, and Behavior code ICD-O-3

Source item numbers: NAACCR 400 (Primary site), 522 (Histologic code ICD-O-3), and 523 (Behavior code ICD-O-3)

This variable was based on ICD-O-3, updated for hematopoietic codes based on WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (2008). It was designed to facilitate epidemiologic studies of lymphoma subtypes.

Considerations for use

Stage – local, regional, distant (LRD) [summary and historic]

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Combined from Derived SS2000 and SEER Summary Stage 2000

Source item numbers: Derived from NAACCR 3020 (Derived SS2000), 759 (SEER Summary Stage 2000), and 764 (Summary Stage 2018)

This is a merged stage variable created using the variables SEER Summary Stage 2000, Derived SS2000, and Summary Stage 2018. This stage variable can be used for diagnosis years 2001–2021.

Considerations for use

This variable is not available for testis cases.

The coding logic for this merged variable is as follows:

For NPCR-registries

  • If a case was diagnosed in 2001, 2002, 2003, 2016 or 2017, stage at diagnosis is recorded using the SEER Summary Stage 2000 variable value.
  • If a case was diagnosed in or between 2004 and 2015, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value. If the Derived SEER Summary Stage 2000 variable is blank or unstaged, and the SEER Summary Stage 2000 variable has a valid value, that value is used to populate the merged variable.
  • If a case was diagnosed in 2018 or later, stage at diagnosis is recorded using the Summary Stage 2018 variable value.

For SEER-only registries (Connecticut, Hawaii, Iowa, and New Mexico)

  • If a case was diagnosed in 2001, 2002, or 2003, stage at diagnosis is recorded using the SEER Summary Stage 2000 variable value.
  • If a case was diagnosed in or between 2004 and 2017, stage at diagnosis is recorded using the Derived SEER Summary Stage 2000 variable value.
  • If a case was diagnosed in 2018 or later, stage at diagnosis is recorded using the Derived Summary Stage 2018 variable value.

Therapy

Source of standard: NCI’s Surveillance, Epidemiology, and End Results (SEER) Program and Commission on Cancer

Source item name: RX Summ—Surg Prim Site

Source item number: NAACCR 1290

This variable records site-specific codes for the type of surgery to the primary site performed as part of the first course of treatment. This includes treatment given at all facilities as part of the first course of treatment.

Considerations for use

Data for this variable are available for female breast cancer cases starting with diagnosis year 2010.

Extent of disease

Source of standard: American Joint Committee on Cancer

Source item name: CS Site-Specific Factor 1

Source item number: NAACCR 2880

The information recorded in this variable differs for each anatomic site. There are site-specific codes and coding structures for each anatomic site. In the U.S. Cancer Statistics Public Use Database, this variable records the World Health Organization (WHO) Grade Classification for brain and other nervous system sites.

Considerations for use

Data for this variable are available for brain and other nervous system cases diagnosed from 2011 through 2017.

Collection of Collaborative Stage (CS) Site-Specific Factor data items stopped in 2017. The information is now collected in the Grade Pathological and Grade Clinical variables for 2018 or later cases.

For the site-specific codes, please refer to the Collaborative Stage Data Collection System, brain cancer: World Health Organization (WHO) Grade Classification.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Combined from CS Site-Specific Factor 1 (breast) and Estrogen Receptor Summary

Source item number: Derived from NAACCR 2880 (CS Site-Specific Factor 1) and 3827 (Estrogen Receptor Summary from Site-Specific Data Item)

This is a merged variable created using the variables CS Site-Specific Factor 1 (breast) and Estrogen Receptor Summary and is the summary of results of the estrogen receptor (ER) assay.

Considerations for use

  • Data for this variable are available for female breast cancer cases diagnosed in 2004 or later.
  • When using this data item, restrict the query to the appropriate cases using the Schema ID variable for cases diagnosed in 2018 onwards. For cases diagnosed prior to 2018, use the primary site and histology combination defined by Schema ID to restrict the cases for a comparable analysis.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Combined from CS Site-Specific Factor 2 (breast) and Progesterone Receptor Summary

Source item number: Derived from NAACCR 2890 (CS Site-Specific Factor 2) and 3915 (Progesterone Receptor Summary from Site-Specific Data Item)

This is a merged variable created using the variables CS Site-Specific Factor 2 (breast) and Progesterone Receptor Summary and is the summary of results of the progesterone receptor (PR) assay.

Considerations for use

  • Data for this variable are available for female breast cancer cases diagnosed in 2010 or later.
  • When using this data item, restrict the query to the appropriate cases using the Schema ID variable for cases diagnosed in 2018 onwards. For cases diagnosed prior to 2018, use the primary site and histology combination defined by Schema ID to restrict the cases for a comparable analysis.

Restricted to female breast and diagnosis years 2010 or later

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Combined from CS Site-Specific Factor 15 (breast) and HER2 Overall Summary

Source item number: Derived from NAACCR 2869 (CS Site-Specific Factor 15) and 3855 (HER2 Overall Summary from Site-Specific Data Item for breast)

This is a merged variable created using the variables CS Site-Specific Factor 15 (breast) and HER2 Overall Summary and is the summary of results from HER2 testing.

Considerations for use

  • Data for this variable are available for female breast cancer cases diagnosed in 2011 or later.
  • When using this data item, restrict the query to the appropriate cases using the Schema ID variable for cases diagnosed in 2018 onwards. For cases diagnosed prior to 2018, use the primary site and histology combination defined by Schema ID to restrict the cases for a comparable analysis.

Multiple primary fields

Source of standard: North American Association of Central Cancer Registries

Source item name: Sequence Number – Central

Source item number: 380

This variable indicates the sequence of all reportable neoplasms over the patient’s lifetime.

Considerations for use

  • The sequence number may change over the patient’s lifetime. If the patient was diagnosed with a single reportable neoplasm, and later diagnosed with a second reportable neoplasm, the sequence code for the first neoplasm changes from 00 to 01. A central registry may find that a patient with one or more known neoplasms had an earlier reportable neoplasm that had been unknown to the registry. Typically, a re-evaluation of all related sequence numbers is required whenever an additional neoplasm is identified.
  • Standards define which neoplasms are reportable. It is assumed that these standards are the minimum definition of reportability. Individual central cancer registries may define additional neoplasms as reportable. Variability of assigning sequence numbers over time may exist for different registries, which may impact the coding of this variable.
  • Because the time period of Sequence Number is a person’s lifetime, reportable neoplasms not included in the central registry (those that occur outside the registry catchment area or before the reference date) also are allotted a sequence number. For example, a registry may contain a single record for a patient with a sequence number of 02 because the first reportable neoplasm preceded the central registry’s reference date.
  • If two or more reportable neoplasms are diagnosed at the same time, the lowest sequence number is assigned to the diagnosis with the worst prognosis. If no difference in prognosis is evident, the decision is arbitrary.
  • Reportable non-malignant tumors diagnosed on or after January 1, 2004 are represented by sequence numbers labeled as “…state registry-defined neoplasm”. Timing rules for sequencing these neoplasms are the same as timing rules for sequencing required in situ or invasive neoplasms.
  • The 2007 Multiple Primary and Histology Coding Rules may also affect the sequence number.

Dates

Source of standard: NCI’s Surveillance, Epidemiology, and End Results (SEER) Program and Commission on Cancer

Source item name: Date of Birth

Source item number: 240

The patient’s year of birth.

Considerations for use

  • The month and day of birth are not provided for confidentiality reasons.
  • If age at diagnosis and year of diagnosis are known, but year of birth is unknown, the year of birth is calculated and so coded. Only the year is entered. Per the NAACCR Data Dictionary, registrars are instructed to estimate a date of birth rather than leave the birth date unknown.
  • This variable includes only count data. Rates cannot be calculated using this variable as no population data are associated with it.

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from Date of Diagnosis

Source item number: 390

This variable is derived from Date of initial diagnosis, which is the date of initial diagnosis by a recognized medical practitioner for the cancer being reported, whether clinically or microscopically confirmed.

Considerations for use

  • The day of diagnosis is not provided as an additional confidentiality measure.
  • This variable includes only count data. Rates cannot be calculated using this variable as no population data are associated with it.

User-specified

Source of standard: North American Association of Central Cancer Registries

Source item name: Derived from RuralUrban Continuum 2013

Source item number: 3312

The U.S. Department of Agriculture Economic Research Service’s 2013 Rural-Urban Continuum Codes form a classification scheme that distinguishes metropolitan counties by the population size of their metro area, and nonmetropolitan counties by degree of urbanization and adjacency to a metro area.

This variable in this database groups the 2013 Rural-Urban Continuum Codes (also referred to as the Beale Codes) into 3 categories: metropolitan counties (rural-urban continuum codes 1–3), nonmetropolitan counties (rural-urban continuum codes 4–9), and unavailable (blank or unknown).

Categorizing counties by population size helps researchers investigate geographic correlates of the burden of cancer in the area of interest.

Considerations for use

These codes are derived electronically by the central cancer registry using patients’ county at diagnosis.

More information

U.S. Department of Agriculture Economic Research Service’s 2013 Rural-Urban Continuum Codes

Merged system-supplied

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Primary Site, Histologic Type ICD-O-3, and Sex

Source item numbers: Derived from NAACCR’s 400 (Primary Site), 522 (Histologic Type ICD-O-3), and 220 (Sex)

This is a predefined variable created using ICD-O-3 site, histology, and sex to define alcohol-related cancers.1 2

Considerations for use

  • Cancer registries do not routinely collect data on alcohol use, so the number of cancers associated with this risk factor cannot be determined definitively.3 4 5
  • However, other sources of information can be used to obtain the proportion of cancers probably caused by the risk factor, known as the attributable fraction.6 The number of attributable cancers can be estimated by multiplying the attributable fraction by the number of associated cancers.
  • For more information, please see the referenced publications and Predefined SEER*Stat Variables for Calculating the Number of Associated Cancers for Selected Risk Factors documentation.
  • Please note that in official federal cancer statistics publications, CDC reports malignant (invasive) cancers except for bladder cancers (includes in situ and invasive cancers).

References

1International Agency for Research on Cancer. IARC monographs on the evaluation of carcinogenic risks to humans: Volume 96: Alcohol Consumption and Ethyl Carbamate. Lyon, France: International Agency for Research on Cancer; 2010.

2International Agency for Research on Cancer. IARC monographs on the evaluation of carcinogenic risks to humans: Volume 100E: Personal Habits and Indoor Combustions: Consumption of Alcoholic Beverages. Lyon, France: International Agency for Research on Cancer; 2012.

3Henley SJ, Singh SD, King J, Wilson RJ, O’Neil ME, Ryerson AB. Invasive cancer incidence and survival—United States, 2013. MMWR Morb Mortal Wkly Rep. 2017;66:69–75.

4Cogliano VJ, Baan R, Straif K, et al. Preventable exposures associated with human cancers. J Natl Cancer Inst. 2011;103(24):1827–1839.

5World Cancer Research Fund / American Institute for Cancer Research. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. Washington, DC: AICR, 2007.

6Levine B. What does the population attributable fraction mean? Prev Chronic Dis. 2007;4(1):A14.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Primary Site, Histologic Type ICD-O-3, Sex, and Diagnostic Confirmation

Source item numbers: Derived from NAACCR’s 400 (Primary Site), 522 (Histologic Type ICD-O-3), 220 (Sex), and 490 (Diagnostic Confirmation)

This is a predefined variable created using ICD-O-3 site, histology, and sex to define human papillomavirus (HPV)-related cancers.1 2 3 4

Considerations for Use

  • Cancer registries do not routinely collect data on HPV diagnoses, so the number of cancers associated with this risk factor cannot be determined definitively.5 6 7
  • However, other sources of information can be used to obtain the proportion of cancers probably caused by the risk factor, known as the attributable fraction.8 The number of attributable cancers can be estimated by multiplying the attributable fraction by the number of associated cancers.
  • For more information, please see the referenced publications and Predefined Variables for Risk Factor-Associated Cancers documentation.
  • Please note that in official federal cancer statistics publications, CDC reports malignant (invasive) cancers except for bladder cancers (includes in situ and invasive cancers).

References

1Watson M, Saraiya M, Ahmed F, et al. Using population-based cancer registry data to assess the burden of human papillomavirus-associated cancers in the United States: overview of methods. Cancer. 2008;113(10 Suppl):2841–2854.

2Saraiya M, Unger ER, Thompson TD, et al. US assessment of HPV types in cancers: implications for current and 9-valent HPV vaccines. J Natl Cancer Inst. 2015;107(6):djv086.

3International Agency for Research on Cancer. IARC monographs on the evaluation of carcinogenic risks to humans. Volume 90: Human Papillomaviruses. Lyon, France: International Agency for Research on Cancer; 2007.

4Viens LJ, Henley SJ, Watson M, et al. Human papillomavirus–associated cancers—United States, 2008–2012. MMWR Morb Mortal Wkly Rep. 2016;65(26):661–666.

5Henley SJ, Singh SD, King J, Wilson RJ, O’Neil ME, Ryerson AB. Invasive cancer incidence and survival—United States, 2013. MMWR Morb Mortal Wkly Rep. 2017;66:69–75.

6Cogliano VJ, Baan R, Straif K, et al. Preventable exposures associated with human cancers. J Natl Cancer Inst. 2011;103(24):1827–1839.

7World Cancer Research Fund / American Institute for Cancer Research. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. Washington, DC: AICR, 2007.

8Levine B. What does the population attributable fraction mean? Prev Chronic Dis. 2007;4(1):A14.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Primary Site, Histologic Type ICD-O-3, Sex, Diagnostic Confirmation, and Age at Diagnosis

Source item numbers: Derived from NAACCR’s 400 (Primary Site), 522 (Histologic Type ICD-O-3), 220 (Sex), 490 (Diagnostic Confirmation), and 230 (Age at diagnosis)

This is a predefined variable created using ICD-O-3 site, histology, and sex to define obesity-related cancers.1 2 3

Considerations for use

  • Cancer registries do not routinely collect data on obesity, so the number of cancers associated with this risk factor cannot be determined definitively.2 4 5
  • However, other sources of information can be used to obtain the proportion of cancers probably caused by the risk factor, known as the attributable fraction.6 The number of attributable cancers can be estimated by multiplying the attributable fraction by the number of associated cancers.
  • For more information, please see the referenced publications and Predefined Variables for Risk Factor-Associated Cancers documentation.
  • Please note that in official federal cancer statistics publications, CDC reports malignant (invasive) cancers except for bladder cancers (includes in situ and invasive cancers).

References

1Eheman C, Henley SJ, Ballard-Barbash R, et al. Annual report to the nation on the status of cancer, 1975–2008, featuring cancers associated with excess weight and lack of sufficient physical activity. Cancer. 2012;118:2338–2366.

2World Cancer Research Fund / American Institute for Cancer Research. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. Washington, DC: AICR, 2007.

3Lauby-Secretan B, Scoccianti C, Loomis D, Grosse Y, Bianchini F, Straif K. Body fatness and cancer—viewpoint of the IARC Working Group. N Engl J Med. 2016;375:794–798.

4Henley SJ, Singh SD, King J, Wilson RJ, O’Neil ME, Ryerson AB. Invasive cancer incidence and survival—United States, 2013. MMWR Morb Mortal Wkly Rep. 2017;66:69–75.

5Cogliano VJ, Baan R, Straif K, et al. Preventable exposures associated with human cancers. J Natl Cancer Inst. 2011;103(24):1827–1839.

6Levine B. What does the population attributable fraction mean? Prev Chronic Dis. 2007;4(1):A14.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Primary Site, Histologic Type ICD-O-3, and Sex

Source item numbers: Derived from NAACCR’s 400 (Primary Site), 522 (Histologic Type ICD-O-3), and 220 (Sex)

This is a predefined variable created using ICD-O-3 site, histology, and sex to define physical inactivity-related cancers.1 2

Considerations for use

  • Cancer registries do not routinely collect data on physical inactivity, so the number of cancers associated with this risk factor cannot be determined definitively.2 3 4
  • However, other sources of information can be used to obtain the proportion of cancers probably caused by the risk factor, known as the attributable fraction.5 The number of attributable cancers can be estimated by multiplying the attributable fraction by the number of associated cancers.
  • For more information, please see the referenced publications and Predefined Variables for Risk Factor-Associated Cancers documentation.
  • Please note that in official federal cancer statistics publications, CDC reports malignant (invasive) cancers except for bladder cancers (includes in situ and invasive cancers).

References

1Eheman C, Henley SJ, Ballard-Barbash R, et al. Annual report to the nation on the status of cancer, 1975–2008, featuring cancers associated with excess weight and lack of sufficient physical activity. Cancer. 2012;118:2338–2366.

2World Cancer Research Fund / American Institute for Cancer Research. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. Washington, DC: AICR, 2007.

3Henley SJ, Singh SD, King J, Wilson RJ, O’Neil ME, Ryerson AB. Invasive cancer incidence and survival—United States, 2013. MMWR Morb Mortal Wkly Rep. 2017;66:69–75.

4Cogliano VJ, Baan R, Straif K, et al. Preventable exposures associated with human cancers. J Natl Cancer Inst. 2011;103(24):1827–1839.

5Levine B. What does the population attributable fraction mean? Prev Chronic Dis. 2007;4(1):A14.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from NAACCR’s Primary Site, Histologic Type ICD-O-3, and Sex

Source item numbers: Derived from NAACCR’s 400 (Primary Site), 522 (Histologic Type ICD-O-3), and 220 (Sex)

This is a predefined variable created using ICD-O-3 site, histology, and sex to define tobacco-related cancers.1

Considerations for use

  • Cancer registries do not routinely collect data on tobacco use, so the number of cancers associated with this risk factor cannot be determined definitively.2 3 4
  • However, other sources of information can be used to obtain the proportion of cancers probably caused by the risk factor, known as the attributable fraction.5 The number of attributable cancers can be estimated by multiplying the attributable fraction by the number of associated cancers.
  • For more information, please see the referenced publications and Predefined Variables for Risk Factor-Associated Cancers documentation.
  • Please note that in official federal cancer statistics publications, CDC reports malignant (invasive) cancers except for bladder cancers (includes in situ and invasive cancers).

References

1U.S. Department of Health and Human Services. The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; 2014.

2Henley SJ, Singh SD, King J, Wilson RJ, O’Neil ME, Ryerson AB. Invasive cancer incidence and survival—United States, 2013. MMWR Morb Mortal Wkly Rep. 2017;66:69–75.

3Cogliano VJ, Baan R, Straif K, et al. Preventable exposures associated with human cancers. J Natl Cancer Inst. 2011;103(24):1827–1839.

4World Cancer Research Fund / American Institute for Cancer Research. Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective. Washington, DC: AICR, 2007.

5Levine B. What does the population attributable fraction mean? Prev Chronic Dis. 2007;4(1):A14.

Source of standard: CDC’s National Program of Cancer Registries

Source item name: Derived from Addr at DX – state and state-level race or ethnicity reporting restrictions

Source item number: Derived from NAACCR’s 80

This variable was created specifically for this database. It provides the selection of states that are eligible to be included in a state-level analysis of race and ethnicity combined.

Considerations for use

  • States have the option to suppress race-specific and Hispanic ethnicity-specific data. While these states can be included in an aggregated analysis, the affected state’s information cannot be reported at the state level.
  • Use this variable when conducting state-level analyses of race and ethnicity combinations. If you are conducting a state-level analysis of race or ethnicity only, make restrictions manually in the SEER*Stat Selection tab.
  • The following states have state-level race or ethnicity data presentation restrictions:
    • Data for Hispanic and non-Hispanic American Indian and Alaska Native people cannot be displayed for Illinois, Kansas, New Jersey, and New York.
    • Data for Hispanic Asian and Pacific Islander and Hispanic Black people cannot be displayed for Kansas.
  • For more information, please refer to the Race recode (W, B, AIAN, API), Origin recode NHIA (Hispanic, Non-Hisp), and Race and origin recode (NHW, NHB, NHAIAN, NHAPI, Hispanic) variable descriptions.