Key points
You can use the following sample text to describe U.S. Cancer Statistics data methods in manuscripts.
Methods
Cancer incidence
U.S. Cancer Statistics data, which combine cancer registry data from the Centers for Disease Control and Prevention's (CDC's) National Program of Cancer Registries (NPCR) and the National Cancer Institute's (NCI's) Surveillance, Epidemiology, and End Results (SEER) Program, were analyzed.1 This dataset includes cancer incidence data from central cancer registries reported to NPCR in 46 states, the District of Columbia, [IF APPLICABLE]2 and SEER in 4 states. Data about all new diagnoses of cancer from patient records at medical facilities such as hospitals, physicians' offices, therapeutic radiation facilities, freestanding surgical centers, and pathology laboratories are reported to central cancer registries, which collate these data and use state vital records to collect information about any cancer deaths that were not reported as cases. The central cancer registries use uniform data items and codes as documented by the North American Association of Central Cancer Registries. These data are submitted annually to CDC and NCI and combined into one dataset.3 Cancer registries demonstrate that data are of high quality by meeting U.S. Cancer Statistics publication criteria;1 during [YEARX–YEARY], data from [X] cancer registries met these criteria, covering [X%] of the United States population. This report includes new cases of primary invasive [CANCER TYPE] cancer (International Classification of Diseases for Oncology, Third Edition code [CXX.X–CXX.X])4 diagnosed during [YEARX–YEARY]; [IF APPLICABLE] excluding histology codes 9050–9055, 9140, and 9590–9992 [OR] restricted to histology codes [XXXX–XXXX].
[IF APPLICABLE] Race and ethnicity
Data were analyzed by five major racial/ethnic groups: White, Black, American Indian and Alaska Native (AI/AN), Asian/Pacific Islander (API), and Hispanic. Information about race and Hispanic ethnicity were collected separately. An algorithm was applied to Hispanic ethnicity data to reduce misclassification of Hispanic persons as being of unknown ethnicity.5 To reduce misclassification of AI/AN race, some central cancer registries link case data with the Indian Health Service (IHS) patient registration database, which contains records of individuals who are members of federally recognized tribes; cases linked with the IHS database were coded as AI/AN.6
Because states can opt not to present state-specific counts and rates for [AS APPLICABLE: API, Hispanic, and AI/AN populations], these data are not shown for the following states [CHECK STATE LIST; FOR EXAMPLE, Because states can opt not to present state-specific counts and rates for AI/AN populations, these data are not shown for Illinois, Kansas, New Jersey, and New York.]
[IF APPLICABLE] Histology
Analyses by histology included only cases that were microscopically confirmed ([X%] of cases).
[IF APPLICABLE] Stage
Stage is classified using a merged variable that spans the time periods when three different staging schemes were used: SEER Summary Stage 2000, Derived Summary Stage, and Summary Stage 2018. The staging criteria characterize cancers as localized, regional, distant, or unknown stage. Localized cancer is confined to the primary site; regional cancer has spread directly beyond the primary site (regional extension) or to regional lymph nodes; and distant cancer has spread to other organs (distant extension) or remote lymph nodes.7
Population estimates
Population estimates for rate denominators were a modification of annual county population estimates by age, sex, bridged race, and ethnicity produced by the U.S. Census Bureau in collaboration with CDC and with support from NCI.8 Modifications incorporated bridged, single-race estimates that were derived from multiple-race categories in the Census and accounted for known issues in certain counties. The modified county-level population estimates, summed to the state and national levels, were used as denominators in rate calculations.8
Statistical analysis
Incidence and death rates
Average annual rates for [YEARX–YEARY] per 100,000 population were age-adjusted (using 19 age groups) by the direct method to the 2000 US standard population.9 Corresponding 95% confidence intervals (CIs) were calculated as modified gamma intervals.10 Rates based on fewer than 16 cases tend to have poor reliability and were not presented. To determine differences between subgroups, rate ratios were calculated; rates were considered statistically different if the 95% CIs of the rate ratios excluded 1.11 Rates were calculated using SEER*Stat software version [X.X.X].12
[IF APPLICABLE] trends in rates
Annual percentage change (APC) was used to quantify the change in rates during [YEARX–YEARY] and was calculated using weighted least squares regression.13 A two-sided t-test was used to test whether the APC was statistically different from zero (P <.05). Rates were considered to increase or decrease if P <.05; otherwise rates were considered stable. APCs were calculated using SEER*Stat software version [X.X.X].12
[OR]
Change in rates during [YEARX–YEARY] was calculated using joinpoint regression, which involves fitting a series of joined straight lines on a logarithmic scale to the trends in the annual age-standardized rates;14 up to [X] joinpoints ([X] line segments) were allowed. The trend of the line segment was used to quantify the annual percentage change (APC). A two-sided t-test was used to test whether the APC was statistically different from zero (P <.05). The average annual percentage change (AAPC) for [YEARX–YEARY] was calculated using a weighted average of the slope coefficients of the underlying joinpoint regression line with the weights equal to the length of each segment over the interval. To determine whether the AAPC was statistically different from zero (P <.05), a two-sided t-test was used for 0 joinpoints, and a two-sided z-test was used for 1 or more joinpoints. Rates were considered to increase or decrease if P <.05; otherwise rates were considered stable. Trends were calculated using Joinpoint regression program version [X.X.X].15
Footnotes for tables
It is recommended that standard footnotes from U.S. Cancer Statistics or slight derivations be used for tables and figures.
For population coverage
Data are from population-based registries that participate in CDC's National Program of Cancer Registries and/or NCI's Surveillance, Epidemiology, and End Results Program and meet high-quality data criteria. These registries cover approximately [XX]% of the United States population.
For age-adjusted rates
Rates are per 100,000 persons and are age-adjusted to the 2000 US standard population (19 age groups – Census P25–1130).
- Centers for Disease Control and Prevention. U.S. Cancer Statistics. https://www.cdc.gov/united-states-cancer-statistics/ Accessed [ENTER DATE].
- Singh SD, Henley SJ, Ryerson AB. Surveillance for cancer incidence and mortality—United States, 2012. MMWR Morb Mortal Wkly Rep. 2016;63(55):17–58.
- National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database: [ENTER DATABASE TITLE], United States Department of Health and Human Services, Centers for Disease Control and Prevention. Released [DATE], based on the November [YEAR] submissions. Available at www.cdc.gov/united-states-cancer-statistics/public-use/.
- Fritz A, Percy C, Jack A, Shanmugarathnam K, Sobin L, Parkin D, et al., editors. International Classification of Diseases for Oncology. 3rd edition. Geneva, Switzerland: World Health Organization; 2000.
- NAACCR Race and Ethnicity Work Group. NAACCR Guideline for Enhancing Hispanic/Latino Identification: Revised NAACCR Hispanic/Latino Identification Algorithm [NHIA v2.2.1]. Springfield (IL): North American Association of Central Cancer Registries. September 2011.
- Jim MA, Arias E, Seneca DS, et al. Racial misclassification of American Indians and Alaska Natives by Indian Health Service Contract Health Service Delivery Area. Am J Public Health. 2014;104:S295–S302.
- Young JL Jr, Roffers SD, Ries LAG, Fritz AG, AA H, editors. SEER Summary Staging Manual – 2000: Codes and Coding Instructions. Bethesda, MD: National Cancer Institute; 2001.
- National Cancer Institute. Surveillance, Epidemiology, and End Results (SEER) Program. Modifications to Census Bureau's County Population Data.
- Anderson R, Rosenberg H. Age standardization of death rates: implementation of the year 2000 standard. Natl Vital Stat Rep. 1998;47:1–16.
- Tiwari RC, Clegg LX, Zou Z. Efficient interval estimation for age-adjusted cancer rates. Stat Methods Med Res. 2006;15:547–569.
- Fay MP. Approximate confidence intervals for rate ratios from directly standardized rates with sparse data. Communications in Statistics: Theory and Methods 2007;28(9):2141–2160.
- National Cancer Institute. SEER*Stat software. Bethesda, MD: National Cancer Institute, Surveillance Research Program; [YEAR]. Available at https://seer.cancer.gov/seerstat/.
- Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods. 2nd ed. Boston, Mass: PWS-Kent; 1988.
- Kim H-J, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat. Med. 2000;19:335–351.
- National Cancer Institute. Joinpoint Trend Analysis Software. Bethesda, MD: National Cancer Institute, Surveillance Research Program, Statistical Methodology and Applications Branch; [YEAR]. Available at https://surveillance.cancer.gov/joinpoint/.