Analyzing Death Certificate Data

by Industry and Occupation

At a glance

  • Learn more about the industry and occupation variables included in the National Vital Statistics System (NVSS) public use and restricted data sets.
  • Learn how to download the data sets.
  • Access tools to assist with analyzing death certificate data by industry and occupation.

About the industry and occupation variables

Starting with the 2020 data, NCHS added four industry and occupation fields to the NVSS public use and restricted data sets:

  1. Four-digit CDC Census Occupation Codes
  2. Occupation grouping National Health Interview Survey (NHIS) simple occupation recode. These are based on the Standard Occupational Classification-informed codes obtained from the Census Bureau. There are 25 occupation groupings.
  3. Four-digit CDC Census Industry Codes
  4. Industry grouping NHIS simple industry recode based on the North American Industry Classification System (NAICS)-informed codes obtained from the Census Bureau. There are 22 industry groupings.

Periodically, Census codes are added or deleted. These updates are called code sets.

For the mortality data, the following CDC Census code sets were used in the corresponding years:

Learn more about these code sets.

Learn more about the NVSS industry and occupation variables ‎

The following document provides more details about the industry and occupation variables included in the NVSS.

Other variables

In addition to industry and occupation, the following three variables are included in the NVSS public use data set.

Note: The International Classification of Diseases 10th Revision (ICD–10) codes included in the data are ICD–10 (mortality), not clinical modification (ICD–10–CM) codes.

  • Underlying Cause of Death: This is the disease or injury which leads directly or indirectly to death. It can also be the circumstances of an accident or violence that leads to death.
  • Contributing Causes of Death: This is referred to as "Multiple Conditions" in the record layout document. Often, it is the combined effect of two or more conditions that result in death. These conditions may be unrelated, independent of each other, or causally related (one cause may lead to another).

NCHS provides two types of codes for contributing causes of death:

  1. Record axis codes are the NCHS-edited version of the contributing cause data from the death certificates. These codes are edited for inconsistent information, combine conditions listed separately and standardize the diagnoses. They do not include any information regarding their placement on the death certificate.
  2. Entity axis contributing cause codes are the conditions from the death certificate listed in the same order that they appear on the death certificate. These data are unedited; the number of diagnoses and level of detail on the certificate may vary widely from one certifier to the other, even for the same conditions.

NOMS uses record axis codes. Both record and entity axis codes have important uses in mortality research, but they each serve different purposes. We use record axis codes to standardize data across different sources so that we can make comparisons in patterns of deaths by industry and occupation.

Mortality data include individual ICD–10 codes as well as groupings of ICD–10 underlying cause of death codes. For description of groupings and codes included, see 39-cause recode, 113-cause recode, and 358-cause recode in the file documentation and the Instruction Manual Part 9: "ICD–10 Cause-of-Death Lists for Tabulating Mortality Statistics."

See the NVSS Public Use File Documentation to learn about other available variables.

Download the public use data

Before downloading data, we recommend you review the following:

  1. Each year's record layout information.
  2. The 2020 data documentation that includes background on industry and occupation variables and what is in the mortality data files, by year.

Access the Data‎

The files are located under the header "Mortality Multiple Cause"


(see the U.S. Data .zip files)

Note on data filename extension

The 2020 data filename extension is "dusmcpub," which is uncommon. The 2021 data filename extension is "txt". Both the 2020 and 2021 are ASCII "text" flat files.

Now you're ready to download the data!

  1. On the NVSS data access page and under the heading "Mortality Multiple Cause," click on the year of data that you would like to download.
  2. Move the .zip folder to the file location of your choice (do this before unzipping since it is a large file).
  3. Right click the file to extract all.
  4. Use statistical program of your choice to read the file.

  1. This SAS program [TXT – 11 KB] can be edited and then used to read the data into SAS. Read the file documentation and edit program as needed for any changes to variables or variable location.
  2. This format program (2012 [TXT – 66 KB]; 2018 [TXT – 72 KB]) can be used with the read-in program to assign labels and formats.
  3. This separate program [TXT – 770 KB] includes formats for individual ICD–10 codes.

For the two different filename types:

  1. For .txt filename extension: For the text data, this R program [TXT – 4 KB] can be edited and then used to read the data into R.
  2. For .dusmcpub filename extension (2020 data file): If you must read in the .dusmcpub file, be aware that you may need to run R overnight to fully import the data. For more information on reading this filename type into R, see the following resource.

Tip: When using R with the .dusmcpub filename type, it is best to save the data as a .csv file. This will prevent overly long loading times. If you have SAS, read in the data using this SAS program [TXT – 11 KB] and export the data as a .csv file that can be more easily read into R.

Tools for working with the data

Use the following to group Census Industry and Census Occupation codes into detailed NHIS industry and occupation recodes.

SAS (2012 | 2018) R codes (2012 | 2018)

  • 544 individual CDC Census Occupation (2012) codes → 96 NHIS occupation groupings.
  • 272 individual CDC Census Industry (2012) codes → 80 NHIS industry groupings.

Simple recodes of industry (22 groupings) and occupation (25 groupings) variables are included in the public use data set.

If you are an experienced SAS user, use the following SAS code [TXT – 29 KB] to compute adjusted PMRs.

Quick access: All available programs

SAS
R