At a glance
- Learn more about the industry and occupation variables included in the National Vital Statistics System (NVSS) public use and restricted data sets.
- Learn how to download the data sets.
- Access tools to assist with analyzing death certificate data by industry and occupation.
About the industry and occupation variables
Starting with the 2020 data, NCHS added four industry and occupation fields to the NVSS public use and restricted data sets:
- Four-digit CDC Census Occupation Codes
- Occupation grouping National Health Interview Survey (NHIS) simple occupation recode. These are based on the Standard Occupational Classification-informed codes obtained from the Census Bureau. There are 25 occupation groupings.
- Four-digit CDC Census Industry Codes
- Industry grouping NHIS simple industry recode based on the North American Industry Classification System (NAICS)-informed codes obtained from the Census Bureau. There are 22 industry groupings.
Periodically, Census codes are added or deleted. These updates are called code sets.
For the mortality data, the following CDC Census code sets were used in the corresponding years:
- CDC code set used for Census codes:
- Data set documentation for industry and occupation groupings
- Industry Appendix page 624
- Occupation Appendix page 628
- Industry Appendix page 624
- CDC code set used for Census codes:
- Data set documentation for industry and occupation groupings
- Industry Appendix page 461
- Occupation Appendix page 465
- Industry Appendix page 461
Learn more about these code sets.
Learn more about the NVSS industry and occupation variables
Other variables
In addition to industry and occupation, the following three variables are included in the NVSS public use data set.
Note: The International Classification of Diseases 10th Revision (ICD–10) codes included in the data are ICD–10 (mortality), not clinical modification (ICD–10–CM) codes.
- Underlying Cause of Death: This is the disease or injury which leads directly or indirectly to death. It can also be the circumstances of an accident or violence that leads to death.
- Contributing Causes of Death: This is referred to as "Multiple Conditions" in the record layout document. Often, it is the combined effect of two or more conditions that result in death. These conditions may be unrelated, independent of each other, or causally related (one cause may lead to another).
NCHS provides two types of codes for contributing causes of death:
- Record axis codes are the NCHS-edited version of the contributing cause data from the death certificates. These codes are edited for inconsistent information, combine conditions listed separately and standardize the diagnoses. They do not include any information regarding their placement on the death certificate.
- Entity axis contributing cause codes are the conditions from the death certificate listed in the same order that they appear on the death certificate. These data are unedited; the number of diagnoses and level of detail on the certificate may vary widely from one certifier to the other, even for the same conditions.
NOMS uses record axis codes. Both record and entity axis codes have important uses in mortality research, but they each serve different purposes. We use record axis codes to standardize data across different sources so that we can make comparisons in patterns of deaths by industry and occupation.
Mortality data include individual ICD–10 codes as well as groupings of ICD–10 underlying cause of death codes. For description of groupings and codes included, see 39-cause recode, 113-cause recode, and 358-cause recode in the file documentation and the Instruction Manual Part 9: "ICD–10 Cause-of-Death Lists for Tabulating Mortality Statistics."
See the NVSS Public Use File Documentation to learn about other available variables.
Download the public use data
Before downloading data, we recommend you review the following:
- Each year's record layout information.
- The 2020 data documentation that includes background on industry and occupation variables and what is in the mortality data files, by year.
Access the Data
The files are located under the header "Mortality Multiple Cause"
(see the U.S. Data .zip files)
Note on data filename extension
The 2020 data filename extension is "dusmcpub," which is uncommon. The 2021 data filename extension is "txt". Both the 2020 and 2021 are ASCII "text" flat files.
Now you're ready to download the data!
- On the NVSS data access page and under the heading "Mortality Multiple Cause," click on the year of data that you would like to download.
- Move the .zip folder to the file location of your choice (do this before unzipping since it is a large file).
- Right click the file to extract all.
- Use statistical program of your choice to read the file.
- This SAS program [TXT – 11 KB] can be edited and then used to read the data into SAS. Read the file documentation and edit program as needed for any changes to variables or variable location.
- This format program (2012 [TXT – 66 KB]; 2018 [TXT – 72 KB]) can be used with the read-in program to assign labels and formats.
- This separate program [TXT – 770 KB] includes formats for individual ICD–10 codes.
For the two different filename types:
- For .txt filename extension: For the text data, this R program [TXT – 4 KB] can be edited and then used to read the data into R.
- For .dusmcpub filename extension (2020 data file): If you must read in the .dusmcpub file, be aware that you may need to run R overnight to fully import the data. For more information on reading this filename type into R, see the following resource.
Tip: When using R with the .dusmcpub filename type, it is best to save the data as a .csv file. This will prevent overly long loading times. If you have SAS, read in the data using this SAS program [TXT – 11 KB] and export the data as a .csv file that can be more easily read into R.
Tools for working with the data
Use the following to group Census Industry and Census Occupation codes into detailed NHIS industry and occupation recodes.
SAS (2012 | 2018) R codes (2012 | 2018)
- 544 individual CDC Census Occupation (2012) codes → 96 NHIS occupation groupings.
- 272 individual CDC Census Industry (2012) codes → 80 NHIS industry groupings.
Simple recodes of industry (22 groupings) and occupation (25 groupings) variables are included in the public use data set.
If you are an experienced SAS user, use the following SAS code [TXT – 29 KB] to compute adjusted PMRs.
Quick access: All available programs
- Code to read-in the raw data
- ICD–10 formats
- This format program can be used with the read-in program to assign labels and formats: 2012 | 2018
- Recoding individual census codes to NHIS detailed records for 2018 and 2012 census codes: 2023 data and forward | 2020-2022
- Code to read-in or load data into R
- Recoding individual census codes to NHIS detailed recodes for 2018 and 2012 census codes: R Detailed Recodes 2018 | R Detailed Recodes 2012
- PMR Program SAS Code.txt [TXT – 29 KB]
- Steps required to calculate PMRs