Modernizing the Coding of Occupational Health Data

Key points

  • NIOCCS is a free, web-based application codes industry and occupation text quickly and accurately.
  • These data come from case report forms, surveys, and other records containing industry and occupation information.
  • NIOCCS provides our partners with a modern solution that they can use when analyzing occupational health and safety.
An employee in a hard hat.

Full story

CDC's National Institute for Occupational Safety and Health (NIOSH) developed the NIOSH Industry and Occupation Computerized Coding System (NIOCCS). NIOCCS is a free web-based application to help public health professionals code industry and occupation text quickly and accurately.

This application provides state, tribal, local, and territorial health departments with a modern solution that they can use to analyze how people's jobs impact their health and safety. By leveraging new technologies to streamline outdated approaches to coding occupational health data, NIOCCS is an important step toward modernizing public health data systems.

Public health problem

To protect people's health in the workplace, public health professionals need to understand the risks and exposures workers experience. Public health investigators and researchers need standard industry and occupation codes to examine illnesses and injuries among specific jobs and industries.

As part of public health investigations, forms and surveys may ask questions such as:

  • "What is your job?"
  • "What type of business do you work in?"

Respondents may be given a blank line to fill in their job information:

  • "Registered nurse" working in a hospital
  • "Chef" working in a restaurant
  • "Mechanic" working in an autobody shop

Public health professionals and researchers convert these text descriptions into standardized numeric codes to:

  • Analyze health trends.
  • Inform public health action.

Much like a ZIP code identifies a specific geographic area, industry and occupation codes identify specific industries or groups of workers.

For example, though teachers and teaching assistants are both jobs in the education industry, they have different job codes. Assigning the wrong code may incorrectly show that one group has a higher (or lower) risk of illness than the other. Learn more about standard codes.

Before software applications like NIOCCS, occupational health specialists and researchers had to manually look up codes for each person's response. Analyzing large amounts of information was a daunting task. The manual approach to coding was slow and costly. Manual coding made it difficult to analyze and share timely, high-quality data to help detect, prevent, and respond to emerging health threats in the workplace

Action

To meet this pressing need, NIOSH developed NIOCCS. This web-based application allows public health professionals to automatically code, or autocode, industry and occupation text. NIOCCS uses machine learning to determine the appropriate industry or occupation code. The application assigns a standard code to every industry and occupation record uploaded.

NIOCCS offers a suite of options for coding data:

  • Code a single record
  • Upload a file (this requires the user to create a free account, but is a great option for large amounts of data that are already collected)
  • Code data as it is collected using the NIOCCS Web API (Application Programming Interface)

Impact

State and local jurisdictions have been using NIOCCS since the beta version was created in 2012. When NIOSH incorporated machine learning in 2021, use of the application rapidly increased. To date, over 150 million records have been coded using the NIOCCS application. Some of the reasons for this include:

  • Results are more accurate and consistent because of the machine learning platform.
  • Tens of thousands of records can be coded in minutes.
  • All records receive an industry and occupation code from the autocoder.

Learn how a few jurisdictions use NIOCCS to advance their public health mission.

Wisconsin

"We use NIOCCS to gather real-time coded industry and occupation data on various communicable disease interview forms (including COVID-19 forms). We also use the file coder to code large historical datasets."

"Where employment data would previously have been coded manually or assigned via limited dropdown menus, NIOCCS allows us to quickly assign detailed North American Industry Classification System and Standard Occupational Classification codes—enabling efficient analysis of large datasets. In turn, this makes rapid dissemination of important workplace measures to local and tribal health departments possible."

California

"The California Department of Public Health's (CDPH) Occupational Health Branch (OHB) used NIOCCS extensively during our COVID-19 response. We coded industry and occupation (I/O) for weekly fatalities, collected from workplace outbreak and survey data. The NIOCCS web application automated the processing of large weekly datasets with improved machine learning-based quality. We manually reviewed only the lowest probability results. Together these improvements enabled a quick COVID-19 response, reduced the team training, and modernized high throughput, high-quality I/O, simply not possible through manual coding alone."

"NIOCCS has transformed the field of occupational surveillance and epidemiology and its latest versions have accelerated CDPH OHB efforts to characterize the California workplace COVID-19 burden for prevention of worker exposures, illnesses, and deaths."

Minnesota

"Minnesota uses NIOCCS to code industry and occupation fields in our death certificate data file to get up-to-date information to analyze suicides by industry and occupation. Free use of the automated system helped us obtain much more timely data. This facilitates our ongoing updates of data on farmer suicides and helped us begin to further explore suicide by industry and occupation."

"Minnesota also used NIOCCS in 2021–2022 to code a subset of COVID-19 case data to see if we could use the sporadic job information in the data file to study how workers in various industries and occupations experienced COVID-19. We performed simple quality analysis on the coded data to identify an appropriate cutoff level in the confidence value for accepting or rejecting the codes. Free use of the automated coding system enabled the use of the very large COVID-19 data file. The size of this file would prevent manual coding for anything more than a small sample of the records. Using the entire data file helps us provide a fuller picture of COVID-19 in Minnesota workers."