NIOSH logo and tagline

NIOSH Industry and Occupation Computerized Coding System (NIOCCS)

nioccs logo

NIOCCS Technical Information

System Features

  • Single Record Coding
  • Batch File Processing
  • Computer-Assisted Coding for records not automatically coded
  • Industry and Occupation Coding Classification Scheme options:
    • Census 2010 / NAICS 2007 / SOC 2010
    • Census 2002 / NAICS 2002 / SOC 2000
    • Census 2000 / NAICS 1997 / SOC 2000
  • Census Industry and Occupation Alphabetical Index Lookup
  • Crosswalk Coding forward or backward using Census, NAICS/SIC, or SOC coding classifications
    Crosswalk coding is the mapping of a code from one industry and occupation classification coding scheme to another industry and occupation classification coding scheme or to a different code within the same industry and occupation coding scheme for a different year.

How NIOCCS Works

  1. User uploads file to be coded.  Minimum fields required:  Record ID, Industry text, Occupation text.
  2. Data is processed by the NIOCCS coding engine.
  3. Records are flagged as autocoded or needing manual coding.
  4. Using the tools in the Computer-Assisted Coding Interface of NIOCCS, user selects codes for records needing manual coding.
  5. User downloads coded file.  Output file contains input fields plus the Census industry code, Census occupation code, NAICS code, SOC code, and flags indicating which records were autocoded.

 Autocoding Process Overview

The NIOCCS industry and occupation coding process is based upon the U.S. Census Bureau Industry and Occupation Alphabetical Indexes supplemented with special codes developed by CDC/NIOSH for non-paid workers, non-workers, and the military (see NIOSH industry and occupation coding documentation for more information).

The Census Alphabetical Index of Industries and Occupations lists industry and occupation titles used most often in the economy.  These indexes were developed by the U.S. Census Bureau for use in classifying a respondents industry and occupation as reported in Census Bureau demographic surveys.  These indexes list over 21,000 industry and 31,000 occupation titles in alphabetical order.  Each title has been assigned a Census Industry Code or Census Occupation Code.  Additionally, the associated North American Industry Classification System (NAICS) code or Standard Occupational Classification (SOC) code is also provided for each title.  For more detailed information about the Census Alphabetical Indexes, go to the U.S. Census Bureau website at:  https://www.census.gov/topics/employment/industry-occupation/guidance/indexes.html

NIOCCS codes input industry and occupation narratives, or NAICS codes instead of industry narratives, to Census industry and occupation codes for the user specified target Census Industry and Occupation Classification year.  Once coded, the NAICS and SOC codes associated with the Census code in the alphabetic index are included in the output results.

Industry and Occupation Candidate Pairs Generated

Standardized industry and occupation narrative inputs are used by the NIOCCS candidate generation process to produce lists of candidate lines from their respective alphabetic indexes.

For inputs using NAICS codes rather than industry narratives, the input NAICS codes are validated and automatically crosswalked to the equivalent NAICS codes for the specified NAICS published year. The resulting crosswalked NAICS codes serve as the industry candidate lines in this form of industry and occupation input.

In the NIOCCS industry and occupation candidate generation process:

  1. The alphabetic index dictionary is searched for possible matches using words in the standardized industry and occupation narrative inputs.
  2. Matches between words in the narrative inputs and words in the index titles are used to select industry and occupation candidate lines from the Census alphabetic indexes.
  3. The words in the narrative inputs are compared to the words in the index titles for the selected industry and occupation candidate lines.
  4. The presence or absence of words in the comparison are used to score these selected industry and occupation candidate lines, and low scoring candidate lines are dropped from consideration.
  5. All industry candidate lines are paired with all occupation candidate lines to form a list of possible industry and occupation pairs. industry and occupation pair scores are determined by combining industry candidate line scores with occupation candidate line scores.
Industry and Occupation Restrictions Applied

Codes in the Census Alphabetical Indexes sometimes have restriction rules associated with them.  These rules are used to ensure occupation codes selected for a given industry are valid; and vice versa for industry codes.

The index restrictions are examined for each industry and occupation candidate pair.  When index restriction rules are violated then the pair is dropped from consideration for autocoding.

Learn more about the Census Alphabetical Index Industry and Occupation Restriction or review the Industry and Occupation Coding Instruction Manuals.

Single Highest Score Industry and Occupation Pair Remain?

If a single remaining industry and occupation pair has the highest remaining pair score, then the autocoder selects that industry and occupation pair.

If more than one of the remaining industry and occupation pairs share the highest remaining score, then tiebreakers will be used where possible to select one the these highest scoring industry and occupation pairs.

Tiebreaker Rules Applied

Tiebreaker #1:  Census industry and occupation coding rules are applied. An example of Census coding rules includes rules for the default selection of occupation index lines with “own business not incorporated” (OBNI) or “private company” (PR) industry restrictions for certain occupation titles, see Census Industry and Occupation Coding Instruction Manual (http://www.cdc.gov/niosh/topics/coding/nioccsuserdocumentation.html).

Tiebreaker #2:  Bureau of Labor Statistics (BLS) occupation employment totals for each pair’s NAICS/SOC combination are examined. Industry and occupation pairs with employment totals that occur much more frequently than others among the best scoring pairs may be selected under some circumstances.

Industry and occupation inputs with an industry and occupation pair selected are autocoded. Otherwise, industry and occupation candidate lines and industry and occupation pairs are saved for review and possible selection during manual coding using the computer-assisted coding features of the system.

Industry = MAIL DELIVERY
Occupation = POSTAL TRUCK DRIVER

The autocoder selects all industry lines containing “MAIL” and all industry lines containing “DELIVERY”. All lines that do not contain both “MAIL” and “DELIVERY” are penalized. There is a “MAIL DELIVERY” line in the industry index, and it receives the highest score.

The autocoder selects all occupation lines containing “POSTAL”, all industry lines containing “TRUCK” and all industry lines containing “DRIVER”. There isn’t a “POSTAL TRUCK DRIVER” line in the occupation index, so all selected occupation lines are penalized. There are several “TRUCK DRIVER” occupation lines in the occupation index, and these lines receive the highest score. The “TRUCK DRIVER” lines are for different types of truck drivers, including drivers of semi-trucks and drivers of light trucks used for delivery, such as postal delivery trucks. These “TRUCK DRIVER” lines differ from each other by the occupation codes they categorize and by their industry restrictions.

All selected industry lines are paired with all selected occupation lines, and the pairs are scored by combining their industry line score and their occupation line score. The highest scoring pairs are the “MAIL DELIVERY” and “TRUCK DRIVER” pairs. Restrictions are applied in the highest scoring pairs, and pairs associated with “TRUCK DRIVER” occupations lines that are for drivers of semi-trucks are eliminated because their industry restriction does not include the “MAIL DELIVERY” industry line’s industry code.

Once restrictions are applied to the highest scoring pairs, only one pair remains, and the industry and occupation narrative inputs are autocoded to that industry and occupation codes represented by the remaining highest scoring pair.

Industry = BEER
Occupation = DRIVER

The autocoder selects all industry lines containing “BEER”. All lines containing more words than just “BEER”, such as “ROOT BEER”, are penalized.  There are 3 “BEER” lines in the industry index: beer manufacturing, beer wholesale and beer retail. These 3 “BEER” lines all receive the highest score.

The autocoder selects all occupation lines containing “DRIVER”. All lines containing more words than just “DRIVER”, such as “TRUCK DRIVER” are penalized. There are several “DRIVER” occupation lines in the occupation index, and these lines receive the highest score.  Similar to “TRUCK DRIVER” in example #1, the “DRIVER” lines are for different types of drivers, including drivers of semi-trucks, drivers of light trucks used for delivery, and drivers of taxicabs. These “DRIVER” lines differ from each other by the occupation codes they categorize and by their industry restrictions.

All selected industry lines are paired with all selected occupation lines, and the pairs are scored by combining their industry line score and their occupation line score. Restrictions are applied in the highest scoring pairs.

Once restrictions are applied, only 3 pairs remain. There isn’t enough information for the autocoder to select between beer manufacturing, beer wholesale and beer retail. Therefore, the industry and occupation narrative inputs are not autocoded, and the selected industry and occupation index lines and industry and occupation pairs are saved for review and possible selection during manual coding using the computer-assisted coding features of NIOCCS.

Limitations

Speed

User internet bandwidth will significantly affect the interactivity of the computer-assisted coding.
The auto-coding process may take a significant amount of time when the volume of the data to be coded is significantly large. The turnaround time for autocoding may also depend on the traffic in the queue of coding jobs.

File Size Limitations

Upload file size is currently (as of December 2017) limited to 30MB. The number of records this equates to will vary depending on how many of the optional fields on the input file format are used. For files that use the slim file format (ID, Industry Occupation), it equates to approximately 300,000 records.  It is recommended however to limit file submissions to no more than 100,000 records at a time otherwise the performance of the computer-assisted coding user interface will be diminished.