At a glance
- The Research Data Center (RDC) must review and approve any research output that you want to release.
- You must follow the RDC output rules outlined on this page to receive output.
- Some types of output cannot be released.
Overview
The National Center for Health Statistics (NCHS) has established Research Data Center (RDC) output policy and procedures to protect the confidentiality of study participants' identity. Researchers may receive the output their analysis generates, such as statistics and estimates, if the risk of disclosing confidential information is low.
The NCHS RDC will review your output for disclosure risk. You will not receive your output until it has been reviewed and approved by the RDC.
Output rules
These RDC output rules minimize disclosure risk of confidential information.
Rule 1. The creation of a data set is not appropriate output and will not be released.
Rule 2. Absolutely no output will leave the RDC facilities without an RDC analyst first reviewing it. Your RDC analyst will review the output for possible disclosures of confidential information.
Rule 3. Output submitted for review must match the research questions and the output described in the approved application.
Rule 4. Review and release of intermediate output is not allowed. If the output described in your approved application is later determined to be intermediate output, the RDC will not release it. You must constrain your output to what you need for a final research paper or journal article.
Rule 5. RDC analysts may apply cell suppression criteria. Guidelines may differ by data system and possibly by survey year because of sample size, sample design, and content. Sometimes, specific projects have additional disclosure risk that requires additional or more stringent cell suppression.
Rule 6. Researchers must plan ahead to allow the amount of time needed by the RDC to review and approve output. The RDC will usually return your approved output via email within 3 weeks of your request. Lengthy output not intended for a standard journal article or presentation may take more than 3 weeks to review.
Rule 7. Researchers may only use their output and statistics in a way that does not pose any additional disclosure risks to study participants.
Intermediate output
Intermediate output poses a disclosure risk. As a result, you must constrain your output to what you need for a final research paper or journal article. You can create and use intermediate output onsite at the RDC. However, the RDC does not allow release of intermediate output.
Examples of intermediate output include—
- Tables of unweighted n values
- Tables of preliminary descriptive statistics
- Large volumes of numbers or estimates
- Large volumes of initial and intermediate regression models
- Large volume of tables based on different subsamples*
*Similar tables based on different subsamples may cause complementary disclosure problems. Comparison across tables could reveal information about the sample and individual characteristics.
Output requirements
You must follow the rules and complete the steps on this page to request and receive approval of your output. You also must follow the steps outlined in the Disclosure Manual.
If you do not complete these steps, you may need to return to the RDC to redo your output or amend your application. The RDC will not review output that does not match the output described in an approved application. This ensures that your output conforms to RDC policy.
Note
Preparing your output for review
Complete these steps before submitting your output for review.
Step 1. Make sure your output is in a format that RDC staff can open. Your output must be in a human-readable plain text file format. Examples include files that Windows Notepad can open and read, such as tab delimited text [.txt] files or comma-separated values [.csv]). The RDC will not accept output in other file formats.
Step 2. Provide the actual tables that will appear in your publication. They must match what you included in your approved application.
Step 3. Remove any output that could lead to the identification of an individual survey participant or institution. If you have questions about what types of output could lead to identification, contact your RDC analyst.
Step 4. Remove any individual-level data from your output. Output that has individual record-level information is not permitted.
Step 5. Remove extreme values and values representing an individual participant. Examples include minima, maxima, medians, and modes. If a procedure, such as Proc Univariate, creates extreme observations, remove those extreme values. Examples of extreme values include 0, 1, 99, and 100 percentiles.
Step 6. Collapse variable categories if any category has a frequency of less than 5. If you are unable to recategorize, then all cells with a frequency less than 5 should be asterisked. Complete this before the cells are submitted to your RDC analyst.
Submitting output for review
Send a request to your RDC analyst when output is ready for review.
In your request, provide—
- A description of the output.
- For example, your title may begin "A regression of..."
- For example, your title may begin "A regression of..."
- Descriptions of any samples and subsamples used in the analysis and output
- For example, black males ages 20–29
- For example, black males ages 20–29