1999-2005 NHDS Pollution Exposure Data File
EPA Air Quality Data for linkage to the National Hospital Discharge Survey
Methodology and Description of the Data File
The National Center for Health Statistics (NCHS) has created data files with air pollution information, available from the Environmental Protection Agency, specifically designed to be linked to NHDS data files. The NHDS Pollution Exposure data files contain the NHDS admission dates and inpatient zip-code of residence with corresponding air pollution exposures. These air pollution exposures can be linked to restricted access NHDS data files by zip-code and admission date to obtain an analytic data file with both air pollution exposures and discharge information. The linked analytic data file can be used for examining associations between air quality and hospitalizations in the United States.
Because residential zip-code and admission date are confidential variables, the NHDS Pollution Exposure data files and the restricted access NHDS data files are available through the NCHS Research Data Center (RDC). Information about the RDC is available at the : Research Data Center website.
National Hospital Discharge Survey
The NHDS is a national probability survey designed to provide information on characteristics of inpatients discharged from non-Federal short-stay hospitals in the United States. The NHDS collects data from a sample of approximately 300,000 inpatient records acquired from a national sample of about 500 hospitals. More information about the NHDS can be found at the National Hospital Discharge Survey website.
NHDS Linkage Variables
Air pollution exposures were calculated for the specific zip-codes of residence and dates of admission in the NHDS discharge records. Although exact dates and inpatient addresses are not available on public-use NHDS data files, in-house data files, available under restricted conditions, include residential zip-codes and admission dates.
We assigned the residential location of each discharge record with the coordinates of the geographic centroid (latitude and longitude) of the zip-code at the time of admission. Zip-codes are defined to facilitate the delivery of mail, which has several consequences, including: 1) some zip-codes can be designated for only one building with high volumes of mail, while others cover large geographic areas; 2) zip-codes can cross boundaries of Census geographic units, including county and state borders; and 3) mail delivery routes change frequently to facilitate efficient delivery of mail so zip-code numbers are reused, renumbered, or discontinued as their boundaries change. To address the issue of temporal change in zip-code data, we used earlier versions of CentrusTM software, and databases in these versions, to assign geographic centroids for earlier data years.
Some discharge records were missing zip-code and others had zip-codes that could not be assigned a geographic centroid. Table 1 shows the percentage of discharges with assigned zip-codes by year; between 95% and 98% of records have a valid zip-code.
Of the records with valid zip-codes, date of admission is missing or implausible (e.g. February 31) for approximately 15 to 20 percent (Table 1). The percentage of records with valid dates is lower for the later years of NHDS than for the earlier years.
Because the NHDS Air Pollution Exposure Data File is designed specifically to be linked to the NHDS, it does not contain exposure estimates for every day of the year or every zip-code in the United States, only for the admission dates and zip-codes in the NHDS. There is one exception, however. Because some analyses can be conducted with an approximate admission date, the 15th of each month were used as additional reference points and can be used by analysts for assigning approximate exposure for discharge records that have a month of admission but without a plausible date of admission.
EPA Air Monitoring Data
Pollution measurements at air monitoring locations were obtained from the EPA, and is available at: http://www.epa.gov/ttn/airs/airsaqs/detaildata/downloadaqsdata.htm. Briefly, air pollution monitors that record measurements for carbon monoxide (CO), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and particulate matter (PM2.5 and PM10) were included. Experimental and source oriented monitors were excluded.
Air monitoring data were categorized by location and date for linkage with the NHDS. The location of each monitor is identified by its longitude and latitude. Pollution measurements are recorded by specific date and intervals (hourly, daily).
Air Pollution Exposure
For each zip-code centroid and date of admission on the NHDS discharge data file, pollution exposure measurements for 7 days and 6 weeks prior to the date of admission were averaged from monitors within 20 and 5 miles of the zip-code centroid. These two time periods were chosen to approximate acute and short-term chronic exposure. Two radii were used in calculating exposure to approximate the possibly more precise exposure from a 5 mile estimate and the more broadly defined exposure from the wider area; more discharges are linked to the 20 mile compared to the 5 mile exposure.
For the pollutants measured hourly (CO, SO2, NO2, O3), daily averages were calculated and then used to calculate the 7 day estimates. The 7 day estimates were combined to calculate the 6 week exposure estimates. If the number of daily observations was less than 18, the daily average was assigned to missing; if fewer than 4 days in a 7 day period had daily averages, the 7 day exposure estimate was assigned to missing; if fewer than 4 weeks (7 day periods) in the preceding 6 week period had 7 day estimates, then the 6 week exposure estimate was assigned to missing.
For particulate matter, measured on a daily schedule, one daily measure was needed to assign the 7 day estimate. If fewer than 4 weeks in the preceding 6 week period had 7 day estimates, then the 6 week exposure estimate was assigned to missing.
Exposures for CO, SO2, NO2, and O3 were averaged using distance weighting, where measurements from monitors closer to the zip-code center contributed more than those farther away. Averages for particles (PM2.5 and PM10) were calculated without distance weights because particulate matter is more homogeneously distributed. Because not all NHDS inpatients live near monitors, most discharge records are missing values for one or more pollutants. More records have values for exposures calculated using 20-mile radii than using 5-mile radii.
Description of a Linked Analytic Data File
Exposure estimates in the NHDS Air Pollution Exposure Data File were merged with the NHDS by survey year, date of admission and zip-code to create a linked analytic data file. For this linkage, if a discharge record was missing date of admission but was not missing month of admission, exposures based on the 15th of the month were assigned for the 6 week period prior to admission but not for the 7 days prior to admission.
For analysis, all estimates were weighted using the sample weights and all standard errors were calculated using the survey design information available on the in-house files.
Because not all discharges can be assigned pollution exposures, the number of records for analysis is less that the total sample size and the weighted numbers of discharges on the linked file are not national estimates. Many of the summary statistics reported from the NHDS rely on Census population estimates to calculate discharge rates. However, because the weighted numbers of discharges on the linked files are not national estimates, Census population estimates cannot be used to calculate discharge rates using the linked files. Other potential statistical issues include spatial correlation among zip-code level exposures and sampling error. Consequently, analyses using these merged data must be undertaken with care. The survey design should be incorporated into analyses.
Table 2 shows the number of discharges with pollution measures by survey year, pollutant, and distance between monitors and zip-code centroid. The table shows variation in linkage by pollutant and distance. For example, more discharge records can be linked to PM2.5 exposure than can be linked to SO2 exposure. Less than half of the discharge records that can be merged to exposures within 20 miles of a monitor can be merged to monitors within 5 miles.
Using data from 2005 as an example, Tables 3-8, show the number of discharges with pollution measures by pollutant, time-period, distance between monitors, and zip-code centroid for gender, age, race, region, length of stay, and expected source of payment; these variables are described on the NHDS public use data file documentation (information available at ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHDS/. These tables demonstrate variability in the percentages of records linked by discharge characteristics. Similar tables for other data years are available upon request.
Data Availability and Use
The 1999-2005 NHDS Air Pollution Exposure Data Files and the restricted use NHDS data files are available through the NCHS Research Data Center (RDC). For more information on the NCHS RDC, go to the Research Data Center website. Records on the NHDS air pollution exposure data files are uniquely identified by NHDS year, zip-code and date of admission; these identification variables are needed to merge the air pollution variables to the restricted use NHDS files that contain the discharge information to create a linked analytic data file.
Table A lists the variables on the NHDS Air Pollution Exposure Data File that can be requested to be merged with the restricted use NHDS data files to create a linked data file for analysis.
Data users should contact Jennifer Parker (jdp3@cdc.gov) to provide specifications for a linked data file for use in the RDC.
Contact Us:
- National Center for Health Statistics
3311 Toledo Rd
Hyattsville, MD 20782 - 1 (800) 232-4636
- cdcinfo@cdc.gov

