Important Analytic Considerations when Using the NHANES-CMS Linked Data
Purpose
This module provides information about important analytic considerations when using the NHANES-CMS linked data and key concepts about defining a study population.
Task 1: Considerations when Analyzing CMS Data
The advantages of CMS data are that they are population-based, not subject to recall bias, and can be linked to NCHS population health surveys to expand the analytic potential of both data sources. However, because CMS data are collected for the purpose of making healthcare payments, and not for research, there are some issues you should consider when analyzing the NHANES-CMS linked data.
Before you begin an analysis using the National Health and Nutrition Examination Survey (NHANES) and Centers for Medicare & Medicaid Services (CMS) linked data, it is important to consider a variety of analytic issues. In this task, some issues for you to consider prior to data analysis will be discussed (e.g., gaps in coverage, lack of claims data, unreported costs, and differences in state Medicaid programs).
It is helpful to understand the contents of the Medicare files. Refer to Course 1, Module 4, Task 1 for more information regarding the types of services which appear in the different claims files. A useful document to reference is the NCHS-CMS Medicare Matching Methodology and Analytic Considerations, which appears in the Resources section, below. You are also encouraged to visit the ResDAC website for more information on Medicare data. See the Resources section for the link to the website.
Information: Much of the information for this module was based on a NCHS technical paper, NCHS-CMS Medicare Matching Methodology and Analytic Considerations. The link is provided in the Resources section.
Importance of the Master Beneficiary Summary File
You will need the Master Beneficiary Summary File (MBSF) for the years of claims data that you wish to examine. The MBSF contains basic demographic and enrollment information about each beneficiary entitled to Medicare during each calendar year and is needed to construct an analytic data file, particularly to identify Medicare beneficiaries enrolled in a Medicare Advantage (MA) plan. MA plans, also referred to as Medicare Part C, include Health Maintenance Organizations (HMOs), Preferred Provider Organizations (PPOs), Private Fee-for-Service (PFFS) Plans, Special Needs Plans, and Medicare Medical Savings Account Plans.
Information: Although the MBSF contains gender, race, and date of birth variables, NCHS recommends that you use gender, race-ethnicity and date of birth provided by NHANES.
Exclusion of claims paid by a source other than Medicare (e.g., Medicare Part C plans)
Until recently, CMS generally did not receive claims data for Medicare beneficiaries who enroll in MA plans. However, there were a few exceptions. For example, all hospice claims are processed as Medicare claims regardless of whether the beneficiary is in a fee-for-service (FFS) or an MA plan. During the time covered by the linked data files, Medicare Advantage enrollment increased from approximately 18% of beneficiaries in 1999 to 28% in 2013.
In general, if your study is based on analyses of claims data, you should exclude MA enrollees from your beneficiary sample, and your analysis should indicate that extrapolations of study results to the Medicare population at large may in fact only be representative of the Medicare FFS population. Variables on the MBSF enable you to identify, month-by-month, whether a beneficiary was enrolled in an FFS or an MA plan. For more information, refer to the Methodology and Analytic Guidelines.
You may wish to determine the approximate number of participants in your analyses after participants enrolled in MA plans have been excluded. Course 2, Module 9, Task 3 describes the process of determining your approximate sample size for proposed research projects using tables on the NCHS website. The link to the table showing percentage of NHANES survey participants who were enrolled in an MA plan by year and survey is provided in the Resources section.
If you are conducting a health outcome or epidemiologic study (as opposed to health care utilization or cost study), an alternative approach for dealing with MA enrollees is to include them for the time period prior to entering an MA plan and then censor them at the time they enter an MA plan.
Services not covered
Although Medicare provides coverage for a wide range of services, there are health care services not covered by Medicare. Examples of services not covered include routine physical exams, long-term care, and some cancer screening procedures. These gaps in coverage mean that there are no claims records for these services or for certain time periods. You may find more information on what is not currently covered by Medicare in the “Medicare & You” handbook (link below in the Resources section). Medicare data contain little information on prescription drugs prior to 2006. However, beginning in 2006, prescription drug coverage for Medicare beneficiaries became available through the Medicare Part D program. Information about prescription drugs paid for by Medicare for 2006-2013 is available on the Part D Prescription Drug Event (PDE) File. Prescription drug information for data years 1999-2005 includes:
- Medication given in an inpatient/hospice/SNF setting, although specific medicines dispensed are rarely coded, if at all.
- Chemotherapy administered intravenously (IV), chemotherapy administered orally as a substitute for a medication that could be administered IV, and oral chemotherapeutic agents that break down to a compound comparable to a chemotherapeutic agent administered IV.
Medicare does not pay for chemotherapeutic agents that are administered exclusively in an oral form (e.g., Tamoxifen). Prior to 2006, most outpatient prescription drugs were not covered by Medicare.
You may find more information on what is not currently covered by Medicare in the “Medicare & You” handbook, produced by CMS. See the Resources section for the link to this material.
Cost sharing
Medicare beneficiaries often have a number of cost sharing requirements (i.e., deductibles and coinsurance). Although claims are generated for services where beneficiary cost sharing is involved, the Medicare payment amount does not necessarily represent the full cost to the beneficiary for the service. It is not possible to determine whether the beneficiary paid the cost-sharing amount “out-of-pocket” or whether the cost-sharing was paid by a third party, such as a private supplemental policy or the Department of Veterans Affairs (VA).
The total payment on the claim refers to the Medicare payment, not all payments by third parties which may have been made for the service. It is possible that beneficiaries receive non-covered care, resulting in a total payment amount that is $0. However, this does not imply that the service was not received.
Discrepancies in the coverage period
Medicare enrollment and claims data linked to NCHS data are available for the years 1999-2013. Several of the surveys linked to the Medicare data, such as NHEFS (1971-1992) and NHANES III (1988-1994), have gaps of several years between the end of the study period and the beginning of the Medicare data.
Researchers should be aware that there may be differences in the availability of Medicare data on the linked files depending the survey participant’s age and the year the survey was administered. Earlier Medicare data is only available for NCHS surveys that were included in the previous CMS linkage. For example, an NHIS participant who is 80 years old in 1999 at their interview and matched at some point to the 1999-2013 Medicare Enrollment Data Base (EDB) would only have Medicare data for 1999 to 2013. A similar NHIS participant, who was 80 years old at their interview in 1998 and linked to the 1999-2013 Medicare EDB may have also linked in the previous Medicare linkage. This participant would hypothetically have Medicare data for 1991-1998 (available in a different format). This issue is particularly important for researchers to consider when combining data across survey and Medicare coverage years. Researchers need to determine how to address these discrepancies in coverage periods in their analyses.
On Master Beneficiary Summary File with no claims data
There may be instances where an NHANES participant is on the MBSF but there are no claims data. It is possible to be enrolled in Medicare but not utilizing Medicare services during the coverage period. In addition, there may be some record keeping inconsistencies because CMS data are collected for administrative, not research purposes.
Medicare entitlement and coverage variables
The MBSF includes three variables indicating Medicare entitlement: original reason for entitlement, current reason for entitlement, and Medicare status code.
A beneficiary’s original reason for Medicare entitlement is found in the variable BENE_ENTLMT_RSN_ORIG. This variable is coded by CMS using information provided by the Social Security Administration and/or Railroad Retirement Board. Knowing a beneficiary’s original reason for entitlement can be useful for identifying which aged beneficiaries were formerly Medicare disabled, since their cost and utilization profiles tend differ from other aged beneficiaries, especially at ages 65-74. BENE_ENTLMT_RSN_ORIG values include: Old Age and Survivors Insurance (OASI), Disability Insurance Benefits (DIB) and End Stage Renal Disease (ESRD).
A beneficiary’s current reason for Medicare entitlement is found in the variable BENE_ENTLMT_RSN_CURR. Possible values include: Old Age and Survivors Insurance (OASI), Disability Insurance Benefits (DIB) and End Stage Renal Disease (ESRD). This variable is populated from the Medicare Enrollment Data Base (EDB). The EDB is a master enrollment file of all people ever entitled to Medicare. Many of the variables on the MBSF are extracted from the EDB. The EDB is not available to researchers.
The variable BENE_MDCR_STATUS_CD specifies the most recent status of the beneficiary’s entitlement to Medicare benefits. Medicare status code is a CMS coded variable that is created from the following variables available on the EDB: Age, original reason for entitlement, current reason for entitlement, and an indicator of End Stage Renal Disease (ESRD). Possible values include Aged without ESRD, Aged with ESRD, Disabled without ESRD, Disabled with ESRD, and ESRD only.
Impact of Prospective Payment System (PPS) on Medicare payments
Medicare’s PPS refers to a method of reimbursement where the Medicare payment is made based upon a predetermined, fixed amount. Medicare uses a separate PPS for several services, where the particular payment amount is derived based upon the classification system for that particular service (e.g., diagnosis-related groups for inpatient hospital services). Separate PPS’s are used for reimbursement to acute inpatient hospitals, home health agencies, hospice, hospital outpatient, inpatient psychiatric facilities, inpatient rehabilitation facilities, long-term care hospitals, and skilled nursing facilities. PPS claims can be identified on the MedPAR file using the PPS claims indicator (PPS_IND_CD). For HHA and Outpatient claims, the variable CLM_PPS_IND_CD can be used. For more information on the PPS, please visit the website references below in the Resources section.
Claims data on the Medicare Provider Analysis and Review (MedPAR) File
The MedPAR file includes all hospitalizations that had a discharge date during the calendar year and all SNF stays with an admission date during the calendar year. Hospital stays starting in one calendar year and continuing past the end of the calendar year are not provided on the MedPAR file until the year of discharge. To determine if a record is for a long- or short-stay hospitalization, use the short stay/long/stay/SNF indicator variable SS_LS_SNF_IND_CD.
Each MedPAR record represents a stay in an inpatient “acute stay” or “long stay” hospital. An inpatient stay record summarizes all services rendered to a beneficiary from the time of admission to a facility through discharge. Each MedPAR record may represent one claim or multiple claims, depending on the length of a beneficiary’s stay and the amount of inpatient services used throughout the stay.
The following fields on MedPAR files are not used for payment purposes and should be used with caution:
- Source of inpatient admission (SRC_IP_ADMSN_CD)
- Group health organization payment code (GHO_PD_CD)
You may wish to consider the portion of claims from home health agencies or hospice when determining the approximate number of participants in your analyses. Course 2, Module 9 describes the process of determining your approximate sample size for proposed research projects using tables on the NCHS website. Course 2, Module 8 describes the process of using the NHANES-CMS linked data feasibility files to determine if there are adequate numbers in your study population for an analysis.
Vital status
In addition, mortality information is provided on the MBSF and MedPAR file. However, if the outcome of interest is mortality, researchers should use the NCHS Linked Mortality Files (linked below in the Resources section). Mortality information is also available from the NCHS linked mortality files that were described in Course 1, Module 2, Task 1. A link to this information may be found in the Resources section. No attempt has been made to reconcile inconsistent death information from CMS and these other sources.
Information: RDC research proposals that intend to analyze mortality outcomes should utilize death information from the NCHS linked mortality files.
Resources
- NCHS-CMS Medicare Matching Methodology and Analytic Considerations
- Linkage of NCHS Population Health Surveys to Administrative Records From Social Security Administration and Centers for Medicare & Medicaid Services
- Percentage of NHANES survey participants who were enrolled in a managed care plan by year and survey
- “Medicare & You” Handbook
- Medicare’s Prospective Payment Systems – General Information
- Research Data Assistance Center (ResDAC)
- ResDAC Technical Brief – TN-009: Medicare Managed Care Enrollees and the Medicare Utilization Files
- NCHS Linked Mortality Files
It is helpful to understand the contents of the MAX files. Refer to Course 1, Module 4, Task 1 for more information regarding the types of services which appear in the different claims files. A useful document to reference is the NCHS-CMS Medicaid Matching Methodology and Analytic Considerations, which appears in the Resources section below. You are also encouraged to visit the ResDAC website for more information on Medicaid data. See the Resources section for the link to the website.
Information: Much of the information for this module was based on a NCHS technical paper, NCHS-CMS Medicaid Matching Methodology and Analytic Considerations. The link is provided in the Resources section.
The advantages of Medicaid data are that they are population-based, not subject to recall bias, and can be linked to NCHS population health surveys to expand the analytic potential of both data sources. However, because Medicaid data were collected for the purpose of making healthcare payments, and not for research, there are certain considerations that should be taken into account when constructing an analytic data file and conducting analyses.
How to identify multiple records on the Person Summary (PS) file
Many NCHS survey participants will be linked to multiple MAX file PS records. Most often, this is because a participant is linked to several years of MAX data. However, a survey participant may be linked to multiple PS records within the same year. There are multiple explanations for this situation:
-
- Medicaid enrollees can move between states in a given year.
- Eligibility changes can result in survey participants disenrolling and re-enrolling in Medicaid within the same year.
- Administrative changes or errors with Medicaid reporting: Some administrative changes and errors can be state or year specific.
You may find more information on data anomalies for each state in the Medicaid Statistical Information Systems Anomalies/Issues Report, produced by CMS. See the Resources section for the link to this report.
Participants with multiple PS records per year were generally due to MAX file records coming from multiple states.
Another source of multiple PS records within the same year could be from false matches due to misreporting of personally identifiable information or issues with linkage methodology. The validity of multiple records in the same year can be difficult to ascertain. While some records show eligibility in different states in non-overlapping months, others show eligibility in different states in the same months of the same year.
The existence of multiple PS records within 1 year with overlapping months of Medicaid enrollment data between the PS records can complicate analyses. In considering how to assess Medicaid enrollment in the presence of multiple PS records within a year, researchers may consider the use of variables that indicate enrollment by month in each record. By determining whether a person was enrolled in each month across the multiple records within a year, the number of total months of enrollment across records can be obtained. The variables MAX_ELG_CD_MO_1 through MAX_ELG_CD_MO_12 indicate whether an enrollee was eligible for Medicaid in a given month and, if so, under what criteria. To help identify enrollees with multiple PS records within the same year, NCHS has added a set of flag variables to the PS file to identify these observations. FLG_MULT_RECS identifies whether enrollees have multiple records in any year. Additional variables identify enrollees who had any multiple PS records in a year and whether they occurred in the same or different state overall and in a given year. The variable FLG_YEAR_MULT_RECS identifies enrollees with multiple records in the same year:
0 = no multiple records
1 = multiple records in the same state
2 = multiple records in different states
3 = multiple records in the same and different states
You may choose to exclude these records, depending on the research question being explored.
Medicaid services and payments not included on the MAX files
There are some considerations related to the information contained in the MAX files. Because these files contain only Medicaid-paid services, they do not capture service use or expenditures during periods of non-enrollment, services paid by other payers, or services provided at no charge.
The following Medicaid services and payments are not included on the MAX files:
- Services or expenditures not paid by Medicaid
- Prescription drug rebates
- Medicaid payments made to hospitals that serve a disproportionate share of low income patients with special needs (disproportionate share hospitals)
- Medicaid payments made through UPL (upper payment limit) programs
- Medicaid payments to states for administrative costs
How to identify dual eligible beneficiaries
Service information may be missing or incomplete in MAX files for certain groups of enrollees. This is particularly important for individuals enrolled in both Medicaid and Medicare, often referred to as dual eligible beneficiaries. Because Medicare is the first payer for services used by dual enrollees that are covered by both Medicare and Medicaid, MAX captures such service use only if additional Medicaid payments are made on behalf of the enrollee for Medicare cost sharing or for shared services. Medicare premiums paid by Medicaid on behalf of dual eligible beneficiaries are not included in MAX.
Dual eligible beneficiaries can be identified several ways in the MAX. However, the suggested method to identify dual eligible beneficiaries is to use the annual dual eligibility indicator (EL_MDCR_DUAL_ANN) provided on the PS File. Values 01–10 signify that the enrollee was not found in the Medicare database but was believed to be Medicare-eligible by the state. Values 50–60 signify that the enrollee was found in the Medicare database. Values 50–60 should be used to identify dual eligible beneficiaries as the Medicare enrollment database is the preferred indicator of dual enrollment.
How to identify Children’s Health Insurance Program (CHIP) enrollees
M-CHIP is a state-administered expansion program offering Medicaid benefits to children who were previously ineligible due to their income. S-CHIP is a separate state administered program distinct from its existing Medicaid program. S-CHIP also provides health coverage to children.
There are two issues related to S-CHIP to consider when using the MAX data. First, states have the option of not reporting information on S-CHIP enrollees. Therefore, whereas the data on persons with Medicaid or M-CHIP can be considered universal, the MAX files do not include all S-CHIP enrollees. Variables provide monthly information on CHIP eligibility, as well as whether a person was enrolled in M-CHIP or S-CHIP. For S-CHIP enrollees in the files, some data elements contain no information. Therefore, variables that are counts of months may not be accurate for persons enrolled in S-CHIP for one or more months since those months are not counted in the total counts. Although S-CHIP enrollees may be a group of particular interest for some researchers, it should be noted that they account for a small percentage of NCHS survey participants linked to the MAX files. For example, among the linked 2013 NHIS participants, less than 1.5% of enrollees in the MAX files are in S-CHIP in any given month. The variables EL_CHIP_FLAG_1 through EL_CHIP_FLAG_12 document CHIP eligibility monthly, and whether an enrollee was in M-CHIP or S-CHIP. The values include:
-
- 0 = Not eligible for Medicaid or CHIP during this month
- 1 = Enrolled in Medicaid during this month
- 2 = M-CHIP during this month
- 3 = S-CHIP during this month
- 9 = CHIP status is unknown
For enrollees with a value of “3” (S-CHIP) for EL_CHIP_FLAG_1 through EL_CHIP_FLAG_12, no information is recorded for other monthly variables for that month.
How to identify Medicaid enrollees
Estimates of the number of Medicaid enrollees differ by data source. In general, estimates from population surveys such as the NHANES tend to yield lower estimates than Medicaid enrollment data collected by the states. There are likely several reasons for these differences. Some enrollees may respond incorrectly to population surveys, survey populations may differ from the population included in administrative records, and different reference periods for data from administrative records and the population surveys may account for some of these differences.
A multi-phase research project referred to as the Medicaid Undercount Project was undertaken to explain why discrepancies exist between survey estimates of enrollment in Medicaid and the number of enrollees reported in state and national administrative data. This project is also called the SNACC project. SNACC is an acronym for the agencies conducting the project, which include the University of Minnesota’s State Health Access Data Assistance Center (SHADAC), the National Center for Health Statistics (NCHS), the Agency for Healthcare Research and Quality (AHRQ), the U.S. Department of Health and Human Services Assistant Secretary for Planning and Evaluation (ASPE), the Centers for Medicare & Medicaid Services (CMS), and the U.S. Census Bureau. More information on this project can be found on the U.S. Census Bureau website. See the Resources section for the link to this material.
The best method for identification of participants who were Medicaid enrollees depends on the exact research question. In general, however, the variable MSNG_ELG_DATA on the PS File provides information on their enrollment status.
MSNG_ELG_DATA on the PS File indicates enrollment status:
. = enrolled in Medicaid during the year
2 = enrolled in S-CHIP
1 = enrolled in neither Medicaid nor S-CHIP. See section H below for further discussion of observations where MSNG_ELG_DATA is coded as “1.”
Of note, the variable MSNG_ELG_DATA exists on each of the MAX files (PS, RX, IP, LT, OT) and at times, different values are assigned in the different files for the same person (in the same year). However, the value assigned on the PS File is the most valid of these and should be used for all of the data for that person (and year), regardless of which file it originates.
How to identify managed care vs. fee-for-service (FFS) enrollees
Many Medicaid and CHIP enrollees are enrolled in managed care plans, and enrollment in these programs has expanded over time. Managed care enrollment also varies markedly across states. For enrollees in Medicaid managed care plans, information in MAX is restricted to premium payments and some service-specific utilization information. While records for services delivered (including diagnoses and procedures) are uniformly provided for recipients with fee-for-service coverage, encounter records for those with comprehensive managed care plans are not provided by all states. In some states, only a portion of managed care recipients have encounter data recorded. When included in the files, managed care encounter data list $0 as the amount paid for the services provided, even when the services are covered by the managed care plan.
The Person Summary File contains a variable (EL_PPH_PLN_MO_CNT_CMCP) that can be used to identify beneficiaries enrolled in any type of managed care plan and the number of months of enrollment in the plan.
Twelve additional variables (EL_PHP_TYPE_1_1 – EL_PHP_TYPE_4_12) on the PS File identify each of up to 4 different types of managed care plans that a beneficiary could be enrolled in during each month of the year.
The following types of managed care plans can be identified using EL_PHP_TYPE_1_1 through EL_PHP_TYPE_4_12:
- Medical or comprehensive managed care plan
- Dental managed care plan
- Behavioral managed care plan
- Prenatal/delivery managed care plan
- Long-term care managed care plan
- All-inclusive care for the elderly (PACE) plan
- Primary care case management (PCCM) plan
- Other managed care plan
How to identify enrollees who are eligible for waivers
Section 1115 of the Social Security Act provides the Secretary of Health and Human Services broad authority to authorize experimental, pilot, or demonstration projects likely to assist in promoting the objectives of the Medicaid statute. These projects are intended to demonstrate and evaluate a policy or approach that has not been widely used. Some states expand eligibility to individuals not otherwise eligible under the Medicaid program, provide services that are not typically covered, or use innovative service delivery systems. Examples include expanding care for children in foster care, providing specialty mental health care and expanding Medicaid eligibility for family planning services to women of child-bearing age not otherwise eligible for Medicaid. Medicaid enrollees that are eligible for Medicaid as a result of one of these programs are referred to as eligible through a waiver or waiver program. General information about the Medicaid/CHIP waiver programs can be found on the CMS website. See the Resources section for the link to this material.
Waivers for specific groups make up one of the Maintenance Assistance Status or MAS categories. The variables MAX_ELG_CD_MO_1 through MAX_ELG_CD_MO_12 can be used to identify MAS and BOE monthly enrollment information, although the specific type of waiver cannot be identified before 2005. Starting in 2005, MAX files include three elements for each month (MAX_WAIVER_TYPE_1_MO_1 through MAX_WAIVER_TYPE_3_MO_12) that give detailed information on the type of waivers under which enrollees are eligible for Medicaid.
Child survey participants
Survey participants under 18 years of age at the time of the survey, are considered linkage-eligible, the criteria by which survey participants can be potentially linked to CMS data, if consent is provided by their parent or guardian. Linkages to CMS administrative data are conducted linking survey data to multiple years of administrative data. Consequently, linkage-eligible child survey participants can be under 18 years of age for some years of linked administrative data and 18 years of age or older for later years. For example, a 15-year old 2003-2004 NHANES participant can be linked to CMS data for 2006 and earlier years as a child but would be an adult in 2007 (approximately) and later years.
In accordance with NCHS Ethics Review Board (ERB) guidelines, for survey participants younger than 18 years of age at the time of the survey, NCHS will only provide linked CMS data generated for program participation, claims and other events that occurred prior to the participant’s 18th birthday. The linkage of NHANES to the CMS Medicaid data potentially has many child survey participants linked to one or more years of Medicaid data collected after age 18. This should be taken into consideration by analysts when estimating their potential sample size for RDC proposals. Analysts requiring more information about potential sample sizes for RDC proposals or more information on this NCHS ERB guidance should contact the NCHS Data Linkage team (datalinkage@cdc.gov).
Resources
- NCHS-CMS Medicaid Matching Methodology and Analytic Considerations
- Linkage of NCHS Population Health Surveys to Administrative Records From Social Security Administration and Centers for Medicare & Medicaid Services
- Medicaid Statistical Information System Anomalies/Issues Report
- Medicaid Undercount Project Overview
- Medicaid and CHIP Waiver Programs – General Information
- Research Data Assistance Center (ResDAC)
Task 2: Assessment of a Study Population
One possible use of the NHANES-CMS linked data is to identify a study population with particular characteristics in NHANES and then to examine their health care utilization using CMS data.
The purpose of this task is to demonstrate how you might use the NHANES-CMS linked data feasibility files to determine if there are sufficient numbers in your study population for an analysis using the restricted-use NHANES-CMS linked data. In addition, this task will provide suggestions for further refinements to the study population that might be necessary when conducting the analysis.
This task will describe how to estimate the number of NHANES participants available for an analysis examining the association between obesity and prescription drug events identified from Medicare. If you determine your study is feasible and your proposal is approved, there are additional refinements you may want to make to your study population when working with the restricted use data.
Information: The NHANES-Medicare linked data are restricted-use. This example will utilize the public-use NHANES-CMS Medicare linked data Feasibility Files to demonstrate the steps for assessing the estimated size of your study population. Similar steps can be used with the restricted-use data when you are in the Research Data Center (RDC).
This example will use data from the 2007-2008 through 2011-2012 NHANES linked with 2013 Medicare data. For simplicity, our study population will include NHANES participants aged 65 years and older at the time of their exam, without missing body mass index (BMI) data. As one alternative, we could have included NHANES participants who turned 65 by 2013.
Step 1
After reading the documentation, the first step is to determine which public use files and variables are needed.
From the NHANES data, this analysis requires BMI measurements from the Body Measures File and age from the NHANES Demographic File.
Files and variables needed:
- NHANES Body Measures File (Variables: SEQN BMXBMI)
- NHANES Demographics File (Variables: SEQN RIDAGEYR)
An obese (yes/no) variable will be created using the BMXBMI variable.
If BMXBMI>=30 then obese=1; Else if 0<BMXBMI<30 then obese=0;
The NHANES files need to be merged together and then merged with the NHANES-CMS Medicare linked data feasibility files. Additional information on creating a study population from NHANES can be found in the Continuous NHANES tutorial and information on how to merge NHANES data with the Feasibility Files can be found in Course 2 Module 7 of the NHANES-CMS tutorial. Codes for merging these files can be found in the sample SAS program.
Step 2
It is important to determine which CMS files will be used in an analysis when estimating the sample size with the NHANES-CMS linked data feasibility files. The Master Beneficiary Summary File (MBSF) should be requested for all NHANES-Medicare linked data. It includes enrollment information needed to determine enrollment status and entitlement eligibility. Since this analysis is examining prescription drug events as an outcome, the Medicare Part D (PDE) File is also necessary. CMS files needed:
- MBSF
- PDE
Step 3
As described in Course 2, Module 9, until recently Medicare did not receive individual claims on managed care [Medicare Advantage (MA) or Medicare Part C] enrollees. Therefore, researchers may want to exclude these beneficiaries from their analyses. Although the participant-specific information on managed care is not available in the NHANES-CMS Medicare linked data feasibility file, researchers can estimate how much their sample size will decrease by looking at the managed care percentages in managed care enrollment tables. The link to these tables can be found in the Resources section at the end of this task.
Step 4
If the proposal describing the study and indicating which variables are needed is approved by the RDC (see Course 1, Module 1 for information on the RDC process), additional refinements to the study population may be needed once the data are ready for analysis. For example, you may want to:
- Identify participants in the linked file who were enrolled in MA.
- Identify and account for participants who died after the NHANES interview or examination and may have incomplete Medicare data during the time period of interest.
- Restrict analyses to beneficiaries whose original reason for entitlement was turning 65.
For more information and analytic guidelines refer to Task 1 of this module and the NCHS-CMS Medicare Data File Documentation and Analytic Guidelines.
Information: Due to NCHS confidentiality requirements, access to NHANES-CMS linked data is restricted. These data can only be accessed through the NCHS RDC Network.
Resources
This task will go through the steps needed to estimate the number of survey participants aged 65 years and older without missing body mass index (BMI) data. It will also examine how many had at least one prescription drug event in 2013.
Step 1: Produce frequency distributions of obesity for ages 65 and older from the NHANES file linked with Medicare and with claims on the 2013 Part D File.
Download, open, and run the SAS code for Medicare program.
Statements | Explanation |
---|---|
options ls=120 ps=42 missing=’ ‘ nocenter validvarname=upcase compress=binary nodate nonumber; | Log/List options. COMPRESS option reduces storage requirements for output datasets. |
Note: Please skip to line 84 to find the next set of statements. | |
proc format; value CMSMATCH 1 = ‘Linked’ 2 = ‘Not Linked’ 3 = ‘Linked-Child’ 9 = ‘Ineligible’; value ON_FILE 1=’Yes’ 0=’No’; value OBESE 1=’Obese’ 0=’Not Obese’; |
Creates formats, CMSMATCH, ON_FILE, and OBESE based on the values of the variables CMS_MEDICARE_MATCH, ON_PDE_2013, and OBESE, respectively. |
Data Merged_0712; set Merged_0708 Merged_0910 Merged_1112; *create obesity indicator; If BMXBMI>=30 then obese=1; Else if 0<BMXBMI<30 then obese=0; run; |
Creates a new dataset called “Merged_0712” by appending the merged MBSF, PDE, and feasibility files. It also creates an indicator variable for obesity. |
proc freq data=Merged_0712 ; | The FREQ procedure is used to produce a formatted table displaying counts. |
Tables (CMS_MEDICARE_MATCH ON_PDE_2013)*OBESE; | The TABLE statement describes a two-dimensional table to be printed. |
Where ridageyr>=65; | Limits the study population to adults who were aged ≥65 years at the NHANES interview. |
format CMS_MEDICARE_MATCH cmsmatch. ON_PDE_2013 on_file. OBESE obese.; | This FORMAT statement uses the user-written formats that have been previously defined in PROC FORMAT. |
Title “Merged NHANES Feasibility Data Frequency Distribution of obesity by Medicare Match Status”; | This title statement identifies the contents of the output. |
Step 2: Check the results
To check the results of your program, review the SAS log and output (.lst) reports. Your output should resemble the table below. Note that the total count of NHANES-CMS Medicare eligible and linked participants who are obese=960. However, the number of linked participants who are obese with claims on the Prescription Drug Event File=651. These counts are unweighted.
CMS_MEDICARE_MATCH | Not Obese | Obese |
---|---|---|
Linked | 1,818 | 960 |
Not Linked | 109 | 49 |
Ineligible | 663 | 378 |
Total | 2,590 | 1,387 |
ON_PDE_2013 | Not Obese | Obese | Total |
---|---|---|---|
No | 1,516 | 736 | 2,252 |
Yes | 1,074 | 651 | 1,725 |
This task will describe issues that must be considered when designing a study using the NHANES-Medicaid linked data.
Child survey participants
In accordance with NCHS Ethics Review Board (ERB) guidelines, for survey participants younger than 18 years of age at the time of the survey, NCHS will only provide linked CMS data generated for program participation, claims and other events that occurred prior to the participant’s 18th birthday.
Additional refinements
If the proposal describing the study and indicating which variables are needed is approved (see Course 1, Module 1 for information on the RDC process), additional refinements to the study population may be needed once the data are ready for analysis. For example, analysts using the linked Medicaid data available in the RDC may want to:
- Identify and account for participants enrolled in managed care and S-CHIP.
- Identify and account for participants who died after the survey.
- Identify participants with multiple Person Summary records, potentially as a result of moving to a different state.
For more information and analytic guidelines refer to Task 1 of this module and the NCHS-CMS Medicaid Data File Documentation and Analytic Guidelines.
Information: Due to NCHS confidentiality requirements, access to NHANES-CMS linked data is restricted. These data can only be accessed through the NCHS RDC Network.