Merge Feasibility File and NHANES Data
Purpose
This module provides instruction on how to merge the SAS-formatted datasets from the feasibility and NHANES data files created in the previous module, into a single analytic dataset.
Task: Merge Feasibility Files and NHANES Demographic Datasets
Once you have created the SAS-formatted feasibility and NHANES datasets you will then need to merge them by SEQN. For analytic purposes, subsetting the data is not recommended. In this task, both files are sorted, and then merged to create a new SAS dataset. SAS coding examples are included.
Step 1: Run Program
Download, open and run the SAS Medicare program. Note that this is the same program that was downloaded in Module 6. If you have already downloaded the program and altered the LIBNAME statements, you do not need to download it again.
Statements | Explanation |
---|---|
Note: These commands can be found at the top of the program under the heading “Course 2, Module 7.” | |
options ls=120 ps=42 missing=’ ‘ nocenter validvarname=upcase compress=binary; | Log/List options. COMPRESS option reduces storage requirements for output datasets. |
libname nhanes ‘C:\NHANES’; | Provides location for reading and saving SAS datasets. |
data temp; | Creates a temporary SAS dataset, TEMP. |
set nhanes.nhanes_11_12_eligibility_Medicare; | Reads in the NHANES 2011-2012 Medicare feasibility data. |
proc sort data=temp; by SEQN; run; |
Sorts the temp dataset by the variable SEQN. |
proc sort data=nhanes.demo_g; by SEQN; run; |
Sorts the NHANES demographic dataset by the variable SEQN. |
data nhanes.merg1112_Medicare; merge temp nhanes.demo_g; by SEQN; /*Regroup race_ethnicity as 4 categories*/ if ridreth3=3 then raceth=1; /*Non-Hispanic White*/ else if ridreth3=4 then raceth=2; /*Non-Hispanic Black*/ else if ridreth3 in (1,2) then raceth=3; /*Mexican American/Other Hispanic*/ else raceth=4; /*Other*/ label raceth = ‘Race/Ethnicity (recode)’; run; |
Merges both datasets, by SEQN, which effectively adds variables from the file last referenced to TEMP. (Each file contains the same number of observations). Recodes race/ethnicity variable, RIDRETH3, into 4 categories. New variable is RACETH. Results are saved in permanent SAS dataset, MERG1112_MEDICARE.sas7bdat. |
proc contents data= nhanes.merg1112_Medicare varnum; run; |
Lists the contents of the SAS dataset. VARNUM option prints the list of variables by their position in the dataset. |
proc print data= nhanes.merg1112_Medicare (obs=10); title2 ‘Sample Listing – 10 Rows of select variables’; var seqn cms_medicare_match riagendr ridreth3 raceth wtmec2yr; run; |
Prints sample listing of first 10 records with a select set of variables. |
proc means data= nhanes.merg1112_Medicare ; title2 ‘Simple Means’; var seqn cms_medicare_match; run; |
Reports the number of observations, the mean, the standard deviation, the minimum value, and the maximum value for two variables in the SAS dataset. |
Step 2: Check the results.
To check the results of your program, go to the folder referenced in the LIBNAME. You should now see MERG1112_MEDICARE.sas7bdat in the folder.
Also, review the SAS log report on the number of observations and variables in the file. This information is also listed in the SAS (.lst) output report for PROC CONTENTS.
Highlights from the output:
- In the 2011-2012 NHANES demographic and NHANES-CMS feasibility file there are 9,756 records.
Step 1: Run Program
Download, open and run the SAS Medicaid program. Note that this is the same program that was downloaded in Module 6. If you have already downloaded the program and altered the LIBNAME statements, you do not need to download it again.
Statements | Explanation |
---|---|
Note: These commands can be found at the top of the program under the heading “Course 2, Module 7.” | |
options ls=120 ps=42 missing=’ ‘ nocenter validvarname=upcase compress=binary; | Log/List options. COMPRESS option reduces storage requirements for output datasets. |
libname nhanes ‘C:\NHANES’; | Provides location for reading and saving SAS datasets. |
data temp; | Creates a temporary SAS dataset, TEMP. |
set nhanes.nhanes_11_12_eligibility_Medicaid; | Reads in the NHANES 2011-2012 Medicaid feasibility data. |
proc sort data=temp; by SEQN; run; |
Sorts the temp dataset by the variable SEQN. |
proc sort data=nhanes.demo_g; by SEQN; run; |
Sorts the NHANES demographic dataset by the variable SEQN. |
data nhanes.merg1112_Medicaid; merge temp nhanes.demo_g; by SEQN; /*Regroup race_ethnicity as 4 categories*/ if ridreth3=3 then raceth=1; /*Non-Hispanic White*/ else if ridreth3=4 then raceth=2; /*Non-Hispanic Black*/ else if ridreth3 in (1,2) then raceth=3; /*Mexican American/Other Hispanic*/ else raceth=4; /*Other*/ label raceth = ‘Race/Ethnicity (recode)’; run; |
Merges both datasets, by SEQN, which effectively adds variables from the file last referenced to TEMP. (Each file contains the same number of observations). Recodes race/ethnicity variable, RIDRETH3, into 4 categories. New variable is RACETH. Results are saved in permanent SAS dataset, MERG1112_MEDICAID.sas7bdat. |
proc contents data= nhanes.merg1112_Medicaid varnum; run; |
Lists the contents of the SAS dataset. VARNUM option prints the list of variables by their position in the dataset. |
proc print data= nhanes.merg1112_Medicaid (obs=10); title2 ‘Sample Listing – 10 Rows of select variables’; var seqn cms_medicaid_match riagendr ridreth3 raceth wtmec2yr; run; |
Prints sample listing of first 10 records with a select set of variables. |
proc means data= nhanes.merg1112_Medicaid; title2 ‘Simple Means’; var seqn cms_medicaid_match; run; |
Reports the number of observations, the mean, the standard deviation, the minimum value, and the maximum value for two variables in the SAS dataset. |
Step 2: Check the results.
To check the results of your program, go to the folder referenced in the LIBNAME. You should now see MERG1112_MEDICAID.sas7bdat in the folder.
Also, review the SAS log report on the number of observations and variables in the file. This information is also listed in the SAS (.lst) output report for PROC CONTENTS.
Highlights from the output:
- In the 2011-2012 NHANES demographic and NHANES-CMS feasibility file there are 9,756 records.