Oh My ...You Haven't Aged a Bit!

Purpose

Learn more about how age values are calculated in ESSENCE to avoid common pitfalls that produce negative values or very old ages.

Graphic of people celebrating a birthday

Age calculations in ESSENCE

Have you noticed age values of “-1” in ESSENCE detailed data? Now, that's young! Seriously, though, what does this mean?

ESSENCE age-related data are based on NSSP Processed data in C_Patient_Age and C_Patient_Age_Units. The NSSP business rules for age calculation are hierarchically defined as follows:

  • Calculate the age based on a valid birth date and visit date (C_Visit_Date – Birth_Date)

Else

  • Use Age_Reported/Age_Units_Reported (sent in OBX)

Else

  • Use Age_Calculated/Age_Units_Calculated (sent in OBX, although rarely seen)

This calculation can produce invalid ages in NSSP Processed data when either the birth date or visit date is weird or the age actually reported in the OBX segment is weird. For example, in Processed data in C_Patient_Age, you could see age values of:

  • NULL (perhaps the birth date reported was NULL or year/month only, making it an invalid date, and no other Age data was reported via OBX)
  • Negative values (perhaps due to a typo making the admit date from the early 1900s; or, perhaps due to that being the value actually sent in Age_Reported OBX segment)
  • Very old ages (perhaps due to a typo making the birth date from the 1800s; or, perhaps due to that being the value actually sent in Age_Reported OBX segment)

Basically, as more data become available on the BioSense Platform, we find the content of various age-related columns have outlier values. Outlier age values are NULL and cannot be categorized into ESSENCE age groups used in Table Builder. Consequently, counts tallied by querying detailed ESSENCE data may not match counts generated using Table Builder's data cubes.

To solve this problem, ESSENCE will ingest data by setting the "Age" column to -1 if the following criteria are met: Age is NULL or <0 or >120. Further, ESSENCE will categorize an Age value of -1 into the "Unknown" age group to make sure records with NULL age or odd ages are counted.

When you see an Age value of -1 in ESSENCE detailed data, check out the data passed from the Processed table into ESSENCE detailed data: C_Patient_Age, C_Patient_Age_units, C_Patient_Age_source.

You may see a NULL value, negative value, or value >120 years (ESSENCE's cutoff).

Processing rules for the core NSSP Processed data also have checks for outlier values. These checks, however, are not the same ESSENCE checks implemented after the Processed data flow was defined.

Currently, NSSP runs a process check looking for instances where the birth date calculation produces an age >150. When that occurs, the data process will set the Birth_Date_Time to NULL. This forces the C_Patient_Age algorithm to move down the hierarchy and try to use ages sent in the OBX message including Age_Reported (per current guidelines) or Age_Calculated if Age_Reported was not sent. You can leverage str_birth_date_time in Processed data to determine if a birth date was sent but unable to "convert" to a datetime value in birth_date_time. Here, NSSP's processing goal is that C_Patient_Age has the opportunity to use Age_Reported if the birth date/visit date calculation was, well, weird.

String fields are another resource

Ever notice fields in NSSP Completeness DQ reports that begin with “str_”? What are these fields? We call them “string fields.”

These fields are populated with exactly what came in the message. There are associated columns that store data pulled from the string fields, and those data may be converted to a particular format such as datetime. Put another way ... a column that supports a date and time has a "string" column that stores data exactly as reported and has another column formatted as datetime2 to store the date time in an actual datetime format. For example, str_admit_date_time stores exactly what was reported, and admit_date_time stores the data in "datetime2" format. If you find "str" fields have high completeness but their associated fields have low completeness, this indicates a format issue with the data in the HL7 message.