Purpose
This is the sixth article in a series about how to write ESSENCE free-text queries. Free-text coding in ESSENCE, which is accessible to all users, follows distinct patterns. Learning to read these patterns allows users to take queries from many places and repurpose them to suit their unique needs. Syndromic surveillance depends heavily on sharing methods, and practitioners must understand the language.
Continuing to improve our query
Please see Part 1: Wildcards for background information about the search criteria for ESSENCE free-text queries, which are built around Boolean logical operators and regular expressions.
Building off Part 5: Putting Queries to the Test, we start with the query:
- (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out with^,),OR,^Fell^,),OR,(,^[;/ ]W[01][0-9]^,)
We also noted negations needed for things falling on patients, medical devices "falling out," and other items like mentions of "fallopian tubes."
We can incorporate these changes to make the following query:
- (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,)
Notice how these changes shorten the term ^Falling out with^ to broaden negated Chief Complaints (CCs) and to include medical devices that fall out; add a CC negation for ^Fallop^; and add a negation to ICD-10 codes for W20 – Struck by thrown, projected, or falling object. The placement of the W20 ICD-10 code negation is important. For example, if the negation is placed at the end, it will negate visits regardless of other criteria met. If we place it after the CC terms as shown above, it will not negate visits containing a positive W10-19 ICD-10 code or indicate a visit of interest.
Try it out!
Note the difference in W20 negations on these two queries. Run each query in the CCDD field of ESSENCE. Why do they give different results? Why are certain cases negated by one and not the other? If the queries don't show different results for your data, try running these in the CCQV data.
- (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,)
- (,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),OR,(,^[;/ ]W[01][0-9]^,),),ANDNOT,^[;/ ]W20^
Reviewing our query
Now that we have a well-performing query, we can review all cases to assess performance (possible false positives and true positives), but that doesn’t enable us to review negated cases (possible false negatives and true negatives). Likewise, how do we isolate performance in DD fields versus CC fields? We do this by breaking our query apart and reassembling it in different ways. Let’s leverage ESSENCE to answer some of these questions.
Fall-related query assessment
Code
Description
(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,)
Beginning Query
,^Fall^,AND,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),),OR,(,^Fall^,OR,^Fell^,),AND,^[;/ ]W20^
Isolating Negations: By reorganizing our query and replacing the “ANDNOTs” with “ANDs,” we can assess just those visits we chose to negate.
(,^[;/ ]W[01][0-9]^,)
Field Isolation, Any DD Codes: How heavily does our query rely on the ICD10 codes within it? This query returns the visits that were picked based on the DD criteria.
(,^[;/ ]W[01][0-9]^,),ANDNOT,(,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,)
Field Isolation, DD Codes ONLY: How many of our visits were picked based on the DD criteria ONLY? This query negates all CC portions.
(,^[;/ ]W[01][0-9]^,),AND,(,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,)
Field Isolation, DD AND CC: How many of our visits were picked based on BOTH DD and CC criteria? This query shows the overlap—or, those patients that have a fall-related CC and received a fall-related ICD10 code.
(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),
Field Isolation, Any CC Terms: How heavily does our query rely on the CC terms within it? This query returns the visits that were picked based on those criteria.
(,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,),ANDNOT,(,^[;/ ]W[01][0-9]^,)
Field Isolation, CC Terms ONLY: How many of our visits were picked based on fall-related CC terms but did NOT receive a fall-related ICD10 code?
The results (counts) from the queries in the preceding table should add up as follows:
- Beginning Query = CC Terms ONLY + DD Codes ONLY + DD and CC OverlapAny CC Terms = CC Terms ONLY + DD and CC OverlapAny DD Codes = DD Codes ONLY + DD and CC Overlap
This can be a useful way to understand how the different parts of your query work together.
Let’s take this a step further
ESSENCE has a Fall Subsyndrome that already does a great job capturing fall-related Chief Complaints. Did we really improve fall-related ED surveillance by making a new syndrome? Could we have leveraged the existing query in our new query? Here are some steps to leverage ESSENCE in answering these questions.
To see how our query performs compared with the Fall Subsyndrome, we can run the following queries:
Code
Description
CCDD: (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,)
SubSyndrome Free Text: ^Fall^
New query and subsyndrome overlap: By running our new query in the CCDD field and the subsyndrome in the “Subsyndrome Free Text” field, ESSENCE will automatically apply an “AND” between these two criteria and return the overlap.
CCDD: (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,)
SubSyndrome Free Text: ISNULL,or,^,ANDNOT,^Fall^
New query ONLY: If we run our new query normally and then add negations in the “Subsyndrome Free Text” field, we can ensure that Fall Subsyndrome visits are NOT returned.
CCDD: ^,ANDNOT,(,(,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,),)
SubSyndrome Free Text: ^Fall^
Subsyndrome ONLY: By following the same pattern as the “New query ONLY” example above, we can return only those visits that meet the existing subsyndrome criteria, while not meeting our new syndrome definition.
CCDD: (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,)
SubSyndrome Free Text: ISNULL,or,^,ANDNOT,^Fall^
New query CC portion ONLY: It isn’t fair to pit the Fall Subsyndrome against our query because ours leverages the DD field. To even the comparison, we can check how well our query fares by pitting just the CC portion against the Fall Subsyndrome. We can do this by removing the DD criteria and running it in the CCDD field.
There is a concept in the preceding table that was not previously covered in this series. We can illustrate this new concept by examining a hypothetical situation.
Let's say you're an analyst and have been asked to provide a timeseries graph of syndromic surveillance data for all visits excluding visits labeled with the ESSENCE "Occupational" or "ToolsOrMachinery" subsyndromes. What is the quickest way to exclude only these visits from your ESSENCE query?
By executing the following query in the Subsyndrome free-text field, you are asking "Give me Nulls or Non-Nulls, so long as the Non-Nulls do not contain the text 'Occupational' or 'ToolsOrMachinery.' "
- ISNULL,or,^,ANDNOT,(,^Occupational^,OR,^ToolsOrMachinery^,)
The string "ISNULL,or,^," will return all visits when run in a free-text field because, logically, a field is either null or has contents (^ also captures blanks). When using this format, the negation string should never apply to the ISNULL criteria. That's because the three-valued logic of SQL results in the ANDNOT statement being interpreted as "Not Null," and thus, the query is interpreted as (NULL) ANDNOT NULL. If ANDNOT statements are applied to the "ISNULL," you will get unexpected results.
Now let's examine a real-life example. In the first version of a Severe Respiratory Illness query shared in response to the Vaping-related Illness response, the query included (along with lots of other criteria) the following code in the Subsyndrome free-text field to exclude visits related to Alcohol Use and Altered Mental Status:
- (,^,or,ISNULL,),andnot,(,^alteredmental^,or,^Alcohol^,)
ESSENCE interprets this as nothing can be "ISNULL" if it contains ^alteredmental^ or ^Alcohol^. No NULL field can meet these criteria by using these negations. Consequently, the query excluded all NULL fields. This exclusion of NULL fields did not meet the intent of the query, which was designed to exclude only visits not labeled with AlteredMentalStatus or Alcohol SubSyndromes. On the other hand, the NULL fields (those not labeled with a subsyndrome) met the intended criteria quite well.
This became a problem for the health jurisdictions running the shared Severe Respiratory Illness query. An updated query was shared shortly afterward that remedied the problem by formatting these negations so that they would not apply the "ANDNOT" string to the ISNULL criteria, as follows:
- ISNULL,or,^,ANDNOT,(,^alteredmental^,or,^Alcohol^,)
After doing this work to compare the Fall Subsyndrome with a new query, we can see that our new query's Chief Complaint portion does little to improve upon the Fall Subsyndrome until the DD codes are added. Because the Fall Subsyndrome contains diverse processing like fuzzy matching and negative terms, one solution is to try capturing the best of both worlds by using the following query:
- (,^Fall^,ANDNOT,(,^Crestfallen^,OR,^Falling out^,OR,^Fallop^,OR,^Fall Out^,),OR,^Fell^,),ANDNOT,^[;/ ]W20^,OR,(,^[;/ ]W[01][0-9]^,),or,^;Fall^
Paste this query into the CCDD free-text field.Then select the option for "SubSyndrome Free Text" from the "Also apply the search string to:" box below it (Figure 1). Note the added term ^;Fall^ will capture these subsyndrome visits because the subsyndrome list field is separated by semicolons.
More tips for breaking down a query
Syndromic surveillance often depends on the ability to take a shared query or syndrome definition, break it down, and repurpose it. To do this, the user must be able to reliably parse how a definition works. There are multiple ways to do this, but here is a good way to get started:
Assess whether the parenthetical statements in a query are working
This can be difficult, especially in long queries. A tool like the free program NotePad++ will highlight the related closed parenthesis when you hover over an open parenthesis and vice versa. Tools that highlight parenthesis pairs are helpful to find where sections of code start and end.
Another tool that will help you understand how parentheses are being used is the "replace" tool in Microsoft Word. Paste the query into Word and try replacing different portions such as open parentheses, closed parentheses, or ANDNOT operators with the string "^p." The caret and lowercase p will insert a line break wherever the specified criteria are found.
For example, let's use the CCDD Category for CDC Chronic Hepatitis C v1:
- (,^chronic viral Hep[aei]t[ei]t[ei]s C^,or,^chronic Hep[aei]t[ei]t[ei]s C^,andnot,^Hep[aei]t[ei]t[ei]s C[a-z]^,),or,^[;/ ]B1[89].2^,or,^[;/ ]B1[89]2^,or,^[;/ ]831000119103^,or,^[;/ ]128302006^,or,^[;/ ]768127002^,or,^[;/ ]768289009^,or,^[;/ ]768288001^,or,^[;/ ]768125005^,or,^[;/ ]768006009^,or,^[;/ ]768126006^,or,^[;/ ]767810006^,or,^[;/ ]767810006^,or,^[;/ ]767809001^,or,^[;/ ]708198006^,or,^[;/ ]146371000119104^,or,^[;/ ]120241000119100^,or,^[;/ ]347891000119103^,or,^[;/ ]703866000^,or,^[;/ ]128971000119101^
If we paste the query into Word and then use the "Find and Replace" tool to replace the parenthesis with )^p as shown in Figure 2.
As a result, we get this modification to the code and can clearly see a separation between Chief Complaint terms and Discharge Diagnosis codes.
(,^chronic viral Hep[aei]t[ei]t[ei]s C^,or,^chronic Hep[aei]t[ei]t[ei]s C^,andnot,^Hep[aei]t[ei]t[ei]s C[a-z]^,)
(,^chronic viral Hep[aei]t[ei]t[ei]s C^,or,^chronic Hep[aei]t[ei]t[ei]s C^,andnot,^Hep[aei]t[ei]t[ei]s C[a-z]^,),or,^[;/ ]B1[89].2^,or,^[;/ ]B1[89]2^,or,^[;/ ]831000119103^,or,^[;/ ]128302006^,or,^[;/ ]768127002^,or,^[;/ ]768289009^,or,^[;/ ]768288001^,or,^[;/ ]768125005^,or,^[;/ ]768006009^,or,^[;/ ]768126006^,or,^[;/ ]767810006^,or,^[;/ ]767810006^,or,^[;/ ]767809001^,or,^[;/ ]708198006^,or,^[;/ ]146371000119104^,or,^[;/ ]120241000119100^,or,^[;/ ]347891000119103^,or,^[;/ ]703866000^,or,^[;/ ]128971000119101^
Now we can combine these sections of code and test how our query works. To fully understand a query, you may need to repeat the steps in the first half of this article. Then you should be able to answer the following questions: What proportion of these visits are due to the DD codes? What proportion is due to CC terms? What types of visits receive a positive CC text but don't get a related DD code?
Another example is the CDC Measles CCDD v1. When we follow the Word: Find and Replace process, we can see the two parts of this query break down into "Inclusion Criteria" and "Exclusion Criteria." That is exactly how the developers intended this query to perform, but it results in measles-positive diagnosis codes being negated by text strings like "Vaccine" or "Rubella," which a user might consider unintended consequences of the query. Anyone using this query must question whether a diagnosis code-positive visit is a positive regardless of other text terms. Both methods are reasonable, but only the users can decide what works best for their data and health jurisdiction.
- (,^measl^,or,^meez^,or,^mesles^,or,^rubeo^,or, ^Measel^,or,^Measul^,or,^[;/ ]b05^,or,^[;/ ]14189004 ^,or,^[;/ ]417145006^,or,^[;/ ]28463004^,or,^[;/ ]195900001^,or^[;/ ]74918002^,or,^[;/ ]38921001^,or,^[;/ ]240484000^,or,^[;/ ]240483006^,or,^[;/ ]60013002^,or,^[;/ ]111872008^,or,^[;/ ]444435003^,or,^[;/ ]186562009^,or,^[;/ ]416154000^,or,^[;/ ]426558008^,or,^[;/ ]191727003^,or,^[;/ ]111873003^,or,^[;/ ]230146001^,or,^[;/ ]186561002^,or,^[;/ ]424306000^,or,^[;/ ]13420004^,or,^[;/ ]406592004^,or,^[;/ ]427290009^,or,^[;/ ]427263000^,or,^[;/ ]426654002^,or,^[;/ ]426424002^,or,^[;/ ] 427706006^,or,^[;/ ]426091009^,or,^[;/ ]425684000^,or,^[;/ ]425970007^,or,^[;/ ]732209005^,or,^[;/ ]426188007^,or,^[;/ ]426640005^,or,^[;/ ]427073007^,or,^[;/ ]732207007^,or,^[;/ ]426535005^,or,^[;/ ]426028006^,or,^[;/ ]427182003^,or,^[;/ ]732210000^,or,^[;/ ]698204007^,or,^[;/ ]425966004^,or,^[;/ ]416435006^,or,^[;/ ]444974003^,),andnot,(,^titer^,or,^measles mumps rubella^,or,^mmr^,or,^vacc^,or,^shot^,or,^immun^,or,^rubel^,or,^proph^,or,^room b05^,or,^german^,or,^homesless^,or,^rule out measles^,or,^[;/ ]47435007^,or,^[;/ ]371085006^,or,^[;/ ]150961000119105^,or,^[;/ ]61153008^,or,^[;/ ]150971000119104^,or,^[;/ ]473166002^,or,^[;/ ]703347005^,or,^[;/ ]170433008^,or,^[;/ ]170431005^,or,^[;/ ]432636005^,or,^[;/ ]571591000119106^,or,^[;/ ]170432003^,or,^[;/ ]433733003^,or,^[;/ ]572511000119105^,or,^[;/ ]440075005 ^,)
We thank Senior Data Analyst Zachary Stein for volunteering to write a series of articles about free-text coding. Stein does epidemiological work to support NSSP efforts and is an active participant in our NSSP Community of Practice (CoP). He initially wrote about free-text coding as an entry on the NSSP-CoP Syndrome Definition Committee forum. The forum generated considerable interest, inspiring this series. Stein acknowledges input provided by others who contributed to the forum post.
If you have questions or suggestions, identify sections that need clarification, or think of a topic we overlooked, please contact Zachary Stein at oru8@CDC.Gov or the NSSP Service Desk.