Replication and the DataMart

Purpose

Many analysts perform validation and quality improvement activities on their site's incoming syndromic data. Most often this involves processed data, but it can also involve raw data. The NSSP team developed a replication process in the BioSense Platform to make sure sites would still have such capabilities. The Analytic DataMart resides on a server separate from all data processing into ESSENCE. This separation ensures that analytics and validation activities do not negatively affect data flow.

Graphic with many gears

The process

When syndromic data are received by the BioSense Platform, they are scrubbed to remove personally identifying information (PII). Next data are ingested and processed into the ESSENCE application for use in surveillance activities. To optimize processing, one database table is allocated to each site and data are indexed.

Shown below is a detailed view of how data are processed within the Archive database (Figure 1) and replicated to the Analytic DataMart, including how the archiving process fits into the larger scheme of data processing (Figure 2).

Detailed view below:

archiving process fits into the larger scheme
Figure 1. Data flow within the Archive (data processing)
how data are processed within the Archive database and replicated to the Analytic DataMart
Figure 2. Data flow overview including replication

Many analysts and epidemiologists are accustomed to performing validation and quality improvement activities on their site's incoming syndromic data—most often on the processed data but also on the raw data. When developing the BioSense Platform, the NSSP team made sure sites would still have such capabilities. So, to accommodate quality improvement activities, the Analytic DataMart was conceived. The DataMart resides on a server separate from all data processing into ESSENCE. Separation is maintained so that analytics and validation activities do not negatively affect data flow. This separation is made possible by a process called replication.

Replication is the process by which all data in the Archive processing lane are copied to DataMart incrementally, about every 10 to 15 minutes. All NSSP Data Quality Reports and Data Quality Dashboards are based on DataMart data. Authorized users may access these data on the BioSense_Platform using the Posit Workbench or SAS Studio web applications. These apps may be accessed via the Access & Management Center (AMC) home page.

All tables accessible in the BioSense_Platform are replicated views that include:

  • XX_MFT—site Master Facility Table
  • XX_Operational_Crosswalk
  • Production Views:
    • XX_PR_Raw—original messages with key metadata
    • XX_PR_Processed—processed messages with calculated values that meet minimum processing criteria
    • XX_PR_Except—processed messages that did not meet minimum criteria
    • XX_PR_Except_Reason—Message_IDs with all reasons for being placed in exceptions
    • XX_Cache_ER_Base—one record per visit, as available in ESSENCE
  • Production Onboarding (Staging) Views:
    • XX_ST_Raw—original messages with key metadata
    • XX_ST_Processed—processed messages with calculated values that meet minimum processing criteria
    • XX_ST_Except—processed messages that did not meet minimum criteria
    • XX_ST_Except_Reason—Message_IDs with all reasons for being placed in exceptions
  • Filter_Reasons—Filter Reason Code definitions
  • Except_Reasons—Exception Reason Code definitions

Note: XX refers to the site's short name.

These tables are located on a SQL server. Access is standard ODBC (open database connectivity) via Posit Workbench or SAS Studio with the initial data pull in SQL. Because of this, you can run analytics using a combination of SQL and R, or SQL and SAS. Basic R and SAS queries for running SQL queries on the BioSense_Platform have been provided in previous issues of NSSP Update (December 2020, March 2022). Please contact the NSSP Service Desk for assistance with quality improvement code. Or visit the NSSP Technical Resource Center for links to user manuals and quick start guides.

The data flow journey‎

An overview of data flow is described in the Data Dictionary: Data Elements Used in NSSP Data Processing Journey. More on data element processing and indexing is available in the NSSP Data Dictionary.

Identifying when replication is broken

If you're using Posit Workbench or SAS Studio and don't see recent data, look at these data in ESSENCE. If you see recent visits in ESSENCE that aren't reflected in the data flow, it is likely that there are issues with replication. Keep in mind that replication is often up to 15 minutes behind—and, at times, ESSENCE can be up to four hours behind. If you see evidence of problems, please submit an NSSP Service Desk ticket.

Access to apps

Contact your site administrator and request access to Posit Workbench or SAS Studio. Site administrators, please make sure the requestor has an account set up in the AMC. Once the user account is set up, grant access in the following ways:

  • DataMart: Edit the User Profile within the AMC. Check the box next to DataMart.
  • SAS Studio: Edit the User Profile within the AMC. Check the box next to SAS Studio.
  • Posit Workbench: Submit an NSSP Service Desk ticket requesting that the user be given access.

Note: Granting access to DataMart and using either analytic tool will allow access to line-level data received from all facilities within your site.

Resources