Skip to main content

Data quality: the key for integrated analytics

Data quality: the key for integrated analytics
By Michael Simon , Arcadia Healthcare Solutions
digital data

As healthcare delivery continues to evolve, healthcare organizations are often moving too quickly from EHR implementation to population health to risk-based contracts, glossing over (or skipping entirely) the crucial step of evaluating the quality of the data that serves as the foundation of their strategic initiatives. As these organizations adopt population health-focused tools and methodologies, an integrated analytics platform and a trusted, high quality, and objective data asset is critical for success under these new payment models.

Most healthcare analytics platforms rely heavily on claims data, which are highly structured but lack the rich context afforded by clinical data. Further, the few analytics programs that do leverage clinical data typically depend on vendor-supplied integration messages, such as a Continuity of Care Document (CCD). While CCDs offer a compact and convenient way to integrate clinical data, they also impose limitations through both design and implementation that make them insufficient for population health and performance analytics.

There are many downstream processes, including EHR configuration, data transport, aggregation, normalization, and reporting mechanisms, that through omission or commission can negatively impact data quality. Even defining a data quality gap properly presents an analytic challenge. Examples of data quality issues one might encounter in the EHR include:

  • Erroneous patient identifiers, such as a missing social security number, misspelled name, incorrect sex, or transposed date of birth. 

  • A standard numerical metric, such as blood pressure, written in text in encounter notes rather than in appropriate structured fields. 

  • Generic diagnosis codes entered quickly or out-of-habit instead of more specific and actionable diagnosis codes appropriate to the patient. 

  • Crucial radiology images absent from reports resulting in insufficient information to consult or verify a diagnosis. 

  • Inconsistent entry of standard codes, such as National Drug Catalog (NDC) for drugs, derailing bulk analysis. 


Each of these cases involves very different causes, data elements, and outside standards, and each may result in one or more different types of gaps. Some arise as a result of standard reporting configurations that fail to transmit crucial information. Others are the result of clinical practices, which may in turn stem from EHR configuration, organizational workflow, or even user personalities.
Regardless of the cause, concerns about the quality of healthcare data generated in the clinical environment threaten to derail efforts to derive organizational and public value from healthcare data sets. 


If the quality of clinical data is so important, how can the user gain confidence about his or her EHR data?

The first step towards gaining confidence in an integrated analytics program is to develop a structured way to capture clinical data quality gaps.  To address this challenge, consider the following three points at which data quality gaps can be introduced into your datasets:

  1. Capture. The stage at which clinical professionals and/or automated systems enter data into the EHR. Valid data capture requires that a clinical event happen – e.g. a patient encounter or a returned lab result – and that the results be accurately entered into the system. 

  2. Structure. The process by which captured data are stored in an appropriate format and location. Valid structure depends both on the way in which the data are entered as well as on the configuration of the EHR platform. If an integer is entered into a text field, its accessibility for reporting and analysis is reduced. Even if it is entered in a structured field, however, if the template is not mapped or configured properly, the value may still be stored to an inappropriate location.
  3. Transport. The method by which data are extracted from storage and made available to external systems for reporting or analysis. In the absence of a direct ("back-end") database connection, not all pertinent data would be included in an extract. Which fields are extracted, and how records are selected for inclusion, are key factors in how the Transport mechanism impacts the quality of outgoing data. 


Once you have established the points at which data quality gaps are introduced, you can begin to address those gaps through focused initiatives.

Rooting out unstructured data. The first step is to make sure providers are making use of structured fields whenever possible. Unfortunately, with all the checkboxes and drop-down lists, this can be overwhelming. When possible, work with providers to configure templates to minimize the number of clicks needed. Make sure numeric fields are obvious and intuitive, and make it easy to correctly select units. In cases where unstructured data come from a third party, such as lab results or an image, work with provider experts to identify key salient elements that can easily be coded in a structured format.

Finding the right structured element. Using the right structured element is harder than it seems. For example, if there are multiple fields and locations to enter vitals, consider whether the workflow can be refined to the vitals data elements most likely intended for the visit and for future reporting. This problem also arises when providers configure their EHR client.  For example, in some implementations, individuals may choose to create a template or “favorites” setting for a commonly used medication configuration. While a time-saver, if the provider-user doesn’t tie the proper NDC code to the configured medication, NDC codes will be absent in future notations of this medication, resulting in gaps in downstream reporting. (A similar effect can be seen with LOINC codes in order templates.) If gaps in structured data are detected early, simple workflow or template changes can correct the issue before it becomes unmanageable.

Knowing your reports. In our research, a major source of data quality gaps remains a mismatch between the system used for reporting data from the EHR and the system used for analysis and measurement calculations. When possible, evaluate the quality of the reporting mechanism early on, with an eye toward identifying common data quality gaps. Just because your systems uses a major standard like CCDs or HL7 doesn’t mean that that standard will support the dataset you need right out of the box.  Make sure your reporting system is configured to support a complete clinical dataset, and that you have transparency into how information is retrieved and packaged.  

It is possible to diagnose and fix data quality issues, but solutions like those above demand transparency and completeness from the dataset. Without the ability to identify gaps directly, and the ability to understand how data are structured and reported, even the most egregious data quality gaps remain obscured. In a fully integrated system with rich support for data quality analysis, a number of options for data quality interventions and remediation are at your fingertips.

By leveraging the power of an integrated data set, with an understanding of this continuum of data flow – from capture to structure to transport – healthcare organizations will have an improved understanding of and confidence in their data, granting them critical insight into their patient populations, leading to improved quality measures, incentive revenues, and most importantly, quality of care.