When I moved from working as an engineer into healthcare during the early 90s, I was shocked by the inability of healthcare providers to capture or use data to improve care delivery. Now, almost 30 years later, we do a much better job capturing clinical data secondary to the widespread implementation of electronic medical record (EMR) systems. However, the health data available electronically remains grossly underutilized because of the difficulty in integrating data from various sources to gain better insights. Key examples of areas that should be addressed using integrated data sets are fundamental to improving population health and include: (i) identification of unique patients; (ii) creating an inventory of health providers; (iii) linking patients to their providers; and (iv) better understanding each person’s social circumstances that influence health behaviors. The types of data required to implement each of these areas include clinical data as well as data from medical claims and information collected outside of the healthcare system that describe behavioral and social context.
Clinical Data: Clinical data is generated when patients are seen within a healthcare setting and the provider uses an electronic medical record (EMR) system to document information around a single encounter. Clinical data usually includes the medical issue driving the visit along with demographic data, medications, procedures, and allergies. Ideally, a medical problem and/or diagnosis would be documented along with the patient’s family history and selected social factors like smoking. Much of the information documented in the EMR is unstructured, making extraction difficult. Furthermore, there are hundreds of different EMRs and multiple instances of EMRs deployed at individual health systems limiting the ability to collect this data across different providers. EMR providers have not been required to make their systems interoperable, and are incentivized to keep their data models proprietary to limit the ease of changing vendors. Despite these limitations, EMR data often represents the best and most rapidly available data for tracking patient health outcomes over time and the quality of medical care provided by physicians. This is particularly true for large healthcare systems that use a single instance of an EMR across multiple care locations.
Claims Data: Claims data is collected primarily by healthcare payers including Medicare, Medicaid, and commercial insurance companies. This data is generated by physicians, hospitals, pharmacies, and other agencies associated with patient care. The data includes codes that describe medical diagnosis, procedures, medications, and medical equipment across all providers that a patient sees while they are covered by an insurer. The data provides key insights in terms of medical utilization across different providers over time, medication compliance, and provision of preventative procedures like mammograms. The claims data from Medicare has been de-identified and made publicly available to help administrators and policy makers better understand variation in care delivery and the relationship between cost and quality of care. However, this data is limited in that: (i) there is no information on true healthcare outcomes - like blood pressure or cholesterol levels; (ii) the data does not include services when insurance was not used (for example purchasing generic medications from a low cost retailer); (iii) the data is dependent on accurate coding by the health provider; (iv) it does not track patients as they change health insurance coverage over time; and (v) claims data require weeks to months to become available.
Social Determinants of Health and Non-Traditional Data: Clinical and claims data fail to capture important information required to understand underlying health related behaviors. These factors, often referred to as social determents of health, have been found to underlie approximately 80 percent of health related outcomes. If healthcare providers want to move into value-based delivery models that improve overall health, they must have insight into and understanding of how to impact these social determinants of health. The first step is collection of data that traditionally has not been used in healthcare. This includes understanding of geospatial data describing where the individual lives. Data on factors like socio-economic status and purchasing behaviors developed and used by retailers and the financial industry can provide important insight for care providers that are looking to keep their patients healthier. Surprisingly, healthcare providers also struggle to even identify patients under their care that have passed away, this data has to be purchased from an external source.
Fundamental Use Cases
Knowledge of your individual patients: This initial use of data may seem like a simple task, however, many health providers struggle to be able to identify unique patients –particularly those with common surnames. Inability to track unique patients can lead to mistaken identity and result in significant adverse events; the least of which being inappropriate disclosure of personal health data. Knowledge of your unique patients improves markedly with increased access to and integration of multiple data sources.
Knowledge of your providers: This use case may also seem straight forward, but surprisingly continues to be an obstacle for many healthcare systems that have used different data systems to collect provider information for credentialing, hospital privileges, marketing, and licensing. Having a single source of truth is important to understand future workforce needs, to identify where patients are seeking care outside of a system, and to improve provider quality of care by appropriately linking patients to their doctors.
Householding: A major opportunity for healthcare providers can be developed through an understanding of how their patients are connected within a family or household. This factor is of key importance of certain health behaviors like smoking and medical conditions that are inherited or linked to social determinants are likely to impact multiple members of a family. This type of data is not traditionally collected within an EMR, and again requires integration of data across multiple sources.
Data Integration is a Foundational Competency for Population Health: Before healthcare providers master even the basic use cases outlined above, we must advance our abilities to integrating data at scale in near real-time. Current approaches that use expensive on premise data warehouses and traditional ETL consistently failed to scale or mange the changes that invariably occur in the transactional systems. Providers also struggle to get EMR venders to take up new solutions or bring in outside data and insight needed to improve care. The future of integration solutions will need to take advantage of next-generation, open source technology like Hadoop to cheaply store and analyze big data using massively parallel commodity hardware. Using this solution will improve the quality of data, allow us to better identify unique patients, and provide insight into patient behavioral and social factors that impact their health. Together, this knowledge will allow us to build better care delivery systems that focus of keeping individuals making all of us healthier and happier.