America’s COVID-19 Data Gap
Co-authored by Niall Brennan, Mark Cullen, and Travis May. Mr. Brennan is the CEO of the Health Care Cost Institute and former Chief Data Officer of CMS. Dr. Cullen is a former Professor of Medicine at Stanford University and led the Center for Population Health Sciences. Mr. May is Founder & President of Datavant.
The United States’ health care system is one of the most technically advanced in the world, yet is struggling to answer even simple questions about COVID-19. Despite millions of data points in the real world, billions of dollars of research money, and trillions of dollars of economic impact, there are a number of basic questions that we still cannot answer:
This is not a problem of data collection, as there has been an explosion in the amount of health data collected in the US in the last 10 years, and the US has spent tens of billions of dollars incentivizing EHR adoption. It’s a problem of data collation — the fragmentation of data across the US health care system. Different institutions separately hold data about vaccination history, testing history, primary care, treatment, hospitalization, and mortality — and there is no easy way to pull this data together at the patient level (let alone to pull in every other piece of medical, social, behavioral, occupational and environmental information that exists for that patient). There is no centralized infrastructure to link these separate datasets, which could enable the types of insights that can be drawn by looking at the full picture. So despite the urgent need to answer these questions (and many more), and the millions of real-world data points that already exist to answer them, researchers and regulators have struggled to quickly draw meaningful conclusions to inform critical public health decisions.
A significant amount of the real-world research regarding the Delta variant has come out of Israel. It’s imperative that the US learns from what Israel is doing right, so that we can leverage the data being collected in the US for research to inform public health policy. While Israel is exemplary in its approach, its population is much smaller and likely not fully representative of US demographics. The US needs to power the same kind of research as Israel in order to better understand the nuances of COVID-19 and the Delta variant in the US, and better respond to future pandemics and public health crises.
Israel benefits from a public health infrastructure where their population is enrolled in national health maintenance organizations (HMOs). Individual-level data is collected and linked together so it can then be used for research and analysis, as part of a rich data set with information from the entire country. While the US does not necessarily need complete data on all individuals (as a small country like Israel might), we would certainly benefit from samples of data that are fully representative of the whole US population.
While Israel deserves credit for the leading edge insights that it is generating regarding COVID-19,: they have not made their data accessible to the world’s scientists. Any solution to our national data needs regarding the pandemic must incorporate access to the scientific community, in order to validate answers to research questions and rebuild the trust of the public in science.
So what do we need going forward? First, we need to be far more effective with the information we are collecting and sharing with the research community. Comprehensive data on testing, infections, vaccines, social determinants and any health care system interactions is essential to better understand the effects of COVID-19. Scaling up our surveillance and tracking data regarding COVID-19 testing and breakthrough infections would be an excellent complement, so that we could more quickly address emerging policy questions and treatment guidance.
Moreover, we need a health data infrastructure where data can be collected and managed to ensure individual level patient-privacy, while, at the same time, maintaining the data richness needed to answer difficult questions in a timely manner. While many aspects of our public health system are based at state and local levels — hence are highly variable in every way — data cannot remain fragmented in this way if we are to succeed in rapidly using it to adjust policy as the pandemic evolves. While implementation of public measures may reasonably vary from place to place for various reasons, our scientific understanding of what is happening should not be subject to the same variability. Needless to say, security and privacy must be among the highest priorities for any such infrastructure, given the highly sensitive nature of this rich data.
How can we leverage this type of model in the context of the US? The notion that a single public agency may be uniquely poised to serve this data organizing function — as some imagined the CDC might do — has proven to be difficult, and is perhaps not the optimal solution. Other federal and state organizations are poised to play roles in a collaborative national effort to create data repositories of real-world data to inform the COVID-19 pandemic response. And beyond federal and state resources are many organizations in the private and NGO sectors — bringing technical solutions, agility and resources which could prove invaluable to a US approach, and are already poised to contribute.
Implementing such a platform or platforms to connect previously siloed data would allow us to use US data to finally answer the types of questions that are top-of-mind for many, including:
- The safety and efficacy of vaccines generally, and in various subpopulations
- The incidence and progression of the disease overall and in various subpopulations
- Impacts of variants, both on breakthrough infections and general transmission
- Impacts of public health policies, including on health equity
In short, we have a tremendous amount of work to do in order to adequately scale up health data connectivity in the US. Doing so will help us catch up to countries like Israel in our understanding of COVID-19 and the Delta variant, and more broadly, help inform our critical public health decisions and policy moving forward.