80 COVID-19 Questions That Can Be Answered Using Real-World Data

Travis May
10 min readJun 19, 2020

As part of the tragedy of COVID-19, tens of thousands of new cases are diagnosed in the US each day. In America, the data from these patients are captured across electronic medical records, medical claims, diagnostic tests, pharmacies, and mortality records.

Buried in this data are the answers to many of the questions vexing researchers: how the disease progresses, the impact of various co-morbidities, the safety and efficacy of various therapeutics, the genetic correlations of the disease, and the demographic disparities of the disease’s impact. Tens of thousands of researchers are hard at work trying to understand snapshots of the data to improve our understanding.

In our last post, we surveyed the major types of real-world data (RWD), the strengths and limitations of each type, and how such data can be combined to create “fit-for-purpose” data sets for analysis. The challenges with real-world data include fragmentation of information, difficulty to access data at scale, lack of randomization, and lack of standardization — challenges that have made it difficult to rapidly utilize this information to better understand the disease. But the promise of using the collective information captured by the healthcare system to better understand, prevent, and treat COVID-19 has made thousands of researchers begin focus on the use of RWD.

Here, we focus on what questions can be answered with RWD, and provide eighty examples, as well as a short description of the types of RWD necessary to be linked to answer those questions.

***

Disease characteristics

Today, much is still unknown about the fundamental characteristics of COVID-19, including how infectious or deadly the disease is. Understanding these characteristics is key to being able to design policy and clinical solutions to mitigate its impacts.

Sample Questions

  1. Are there long-lasting health impacts or changes to patients as a result of COVID-19 infection?
  2. What are early or unusual symptoms that may indicate that someone has been infected with the virus?
  3. Which comorbidities are correlated with severe outcomes?
  4. Does temperature or other geographic characteristics influence how rapidly COVID-19 spreads, or the severity of cases?
  5. Which genes are correlated with an increased risk of severe COVID-19 outcomes?
  6. What is the r0 of COVID-19 in different communities?
  7. How quickly can patients resume their normal activity levels after being infected?
  8. How often are antibody tests false positives?
  9. What is the true fatality rate of COVID-19 infections?
  10. Are the results of antibody tests impacted by previous coronavirus or influenza infections?
  11. What percentage of patients are asymptomatic?
  12. Which genes are correlated with less susceptibility to contracting the infection, and/or experiencing severe outcomes?
  13. Does blood type impact the likelihood of severe COVID-19 outcomes?
  14. What is the incubation period of the virus?
  15. What is the correlation between obesity and severe COVID-19 outcomes?
  16. What is the correlation between high cholesterol and severe COVID-19 outcomes?
  17. Do individuals who have contracted a coronavirus within a certain time period demonstrate less susceptibility to infection?

Required Data

  • Claims data: Standardized, longitudinal patient data
  • Lab data: Captures which patients tested positive or negative for both COVID-19 as well as other comorbidities
  • EHR data: Includes detailed clinical information, offering insight into how clinical decisions were made and into patients’ symptoms
  • Genomics data: Captures whether patients have certain genetic characteristics
  • Mortality data
  • Other relevant data sets: Travel data (to investigate infection rates), wearables (to measure activity levels before, during and after infection), weather data (to study the impact of weather patterns)

Socioeconomic and demographic factors

Numerous open questions remain on how socioeconomic and demographic factors interact with COVID-19. Early indications are that disease outcomes vary significantly by gender, race, age, and income level, and there is much more work to understand the interplay of these factors in disease trajectories.

Sample Questions

  1. Does socioeconomic status impact the risk of hospitalization and mortality?
  2. Does alcohol use increase or reduce risk of COVID-19 infection? What about disease severity?
  3. Does smoking increase or reduce COVID-19 risk of COVID-19 infection? What about disease severity?
  4. What type of essential personnel is most likely to get infected by COVID-19?
  5. Does frequenting certain locations (e.g., gas stations during a long road trip; gyms; certain types of grocery stores) lead to higher COVID-19 susceptibility?
  6. Which geographies are most likely to have lab testing available?
  7. Does having adequate lab testing in the surrounding geographic area correlate to avoiding hospitalization and mortality?
  8. Does socioeconomic status dictate my likelihood of getting a lab test?
  9. Does insurance status influence the likelihood of receiving a lab test?
  10. Does the presence of a caregiver in the home make a person more or less likely to be tested, be hospitalized, and/or have a bad outcome?
  11. How is COVID-19 impacting folks who might experience homelessness?
  12. Do socioeconomic factors predict which groups of patients are most likely to end up in the ICU?
  13. Does a higher proportion of intergenerational households in an area correlate with a higher rate of infection?
  14. Are elderly patients who receive food through support programs less likely to be hospitalized from COVID-19 infection?
  15. How do COVID-19 outcomes vary by race and gender, including hospitalization, likelihood of being put on a ventilator, and risk of mortality?
  16. Are “essential personnel” more likely to get infected by COVID-19?
  17. Is sedentariness correlated with disease incidence or severity?
  18. Are physical activity rates correlated with infection rates, incidence, or disease severity?
  19. Does living in public housing put patients at higher risk of being infected with COVID-19 and dying?
  20. Does being in prison put patients at higher risk of being infected with COVID-19 and dying?

Required data sets

  • Consumer data: May contain a variety of socio-eonomic and demographic factors, including ethnicity, income bracket, address instability, and occupation, as well as propensity to purchase alcohol or tobacco products
  • Claims and EHR data: Understand clinical decision-making, hospital admissions, as well as which patients went to the ICU or were placed on a ventilator. Also shows site of care
  • Lab data: Determining which patients have tested positive or negative for COVID-19, as well as what types of patients were able to access lab tests
  • Mortality data

The safety and efficacy of different therapeutics

Today, providers are using various therapeutic options to attempt to treat COVID-19 with only anecdotal evidence of therapy safety and effectiveness. While there is no substitute for the gold standard of a randomized controlled trial, RWD can help researchers better understand which of these therapeutic options are safe and effective on a more accelerated timeline.

  1. Which off-label drugs are being used with COVID patients?
  2. Which off-label drugs correlate with reduced hospitalization, ventilator use, and/or mortality?
  3. Are there genomic or physiological characteristics in patients that influence drug effectiveness?
  4. What is the long-term safety and efficacy of a COVID-19 vaccine (once developed)?
  5. What is the long-term safety of drugs used for COVID-19?
  6. Does receiving an influenza vaccine have preventative effects for COVID-19?
  7. What pre-existing prescription drug usage is correlated with protective benefits from getting COVID-19?
  8. What pre-existing prescription drug usage is correlated with protective benefits from hospitalization, ventilator use, and mortality?
  9. What is the impact on other diseases if anti-inflammatory and other drugs are being repurposed for COVID19 treatment?
  10. Do drugs used for COVID-19 have interactions with any commonly-used medications?

Data sets

  • Clinical trial data: May be connected to RWD to follow patients longitudinally and understand long term outcomes not captured in the clinical trial
  • EHR data: Contains information on what drug a provider chose to order for a given patient
  • Claims data: Offers information on medical outcomes and clinical treatment patterns
  • Pharmacy claims: For drugs prescribed by a doctor, pharmacy claims show what prescriptions were actually filled, not just ordered, and therefore can help reflect information about both adherence and drug shortages
  • Patient registry data: Deep, curated sources of disease-specific data pulled from multiple sources, including self-reported health outcomes
  • Mortality data
  • Genomics data: Identify genetic traits correlated with therapeutic outcomes
  • Other relevant data: Credit card data can capture what over-the-counter drugs a patient purchased

Impacts of clinical resource shifting

As a result of COVID-19, health systems have cancelled or delayed elective procedures, as well as reduced the amount of preventative care provided. This clinical resource shifting will likely have impacts long after the pandemic.

Sample questions

  1. What is the long-term impact on patients who are delaying or not getting elective surgeries?
  2. Did mortality rates for emergent conditions such as stroke change during the pandemic?
  3. What is the impact on delayed preventative diagnostic tests on overall disease burden?
  4. What is the impact on delayed preventative care on overall disease burden?
  5. What is the impact on delayed prenatal care?
  6. What is the short term effect on any anti-inflammatory drug shortages (e.g., anti IL-6) for patients who need those drugs for other diseases?
  7. Are there changes in prescription refill rates? What is the long term effect on patient health from any COVID-19-related non-adherence?
  8. Are patients of higher or lower socioeconomic status more likely to experience delayed preventative care?
  9. Are drug shortages affecting care for patients with other diseases?
  10. Are there certain types of procedures that are more or less likely to be delayed?

Required data

  • Specialty EHR data: for investigating the impact of drug shortages or delayed treatments on specific types of diseases like cancer
  • Claims data: Establish the baseline amount of surgeries or treatments that would have been expected, current levels of those same procedures, and quantify excess morbidity and mortality after the pandemic
  • Pharmacy claims data: understand whether prescriptions found in EHR data were filled
  • Wearables and medical device data: See if outcomes such as spikes in blood sugar levels or declines in physical activity occurred concurrent with delayed or canceled care
  • Socioeconomic data: Understand what social determinants are correlated with adverse outcomes
  • Mortality data

Healthcare system delivery

COVID-19 has forced many health systems to quickly move to telehealth and remote patient monitoring. Understanding the scope of these changes, and how they have impacted quality, cost, and patient outcomes will help offer guidance for how COVID-19 might reshape the U.S. healthcare system.

Sample questions

  1. How has the COVID-19 pandemic changed small group practices’ financial situation?
  2. How is the type of hospital (non-profit vs. for-profit, academic medical center vs. not, large vs. small) correlated with patient mortality or total caseload?
  3. For common conditions that are now being managed at home, are outcomes different when managed at home versus being managed in a medical setting?
  4. Are there certain specialties that have shifted to telemedicine more than others?
  5. How has the adoption of telemedicine impacted the likelihood of COVID-19 diagnosis? Is adoption of telemedicine correlated with other health outcomes?
  6. Have health systems that have dealt with previous natural disasters adapted better to the challenges of COVID-19?
  7. How has skilled nursing facility utilization changed during the COVID-19 pandemic period?
  8. Are certain types of patients more likely to utilize telehealth during the pandemic?
  9. Have reimbursement rates and patterns changed as a result of COVID-19?
  10. How have vaccination rates in children changed during the pandemic?
  11. Did health systems with value-based payments models, such as ACOs, respond differently to COVID-19?
  12. How have specialty pharmacy dispensing patterns changed during the pandemic?
  13. What is the financial impact on hospitals of delayed or canceled elective surgeries?

Required data

  • Medical claims data: Observe the amount of activity during the pandemic, as well as to show whether treatment locations have shifted to telehealth or other alternative delivery models
  • Socioeconomic and demographic data: Determine if certain types of patients are more or less likely to take advantage of new treatment systems
  • Remittance data: See how the total amount of remittances, or payments, to providers and hospital systems changed during the pandemic
  • Specialty pharmacy data: See how specialty pharmacy dispensing patterns changed
  • Other relevant data: Provider demographic data, such as number of beds or academic medical center status, can be useful to see if trends are the same across all provider types

Second-order societal impacts of COVID-19:

The second-order impacts of COVID-19 are wide ranging and include impacts on educational development, family structure, mental health, and physical health. Answering these questions can guide policymakers to adopt more customized and tailored mitigation policies.

  1. Do school closures impact childrens’ health outcomes?
  2. Have shelter-in-place orders changed the amount of physical activity that Americans are engaging in?
  3. Have reductions in visitations at hospitals and nursing homes impacted patients’ health status?
  4. Has social isolation led to greater prevalence of mental health issues, such as anxiety and depression?
  5. Do the health effects of shelter-in-place orders vary by geography?
  6. Are areas with poor internet or phone connections impacted by shelter-in-place orders differently than urban areas? Do these areas see higher levels of anxiety or depression?
  7. How has widespread unemployment impacted the number of Americans who are covered by insurance, and how has that impacted their health outcomes?
  8. Has unemployment led to greater prevalence of mental health issues, such as depression and severe stress?
  9. What are the major factors driving excess mortality across the board (not just for COVID-19 patients)?
  10. Has prenatal care changed as a result of COVID-19? Have there been impacts on infant mortality and outcomes?
  11. Do veterans who received community care versus care at a VA center differ in risk of poor outcomes or mortality?
  12. How does prison density correlate with COVID-19 risk, and how do outcomes for prisoners with COVID-19 compare to members of the general population?
  13. Do shelter-in-place orders reduce the frequency and severity of COVID-19 infections?
  14. What is the correlation between reopening and an increase in COVID-19 infections? Are there different intermediate options in use that correlate to different levels of risk?

Data sets

  • Claims, lab, or EHR data: Identify patients with a COVID-19 diagnosis
  • Mortality data
  • Financial, socioeconomic, and consumer data: Indicate if patients lose jobs or have changes in their financial security as a result of the pandemic
  • Mobility data: Investigate the impact of shelter-in-place orders
  • Policy data: Understand policy measures in different jurisdictions

***

Travis May is the Chief Executive Officer of Datavant, a company whose mission is to connect the world’s health data to improve patient outcomes.

Together with Health Care Cost Institute, Change Healthcare, Veradigm, Symphony Health, Healthjump, Office Ally, Medidata, and Snowflake, Datavant is a founding member of the COVID-19 Research Database, whose goal is to make fit-for-purpose real world data available pro bono for noncommercial COVID-related research.

--

--

Travis May

Entrepreneur, Investor, and Board Member. Founder & Fmr CEO of LiveRamp and Datavant.