Three coworkers analyzing data on desktop computer; clinical data analysis concept

Clinical Data Analysis: 3 Most Important Frontiers Right Now

Data collection and analysis are the keys to clinical research. And while there are myriad tools for collecting and completing clinical data analysis that didn’t exist a decade ago, research still leaves many insights on the table. That’s because it’s impossible to explore all of the knowledge in the data we have right now.

Clinical data management best practices aim to solve this, but the core problem we touched on in 2021 remains the same today: Data that gets collected doesn’t necessarily get analyzed. In any clinical trial, hundreds of key data points are often ignored or thrown out.

Below, we examine the three most important frontiers of clinical data analysis and how the technologies on those cutting edges are attempting to solve this problem.

Natural Language Processing

Natural language processing (NLP) is the branch of artificial intelligence concerned with teaching a model how to parse data that’s input in a format we humans would recognize as conversational.

This is work that healthcare professionals have done manually for generations. When a patient tells their primary care doctor, “I have a fever, my head hurts, and I feel a little dizzy,” the doctor will later record that information in the patient’s electronic health record (EHR) as “R50. Fever of other and unknown origin,” “R51. Headache,” and “R53. Malaise and fatigue.”

NLP’s ability to automate that work opens up so many opportunities with clinical data. The clearest use case is in freeing up healthcare providers’ attention. “Removing the administrative burden of manual data input using dictation tools alongside NLP would enhance patient-centred consultations whilst reducing an unpopular part of General Practitioner (GP) workload,” Imperial College London medical students Yasin Uddin, Abhinav Nair, Sameed Shariq, and Sehaan Hasan Hannan write in an April 2023 response to an article in the British Medical Journal titled “How can we improve the quality of data collected in general practice?”

Uddin et al. cite research that found GPs spend twice as much time on documenting EHRs as they do with patients.

Those time and cost savings create other opportunities, as well. Robert Y. Lee from the Cambia Palliative Care Center of Excellence at UW Medicine in Seattle is the lead author on a March 2023 paper in which the researchers conclude: “NLP may facilitate clinical research studies that would otherwise be infeasible due to the costs of manual medical record abstraction.”

Key challenges remain in the development of NLP as a clinical trial analysis tool.

For one thing, there is something of a language barrier. “There is a diversity of healthcare data being produced in English, especially research data and scientific literature,” Dr. Martin Krallinger and fellow researchers at the Barcelona Supercomputing Center write. “Nonetheless, most clinical content, in particular clinical records, are being written in a diversity of languages.”

Further, immense collaboration is necessary to train up machine learning models on parsing all of the human languages we use for recording medical data. The “challenges (or opportunities) faced by clinical NLP are too great to be tackled by individuals working alone or in small research groups,” write Honghan Wu, from the Institute of Health Informatics at University College London, and fellow researchers.

Wu et al. call for community- and national-level cooperation “to create reproducible streamlined procedures for facilitating access to free-text clinical data.”

Man working on laptop computer; Clinical Data Analysis concept


Deep Learning and Explainability

Deep learning models and model explainability go hand-in-hand. As machine learning models get more sophisticated in the ways they make connections and recognize patterns, they become harder for our minds to understand.

Take for instance the semi-supervised deep-learning model that Guangzhi Wang and fellow researchers reported on in a 2023 article for Heliyon. Those researchers taught their deep learning model to recognize airway conditions in patients by inputting thousands of photos of people’s faces as well specific, identifying predictors of airway troubles (e.g., maximum mouth opening, chin-to-chest distance).

It’s easy enough for a lay person to understand, more or less, how that learning model works. But what’s difficult is to understand how — i.e., by what calculations and assessments — the model makes the connections it does. How did it conclude that Patient A’s chin-to-chest distance makes that person more or less likely to have an airway condition?

This is where conversations about explainable artificial intelligence (XAI) lead into realms such as usability. In a 2022 article for the journal Artificial Intelligence in Medicine, lead author Carlo Combi and fellow researchers argued that understandability and usability are two complementary dimensions. “That is, usability is enhanced via understandability: an AI application that is understandable is more likely to be usable,” they write.

The important lesson for now is that explainability is a relatively new, still-maturing notion. As Farzad V. Farahani and fellow authors note in a 2022 literature review, the most recent research suggests explainable AI models are indeed finding ways to demonstrate their rationale to the degree of trust needed in medical sciences.

“[T]he XAI in this research field is still immature and young,” Farahani et al. write. “If we expect to overcome XAI’s current imperfections, great effort is still needed to foster XAI research. Finally, medical AI and XAI’s needs cannot be achieved without keeping medical practitioners in the loop.”

Medical technology team meeting and discussing graphics; Clinical Data Analysis concept

Data Standardization and Harmonization

As important as data processing and analysis methods will be for the future of clinical research, a foundational challenge exists in the way data is collected.

Different people record information differently. In fields such as medicine, there are rigorous standards that keep data mostly harmonized from one organization to the next. But data from outside of that sphere — e.g., real-world data collected from consumer products — won’t necessarily be so standardized.

Machine learning processes data at massive scales, so standardizing and harmonizing input datasets is a tremendous challenge.

“I think the thing that has been most important is the old adage ‘it takes a village’ which really is true,” Amy Abernathy, president of product development and chief medical officer at Verily, said in a 2021 fireside chat. “It takes the software, data, and product collaborators, as well as the clinical, statistical, and quantitative sciences collaborators to come together to solve this problem.”

Learn More

There is a theme across all three frontiers mentioned above. Every one of these challenges require broad efforts from numerous communities and groups to solve. These data challenges aren’t things clinical researchers can solve on their own.

As these technologies evolve, data consolidation, integration, and delivery will become even more important for research organizations and sponsors.

To learn more about how Anju’s data science services can help your organization manage and derive insights from your data, contact us today or request a product demo.

Images used under license from

Want to stay up to date with our news?

To top