The use of electronic data capture (EDC) tools is changing how clinical trials are conducted. Digital systems have allowed for more decentralized studies, particularly when meeting in person was impossible during the COVID-19 pandemic. However, EDC systems provide value throughout the trial development and analysis processes. EDC technology plays a critical role in data cleaning and drives better results.
To better understand the value of EDC systems in data cleaning, it helps to take a deeper dive into the entire process. Primarily, data cleaning systems work to sort through large piles of data and eliminate errors — all while presenting the data in an easy-to-review manner.
“The main problem is that logs, as well as metrics, come in different forms, making both analysis and correlation difficult for each point and almost impossible between the two,” says Vikas Bhatt, founder and chief revenue officer at Only B2B. “Data cleansing not only eradicates errors from both the data types but also transforms metrics and log data into a common format, providing teams with shared insights and views across the entire environment of the application.”
Data cleaning can affect the entire clinical trial. With the wrong data or poor data, teams can come to different conclusions about the effectiveness of treatments and whether they are safe to go to market.
The pandemic has recently highlighted the value of clean data. Some state health departments are scrambling to protect their COVID-19 reports to ensure governing bodies make the best decisions possible for constituents.
“In the effort to get information out quickly because of the high interest in COVID-19, data quality is sometimes sacrificed,” says Traci DeSalvo, director of the Bureau of Communicable Diseases at the Wisconsin Department of Health.
The state has worked to correct the results from rapid antigen tests that differed from the more accurate PCR tests. About 3,000 confirmed cases were updated or corrected within a few weeks in March, significantly changing the state’s COVID-19 outlook.
The need for high-quality data can be found across the healthcare field, from early- to late-stage trials and even in results tracking after a product debuts on the market.
Historically, data cleaning steps were viewed as something to rush through — or at least as a necessary evil that took up time in order to get approved for the next part of the trial. However, more data experts and trial managers are starting to appreciate data cleaning and the value it brings to the trial.
“The act of cleaning data is the act of preferentially transforming data so that your chosen analysis algorithm produces interpretable results,” says Randy Au, quantitative UX researcher at Google. “That is also the act of data analysis.”
Au argues that time spent on data cleaning isn’t wasted. This is time spent improving and prioritizing data so that it can be presented. Yes, clinical trials will move faster with data cleaning technology, but there will always be researchers behind it to determine how data should be viewed or prioritized.
“Technical cleaning usually begins by surveying the data with simple yet effective summaries and sanity checks, validating the overall structure of the data, correcting the typing of the fields, formatting, and ensuring completeness of the data,” says senior data scientist Maria Pospelova. “Zooming in on the data with additional field-by-field investigations, the number and nature of investigations will greatly depend on the specific content and its structure.”
EDC systems can make data cleaning easier, but also more strategic. Teams can focus on checking for accuracy and eliminating outliers through detailed analysis and review.
Many EDC systems are increasing their value with the help of artificial intelligence (AI) and machine learning (ML). These technological advances make data cleaner and speed up the cleaning process.
“With Artificial Intelligence (AI) systems going mainstream, our ability to find and interpret new information patterns from this complex data and make more reliable and accurate predictions of clinical outcomes has improved drastically,” writes David Wang, Masha Hoffey, and Dr. Simone Sharma at Drug Discovery World. “It also helps provide predictive long-term outcomes around safety and efficacy and reduces the time and cost of clinical trials which could vastly improve the drug development process.”
In some cases, EDC systems can prepare researchers for the types of data they are about to review. They can grade datasets to help trial developers understand if they are working with clean data or problematic information sets.
“Users should be presented with data based on the probability that the data is clean or the data points that require their attention,” says Prabha Ranganathan, director of life sciences data warehousing and analytics at Perficient. “The data reviewers can prioritize their review activity based on the input from the ML models.”
This can significantly reduce the amount of time it takes to review data and can guide researchers straight to problem areas within the research.
Data cleaning systems save lives, and the COVID-19 vaccine clinical trials provide one of the best examples of this. The researchers at Pfizer credit a Smart Data Query (SDQ) tool developed by its “incubation sandbox” teams for speeding up the data cleaning process.
“It saved us an entire month,” says Demetris Zambas, vice president and global head of data monitoring and management at Pfizer. “It really has had a significant impact on the first-pass quality of our clinical data and the speed through which we can move things along and make decisions.”
It normally takes more than 30 days to clean up data, but this software system reviewed the trial information in 22 hours.
As clinical trial teams increase their data collection, the need for cleaning systems also grows. Clinical trial firms are developing their own cleansing tech stacks to keep their data quality high.
According to a study by the Tufts Center for the Study of Drug Development, 73.9 percent of clinical research organizations use two or more EDC solutions. Only 26.1 percent of CROs use just one. The survey interviewed 194 respondents who used 12 different EDC systems.
EDC systems get to the root of clinical data cleaning. Not only do they make it easy to sort through data sets, but they also improve data collection from the start.
“The efficiency of an EDC system is directly related to its design,” the team at Med Institute explains. “An effective framework enables users to perform data entry in less time with fewer errors. A good way to prevent erroneous data from being entered in the database and to minimize the need for subsequent queries is the judicious use of edit checks.”
One of the easiest ways to reduce the time spent in data cleaning is to improve the data set that you need to review.
“As the use of EDC in clinical studies increases, it is becoming clear that such systems present new challenges in ensuring data integrity and subject safety,” write Adam Donat, Martin Hamilton, Irfan Khan, and Nichole Chamberlain at The Society of Clinical Research Associates. “During the study-planning phase for example, designing staff training on the study’s data management procedures helps to ensure each site is collecting accurate data uniformly across the study.”
Even with advanced EDC systems, clinical trials are still focused on humans and conducted by humans. Proper training and the right tools can improve the data collection process and reduce the burden on researchers to clean data sets that are incorrect or disorganized.
Human error will always be a part of clinical trials. Even the best researchers can input incorrect numbers or tweak an algorithm by accident. Electronic data capture systems work to find these errors and serve as a system of checks and balances with research teams. This allows for better data and more confident results as different treatments are approved.