Post written by Sanjay Gadi, MD, and Jeremy Glissen Brown, MD, from Duke University Medical Center, Durham, North Carolina, USA.

Artificial intelligence—based computer-aided detection (CADe) software may improve colorectal cancer outcomes by increasing adenoma detection while reducing miss rates during colonoscopy. CADe has had promising results in controlled environments but mixed results when evaluated in real-world settings. Ongoing evaluation of CADe is needed to inform its ideal role, especially as new iterations emerge. Therefore, we aimed to identify priority scoring metrics for evaluation and comparison of CADe.
Multiple CADe systems have achieved regulatory approval in the United States, Europe, and Asia and are already used in the endoscopy suite. However, we still do not have a standardized approach to evaluate and compare the performance of various CADe systems.

In the literature on CADe, different projects measure different outcomes. Based on our review, there did not seem to be a “go-to” set of metrics that distinguished high-performing CADe algorithms from lower-performing ones. This lack of consensus on which metrics to prioritize was what inspired us to perform this study. We wanted to contribute those high-priority metrics that would enable assessment of CADe performance in a standardized way.
To our knowledge, this is the first reported international consensus statement of priority scoring metrics for CADe in colonoscopy. Our study, using a modified Delphi approach to survey an international group of experts, identified the 6 highest-priority criteria that can be used for evaluation of CADe performance. These metrics are sensitivity, separate and independent validation of the algorithm, adenoma detection rate, false-positive rate, latency, and adenoma miss rate.
For statistical criteria such as sensitivity and false-positive rate, we also wanted to precisely define these values, as they can be measured in a multitude of ways. These statistical criteria can be defined per each frame of the video (per-frame definition), per each polyp (per-polyp definition), and per each detection box that appears (per-detection box definition). Ultimately, our results identified that per-frame or per-polyp definitions should be used when operationalizing statistical criteria.
The next step in advancing this work is to validate the criteria on benchmark video datasets to develop a standardized scoring instrument that can be used to grade CADe performance. We can see medical institutions, gastroenterology societies, and regulatory bodies at the national and international levels using this scoring instrument to standardize their review of CADe performance, informing the uptake of the technology and monitoring its performance.

Read the full article online.
The information presented in Endoscopedia reflects the opinions of the authors and does not represent the position of the American Society for Gastrointestinal Endoscopy (ASGE). ASGE expressly disclaims any warranties or guarantees, expressed or implied, and is not liable for damages of any kind in connection with the material, information, or procedures set forth.