Deep learning algorithms can identify abnormalities on head computed tomography (CT) scans in patients who present with head trauma or stroke symptoms, according to study results published in the Lancet.
Approximately 20 medical centers from India provided data on 313,318 head CT scans with their clinical reports, which was retrospectively reviewed by the study investigators. The researchers randomly selected a subset of these data, the Qure25k data set, for validation, whereas the remainder of the CT scan data was used for algorithm development. The Qure25k data set provided 21,095 head CT scans (mean age, 43 years) taken at 6 radiology centers in New Delhi. In addition, the CQ500 was another validation data set collected in 2 batches from different centers in India (214 and 277 scans in batch 1 and batch 2, respectively).
For algorithm assessment, the investigators used areas under the receiver operating characteristic curves (AUCs). The developed algorithms, when run on a scan, resulted in the production of 9 valued confidence scores (range, 0-1), indicating the presence of intracranial hemorrhage, 5 hemorrhage types, midline shift, mass effect, and calvarial fracture.
In the Qure25k data set, the algorithms were associated with an AUC of 0.92 (95% CI, 0.91-0.93) for detecting intracranial hemorrhage. Comparatively, additional AUCs were 0.90 (95% CI, 0.89-0.91) for intraparenchymal, 0.96 (95% CI, 0.94-0.97) for intraventricular, 0.92 (95% CI, 0.90-0.93) for subdural, 0.93 (95% CI, 0.91-0.95) for extradural, and 0.90 (95% CI, 0.89-0.92) for subarachnoid hemorrhages.
For the CQ500 data set, detecting intracranial hemorrhage was associated with an AUC of 0.94 (95% CI, 0.92-0.97). In addition, the AUCs for detecting intraparenchymal, intraventricular, subdural, extradural, and subarachnoid hemorrhages were 0.95 (95% CI, 0.93-0.98), 0.93 (95% CI, 0.87-1.00), 0.95 (95% CI, 0.91-0.99), 0.97 (95% CI, 0.91-1.00), and 0.96 (95% CI, 0.92-0.99), respectively. Finally, the Qure25k data set was associated with AUCs of 0.92 (95% CI, 0.91-0.94) for calvarial fractures, 0.93 (95% CI, 0.91-0.94) for midline shift, and 0.86 (95% CI, 0.85-0.87) for mass effect.
A limitation of the analysis, according to the researchers, was the lack of exclusion of “follow-up scans of patients from the CQ500 dataset, mainly because very few scans were reported with some of our target abnormalities such as extradural and intraventricular haemorrhages.”
In addition, the researchers concluded that although deep learning algorithms may “be a helpful adjunct for identification of acute head CT findings in a trauma setting, providing a lower performance bound for quality and consistency of radiological interpretation” and “automate the triage process of head CT scans,” the “over-reliance on such a triage might lead to automation bias in radiologists whereby false negative scans are overlooked.”
Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392(10162):2388-2396.