Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm
The aim of the study is to create, train, and test the algorithm for the analysis of brain CT text reports using a decision tree model to solve the task of simple binary classification of presence/absence of intracranial hemorrhage (ICH) signs.
Materials and Methods. The initial data is a download from the Unified Radiological Information Service of the Unified Medical Information and Analytical System (URIS UMIAS) containing 34,188 studies obtained by a non-contrast CT of the brain in 56 inpatient medical settings. Data analysis and preprocessing were carried out using NLTK (Natural Language Toolkit, version 3.6.5), a library for symbolic and statistical processing of natural language, and scikit-learn, a machine learning library containing tools for classification tasks. According to 14 selected ICH-related key words, as well as 33 stop-phrases with key words denoting absence of ICH, an automatic selection of the CT investigations and their subsequent expert verification were carried out. Two classes of investigations were formed based on the sample from 3980 protocol descriptions: containing descriptions of ICH and without them. The problem of binary classification was solved using the decision tree algorithm as a model. To evaluate the performance of the model, the CT investigations were divided randomly into samples in the ratio of 7:3. Of 3980 protocols, 2786 were assigned to the training data set, 1194 — to the test one.
Results. According to the test results, the designed and trained algorithm in the binary classification of the CT reports “with signs of ICH” and “without signs of ICH” has shown sensitivity of 0.94, specificity of 0.88, F-score of 0.83.
Conclusion. The developed and trained algorithm for the analysis of radiology reports has demonstrated high accuracy in relation to brain CT with signs of intracranial hemorrhage and can be used to solve binary classification problems and create appropriate data sets. However, it is limited by the need for manual revision of CT studies to ensure quality control.
- Belolipetskaya A.E., Golovina T.A., Polyanin A.V. Digital transformation of healthcare: a competency-based approach. Problemy sotsialnoi gigieny, zdravookhraneniya, i istorii meditsiny 2020; 28(S): 694–700, https://doi.org/10.32687/0869-866x-2020-28-s1-694-700.
- Polishchuk N.S., Vetsheva N.N., Kosarin S.P., Morozov S.P., Kuz’mina E.S. Unified Radiological Information Service as a key element of organizational and methodical work of Research and Practical Center of Medical Radiology. Radiologia — praktika 2018; 1: 6–17.
- Buchlak Q.D., Milne M.R., Seah J., Johnson A., Samarasinghe G., Hachey B., Esmaili N., Tran A., Leveque J.C., Farrokhi F., Goldschlager T., Edelstein S., Brotchie P. Charting the potential of brain computed tomography deep learning systems. J Clin Neurosci 2022; 99: 217–223, https://doi.org/10.1016/j.jocn.2022.03.014.
- Kuo W., Hӓne C., Mukherjee P., Malik J., Yuh E.L. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A 2019; 116(45): 22737–22745, https://doi.org/10.1073/pnas.1908021116.
- Ginat D.T. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology 2020; 62(3): 335–340, https://doi.org/10.1007/s00234-019-02330-w.
- Pons E., Braun L.M.M., Hunink M.G.M., Kors J.A. Natural language processing in radiology: a systematic review. Radiology 2016; 279(2): 329–343, https://doi.org/10.1148/radiol.16142770.
- Wang Y., Sohn S., Liu S., Shen F., Wang L., Atkinson E.J., Amin S., Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 2019; 19(1): 1, https://doi.org/10.1186/s12911-018-0723-6.
- Vo T.H., Nguyen N.T.K., Kha Q.H., Le N.Q.K. On the road to explainable AI in drug-drug interactions prediction: a systematic review. Comput Struct Biotechnol J 2022; 20: 2112–2123, https://doi.org/10.1016/j.csbj.2022.04.021.
- Chen J., Druhl E., Polepalli Ramesh B., Houston T.K., Brandt C.A., Zulman D.M., Vimalananda V.G., Malkani S., Yu H. A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews. J Med Internet Res 2018; 20(1): e26, https://doi.org/10.2196/jmir.8669.
- Chen P.H. Essential elements of natural language processing: what the radiologist should know. Acad Radiol 2020; 27(1): 6–12, https://doi.org/10.1016/j.acra.2019.08.010.
- Sysoev O., Bartoszek K., Ekström E.C., Ekholm Selling K. PSICA: decision trees for probabilistic subgroup identification with categorical treatments. Stat Med 2019; 38(22): 4436–4452, https://doi.org/10.1002/sim.8308.
- Hostettler I.C., Muroi C., Richter J.K., Schmid J., Neidert M.C., Seule M., Boss O., Pangalu A., Germans M.R., Keller E. Decision tree analysis in subarachnoid hemorrhage: prediction of outcome parameters during the course of aneurysmal subarachnoid hemorrhage using decision tree analysis. J Neurosurg 2018; 129(6): 1499–1510, https://doi.org/10.3171/2017.7.jns17677.
- He B., Guan Y., Dai R. Classifying medical relations in clinical text via convolutional neural networks. Artif Intell Med 2019; 93: 43–49, https://doi.org/10.1016/j.artmed.2018.05.001.
- Qing L., Linhong W., Xuehai D. A novel neural network-based method for medical text classification. Future Internet 2019; 11(12): 255, https://doi.org/10.3390/fi11120255.
- Donnelly L.F., Grzeszczuk R., Guimaraes C.V. Use of natural language processing (NLP) in evaluation of radiology reports: an update on applications and technology advances. Semin Ultrasound CT MR 2022; 43(2): 176–181, https://doi.org/10.1053/j.sult.2022.02.007.
- Vrigazova B. The proportion for splitting data into training and test set for the bootstrap in classification problems. Bus Syst Res 2021; 12(1): 228–242, https://doi.org/10.2478/bsrj-2021-0015.
- Warner J.L., Levy M.A., Neuss M.N. ReCAP: feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. J Oncol Pract 2016; 12(2): 157–158, https://doi.org/10.1200/jop.2015.004622.
- Szlosek D.A., Ferretti J.M. Using machine learning and natural language processing algorithms to automate the evaluation of clinical decision support in electronic medical record systems. EGEMS (Wash DC) 2016; 4(3): 1222, https://doi.org/10.13063/2327-9214.1222.
- Davidson E.M., Poon M.T.C., Casey A., Grivas A., Duma D., Dong H., Suárez-Paniagua V., Grover C., Tobin R., Whalley H., Wu H., Alex B., Whiteley W. The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Med Imaging 2021; 21(1): 142, https://doi.org/10.1186/s12880-021-00671-8.
- Morozov S.P., Vladzimirskiy A.V., Klyashtornyy V.G., Andreychenko A.E., Kul’berg N.S., Gombolevskiy V.A., Sergunova K.A. Klinicheskie ispytaniya programmnogo obespecheniya na osnove intellektual’nykh tekhnologiy (luchevaya diagnostika). Seriya “Luchshie praktiki luchevoy i instrumental’noy diagnostiki” [Clinical trials of software based on intelligent technologies (diagnostic radiology). Series “Best practices of radiological and instrumental diagnostics”]. Moscow; 2019; 51 p.