A hybrid text classification and language generation model for automated summarization of dutch breast cancer radiology reports

Verfasst von

E. Nguyen, D. Theodorakopoulos, S. Pathak, J. Geerdink, O. Vijlbrief, M. Van Keulen, C. Seifert

Abstract

Breast cancer diagnosis is based on radiology reports describing observations made from medical imagery, such as X-rays obtained during mammography. The reports are written by radiologists and contain a conclusion summarizing the observations. Manually summarizing the reports is time-consuming and leads to high text variability. This paper investigates the automated summarization of Dutch radiology reports. We propose a hybrid model consisting of a language model (encoder-decoder with attention) and a separate BI-RADS score classifier. The summarization model achieved a ROUGE-L F1 score of 51.5% on the Dutch reports, which is comparable to results in other languages and other domains. For the BI-RADS classification, the language model (accuracy 79.1 %) was outperformed by a separate classifier (accuracy 83.3 %), leading us to propose a hybrid approach for radiology report summarization. Our qualitative evaluation with experts found the generated conclusions to be comprehensible and to cover mostly relevant content, and the main focus for improvement should be their factual correctness. While the current model is not accurate enough to be employed in clinical practice, our results indicate that hybrid models might be a worthwhile direction for future research.

Details

Externe Organisation(en)
University of Twente (UT)
Universität Duisburg-Essen (UDE)
Typ
Aufsatz in Konferenzband
Seiten
72-81
Anzahl der Seiten
10
Publikationsdatum
28.10.2020
Publikationsstatus
Veröffentlicht
Peer-reviewed
Ja
ASJC Scopus Sachgebiete
Artificial intelligence, Angewandte Informatik, Software, Kognitive Neurowissenschaft
Ziele für nachhaltige Entwicklung
SDG 3 - Gute Gesundheit und Wohlergehen
Elektronische Version(en)
https://doi.org/10.1109/CogMI50398.2020.00019 (Zugang: Unbekannt )

Zitieren

Laden...