Although large medical texts are stored in electronic format, they are seldom reused because of the difficulty of processing narrative texts by computer. Morphological analysis is a key technology for extracting medical terms correctly and automatically. This process parses a sentence into its smallest unit, the morpheme. Phrases consisting of two or more technical terms, however, cause morphological analysis software to fail in parsing the sentence and output unprocessed terms as "unknown words." The purpose of this study was to reduce the number of unknown words in medical narrative text processing. The results of parsing the text with additional dictionaries were compared with the analysis of the number of unknown words in the national examination for radiologists. The ratio of unknown words was reduced 1.0% to 0.36% by adding terminologies of radiological technology, MeSH, and ICD-10 labels. The terminology of radiological technology was the most effective resource, being reduced by 0.62%. This result clearly showed the necessity of additional dictionary selection and trends in unknown words. The potential for this investigation is to make available a large body of clinical information that would otherwise be inaccessible for applications other than manual health care review by personnel.
CITATION STYLE
Tsuji, S., Nishimoto, N., & Ogasawara, K. (2008). Pilot study of domain-specific terminology adaptation for morphological analysis: research on unknown terms in national examination documents of radiological technologists. Nippon Hoshasen Gijutsu Gakkai Zasshi, 64(7), 791–794. https://doi.org/10.6009/jjrt.64.791
Mendeley helps you to discover research relevant for your work.