A word pair dataset for semantic similarity and relatedness in korean medical vocabulary: Reference development and validation

Yunjin Yum; Jeong Moon Lee; Moon Joung Jang; Yoojoong Kim; Jong Ho Kim; Seongtae Kim; Unsub Shin; Sanghoun Song; Hyung Joon Joo

Journal ArticleOPEN ACCESS

A word pair dataset for semantic similarity and relatedness in korean medical vocabulary: Reference development and validation

JMIR Medical Informatics (2021) 9(6)

DOI: 10.2196/29667

7Citations

10Readers

Abstract

Background: The fact that medical terms require special expertise and are becoming increasingly complex makes it difficult to employ natural language processing techniques in medical informatics. Several human-validated reference standards for medical terms have been developed to evaluate word embedding models using the semantic similarity and relatedness of medical word pairs. However, there are very few reference standards in non-English languages. In addition, because the existing reference standards were developed a long time ago, there is a need to develop an updated standard to represent recent findings in medical sciences. Objective: We propose a new Korean word pair reference set to verify embedding models. Methods: From January 2010 to December 2020, 518 medical textbooks, 72,844 health information news, and 15,698 medical research articles were collected, and the top 10,000 medical terms were selected to develop medical word pairs. Attending physicians (n=16) participated in the verification of the developed set with 607 word pairs. Results: The proportion of word pairs answered by all participants was 90.8% (551/607) for the similarity task and 86.5% (525/605) for the relatedness task. The similarity and relatedness of the word pair showed a high correlation (ρ=0.70, P

Author supplied keywords

Cite

CITATION STYLE

APA

Yum, Y., Lee, J. M., Jang, M. J., Kim, Y., Kim, J. H., Kim, S., … Joo, H. J. (2021). A word pair dataset for semantic similarity and relatedness in korean medical vocabulary: Reference development and validation. JMIR Medical Informatics, 9(6). https://doi.org/10.2196/29667

A word pair dataset for semantic similarity and relatedness in korean medical vocabulary: Reference development and validation

Abstract

Author supplied keywords

Cite

Register to see more suggestions