Abstract
In this paper, we propose methods1 for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.
Cite
CITATION STYLE
Nagata, R., Otani, N., Takamura, H., & Kawasaki, Y. (2023). Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 15609–15622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.965
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.