Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

Ryo Nagata; Naoki Otani; Hiroya Takamura; Yoshifumi Kawasaki

Conference Proceedings

Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 15609-15622

DOI: 10.18653/v1/2023.emnlp-main.965

5Citations

12Readers

Get full text

Abstract

In this paper, we propose methods1 for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.

Cite

CITATION STYLE

APA

Nagata, R., Otani, N., Takamura, H., & Kawasaki, Y. (2023). Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 15609–15622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.965

Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

Abstract

Cite

Register to see more suggestions