Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

5Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose methods1 for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.

Cite

CITATION STYLE

APA

Nagata, R., Otani, N., Takamura, H., & Kawasaki, Y. (2023). Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 15609–15622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.965

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free