DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation

Joseph Hajjar; Weicheng Ma; Soroush Vosoughi

Conference ProceedingsOPEN ACCESS

DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation

SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop (2022) 1157-1162

DOI: 10.18653/v1/2022.semeval-1.163

3Citations

29Readers

Abstract

This paper presents our approach for tackling SemEval-2022 Task 8: Multilingual News Article Similarity. Our experiments show that even by using multi-lingual pre-trained language models (LMs), translating the text into the same language yields the best evaluation performance. We also find that stylometric features of the text and meta-information of the news articles can be predicted based on the text with low error rates, and these predictions could be used to improve the predictions of the overall similarity scores. These findings suggest substantial correlations between authorship information and topical similarity estimation, which sheds light on future stylometric and topic modeling research.

Cite

CITATION STYLE

APA

Hajjar, J., Ma, W., & Vosoughi, S. (2022). DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation. In SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 1157–1162). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.semeval-1.163

DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation

Abstract

Cite

Register to see more suggestions