Comparing Contextual Embeddings for Semantic Textual Similarity in Portuguese

José E. Andrade Junior; Jonathan Cardoso-Silva; Leonardo C.T. Bezerra

Conference Proceedings

Comparing Contextual Embeddings for Semantic Textual Similarity in Portuguese

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 13074 LNAI 389-404

DOI: 10.1007/978-3-030-91699-2_27

0Citations

4Readers

Get full text

Abstract

Semantic textual similarity (STS) measures how semantically similar two sentences are. In the context of the Portuguese language, STS literature is still incipient but includes important initiatives like the ASSIN and ASSIN 2 shared tasks. The state-of-the-art for those datasets is a contextual embedding produced by a Portuguese pre-trained and fine-tuned BERT model. In this work, we investigate the application of Sentence-BERT (SBERT) contextual embeddings to these datasets. Compared to BERT, SBERT is a more computationally efficient approach, enabling its application to scalable unsupervised learning problems. Given the absence of SBERT models pre-trained in Portuguese and the computational cost for such training, we adopt multilingual models and also fine-tune them for Portuguese. Results showed that SBERT embeddings were competitive especially after fine-tuning, numerically surpassing the results of BERT on ASSIN 2 and the results observed during the shared tasks for all datasets considered.

Author supplied keywords

Cite

CITATION STYLE

APA

Andrade Junior, J. E., Cardoso-Silva, J., & Bezerra, L. C. T. (2021). Comparing Contextual Embeddings for Semantic Textual Similarity in Portuguese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13074 LNAI, pp. 389–404). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-91699-2_27

Comparing Contextual Embeddings for Semantic Textual Similarity in Portuguese

Abstract

Author supplied keywords

Cite

Register to see more suggestions