Evaluation of Transformer-Based Models for Punctuation and Capitalization Restoration in Spanish and Portuguese

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Punctuation restoration plays a key role as a post-processing task in various text generation methods, such as Automatic Speech Recognition (ASR), and Machine Translation (MT). Despite its importance, the results of ASR systems and other generation models used in these tasks often produce texts that lack punctuation, which is difficult for human readers and might limit the performance of many downstream text processing tasks for web analytics, such as sentiment analysis, sarcasm detection or hate-speech identification including stereotypes, sexism, and misogyny. Thus, there are many techniques for restoring text punctuation, but most solutions like Condition Random Field (CRF) and pre-trained models such as the BERT, have been widely applied. In addition, they focus only on English and on restoring punctuation, without considering the restoration of capitalization. Recently, there has been a growing interest in an alternative method of addressing the problem of punctuation restoration, which is to transform it into a sequence labeling task. In this sense, we propose a capitalization and punctuation restoration system based on Transformers models and a sequence labeling approach for Spanish and Portuguese. Both models obtained good results: a macro-averaged F1-score of 59.90% and overall performance of 93.87% for Spanish and macro-averaged F1-score of 76.94% and 93.66% overall performance for Portuguese. In addition, they are also able to restore capitalization, identifying proper names, names of countries and organizations.

Cite

CITATION STYLE

APA

Pan, R., García-Díaz, J. A., & Valencia-García, R. (2023). Evaluation of Transformer-Based Models for Punctuation and Capitalization Restoration in Spanish and Portuguese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13913 LNCS, pp. 243–256). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-35320-8_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free