Exploiting languages proximity for part-of-speech tagging of three French regional languages

Pierre Magistry; Anne Laure Ligozat; Sophie Rosset

Journal Article

Exploiting languages proximity for part-of-speech tagging of three French regional languages

Language Resources and Evaluation (2019) 53(4) 865-888

DOI: 10.1007/s10579-019-09463-7

7Citations

16Readers

Get full text

Abstract

This paper presents experiments in part-of-speech tagging of low-resource languages. It addresses the case when no labeled data in the targeted language and no parallel corpus are available. We only rely on the proximity of the targeted language to a better-resourced language. We conduct experiments on three French regional languages. We try to exploit this proximity with two main strategies: delexicalization and transposition. The general idea is to learn a model on the (better-resourced) source language, which will then be applied to the (regional) target language. Delexicalization is used to deal with the difference in vocabulary, by creating abstract representations of the data. Transposition consists in modifying the target corpus to be able to use the source models. We compare several methods and propose different strategies to combine them and improve the state-of-the-art of part-of-speech tagging in this difficult scenario.

Author supplied keywords

Cite

CITATION STYLE

APA

Magistry, P., Ligozat, A. L., & Rosset, S. (2019). Exploiting languages proximity for part-of-speech tagging of three French regional languages. Language Resources and Evaluation, 53(4), 865–888. https://doi.org/10.1007/s10579-019-09463-7

Exploiting languages proximity for part-of-speech tagging of three French regional languages

Abstract

Author supplied keywords

Cite

Register to see more suggestions