Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Stav Klein; Reut Tsarfaty

Conference ProceedingsOPEN ACCESS

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Klein S
Tsarfaty R

DOI: 10.18653/v1/2020.sigmorphon-1.24

N/ACitations

25Readers

Abstract

This work investigates the most basic units that underlie contextualized word embeddings, such as BERT — the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Modern Hebrew, as a proxy to evaluate the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We suggest that linguistically-informed word-pieces schemes, that make the morphological structure explicit, might boost performance for MRLs.

Cite

CITATION STYLE

APA

Klein, S., & Tsarfaty, R. (2020). Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology? (pp. 204–209). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.sigmorphon-1.24

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Abstract

Cite

Register to see more suggestions