Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

  • Klein S
  • Tsarfaty R
N/ACitations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

This work investigates the most basic units that underlie contextualized word embeddings, such as BERT — the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Modern Hebrew, as a proxy to evaluate the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We suggest that linguistically-informed word-pieces schemes, that make the morphological structure explicit, might boost performance for MRLs.

Cite

CITATION STYLE

APA

Klein, S., & Tsarfaty, R. (2020). Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology? (pp. 204–209). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.sigmorphon-1.24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free