Word Segmentation as Unsupervised Constituency Parsing

Raquel G. Alhama

Conference ProceedingsOPEN ACCESS

Word Segmentation as Unsupervised Constituency Parsing

Alhama R

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 4103-4112

DOI: 10.18653/v1/2022.acl-long.283

1Citations

39Readers

Abstract

Word identification from continuous input is typically viewed as a segmentation task. Experiments with human adults suggest that familiarity with syntactic structures in their native language also influences word identification in artificial languages; however, the relation between syntactic processing and word identification is yet unclear. This work takes one step forward by exploring a radically different approach of word identification, in which segmentation of a continuous input is viewed as a process isomorphic to unsupervised constituency parsing. Besides formalizing the approach, this study reports simulations of human experiments with DIORA (Drozdov et al., 2019), a neural unsupervised constituency parser. Results show that this model can reproduce human behavior in word identification experiments, suggesting that this is a viable approach to study word identification and its relation to syntactic processing.

Cite

CITATION STYLE

APA

Alhama, R. G. (2022). Word Segmentation as Unsupervised Constituency Parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4103–4112). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.283

Word Segmentation as Unsupervised Constituency Parsing

Abstract

Cite

Register to see more suggestions