Word identification from continuous input is typically viewed as a segmentation task. Experiments with human adults suggest that familiarity with syntactic structures in their native language also influences word identification in artificial languages; however, the relation between syntactic processing and word identification is yet unclear. This work takes one step forward by exploring a radically different approach of word identification, in which segmentation of a continuous input is viewed as a process isomorphic to unsupervised constituency parsing. Besides formalizing the approach, this study reports simulations of human experiments with DIORA (Drozdov et al., 2019), a neural unsupervised constituency parser. Results show that this model can reproduce human behavior in word identification experiments, suggesting that this is a viable approach to study word identification and its relation to syntactic processing.
CITATION STYLE
Alhama, R. G. (2022). Word Segmentation as Unsupervised Constituency Parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4103–4112). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.283
Mendeley helps you to discover research relevant for your work.