Natural Language (NL) processing tools, such as tokenizers, part-of-speech taggers or syntactic processors obtain knowledge from a set of documents (e.g., tokens, syntactic patterns, etc.) and produce the different elements that will take part on the discourse universe in a NL text (e,g., noun phrases, verbs, sentences , etc). In this paper, we present how NL software systems development can be performed incrementally by using a high-performance specification language like Maude. A generic algebraic specification for NL is defined, including sorts and sub-sorts apart from equational properties, such as associativity and commutativity for built-in lists and sets. Then, the full discourse universe, available for NL processing, is described in terms of the algebraic specification by providing a non-deterministic but terminating set of transformation rules. Finally, and as a proof of concept, a set of documents for NL processing is given to Maude as an input term and successfully transformed into a proper document, exploring all the non-deterministic possibilities , as well as resolving the ambiguity in language. The main advantages of implementing NL in this manner are: generality, transparency, extensibility, reusability, and maintainability. To the best of our knowledge, this is the first attempt to represent and develop complex NL software systems with this formal notation, and based on the analysis conducted, this implementation constitute the basis for the design and development of more specific NL processing applications, such as text summarization.
CITATION STYLE
Lloret, E., Escobar, S., Palomar, M., & Ramos, I. (2014). Incremental and Adaptive Software Systems Development of Natural Language Applications. In Information System Development (pp. 511–523). Springer International Publishing. https://doi.org/10.1007/978-3-319-07215-9_41
Mendeley helps you to discover research relevant for your work.