We extend prior work on a model for natural language text representation and retrieval using a linguistic device called text grammar. We demonstrate the value of this approach in accessing relevant items from a collection of empirical abstracts in a medical domain. The advantage, when compared to traditional keyword retrieval, is that this approach is a significant move towards knowledge representation and retrieval. Text representation in this model includes keywords and their conceptual roles in the text. In particular, it involves extracting TOPIC predicates representing the research issue addressed and DESIGN predicates representing important methodological features of the empirical study. Preliminary experimentation shows that keywords exhibit a variety of text-grammar roles in a text database. Second, as intuitively expected, retrieval using TOPIC predicates identifies a smaller subset of texts than Boolean retrieval does. These empirical results along with the theoretical work indicate that the representation and retrieval strategies proposed have a significant potential. Finally, EMPIRICIST, a prototype system is described. In it the text representation predicates are implemented as a network while retrieval is through constrained-spreading activation strategies. © 1993, ACM. All rights reserved.
CITATION STYLE
Rama, D. V., & Srinivasan, P. (1993). An Investigation of Content Representation Using Text Grammars. ACM Transactions on Information Systems (TOIS), 11(1), 51–75. https://doi.org/10.1145/151480.151490
Mendeley helps you to discover research relevant for your work.