Natural language analysis for semantic document modeling

Terje Brasethvik; Jon Atle Gulla

Conference Proceedings

Natural language analysis for semantic document modeling

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 1959 127-140

DOI: 10.1007/3-540-45399-7_11

0Citations

16Readers

Get full text

Abstract

To ease the retrieval of documents published on the Web, the documents should be classified in a way that users find helpful and meaningful. This paper presents an approach to semantic document classification and retrieval based on Natural Language Analysis and Conceptual Modeling. A conceptual domain model is used in combination with linguistic tools to define a controlled vocabulary for a document collection. Users may browse this domain model and interactively classify documents by selecting model fragments that describe the contents of the documents. Natural language tools are used to analyze the text of the documents and propose relevant domain model concepts and relations. The proposed fragments are refined by the users and stored as XML document descriptions. For document retrieval, lexical analysis is used to pre-process search expressions and map these to the domain model for manual query-refinement. A prototype of the system is described, and the approach is illustrated with examples from a document collection published by the Norwegian Center for Medical Informatics (KITH).

Cite

CITATION STYLE

APA

Brasethvik, T., & Gulla, J. A. (2001). Natural language analysis for semantic document modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1959, pp. 127–140). Springer Verlag. https://doi.org/10.1007/3-540-45399-7_11

Natural language analysis for semantic document modeling

Abstract

Cite

Register to see more suggestions