Classifying texts using relevancy signatures

Ellen Riloff; Wendy Lehnert

Conference Proceedings

Classifying texts using relevancy signatures

Proceedings Tenth National Conference on Artificial Intelligence (1992) 329-334

DOI: 10.3115/1075527.1075576

13Citations

79Readers

Get full text

Abstract

Text processing for complex domains such as terrorism is complicated by the difficulty of being able to reliably distinguish relevant and irrelevant texts. We have discovered a simple and effective filter, the Relevancy Signatures Algorithm, and demonstrated its performance in the domain of terrorist event descriptions. The Relevancy Signatures Algorithm is based on the natural language processing technique of selective concept extraction, and relies on text representations that reflect predictable patterns of linguistic context. This paper describes text classification experiments conducted in the domain of terrorism using the MUC-3 text corpus. A customized dictionary of about 6,000 words provides the lexical knowledge base needed to discriminate relevant texts, and the CIRCUS sentence analyzer generates relevancy signatures as an effortless side-effect of its normal sentence analysis. Although we suspect that the training base available to us from the MUC-3 corpus may not be large enough to provide optimal training, we were nevertheless able to attain relevancy discriminations for significant levels of recall (ranging from 11% to 47%) with 100% precision in half of our test runs.

Cite

CITATION STYLE

APA

Riloff, E., & Lehnert, W. (1992). Classifying texts using relevancy signatures. In Proceedings Tenth National Conference on Artificial Intelligence (pp. 329–334). Publ by AAAI. https://doi.org/10.3115/1075527.1075576

Classifying texts using relevancy signatures

Abstract

Cite

Register to see more suggestions