Abstract
In view of the exponential growth of online document corpora, even perfect retrieval will fetch too much material for a user to cope with. One way to reduce this problem is automatic domain-specific summarization tailored to user's needs, which is a kind of high-level data cleaning. This requires some method of discovering classes of similar item s that may be grouped into predetermined domains. We explore whether there exists a synergic relation between systems for classification and those for summarization by way of composing those subsystems. In other words, we examine whether prior summarization will increase the performance of the classifier system and vice versa. In both cases, the answer is affirmative, as we show in this paper. We propose a text-mining framework in which these subsystems are treated as constituents of a knowledge discovery process for text corpora.
Cite
CITATION STYLE
Saravanan, M., Raj, P. C. R., & Raman, S. (2003). Summarization and categorization of text data in high-level data cleaning for information retrieval. Applied Artificial Intelligence, 17(5–6), 461–474. https://doi.org/10.1080/713827177
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.