Text preprocessing using annotated suffix tree with matching keyphrase

Ionia Veritawati; Ito Wasito; T. Basaruddin

Journal ArticleOPEN ACCESS

Text preprocessing using annotated suffix tree with matching keyphrase

International Journal of Electrical and Computer Engineering (2015) 5(3) 409-420

DOI: 10.11591/ijece.v5i3.pp409-420

6Citations

16Readers

Abstract

Text document is an important source of information and knowledge. Most of the knowledge needed in various domains for different purposes is in form of implicit content. A content of text is represented by keyphrases, which consists of one or more meaningful words. Keyphrases can be extracted from text through several steps of processing, including text preprocessing. Annotated Suffix Tree (AST) built from the documents collection itself is used to extract the keyphrase, after basic text preprocessing that includes removing stop words and stemming are applied. Combination of four variations of preprocessing is used. Two words (bi-words) and three-words of phrases extracted are used as a list of keyphrases candidate which can help user who needs keyphrase information to understand content of documents. The candidate of keyphrase can be processed further by learning process to determine keyphrase or non keyphrase for the text domain with manual validation. Experiments using simulation corpus in which keyphrases are determined from them show that keyphrases of two and three words can be extracted more than 90%. Using real corpus of economy, keyphrases or meaningful phrases can be extracted about 70%. The proposed method can be an effective way to find candidate keyphrases from collection of text documents which can reduce non keyphrases or non meaningful phrases from list of keyphrase candidates and can detect keyphrases separated by stopwords.

Author supplied keywords

Cite

CITATION STYLE

APA

Veritawati, I., Wasito, I., & Basaruddin, T. (2015). Text preprocessing using annotated suffix tree with matching keyphrase. International Journal of Electrical and Computer Engineering, 5(3), 409–420. https://doi.org/10.11591/ijece.v5i3.pp409-420

Text preprocessing using annotated suffix tree with matching keyphrase

Abstract

Author supplied keywords

Cite

Register to see more suggestions