Text classification of technical papers based on text segmentation

Thien Hai Nguyen; Kiyoaki Shirai

Conference Proceedings

Text classification of technical papers based on text segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7934 LNCS 278-284

DOI: 10.1007/978-3-642-38824-8_25

22Citations

20Readers

Get full text

Abstract

The goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper. Based on the idea that papers are well organized and some parts of papers are more important than others for text classification, segments such as title, abstract, introduction and conclusion are intensively used in text representation. In addition, new features called Title Bi-Gram and Title SigNoun are used to improve the performance. The results of the experiments indicate that feature selection based on text segmentation and these two features are effective. Furthermore, we proposed a new model for text classification based on the structure of papers, called Back-off model, which achieves 60.45% Exact Match Ratio and 68.75% F-measure. It was also shown that Back-off model outperformed two existing methods, ML-kNN and Binary Approach. © 2013 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, T. H., & Shirai, K. (2013). Text classification of technical papers based on text segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7934 LNCS, pp. 278–284). https://doi.org/10.1007/978-3-642-38824-8_25

Text classification of technical papers based on text segmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions