Automatic document metadata extraction using support vector machines

  • Han H
  • Giles C
  • Manavoglu E
 et al. 
  • 5

    Readers

    Mendeley users who have this article in their library.
  • 181

    Citations

    Citations of this article.

Abstract

Automatic metadata generation provides scalability and usability for
digital libraries and their collections. Machine learning methods offer
robust and adaptable automatic metadata extraction. We describe a
Support Vector Machine classification-based method for metadata
extraction from header part of research papers and show that it
outperforms other machine learning methods on the same task. The method
first classifies each line of the header into one or more of 15 classes.
An iterative convergence procedure is then used to improve the line
classification by using the predicted class labels of its neighbor lines
in the previous round Further metadata extraction is done by seeking the
best chunk boundaries of each line. We found that discovery and use of
the structural patterns of the data and domain based word clustering can
improve the metadata extraction performance. An appropriate feature
normalization also greatly improves the classification performance. Our
metadata extraction method was originally designed to improve the
metadata extraction quality of the digital libraries Citeseer{[}17] and
EbizSearch{[}24]. We believe it can be generalized to other digital
libraries.

Author-supplied keywords

  • Convergence
  • Data mining
  • Design methodology
  • Learning systems
  • Robustness
  • Scalability
  • Software libraries
  • Support vector machine classification
  • Support vector machines
  • Usability

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Hui Han

  • C. L. Giles

  • E. Manavoglu

  • Hongyuan Zha

  • Zhenyue Zhang

  • E. A. Fox

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free