Two-tier machine learning using conditional random fields with constraints

Sebastian Lindner

Conference Proceedings

Two-tier machine learning using conditional random fields with constraints

Lindner S

Communications in Computer and Information Science (2015) 454 80-95

DOI: 10.1007/978-3-662-46549-3_6

0Citations

3Readers

Get full text

Abstract

This paper shows a novel approach of two-tier machine learning to locate bibliographic references in HTML and separate them into fields. First it is demonstrated, how Conditional Random Fields (CRFs) with constraints can be used to split bibliographic references into fields e.g. authors and title. Therefore a unique feature set, constraints and a method for automatic keyword extraction are introduced. The output of this CRF for tagging bibliographic references, Part Of Speech (POS) analysis and Named Entity Recognition (NER) build the first tier and their output is used to locate the bibliographic reference section in the first place. For this the documents are split into blocks, which are then used for classification. For this task a Support Vector Machines (SVM) approach is compared with another one using a CRF. We demonstrate this two-tier approach archives very good results, while the reference tagging approach is able to compete with other state-of-the-art approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Lindner, S. (2015). Two-tier machine learning using conditional random fields with constraints. In Communications in Computer and Information Science (Vol. 454, pp. 80–95). Springer Verlag. https://doi.org/10.1007/978-3-662-46549-3_6

Two-tier machine learning using conditional random fields with constraints

Abstract

Author supplied keywords

Cite

Register to see more suggestions