Two-tier machine learning using conditional random fields with constraints

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper shows a novel approach of two-tier machine learning to locate bibliographic references in HTML and separate them into fields. First it is demonstrated, how Conditional Random Fields (CRFs) with constraints can be used to split bibliographic references into fields e.g. authors and title. Therefore a unique feature set, constraints and a method for automatic keyword extraction are introduced. The output of this CRF for tagging bibliographic references, Part Of Speech (POS) analysis and Named Entity Recognition (NER) build the first tier and their output is used to locate the bibliographic reference section in the first place. For this the documents are split into blocks, which are then used for classification. For this task a Support Vector Machines (SVM) approach is compared with another one using a CRF. We demonstrate this two-tier approach archives very good results, while the reference tagging approach is able to compete with other state-of-the-art approaches.

Cite

CITATION STYLE

APA

Lindner, S. (2015). Two-tier machine learning using conditional random fields with constraints. In Communications in Computer and Information Science (Vol. 454, pp. 80–95). Springer Verlag. https://doi.org/10.1007/978-3-662-46549-3_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free