A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents

Florendia Fourli-Kartsouni; Kostas Slavakis; Georgios Kouroupetroglou; Sergios Theodoridis

Conference ProceedingsOPEN ACCESS

A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4556 LNCS(PART 3) 299-308

DOI: 10.1007/978-3-540-73283-9_34

16Citations

15Readers

Abstract

The wide-spread applications of document digitization have lead to the use of structured digital representation methods such as the XML language. Extraction methodologies for the formatting metadata can be used on such structured documents for enhancing their accessibility, including augmented audio representation of documents. To the best of our knowledge, an effort has yet to be made to produce an automatic extraction system of semantic information of the document formatting, solely from document layout, without the use of natural language processing. In this study a corpus of XML representations of several issues of a Greek newspaper is used in order to create and evaluate a semantic classifier of text formatting, based on Bayesian Networks. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Fourli-Kartsouni, F., Slavakis, K., Kouroupetroglou, G., & Theodoridis, S. (2007). A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4556 LNCS, pp. 299–308). Springer Verlag. https://doi.org/10.1007/978-3-540-73283-9_34

A Bayesian network approach to semantic labelling of text formatting in XML corpora of documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions