Extracting structured subject information from digital document archives

Jyi Shane Liu; Ching Ying Lee

Conference Proceedings

Extracting structured subject information from digital document archives

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4312 LNCS 141-150

DOI: 10.1007/11931584_17

1Citations

5Readers

Get full text

Abstract

Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP, and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database. © Springer-Verlag Berlin Heidelberg 2006.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, J. S., & Lee, C. Y. (2006). Extracting structured subject information from digital document archives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4312 LNCS, pp. 141–150). Springer Verlag. https://doi.org/10.1007/11931584_17

Extracting structured subject information from digital document archives

Abstract

Author supplied keywords

Cite

Register to see more suggestions