Visualizing document image collections using image-based word clouds

Tomas Wilkinson; Anders Brun

Conference Proceedings

Visualizing document image collections using image-based word clouds

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9474 297-306

DOI: 10.1007/978-3-319-27857-5_27

1Citations

2Readers

Get full text

Abstract

In this paper, we introduce image-based word clouds as a novel tool for a quick and aesthetic overviews of common words in collections of digitized text manuscripts. While OCR can be used to enable summaries and search functionality to printed modern text, historical and handwritten documents remains a challenge. By segmenting and counting word images, without applying manual transcription or OCR, we have developed a method that can produce word or tag clouds from document collections. Our new tool is not limited to any specific kind of text. We make further contributions in ways of stop-word removal, class based feature weighting and visualization. An evaluation of the proposed tool includes comparisons with ground truth word clouds on handwritten marriage licenses from the 17th century and the George Washington database of handwritten letters, from the 18th century. Our experiments show that image-based word clouds capture the same information, albeit approximately, as the regular word clouds based on text data.

Cite

CITATION STYLE

APA

Wilkinson, T., & Brun, A. (2015). Visualizing document image collections using image-based word clouds. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9474, pp. 297–306). Springer Verlag. https://doi.org/10.1007/978-3-319-27857-5_27

Visualizing document image collections using image-based word clouds

Abstract

Cite

Register to see more suggestions