Reliable editions from unreliable components: Estimating ebooks from print editions using profile hidden Markov models

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A profile hidden Markov model, a popular model in biological sequence analysis, can be used to model related sequences of characters transcribed from books, magazines, and other printed materials. This paper documents one application of a profile HMM: Automatically producing an ebook edition from distinct print editions. The resulting ebook has virtually all the desired properties found in a publisher-prepared ebook, including accurate transcription and an absence of print artifacts such as end-of-line hyphenation and running headers. The technique, which has particular benefits for readers and libraries that require books in an accessible format, is demonstrated using seven copies of a nineteenth-century novel. CCS CONCEPTS Information systems!Digital libraries and archives.

Cite

CITATION STYLE

APA

Riddell, A. B. (2022). Reliable editions from unreliable components: Estimating ebooks from print editions using profile hidden Markov models. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3529372.3533292

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free