A profile hidden Markov model, a popular model in biological sequence analysis, can be used to model related sequences of characters transcribed from books, magazines, and other printed materials. This paper documents one application of a profile HMM: Automatically producing an ebook edition from distinct print editions. The resulting ebook has virtually all the desired properties found in a publisher-prepared ebook, including accurate transcription and an absence of print artifacts such as end-of-line hyphenation and running headers. The technique, which has particular benefits for readers and libraries that require books in an accessible format, is demonstrated using seven copies of a nineteenth-century novel. CCS CONCEPTS Information systems!Digital libraries and archives.
CITATION STYLE
Riddell, A. B. (2022). Reliable editions from unreliable components: Estimating ebooks from print editions using profile hidden Markov models. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3529372.3533292
Mendeley helps you to discover research relevant for your work.