Abstract
Underlying any processing and analysis of texts is the need to represent the individual characters that make up those texts. For the first few decades, scholars pioneering digital classical philology had to adopt various workaround for dealing with the various scripts of historical languages on systems that were never intended for anything but English. The Unicode Standard addresses many of the issues with character encoding across the world's writing systems, including those used by historical languages, but its practical use in digital classical philology is not without challenges. This chapter will start with a conceptual overview of character coding systems and the Unicode Standard in particular but will discuss practical issues relating to the input, interchange, processing and display of classical texts. As well as providing guidelines for interoperability in text representation, various aspects of text processing at the character level will be covered including normalisation, search, regular expressions, collation, and alignment.
Cite
CITATION STYLE
Tauber, J. K. (2019). Character encoding of classical languages. In Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution (pp. 137–57). De Gruyter. https://doi.org/10.1515/9783110599572-009
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.