The step to e-research in philosophy depends on the availability of high quality, easily and freely accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. Corpora matching these needs are at the moment virtually non-existing. Within @PhilosTei, we have addressed this corpus building problem by developing an open source, web-based, user-friendly workflow from textual images to TEI, based on state-of-the-art open source OCR software, to wit Tesseract, and a multi-language version of TICCL, a powerful OCR post-correction tool. We have demonstrated the utility of the tool by applying it to a multilingual, multi-script corpus of important eighteenth to twentieth-century European philosophical texts.
CITATION STYLE
van Hessen, A. (2017). @PhilosTEI: Building Corpora for Philosophers. In CLARIN in the Low Countries (pp. 379–392). Ubiquity Press. https://doi.org/10.5334/bbi.32
Mendeley helps you to discover research relevant for your work.