Turkish language resources: Morphological parser, morphological disambiguator and web corpus

Haşim Sak; Tunga Güngör; Murat Saraçlar

Conference Proceedings

Turkish language resources: Morphological parser, morphological disambiguator and web corpus

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5221 LNAI 417-427

DOI: 10.1007/978-3-540-85287-2_40

97Citations

64Readers

Get full text

Abstract

In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present an implementation of a morphological parser based on two-level morphology. This parser is one of the most complete parsers for Turkish and it runs independent of any other external system such as PC-KIMMO in contrast to existing parsers. Due to complex phonology and morphology of Turkish, parsing introduces some ambiguous parses. We developed a morphological disambiguator with accuracy of about 98% using averaged perceptron algorithm. We also present our efforts to build a Turkish web corpus of about 423 million words. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5221 LNAI, pp. 417–427). https://doi.org/10.1007/978-3-540-85287-2_40

Turkish language resources: Morphological parser, morphological disambiguator and web corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions