Persian pronoun resolution using data driven approaches

Aria Nourbakhsh; Mohammad Bahrani

Conference Proceedings

Persian pronoun resolution using data driven approaches

Communications in Computer and Information Science (2017) 756 574-585

DOI: 10.1007/978-3-319-67642-5_48

0Citations

2Readers

Get full text

Abstract

Pronoun resolution is one of the challenges of natural language processing (NLP). The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this article, we follow a previous machine learning approach on Persian pronoun anaphora resolution. The primary goal of this paper is to improve the results, mainly by extracting more balanced data through using heuristic rules in instance sampling, and utilizing more relevant features in classification. Using PCAC2008 dataset, we consider noun phrase structure as a way to extract more suitable training data. Incorporated features include syntactic and semantic information. Finally, we train and test different classifiers in order to find and compare the results. The best result is achieved by using the C4.5 decision tree classifier. The results show a significant improvement over the previous work by achieving 75% F-measure compared to 45%. An analysis of extracted features and their contribution are also discussed.

Author supplied keywords

Cite

CITATION STYLE

APA

Nourbakhsh, A., & Bahrani, M. (2017). Persian pronoun resolution using data driven approaches. In Communications in Computer and Information Science (Vol. 756, pp. 574–585). Springer Verlag. https://doi.org/10.1007/978-3-319-67642-5_48

Persian pronoun resolution using data driven approaches

Abstract

Author supplied keywords

Cite

Register to see more suggestions