Pronoun resolution is one of the challenges of natural language processing (NLP). The proposed solutions range from heuristic rule-based to machine learning data driven approaches. In this article, we follow a previous machine learning approach on Persian pronoun anaphora resolution. The primary goal of this paper is to improve the results, mainly by extracting more balanced data through using heuristic rules in instance sampling, and utilizing more relevant features in classification. Using PCAC2008 dataset, we consider noun phrase structure as a way to extract more suitable training data. Incorporated features include syntactic and semantic information. Finally, we train and test different classifiers in order to find and compare the results. The best result is achieved by using the C4.5 decision tree classifier. The results show a significant improvement over the previous work by achieving 75% F-measure compared to 45%. An analysis of extracted features and their contribution are also discussed.
CITATION STYLE
Nourbakhsh, A., & Bahrani, M. (2017). Persian pronoun resolution using data driven approaches. In Communications in Computer and Information Science (Vol. 756, pp. 574–585). Springer Verlag. https://doi.org/10.1007/978-3-319-67642-5_48
Mendeley helps you to discover research relevant for your work.