Abstract
We propose a novel machine learning approach to the task of identifying definitions in Polish documents. Specifics of the problem domain and characteristics of the available dataset have been taken into consideration, by carefully choosing and adapting a classification method to highly imbalanced and noisy data. We evaluate the performance of a Random Forest-based classifier in extracting definitional sentences from natural language text and give a comparison with previous work. © 2008 Springer-Verlag Berlin Heidelberg.
Cite
CITATION STYLE
Kobyliński, Ł., & Przepiórkowski, A. (2008). Definition extraction with balanced random forests. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5221 LNAI, pp. 237–247). https://doi.org/10.1007/978-3-540-85287-2_23
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.