This paper presents a framework for aligning comparable documents collection. Our feature based model is able to consider different characteristics of documents for evaluating their similarities. The model uses the content of documents while no link, special tag or Metadata are available. And also we apply a filtering mechanism which made our model to be properly applicable for a large collection of data. According to the results, our model is able to recognize related documents in the target language with recall of 45.67% for the 1-best and 62% for the 5-best.
CITATION STYLE
Zafarian, A., Aghasadeghi, A., Azadi, F., Ghiasifard, S., Alipanahloo, Z., Bakhshaei, S., & Ziabary, S. M. M. (2015). AUT Document Alignment Framework for BUCC Workshop Shared Task. In 8th Workshop on Building and Using Comparable Corpora, BUCC 2015 - co-located with 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2015 - Proceedings (pp. 79–87). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3412
Mendeley helps you to discover research relevant for your work.