This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.
CITATION STYLE
Wu, H., Huang, H., & Lu, W. (2016). BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 686–690). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1105
Mendeley helps you to discover research relevant for your work.