Learning text similarity with siamese recurrent networks

285Citations
Citations of this article
425Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a deep architecture for learning a similarity metric on variablelength character sequences. The model combines a stack of character-level bidirectional LSTM's with a Siamese architecture. It learns to project variablelength strings into a fixed-dimensional embedding space by using only information about the similarity between pairs of strings. This model is applied to the task of job title normalization based on a manually annotated taxonomy. A small data set is incrementally expanded and augmented with new sources of variance. The model learns a representation that is selective to differences in the input that reflect semantic differences (e.g., "Java developer" vs. "HR manager") but also invariant to nonsemantic string differences (e.g., "Java developer" vs. "Java programmer").

Cite

CITATION STYLE

APA

Neculoiu, P., Versteegh, M., & Rotaru, M. (2016). Learning text similarity with siamese recurrent networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 148–157). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-1617

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free