This paper presents a deep architecture for learning a similarity metric on variablelength character sequences. The model combines a stack of character-level bidirectional LSTM's with a Siamese architecture. It learns to project variablelength strings into a fixed-dimensional embedding space by using only information about the similarity between pairs of strings. This model is applied to the task of job title normalization based on a manually annotated taxonomy. A small data set is incrementally expanded and augmented with new sources of variance. The model learns a representation that is selective to differences in the input that reflect semantic differences (e.g., "Java developer" vs. "HR manager") but also invariant to nonsemantic string differences (e.g., "Java developer" vs. "Java programmer").
CITATION STYLE
Neculoiu, P., Versteegh, M., & Rotaru, M. (2016). Learning text similarity with siamese recurrent networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 148–157). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-1617
Mendeley helps you to discover research relevant for your work.