FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

3Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

Cite

CITATION STYLE

APA

Chen, M., Chen, C., Yu, X., & Yu, Z. (2023). FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 211–231). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.eacl-main.17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free