FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Maximillian Chen; Caitlyn Chen; Xiao Yu; Zhou Yu

Conference ProceedingsOPEN ACCESS

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (2023) 211-231

DOI: 10.18653/v1/2023.eacl-main.17

3Citations

18Readers

Abstract

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

Cite

CITATION STYLE

APA

Chen, M., Chen, C., Yu, X., & Yu, Z. (2023). FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 211–231). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.eacl-main.17

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Abstract

Cite

Register to see more suggestions