Feature analysis for duplicate detection in programming QA communities

Wei Emma Zhang; Quan Z. Sheng; Yanjun Shu; Vanh Khuyen Nguyen

Conference Proceedings

Feature analysis for duplicate detection in programming QA communities

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10604 LNAI 623-638

DOI: 10.1007/978-3-319-69179-4_44

7Citations

12Readers

Get full text

Abstract

In community question answering (CQA), duplicate questions are questions that were previously created and answered but occur again. These questions produce noises in the CQA websites which impede users to find answers efficiently. Programming CQA (PCQA), a branch of CQA that holds questions related to programming, also suffers from this problem. Existing works on duplicate detection in PCQA websites framed the task as a supervised learning task on the question pairs, and relied on a number of extracted features of the question pairs. But they extracted only textual features and did not consider the source code in the questions, which are linguistically very different to natural languages. Our work focuses on developing novel features for PCQA duplicate detection. We leverage continuous word vectors from the deep learning literature, probabilistic models in information retrieval and association pairs mined from duplicate questions using machine translation. We provide extensive empirical analysis on the performance of these features and their various combinations using a range of learning models. Our work could be helpful for both research works and practical applications that require extracting features from texts that are not all natural languages.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, W. E., Sheng, Q. Z., Shu, Y., & Nguyen, V. K. (2017). Feature analysis for duplicate detection in programming QA communities. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10604 LNAI, pp. 623–638). Springer Verlag. https://doi.org/10.1007/978-3-319-69179-4_44

Feature analysis for duplicate detection in programming QA communities

Abstract

Author supplied keywords

Cite

Register to see more suggestions