Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although there exist various machine learning and text mining techniques to identify the programming language of complete code files, multi-label code snippet prediction was not considered by the research community. This work aims at devising a tuner for multi-label programming language prediction of stack overflow posts. To that end, a Hyper Source Code Classifier (HyperSCC) is devised along with rule-based automatic labeling by considering the bottlenecks of multi-label classification. The proposed method is evaluated on seven multi-label predictors to conduct an extensive analysis. The method is further compared with the three competitive alternatives in terms of one-label programming language prediction. HyperSCC outperformed the other methods in terms of the F1 score. Preprocessing results in a high reduction (50%) of training time when ensemble multi-label predictors are employed. In one-label programming language prediction, Gradient Boosting Machine (gbm) yields the highest accuracy (0.99) in predicting R posts that have a lot of distinctive words determining labels. The findings support the hypothesis that multi-label predictors can be strengthened with sophisticated feature selection and labeling approaches.

Cite

CITATION STYLE

APA

Öztürk, M. M. (2023). Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC. EAI Endorsed Transactions on Scalable Information Systems, 10(1). https://doi.org/10.4108/eai.27-5-2022.174084

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free