Taxonomy is becoming indispensable to a growing number of applications in software engineering such as software repository mining and defect prediction. However, the existing related taxonomies are always manually constructed. The sizes of these taxonomies are small and their depths are limited. In order to show the full potential of taxonomies in software engineering applications, in this paper, we present the first large-scale software programming taxonomy which is more comprehensive than any existing ones. It contains 38,205 concepts and 68,098 subsumption relations. Instead of learning from a open domain, we focus on taxonomy construction from Stackoverflow which is one of the largest QA websites about software programming. We propose a machine learning based method with novel features to create a taxonomy that captures the hierarchical semantic structure of tags in Stackoverflow. This method executes iteratively to find as many relations as possible. Experimental results show that our approach achieves much better accuracy than baselines. Compared with taxonomies related to software programming which are extracted from the general-purpose taxonomies such as WikiTaxonomy, Yago Taxonomy and Schema.org, our taxonomy has the widest coverage of concepts, contains the largest number of subsumption relations, and runs up to the deepest semantic hierarchy.
CITATION STYLE
Zhu, J., Shen, B., Cai, X., & Wang, H. (2015). Building a large-scale software programming taxonomy from stackoverflow. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (Vol. 2015-January, pp. 391–396). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2015-135
Mendeley helps you to discover research relevant for your work.