To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.
Wille, D., Babur, Ö., Cleophas, L., Seidl, C., van den Brand, M., & Schaefer, I. (2018). Improving custom-tailored variability mining using outlier and cluster detection. Science of Computer Programming, 163, 62–84. https://doi.org/10.1016/j.scico.2018.04.002