In the field of NLP, word embeddings have recently attracted a lot of attention. A textual corpus is represented as a sparse words co-occurrences matrix. Then, the matrix can be factorized, for example using SVD, which allows to obtain a shorter matrix with dense and continuous vectors. To help SVD, PMI measure is applied on the initial co-occurrence matrix, assigning a relevant weight to the co-occurrences by normalizing them using both the considered words frequencies. In this paper, we follow this idea to study if weighted networks can benefit from pre-processing that can help community detection. We first design a benchmark using LFR networks. Then, we consider PMI and another NLP inspired measure as a preprocessing of the links weights, and show that PMI worsens the results while the other one improves them. By distinguishing links inside communities and links between communities into two classes, we show that this is due to the weights distributions of these links. Links between communities are in average bigger, leading to bigger values of PMI. From this analysis, we design another set of experiments that show that it is possible to classify efficiently links into these two classes, using a small set of features. Finally, we introduce the Supervised Label Propagation (SLP) algorithm that takes into account the classification results during the propagation. This algorithm clearly improves the results, leading us to a major questioning: is community detection on weighted networks a fully unsupervised task? We conclude with our thoughts on this topic.
CITATION STYLE
Connes, V., Dugué, N., & Guille, A. (2019). Is community detection fully unsupervised? The case of weighted graphs. In Studies in Computational Intelligence (Vol. 812, pp. 256–266). Springer Verlag. https://doi.org/10.1007/978-3-030-05411-3_21
Mendeley helps you to discover research relevant for your work.