A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

12Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user's experience. Results: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. Conclusions: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.

References Powered by Scopus

Clustering by fast search and find of density peaks

4502Citations
N/AReaders
Get full text

BIRCH: An Efficient Data Clustering Method for Very Large Databases

4088Citations
N/AReaders
Get full text

The split-apply-combine strategy for data analysis

1922Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A systematic review of machine learning-based missing value imputation techniques

63Citations
N/AReaders
Get full text

Local Sample-Weighted Multiple Kernel Clustering With Consensus Discriminative Graph

37Citations
N/AReaders
Get full text

Distributed Control of Distributed Energy Resources in Active Power Distribution System for Local Power Balance With Optimal Spectral Clustering

16Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Liao, L., Li, K., Li, K., Yang, C., & Tian, Q. (2018). A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics. BMC Systems Biology, 12. https://doi.org/10.1186/s12918-018-0630-6

Readers over time

‘18‘19‘20‘21‘2402468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

50%

Professor / Associate Prof. 3

25%

Researcher 3

25%

Readers' Discipline

Tooltip

Computer Science 4

33%

Medicine and Dentistry 4

33%

Engineering 2

17%

Agricultural and Biological Sciences 2

17%

Save time finding and organizing research with Mendeley

Sign up for free
0