Ensembles of randomized trees using diverse distributed representations of clinical events

17Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

Abstract

Background: Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling. Methods: Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size. Results: The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. Conclusions: The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

References Powered by Scopus

Random forests

94865Citations
N/AReaders
Get full text

Ensemble methods in machine learning

5615Citations
N/AReaders
Get full text

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

2015Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A systematic review of fundamental and technical analysis of stock market predictions

271Citations
N/AReaders
Get full text

A Survey on Text Mining Techniques

49Citations
N/AReaders
Get full text

Evaluating parameters for ligand-based modeling with random forest on sparse data sets

38Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Henriksson, A., Zhao, J., Dalianis, H., & Boström, H. (2016). Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Medical Informatics and Decision Making, 16. https://doi.org/10.1186/s12911-016-0309-0

Readers over time

‘16‘17‘18‘19‘20‘21‘22‘23‘24‘2506121824

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 16

62%

Researcher 6

23%

Professor / Associate Prof. 3

12%

Lecturer / Post doc 1

4%

Readers' Discipline

Tooltip

Medicine and Dentistry 13

54%

Computer Science 6

25%

Engineering 3

13%

Pharmacology, Toxicology and Pharmaceut... 2

8%

Save time finding and organizing research with Mendeley

Sign up for free
0