A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations

5Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We demonstrate the application of a grid infrastructure for conducting text mining over distributed data and computational resources. The approach is based on using LexiQuest Mine, a text mining workbench, in a grid computing environment. We describe our architecture and approach and provide an illustrative example of mining full-text journal articles to create a knowledge base of gene relations. The number of patterns found increased from 0.74 per full-text articles from a corpus of 1000 articles to 0.83 when the corpus contained 5000 articles. However, it was also shown that mining a corpus of 5000 full-text articles took 26 hours on a single computer, whilst the process was completed in less than 2.5 hours on a grid comprising of 20 computers. Thus whilst increasing the size of the corpus improved the efficiency of the text-mining process, a grid infrastructure was required to complete the task in a timely manner. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Natarajan, J., Mulay, N., DeSesa, C., Hack, C. J., Dubitzky, W., & Bremer, E. G. (2005). A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3745 LNBI, pp. 101–108). https://doi.org/10.1007/11573067_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free