A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations

Jeyakumar Natarajan; Niranjan Mulay; Catherine DeSesa; Catherine J. Hack; Werner Dubitzky; Eric G. Bremer

Conference Proceedings

A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3745 LNBI 101-108

DOI: 10.1007/11573067_11

5Citations

12Readers

Get full text

Abstract

We demonstrate the application of a grid infrastructure for conducting text mining over distributed data and computational resources. The approach is based on using LexiQuest Mine, a text mining workbench, in a grid computing environment. We describe our architecture and approach and provide an illustrative example of mining full-text journal articles to create a knowledge base of gene relations. The number of patterns found increased from 0.74 per full-text articles from a corpus of 1000 articles to 0.83 when the corpus contained 5000 articles. However, it was also shown that mining a corpus of 5000 full-text articles took 26 hours on a single computer, whilst the process was completed in less than 2.5 hours on a grid comprising of 20 computers. Thus whilst increasing the size of the corpus improved the efficiency of the text-mining process, a grid infrastructure was required to complete the task in a timely manner. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Natarajan, J., Mulay, N., DeSesa, C., Hack, C. J., Dubitzky, W., & Bremer, E. G. (2005). A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3745 LNBI, pp. 101–108). https://doi.org/10.1007/11573067_11

A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations

Abstract

Cite

Register to see more suggestions