We demonstrate the application of a grid infrastructure for conducting text mining over distributed data and computational resources. The approach is based on using LexiQuest Mine, a text mining workbench, in a grid computing environment. We describe our architecture and approach and provide an illustrative example of mining full-text journal articles to create a knowledge base of gene relations. The number of patterns found increased from 0.74 per full-text articles from a corpus of 1000 articles to 0.83 when the corpus contained 5000 articles. However, it was also shown that mining a corpus of 5000 full-text articles took 26 hours on a single computer, whilst the process was completed in less than 2.5 hours on a grid comprising of 20 computers. Thus whilst increasing the size of the corpus improved the efficiency of the text-mining process, a grid infrastructure was required to complete the task in a timely manner. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Natarajan, J., Mulay, N., DeSesa, C., Hack, C. J., Dubitzky, W., & Bremer, E. G. (2005). A grid infrastructure for text mining of full text articles and creation of a knowledge base of gene relations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3745 LNBI, pp. 101–108). https://doi.org/10.1007/11573067_11
Mendeley helps you to discover research relevant for your work.