Searching through scientific PDF files supported by bi-clustering of key terms matrices

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe an original approach for exploring corpora of pdf format scientific texts in the area of bio-medical research, created over a wide topic of interest, e.g., cancer, thyroid cancer, biological process etc. Our methodology is based on indexing large lists of appropriate key-terms and additionally performing bi-clustering of term occurrence matrices. In our approach the position of phrase inside text (abstract or text) is not considered, but we include statistics based on occurrences frequency. We treat documents as a bags of words and the results are processed toward unique list of values. Bi-clustering is used to achieve separating character of lists of key-terms, characterizing sub-types of the studied category, e.g., different cancers or different sub-classes of a given cancer. We prove usefulness of the algorithm by searching for lists of genes characteristic for cancer types.

Author supplied keywords

Cite

CITATION STYLE

APA

Łancucki, R., Foszner, P., & Polanski, A. (2018). Searching through scientific PDF files supported by bi-clustering of key terms matrices. In Advances in Intelligent Systems and Computing (Vol. 659, pp. 144–153). Springer Verlag. https://doi.org/10.1007/978-3-319-67792-7_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free