Searching through scientific PDF files supported by bi-clustering of key terms matrices

Rafal Łancucki; Pawel Foszner; Andrzej Polanski

Conference Proceedings

Searching through scientific PDF files supported by bi-clustering of key terms matrices

Advances in Intelligent Systems and Computing (2018) 659 144-153

DOI: 10.1007/978-3-319-67792-7_15

0Citations

2Readers

Get full text

Abstract

We describe an original approach for exploring corpora of pdf format scientific texts in the area of bio-medical research, created over a wide topic of interest, e.g., cancer, thyroid cancer, biological process etc. Our methodology is based on indexing large lists of appropriate key-terms and additionally performing bi-clustering of term occurrence matrices. In our approach the position of phrase inside text (abstract or text) is not considered, but we include statistics based on occurrences frequency. We treat documents as a bags of words and the results are processed toward unique list of values. Bi-clustering is used to achieve separating character of lists of key-terms, characterizing sub-types of the studied category, e.g., different cancers or different sub-classes of a given cancer. We prove usefulness of the algorithm by searching for lists of genes characteristic for cancer types.

Author supplied keywords

Cite

CITATION STYLE

APA

Łancucki, R., Foszner, P., & Polanski, A. (2018). Searching through scientific PDF files supported by bi-clustering of key terms matrices. In Advances in Intelligent Systems and Computing (Vol. 659, pp. 144–153). Springer Verlag. https://doi.org/10.1007/978-3-319-67792-7_15

Searching through scientific PDF files supported by bi-clustering of key terms matrices

Abstract

Author supplied keywords

Cite

Register to see more suggestions