Protein Word Detection using Text Segmentation Techniques

G. Devi; Ashish V. Tendulkar; Sutanu Chakraborti

Conference ProceedingsOPEN ACCESS

Protein Word Detection using Text Segmentation Techniques

BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop (2017) 238-246

DOI: 10.24073/jga/6/01/06

3Citations

71Readers

Abstract

Literature in Molecular Biology is abundant with linguistic metaphors. There have been works in the past that attempt to draw parallels between linguistics and biology, driven by the fundamental premise that proteins have a language of their own. Since word detection is crucial to the decipherment of any unknown language, we attempt to establish a problem mapping from natural language text to protein sequences at the level of words. Towards this end, we explore the use of an unsupervised text segmentation algorithm to the task of extracting”biological words” from protein sequences. In particular, we demonstrate the effectiveness of using domain knowledge to complement data driven approaches in the text segmentation task, as well as in its biological counterpart. We also propose a novel extrinsic evaluation measure for protein words through protein family classification.

Cite

CITATION STYLE

APA

Devi, G., Tendulkar, A. V., & Chakraborti, S. (2017). Protein Word Detection using Text Segmentation Techniques. In BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop (pp. 238–246). Association for Computational Linguistics (ACL). https://doi.org/10.24073/jga/6/01/06

Protein Word Detection using Text Segmentation Techniques

Abstract

Cite

Register to see more suggestions