Text document search typically retrieves documents by performing an exact match based on keywords. In all domains the exact match may not yield good performance as the morpheme or structure of the words has not been considered for the search. This problem becomes significant in the research field of chemistry, where the user could search using a keyword and the document could contain the keyword as a part of the chemical name. For example, the chemical name pentanone contains ketone functional group in it, which can be found by doing a morphemic analysis with the help of chemical nomenclature. Each of the chemical names contains a lot of information about the chemical compound for which it is being named. Hence, the chemical names in the document need to be tagged with all its possible meaningful morphemes to have efficient performance. A multi-perspective and domain specific tagging system was designed based on the available chemical nomenclature, considering the type of bond, number of carbon atoms and the functional group of the chemical entity. The tagging system begins with extraction of the chemical names in the document based on morphological and domain specific features. Based on these features and the contextual knowledge, models were created by designing a linear-chain conditional random field of order two, and they serve as a baseline for the chemical entity extraction process. A morphemic or structural analysis of the extracted named entity was done for the multi-perspective tagging system.
Deepika, S. S., Geetha, T. V., & Sridhar, R. (2018). Multi-perspective and Domain Specific Tagging of Chemical Documents. In Communications in Computer and Information Science (Vol. 804, pp. 72–85). Springer Verlag. https://doi.org/10.1007/978-981-10-8603-8_7