Review of techniques and models used in optical chemical structure recognition in images and scanned documents

7Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development.

Cite

CITATION STYLE

APA

Musazade, F., Jamalova, N., & Hasanov, J. (2022, December 1). Review of techniques and models used in optical chemical structure recognition in images and scanned documents. Journal of Cheminformatics. BioMed Central Ltd. https://doi.org/10.1186/s13321-022-00642-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free