Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Fidan Musazade; Narmin Jamalova; Jamaladdin Hasanov

ArticleOPEN ACCESS

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Journal of Cheminformatics

DOI: 10.1186/s13321-022-00642-3

7Citations

15Readers

Abstract

Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development.

Author supplied keywords

Cite

CITATION STYLE

APA

Musazade, F., Jamalova, N., & Hasanov, J. (2022, December 1). Review of techniques and models used in optical chemical structure recognition in images and scanned documents. Journal of Cheminformatics. BioMed Central Ltd. https://doi.org/10.1186/s13321-022-00642-3

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions