Sign up & Download
Sign in

Automated structure elucidation - the benefits of a symbiotic relationship between the spectroscopist and the expert system

by Mikhail E Elyashberg, Kirill A Blinov, Eduard R Martirosian, Sergey G Molodtsov, Antony J Williams, Gary E Martin
Journal of Heterocyclic Chemistry (2003)

Abstract

Characteristic features of a new expert system StrucEluc are described. The system is intended for the structure elucidation of complex organic molecules using a variety of spectroscopic data including 2D NMR. We review here the results of challenging this system with over 100 structure elucidation problems where the 2D NMR peak tables presented in original journal publications provided the input data. This contribution is focused on methods to overcome difficult situations that can arise when contradictions are present in the input data and/or when the structure is underdetermined as a result of insufficient 2D NMR correlations. Methods by which to address these situations are examined. It has been shown that synergy between the spectroscopist and the expert system allows the solution of problems that seemed to be hopeless at the outset of the structure elucidation process.

Cite this document (BETA)

Page 1
hidden

Automated structure elucidation - the benefits of a symbiotic relationship between the spectroscopist and the expert system

Automated Structure Elucidation – the Benefits of a Symbiotic
Relationship between the Spectroscopist and the Expert System
Mikhail E. Elyashberg, Kirill A. Blinov, Eduard R. Martirosian
Advanced Chemistry Development
6 Akademik Bakulev St, Moscow 117513, Russian Federation
Sergey G. Molodtsov
Novosibirsk Institute of Organic Chemistry
Siberian Branch of Russian Academy of Science
Lavrentiev Avenue 9, Novosibirsk 630090, Russian Federation
Antony J. Williams
Advanced Chemistry Development Inc.,
90 Adelaide Street West, Suite 600, Toronto, ON, M5H 3V9 Canada
Gary E. Martin
Michigan Pharmaceutical Sciences Structure Elucidation Group
Pfzier Global Research & Development
Pfizer Corp., Kalamazoo, Michigan 49001-0199, USA
Received May 16, 2003
Characteristic features of a new expert system StrucEluc are described. The system is intended for the
structure elucidation of complex organic molecules using a variety of spectroscopic data including 2D NMR.
We review here the results of challenging this system with over 100 structure elucidation problems where the
2D NMR peak tables presented in original journal publications provided the input data. This contribution is
focused on methods to overcome difficult situations that can arise when contradictions are present in the
input data and/or when the structure is underdetermined as a result of insufficient 2D NMR correlations.
Methods by which to address these situations are examined. It has been shown that synergy between the
spectroscopist and the expert system allows the solution of problems that seemed to be hopeless at the outset
of the structure elucidation process.
J. Heterocyclic Chem ., 40 , 1017 (2003).
Introduction.
Advances in both hardware for data acquisition and soft-
ware for data analysis have enabled structure elucidation.
Nevertheless the extraction of a chemical structure from a
collection of analytical data still remains a challenge for
analytical laboratories. Laboratories in the chemical and
pharmaceutical industry commonly isolate a large number
of compounds in any given year and many of these can be
regarded to be complex. To both simplify and speed up the
analysis process and hence the determination of the struc-
ture of interest, expert systems have been created that
mainly use NMR spectral data as their foundation. A series
of reports have been published in which expert systems
developed to aid the elucidation process are described (for
instance, [1-10]). In these systems 2D NMR data, pre-
sented in the form of connectivities between skeletal atoms
of the molecule, serve as restrictions for the structure gen-
eration process which proceeds from a given molecular
formula. A typical input data set generally is comprised of
both homonuclear
1
H-
1
H COSY and heteronuclear direct
(HMQC or HSQC), and long-range (HMBC or any of a
number of more recently developed experiments [11,12])
connectivities. Recently
15
N-
1
H HMBC correlations (see
reviews [13,14]) and a series of new 2D NMR techniques
are becoming more widely used. After the structure gener-
ation process produces a series of hypothetical structures
consistent with the atom-to-atom connectivity (homo- and
heteronuclear; direct, long-range, and through space)
information fed to the program, the most likely structures
can be identified on the basis of a comparison of the pre-
dicted
13
C chemical shifts of the candidate structures vs .
the observed
13
C chemical shifts for the molecule.
Our work in regards to the development of expert sys-
tems has shown that the best result is achieved when soft-
ware allows synergistic interaction between the skills and
insights of a spectroscopist and the unbiased nature of a
computer program. This means that qualified spectro-
scopists should be given the freedom to apply their experi-
ence and knowledge regarding the elucidation of a com-
pound under study in order to introduce additional con-
straints for the structures generated by the expert system.
The implementation of such possibilities allows a symbi-
otic relationship between the scientist and a computer, a
synergistic effect that is highly beneficial.
According to previous publications, expert systems
[1-10] have only allowed restricted application of a priori
Nov-Dec 2003 1017
Page 2
hidden
M. E. Elyashberg, K. A. Blinov, E. R. Martirosian, S. G. Molodtsov, A. J. Williams and G. E. Martin1018 Vol. 40
information. In particular this can prevent the introduction
of key fragments that can directly contribute to the struc-
tural hypothesis. Similarly, little attention has been paid to
the detection of contradictions in 2D NMR data and meth-
ods by which these can be resolved. It is known [15,17]
that the source of contradictions is, for example, the pres-
ence of cross peaks in COSY or HMBC spectra that corre-
spond to couplings over four or more bonds. Commonly
computer software applications are defaulted to correla-
tions over fewer bonds.
A number of different ways of eliminating the contradic-
tions have been proposed. In particular, searching for con-
tradictions by repeating the process of structure generation
and adding at each iterative step one correlation related to
a weak 2D NMR peak has been utilized during this work.
The authors of ref. [10] also apply this approach. If a par-
ticular correlation turns out to be of greater length than that
set as a default then the structure generator will produce no
structures after this correlation has been added and this
indicates the presence of contradictions. However, as will
be shown in the current work, the presence of contradic-
tions may not prevent structure generation, and false struc-
tures may be generated. In order to overcome this diffi-
culty the application of a stochastic generation algorithm
that requires the application of computer calculations on
parallel processors has been proposed [8]. Our experience
shows that the number of connectivities characterized by
spin-spin couplings over four bonds or more can be over
ten couplings in one set of 2D NMR data [10]. This makes
the resolution of these contradictions using the methods
discussed problematic. Our experience has also shown that
frequently important information regarding the structure
of an unknown substance can be derived from a sub-struc-
ture database and related
13
C NMR sub-spectra when used
in combination with 2D NMR data.
The drawbacks highlighted in the preceding discussion
have mostly been overcome in the StrucEluc system [15-
17]. During development, the capabilities of the StrucEluc
program package have been challenged and evaluated by
using the published 2D NMR data for more than one hun-
dred natural products whose structures were elucidated.
Data were fed to the system directly from the published 2D
NMR peak tables. In the process, we have demonstrated
that the system is very capable of automated structure elu-
cidation. The work described in this report illustrates
strategies for determining the structures of natural prod-
ucts in a number of challenging situations, when the solu-
tion of the problem by the " common " operation mode of
StrucEluc fails. For several of examples shown, the inter-
action of a qualified spectroscopist with the software sys-
tem, which incorporates a capable knowledge base and
diverse means of correct structure identification, leads to
the successful determination of structures that initially
seemed doomed to failure.
Materials and Methods.
The StrucEluc system has been described in detail previ-
ously [15-17]. Here we shall provide only a brief overview
of its unique capabilities. Relative to other systems
designed to elucidate structures based on 2D NMR spec-
tral inputs, StrucEluc is equipped with a knowledge base
(KB) and three structure generators based on different
mathematical algorithms.
The KB consists of three components: 1) a library of
about 200,000 molecular structures and their assigned
13
C
NMR spectra; 2) a fragment library (FL) containing about
1,000,000 fragments with corresponding
13
C NMR sub-
spectral assignments created using proprietary algorithms
from the full structures stored in the KB; 3) a Library of
Spectrum-to-Structure Correlations (LSC) comprising the
most common functional groups and their characteristics
in both NMR and IR spectra.
The StrucEluc system is able to use both the 2D NMR
data and the fragment library during the elucidation
process. In those cases where the number of available 2D
NMR correlations is insufficient to impose effective
restrictions during the structure generation process (in
this case the number of possible structures can be
extremely large and the generation time will be unaccept-
able), the system searches for appropriate fragments in
the library in accordance with their associated sub-spec-
tra. Found fragments (FF) meeting the restrictions arising
from the 2D NMR spectra, are retained. Acceptable com-
binations of good fragments are "projected" onto the set
of all atoms within the molecular formula. As a result, the
program builds and displays one or more molecular con-
nectivity diagrams (MCD), on which fragments, atoms,
and connectivities of different lengths are graphically
represented. Using a correlation table, the program auto-
matically establishes the carbon atom properties, these
being the atom hybridizations and proximity to hetero-
atoms (the number and type of heteroatoms are speci-
fied). At this stage a qualified specialist is given an
opportunity to analyze the MCD and make appropriate
revisions including specifying particular atom properties,
change the lengths of certain connectivities, draw specific
chemical bonds, for example explicitly designating C=O,
-C ≡ N, etc .). Subsequently process chemists could also
choose to introduce fragments that in their opinion should
be present in a molecule (so-called user fragments, UF).
In this case the program is adjusted so that both the users
and found fragments are used for creation of the next gen-
eration of MCDs.
Chemists frequently try to use the assigned spectra of
structures related to the molecule being studied to aid in
the structural determination and assignment of the NMR
spectra of new compounds. In many cases this approach
can be very successful. In order to implement this method
within the StrucEluc system, algorithms enabling auto-

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

3 Readers on Mendeley
by Discipline
 
by Academic Status
 
67% Other Professional
 
33% Researcher (at a non-Academic Institution)
by Country
 
67% United States