An expert system for automated structure elucidation utilizing 1H-1H, 13C-1H and 15N-1H 2D NMR correlations.
Fresenius Journal Of Analytical Chemistry (2001)
- PubMed: 11371077
Available from
Kirill Blinov's profile on Mendeley.
or
Abstract
A software program for the automated structure elucidation of complex organic molecules using an expert system and utilizing 2D homo- and heteronuclear correlation 1H, 13C and 15N NMR spectroscopy is described. The methodology is illustrated on the basis of the automated structure determination of strychnine and some other examples.
Available from
Kirill Blinov's profile on Mendeley.
Page 1
An expert system for automated structure elucidation utilizing 1H-1H, 13C-1H and 15N-1H 2D NMR correlations.
Abstract A software program for the automated struc-
ture elucidation of complex organic molecules using an
expert system and utilizing 2D homo- and heteronuclear
correlation 1H, 13C and 15N NMR spectroscopy is de-
scribed. The methodology is illustrated on the basis of the
automated structure determination of strychnine and some
other examples.
Introduction
Computerized approaches to molecular structure elucida-
tion from 2D NMR spectra continue to attract the growing
interest of spectroscopists. This is primarily driven by the
need for reliable identification of natural products as well
as the demands for improved timescales associated with
the elucidation process. Such approaches are supported by
the continued progress of new 2D pulse sequences and
NMR probe technologies where, along with 1H-1H and
13C-1H correlations, the connectivities between 1H and
15N nuclei have started to enter the mainstream (see re-
view [1]).
In the nineties, several expert systems for the molecu-
lar structure elucidation from the 2D NMR data were de-
veloped (for example, SESAMI [2,3], CISOC-SES [4],
LUCY [5], LSD [6], CHEMICS [7], COCON [8]). The
principles which these systems are based on were dis-
cussed in the review [9]. The results obtained seem to be
rather promising however further researches should be
done to turn the 2D NMR systems into a real tool for the
molecular structure elucidation.
In our previous work [10] a Structure Elucidator sys-
tem that mainly utilizes 1D 13C NMR spectra for molecu-
lar structure determination has been described. The
knowledge base of the system comprises 135,000 molec-
ular structures with assigned 13C NMR spectra, about 500,
000 fragments and corresponding 13C NMR sub-spectra,
as well as libraries for spectrum-structure correlations in-
ferred for 13C NMR, 1H NMR and IR spectra. In this ap-
proach, firstly the knowledge base is searched for frag-
ments with matching subspectra then the system tries to
build structures from the found fragments with common
atoms. If the program fails to build a structure, sets of
non-overlapped fragments are formed and all the struc-
tures fitting the spectral data and a priori information are
generated. The program has been demonstrated to deter-
mine structures for many molecules up to medium size
(up to 20–25 heavy atoms). The chemistry of natural
products generally deals with large molecules containing
20–60 or more skeletal atoms and this approach has com-
monly failed in our hands as a result. Taking this into
account, we have recently enhanced the system by incor-
porating a module, which allows us the use of 1H-1H,
13C-1H and 15N-1H 2D NMR experiments in combination
with the knowledge base for the structure elucidation of
larger molecules such as those experienced for natural
products.
The present work is devoted to the description of the
2D module of the Structure Elucidator. We will expound
the main stages of the 2D module operation using the
strychnine molecule as an example of an “unknown”
compound.
Inputting and pre-processing experimental data
We used the following data for elucidation of the structure
of strychnine: the molecular weight of the “unknown”
K. A. Blinov · M. E. Elyashberg · S. G. Molodtsov ·
A. J. Williams · E. R. Martirosian
An expert system for automated structure elucidation
utilizing 1H-1H, 13C-1H and 15N-1H 2D NMR correlations
Fresenius J Anal Chem (2001) 369 :709–714 © Springer-Verlag 2001
Received: 27 September 2000 / Revised: 29 January 2001 / Accepted: 2 February 2001
SPECIAL ISSUE PAPER
K. A. Blinov · A. J. Williams
Advanced Chemistry Development Inc.,
90 Adelaide Street West, Suite 702,
Toronto, ON, M5H 3V9 Canada
M. E. Elyashberg () · E. R. Martirosian
All-Russian Research Institute of Organic Synthesis,
12 Radio Street, Moscow 107005, Russia
e-mail: elyas@molspec.msk.ru
S. G. Molodtsov
Novosibirsk Institute of Organic Chemistry,
Siberian Branch of Russian Academy of Science,
Lavrentiev Avenue 9, Novosibirsk 630090, Russia
ture elucidation of complex organic molecules using an
expert system and utilizing 2D homo- and heteronuclear
correlation 1H, 13C and 15N NMR spectroscopy is de-
scribed. The methodology is illustrated on the basis of the
automated structure determination of strychnine and some
other examples.
Introduction
Computerized approaches to molecular structure elucida-
tion from 2D NMR spectra continue to attract the growing
interest of spectroscopists. This is primarily driven by the
need for reliable identification of natural products as well
as the demands for improved timescales associated with
the elucidation process. Such approaches are supported by
the continued progress of new 2D pulse sequences and
NMR probe technologies where, along with 1H-1H and
13C-1H correlations, the connectivities between 1H and
15N nuclei have started to enter the mainstream (see re-
view [1]).
In the nineties, several expert systems for the molecu-
lar structure elucidation from the 2D NMR data were de-
veloped (for example, SESAMI [2,3], CISOC-SES [4],
LUCY [5], LSD [6], CHEMICS [7], COCON [8]). The
principles which these systems are based on were dis-
cussed in the review [9]. The results obtained seem to be
rather promising however further researches should be
done to turn the 2D NMR systems into a real tool for the
molecular structure elucidation.
In our previous work [10] a Structure Elucidator sys-
tem that mainly utilizes 1D 13C NMR spectra for molecu-
lar structure determination has been described. The
knowledge base of the system comprises 135,000 molec-
ular structures with assigned 13C NMR spectra, about 500,
000 fragments and corresponding 13C NMR sub-spectra,
as well as libraries for spectrum-structure correlations in-
ferred for 13C NMR, 1H NMR and IR spectra. In this ap-
proach, firstly the knowledge base is searched for frag-
ments with matching subspectra then the system tries to
build structures from the found fragments with common
atoms. If the program fails to build a structure, sets of
non-overlapped fragments are formed and all the struc-
tures fitting the spectral data and a priori information are
generated. The program has been demonstrated to deter-
mine structures for many molecules up to medium size
(up to 20–25 heavy atoms). The chemistry of natural
products generally deals with large molecules containing
20–60 or more skeletal atoms and this approach has com-
monly failed in our hands as a result. Taking this into
account, we have recently enhanced the system by incor-
porating a module, which allows us the use of 1H-1H,
13C-1H and 15N-1H 2D NMR experiments in combination
with the knowledge base for the structure elucidation of
larger molecules such as those experienced for natural
products.
The present work is devoted to the description of the
2D module of the Structure Elucidator. We will expound
the main stages of the 2D module operation using the
strychnine molecule as an example of an “unknown”
compound.
Inputting and pre-processing experimental data
We used the following data for elucidation of the structure
of strychnine: the molecular weight of the “unknown”
K. A. Blinov · M. E. Elyashberg · S. G. Molodtsov ·
A. J. Williams · E. R. Martirosian
An expert system for automated structure elucidation
utilizing 1H-1H, 13C-1H and 15N-1H 2D NMR correlations
Fresenius J Anal Chem (2001) 369 :709–714 © Springer-Verlag 2001
Received: 27 September 2000 / Revised: 29 January 2001 / Accepted: 2 February 2001
SPECIAL ISSUE PAPER
K. A. Blinov · A. J. Williams
Advanced Chemistry Development Inc.,
90 Adelaide Street West, Suite 702,
Toronto, ON, M5H 3V9 Canada
M. E. Elyashberg () · E. R. Martirosian
All-Russian Research Institute of Organic Synthesis,
12 Radio Street, Moscow 107005, Russia
e-mail: elyas@molspec.msk.ru
S. G. Molodtsov
Novosibirsk Institute of Organic Chemistry,
Siberian Branch of Russian Academy of Science,
Lavrentiev Avenue 9, Novosibirsk 630090, Russia
Page 2
(334 a.u.), the 13C NMR DEPT45°, DEPT135° and 1H
NMR spectra, and 2D data in the form of 1H-1H COSY,
1H-1H TOCSY, 13C-1H HMBC and 13C-1H HMQC exper-
iments. These data were complemented with the 15N and
1H-15N HMBC NMR spectra [11]. The empirical formula
(C21H22N2O2) was determined from the molecular mass
using system tools and using the combined 13C and 15N
NMR data and the approach described in ref. [12]. The
primary processing of the experimental data was carried
out by means of the ACD/SpecManager software module.
This program imports spectral parameters online into ta-
bles where chemical shifts, multiplicities (if determined)
and intensities of signals are shown for 1D NMR spectra,
and the chemical shifts of coupled nuclei and peak inten-
sities are given for 2D data. Tables can also be formed and
edited manually. Furthermore, the tables of 2D data are
transformed into tables of C-C and C-N connectivities
with the following distances between nuclei coupled in
the 2D NMR spectra set as default: 1H-1H COSY (2–3 for
strong peaks and 3–4 for weak ones), 1H-1H TOCSY
(2–6), HMBC and COLOC (1–3).
It worth noting that long-range TOCSY correlations
are complementary in the system. They are used together
with other correlations as constraints imposed on the
structures being generated. It should be taken into account
that sometimes TOCSY cross-peaks corresponding to
coupling of spins, distance between which is equal to
eight bonds in the molecule, may be observed. In these
rare cases the data will contain contradictions, and, conse-
quently, either the program will refuse to generate struc-
tures, or the correct structure will not be generated. There-
fore, if there exists a possibility of very distant couplings,
it is reasonable to set TOCSY connectivities length equal
to 2–8 bonds.
The connectivity tables can be refined as necessary us-
ing user interpretation of the 2D peak intensities. Exam-
ples of tables containing 13C-1H HMBC data and corre-
sponding connectivities are given in Fig.1 and Fig.2. In
total, 210 connectivities were formed from the supplied
2D data.
Setting the task for the structure generator
The problem can be processed further either automatically
or under user supervision. The full picture of skeletal
atom properties and connectivities can be viewed. The
program arranges quaternary C, CH, CH2 and CH3 groups
as well as free hydrogen atoms, if they exist, and het-
eroatoms in the display window as shown in Fig. 3. If the
user prefers, connectivities existing between atoms can be
visualized, with multiple bond connectivities displayed in
different colors. In addition, the chemical shifts of C, N, H
atoms and their numbers can be displayed. At this stage
the connectivity distances can be edited and new bonds
can be input by the user on the basis of prior information.
One of the possible sources of additional information
is the automatic formation of GOODLIST and BADLIST
[10]. In this case, 1653 fragments were chosen from the
knowledge base using the 13C NMR spectrum and it
turned out that 822 fragments (~ 50%) contain a carbonyl
group. As a result a double bond was drawn between the
carbon atom at 169.66 ppm and an oxygen atom. In the
general case, if a molecule contains heteroatoms of only
one type and there are free H atoms, O-H, N-H, NH2, etc.,
bonds may also be drawn.
We have found that if the states of atom hybridization
and possibility of neighboring heteroatoms is taken into
account, the structure generation process speeds up dra-
matically and the number of possible structures is re-
duced. With this in mind, we automatically formed a 13C
NMR correlation table from the system knowledge base.
The table contains carbon atom-centered fragments with
corresponding intervals of the chemical shift variation for
the central carbon atom. The program uses this table for
the automatic assignment of the hybridization (sp3, sp2,
sp, not defined) to all carbons and for assessing the possi-
bility of their neighboring heteroatoms (forbidden, oblig-
atory, not defined). The mark “not defined” is assigned to
a parameter if several conceivable possibilities are equally
probable. The properties assigned can be refined by the
710
Fig.1 An example of a table containing 13C-1H HMBC data
Fig.2 An example of a table containing carbon-carbon connectiv-
ities derived from the 13C-1H HMBC data
NMR spectra, and 2D data in the form of 1H-1H COSY,
1H-1H TOCSY, 13C-1H HMBC and 13C-1H HMQC exper-
iments. These data were complemented with the 15N and
1H-15N HMBC NMR spectra [11]. The empirical formula
(C21H22N2O2) was determined from the molecular mass
using system tools and using the combined 13C and 15N
NMR data and the approach described in ref. [12]. The
primary processing of the experimental data was carried
out by means of the ACD/SpecManager software module.
This program imports spectral parameters online into ta-
bles where chemical shifts, multiplicities (if determined)
and intensities of signals are shown for 1D NMR spectra,
and the chemical shifts of coupled nuclei and peak inten-
sities are given for 2D data. Tables can also be formed and
edited manually. Furthermore, the tables of 2D data are
transformed into tables of C-C and C-N connectivities
with the following distances between nuclei coupled in
the 2D NMR spectra set as default: 1H-1H COSY (2–3 for
strong peaks and 3–4 for weak ones), 1H-1H TOCSY
(2–6), HMBC and COLOC (1–3).
It worth noting that long-range TOCSY correlations
are complementary in the system. They are used together
with other correlations as constraints imposed on the
structures being generated. It should be taken into account
that sometimes TOCSY cross-peaks corresponding to
coupling of spins, distance between which is equal to
eight bonds in the molecule, may be observed. In these
rare cases the data will contain contradictions, and, conse-
quently, either the program will refuse to generate struc-
tures, or the correct structure will not be generated. There-
fore, if there exists a possibility of very distant couplings,
it is reasonable to set TOCSY connectivities length equal
to 2–8 bonds.
The connectivity tables can be refined as necessary us-
ing user interpretation of the 2D peak intensities. Exam-
ples of tables containing 13C-1H HMBC data and corre-
sponding connectivities are given in Fig.1 and Fig.2. In
total, 210 connectivities were formed from the supplied
2D data.
Setting the task for the structure generator
The problem can be processed further either automatically
or under user supervision. The full picture of skeletal
atom properties and connectivities can be viewed. The
program arranges quaternary C, CH, CH2 and CH3 groups
as well as free hydrogen atoms, if they exist, and het-
eroatoms in the display window as shown in Fig. 3. If the
user prefers, connectivities existing between atoms can be
visualized, with multiple bond connectivities displayed in
different colors. In addition, the chemical shifts of C, N, H
atoms and their numbers can be displayed. At this stage
the connectivity distances can be edited and new bonds
can be input by the user on the basis of prior information.
One of the possible sources of additional information
is the automatic formation of GOODLIST and BADLIST
[10]. In this case, 1653 fragments were chosen from the
knowledge base using the 13C NMR spectrum and it
turned out that 822 fragments (~ 50%) contain a carbonyl
group. As a result a double bond was drawn between the
carbon atom at 169.66 ppm and an oxygen atom. In the
general case, if a molecule contains heteroatoms of only
one type and there are free H atoms, O-H, N-H, NH2, etc.,
bonds may also be drawn.
We have found that if the states of atom hybridization
and possibility of neighboring heteroatoms is taken into
account, the structure generation process speeds up dra-
matically and the number of possible structures is re-
duced. With this in mind, we automatically formed a 13C
NMR correlation table from the system knowledge base.
The table contains carbon atom-centered fragments with
corresponding intervals of the chemical shift variation for
the central carbon atom. The program uses this table for
the automatic assignment of the hybridization (sp3, sp2,
sp, not defined) to all carbons and for assessing the possi-
bility of their neighboring heteroatoms (forbidden, oblig-
atory, not defined). The mark “not defined” is assigned to
a parameter if several conceivable possibilities are equally
probable. The properties assigned can be refined by the
710
Fig.1 An example of a table containing 13C-1H HMBC data
Fig.2 An example of a table containing carbon-carbon connectiv-
ities derived from the 13C-1H HMBC data
Page 3
711
chemist who can take into account the δH values and a
priori information. For instance, if three-membered cy-
cles are forbidden, a CH2 group with δC = 43.27 ppm and
δH = 1.88 ppm cannot neighbor with the heteroatoms O
or N, but the chemical shifts δC = 78.01 ppm and δH =
4.29 ppm of a CH group allow us to assign the parameters
sp3 and “neighboring with heteroatom is obligatory” to
the carbon C (78.01).
It is worth noting that the final results are highly de-
pendable on the assignment of atom properties, since the
wrong assignment will obviously lead to the incorrect
structure. It is common for spectroscopists to have prior
information to aid in the assignment of certain atom prior-
ities within a spectrum.
Examining 2D connectivities for consistency
If all possible connectivities emerge from the available
2D NMR spectra and the number of bonds between cou-
pled nuclei can be determined accurately, we can expect
the output file to contain the correct structure. However,
the initial data becomes contradictory if erroneous corre-
lations occur in the connectivity tables. The main cause of
such contradictions is the fact that the distances between
coupled nuclei can be set as being smaller than they are in
reality. For example, if in reality the distance between two
coupled nuclei is equal to 4 bonds, but it is specified as
2–3 in the table, the final structure will be absent in the re-
sults file.
Prior to starting structure generation the program car-
ries out an automatic check of the connectivity tables for
the presence of contradictions. The program verifies the
number of skeletal atoms that may occur in a sphere of
2 bonds radius around each carbon atom with the ascribed
hybridization state being taken into account. If the num-
ber of carbon atoms in the mentioned sphere exceeds the
specific threshold value for a given central carbon atom,
the program displays a message that explains the cause of
the contradiction.
Verification described is only a part of verification be-
ing performed. When applying complete verification of
data, the program finds vertex Xi (Xi may be C or N atom)
or some correlations, which contradict already “estab-
lished” data on bonds in the structure. It is assumed that
vertex has a contradiction, if it is not possible to deter-
mine all bonds of this vertex Xi with other vertexes so,
that they will not contradict 2D connectivities, related to
Xi and neighboring vertexes.
When detecting a contradiction in vertex Xi, the pro-
gram makes lengthening of all 2D connectivities, related
to Xi. However, this method does not assure elimination of
contradictions, since it may be turned out that contradic-
tions were present in 2D connectivities, related to neigh-
boring vertexes, as well. Therefore, regardless the length-
ening of connectivities, related to vertex Xi, contradictions
in 2D data may be left.
When analyzing contradictions, 2D connectivities,
length of which is within 1–2 C-C bonds, are mainly ver-
ified. Therefore, if contradictory 2D connectivities of big-
ger length are present, this fact, as it was mentioned be-
fore, can be detected only in the process of structure gen-
eration.
Depending on the types of 2D experiments employed,
various options controlling the verification severity can be
set. These options match options of the structure genera-
tor, with check severity depending on assumption of prop-
erties of connectivities, obtained from 2D spectra. The
following four cases are considered:
1) For the given carbon atom, bonds only with those car-
bon atoms, that are present together with it in the
found correlations, are allowed.
Fig.3 Display of structural unites and
connectivities
chemist who can take into account the δH values and a
priori information. For instance, if three-membered cy-
cles are forbidden, a CH2 group with δC = 43.27 ppm and
δH = 1.88 ppm cannot neighbor with the heteroatoms O
or N, but the chemical shifts δC = 78.01 ppm and δH =
4.29 ppm of a CH group allow us to assign the parameters
sp3 and “neighboring with heteroatom is obligatory” to
the carbon C (78.01).
It is worth noting that the final results are highly de-
pendable on the assignment of atom properties, since the
wrong assignment will obviously lead to the incorrect
structure. It is common for spectroscopists to have prior
information to aid in the assignment of certain atom prior-
ities within a spectrum.
Examining 2D connectivities for consistency
If all possible connectivities emerge from the available
2D NMR spectra and the number of bonds between cou-
pled nuclei can be determined accurately, we can expect
the output file to contain the correct structure. However,
the initial data becomes contradictory if erroneous corre-
lations occur in the connectivity tables. The main cause of
such contradictions is the fact that the distances between
coupled nuclei can be set as being smaller than they are in
reality. For example, if in reality the distance between two
coupled nuclei is equal to 4 bonds, but it is specified as
2–3 in the table, the final structure will be absent in the re-
sults file.
Prior to starting structure generation the program car-
ries out an automatic check of the connectivity tables for
the presence of contradictions. The program verifies the
number of skeletal atoms that may occur in a sphere of
2 bonds radius around each carbon atom with the ascribed
hybridization state being taken into account. If the num-
ber of carbon atoms in the mentioned sphere exceeds the
specific threshold value for a given central carbon atom,
the program displays a message that explains the cause of
the contradiction.
Verification described is only a part of verification be-
ing performed. When applying complete verification of
data, the program finds vertex Xi (Xi may be C or N atom)
or some correlations, which contradict already “estab-
lished” data on bonds in the structure. It is assumed that
vertex has a contradiction, if it is not possible to deter-
mine all bonds of this vertex Xi with other vertexes so,
that they will not contradict 2D connectivities, related to
Xi and neighboring vertexes.
When detecting a contradiction in vertex Xi, the pro-
gram makes lengthening of all 2D connectivities, related
to Xi. However, this method does not assure elimination of
contradictions, since it may be turned out that contradic-
tions were present in 2D connectivities, related to neigh-
boring vertexes, as well. Therefore, regardless the length-
ening of connectivities, related to vertex Xi, contradictions
in 2D data may be left.
When analyzing contradictions, 2D connectivities,
length of which is within 1–2 C-C bonds, are mainly ver-
ified. Therefore, if contradictory 2D connectivities of big-
ger length are present, this fact, as it was mentioned be-
fore, can be detected only in the process of structure gen-
eration.
Depending on the types of 2D experiments employed,
various options controlling the verification severity can be
set. These options match options of the structure genera-
tor, with check severity depending on assumption of prop-
erties of connectivities, obtained from 2D spectra. The
following four cases are considered:
1) For the given carbon atom, bonds only with those car-
bon atoms, that are present together with it in the
found correlations, are allowed.
Fig.3 Display of structural unites and
connectivities
Page 4
2) It is allowed that all bonds between CH, CH2, CH3
groups have been reflected in 2D spectra (this is com-
mon for COSY data and in higher degree for TOCSY).
3) It is allowed that all bonds of CH, CH2, CH3 groups
with other atoms (including quaternary) have been re-
flected in 2D spectra (this is common for HMBC
data).
4) All bonds between carbon atoms are allowed.
The more severe is the check the connectivities under-
gone, the quicker will be structure generation. The speed
up of generation is achieved by prohibition of the bonds
between certain atoms in accordance with the assumption
being verified. In particular, when implementing Case 2,
use of TOCSY correlations might be rather effective.
In general, the strategy of the contradiction elimination
is to establish the balance between all the distances in-
volved in the task. When the program displays a message
noting the absence of contradictions, the structure genera-
tion process may be initiated.
Structure generation and verification
For structure generation we have utilized a “classic” gen-
erator [10] whose capabilities were enhanced to provide
its application to data presented as 2D connectivities. The
control of structure generation is realized by using options
analogous to those described previously in Ref. [10]. The
options comprise constraints on ring cycle sizes and bond
multiplicities, permission or prohibition of the formation
of bonds between heteroatoms and so on. The only con-
straint which was set for solving the problem outlined
here was the requirement that ring cycle sizes be between
5 and 7 atoms. The structure verification is performed by
employing a series of filters, such as libraries of spec-
trum-structure correlations, GOODLIST, BADLIST, li-
braries of fragments which are unlikely in organic chem-
istry and Bredt’s rule. All these options are used to reduce
processing time, to save disk space and to aid in the de-
tection of identical structures and their subsequent re-
moval. As a result of structure generation and verification,
5 isomers including strychnine were generated in a time
of 1 min 10 s (PC Pentium 500 MHz was used). To select
the most probable structure, the ACD\CNMR Predictor
software program was used. The 13C NMR chemical shifts
of all structures were calculated and the structures were
rated by the average deviation (d) of the calculated spec-
trum from the experimental one. The assignment of esti-
mated shifts to the observed ones was performed automat-
ically. The structures ranked in order of increasing aver-
age deviations are presented in Fig.4 which shows that
the strychnine molecule is the most probable one (its de-
viation is the smallest). To make the structural displays of
these complicated polycyclic compounds more intelligi-
ble, we used the spatial structure optimization (the corre-
sponding program is an integral part of the Structure Elu-
cidator). It should be noted that reliability of the structure
ranking by d values depends on the method utilized for
the 13C NMR spectrum estimation, therefore additional
confirmations of a preferable structure are desired. The
confirmation can be obtained by the prediction of 1H
NMR spectra and some molecular properties such as
logP, solubility, etc., using other programs incorporated
into the system. It should be mentioned that when all 15N
NMR data were ignored, the number of possible struc-
tures increased up to 9 and the processing time was 3 times
greater. This fact testifies that the application of 1H-15N
2D correlation is indeed quite effective in providing addi-
tional filtering to aid in automated elucidation
We have further tested the 2D module of our Structure
Elucidator using 2D NMR data published in a series of
works devoted to natural product structure elucidation. In
so doing we manually introduced the connectivities pre-
sented in the tables which were formed by authors of cor-
responding original articles. Some examples are presented
in Fig.5 (the data from [2–3, 5, 13–14] were employed).
712
Fig.4 Candidate structures ranked in order of increasing d values
groups have been reflected in 2D spectra (this is com-
mon for COSY data and in higher degree for TOCSY).
3) It is allowed that all bonds of CH, CH2, CH3 groups
with other atoms (including quaternary) have been re-
flected in 2D spectra (this is common for HMBC
data).
4) All bonds between carbon atoms are allowed.
The more severe is the check the connectivities under-
gone, the quicker will be structure generation. The speed
up of generation is achieved by prohibition of the bonds
between certain atoms in accordance with the assumption
being verified. In particular, when implementing Case 2,
use of TOCSY correlations might be rather effective.
In general, the strategy of the contradiction elimination
is to establish the balance between all the distances in-
volved in the task. When the program displays a message
noting the absence of contradictions, the structure genera-
tion process may be initiated.
Structure generation and verification
For structure generation we have utilized a “classic” gen-
erator [10] whose capabilities were enhanced to provide
its application to data presented as 2D connectivities. The
control of structure generation is realized by using options
analogous to those described previously in Ref. [10]. The
options comprise constraints on ring cycle sizes and bond
multiplicities, permission or prohibition of the formation
of bonds between heteroatoms and so on. The only con-
straint which was set for solving the problem outlined
here was the requirement that ring cycle sizes be between
5 and 7 atoms. The structure verification is performed by
employing a series of filters, such as libraries of spec-
trum-structure correlations, GOODLIST, BADLIST, li-
braries of fragments which are unlikely in organic chem-
istry and Bredt’s rule. All these options are used to reduce
processing time, to save disk space and to aid in the de-
tection of identical structures and their subsequent re-
moval. As a result of structure generation and verification,
5 isomers including strychnine were generated in a time
of 1 min 10 s (PC Pentium 500 MHz was used). To select
the most probable structure, the ACD\CNMR Predictor
software program was used. The 13C NMR chemical shifts
of all structures were calculated and the structures were
rated by the average deviation (d) of the calculated spec-
trum from the experimental one. The assignment of esti-
mated shifts to the observed ones was performed automat-
ically. The structures ranked in order of increasing aver-
age deviations are presented in Fig.4 which shows that
the strychnine molecule is the most probable one (its de-
viation is the smallest). To make the structural displays of
these complicated polycyclic compounds more intelligi-
ble, we used the spatial structure optimization (the corre-
sponding program is an integral part of the Structure Elu-
cidator). It should be noted that reliability of the structure
ranking by d values depends on the method utilized for
the 13C NMR spectrum estimation, therefore additional
confirmations of a preferable structure are desired. The
confirmation can be obtained by the prediction of 1H
NMR spectra and some molecular properties such as
logP, solubility, etc., using other programs incorporated
into the system. It should be mentioned that when all 15N
NMR data were ignored, the number of possible struc-
tures increased up to 9 and the processing time was 3 times
greater. This fact testifies that the application of 1H-15N
2D correlation is indeed quite effective in providing addi-
tional filtering to aid in automated elucidation
We have further tested the 2D module of our Structure
Elucidator using 2D NMR data published in a series of
works devoted to natural product structure elucidation. In
so doing we manually introduced the connectivities pre-
sented in the tables which were formed by authors of cor-
responding original articles. Some examples are presented
in Fig.5 (the data from [2–3, 5, 13–14] were employed).
712
Fig.4 Candidate structures ranked in order of increasing d values
Page 5
The table shows that the approach developed allows one
to elucidate complicated molecular structures of the nat-
ural products quickly.
We believe that the integration of the approaches out-
lined above, and the inclusion of 2D data in particular,
will be of extreme value in enabling automated elucida-
tion of natural products and other large molecules.
Conclusions
An expert system of molecular structure elucidation from
2D NMR spectra has been developed and tested on many
examples. In addition to 2D NMR spectra, the system uti-
lizes 13C NMR knowledge base containing full structures
and fragments as well as spectrum-structure correlations.
This enables to detect and introduce functional groups to
713
Fig.5 Examples of structures recog-
nized. Here n is the number of heavy
atoms in a molecule, k – number of
found candidate structures, t – gene-
ration time (PC Pentium 500MHz),
r – position of a true structure in a
ranked structural file, d – average de-
viation of the experimental spectrum
from predicted one
to elucidate complicated molecular structures of the nat-
ural products quickly.
We believe that the integration of the approaches out-
lined above, and the inclusion of 2D data in particular,
will be of extreme value in enabling automated elucida-
tion of natural products and other large molecules.
Conclusions
An expert system of molecular structure elucidation from
2D NMR spectra has been developed and tested on many
examples. In addition to 2D NMR spectra, the system uti-
lizes 13C NMR knowledge base containing full structures
and fragments as well as spectrum-structure correlations.
This enables to detect and introduce functional groups to
713
Fig.5 Examples of structures recog-
nized. Here n is the number of heavy
atoms in a molecule, k – number of
found candidate structures, t – gene-
ration time (PC Pentium 500MHz),
r – position of a true structure in a
ranked structural file, d – average de-
viation of the experimental spectrum
from predicted one
Page 6
714
the game, automatically assign hybridization of carbon
atoms and possibility of their neighborhood with het-
eroatoms, form GOODLIST and BADLIST, as well as
perform spectral and structural filtering of the generated
structures. The important distinguished feature of the pro-
gram is the availability of the module, which in many
cases is able to detect and eliminate contradictions be-
tween connectivities. The strategy of molecular structure
determination in the interactive mode has been developed,
and examples demonstrating the efficiency of the system
as a tool for structure determination of natural products
are given.
To develop the system further, it is necessary to im-
prove the method of primary processing of spectral data in
online mode, enhance the efficiency of algorithm for
search and elimination of contradictions in 2D NMR data,
develop the strategy of creation and application of user
database, aimed at structure determination of compounds
of certain classes.
Acknowledgements We sincerely offer our gratitude to Gary
Martin and Chad Hadden of Pharmacia, Kalamazoo, MI for en-
couraging us to include H1-N15 correlations into this approach as
well as pointing us to useful literature.
References
1.Martin GE, Hadden CE (1999) J Nat Prod (2000) 63: 543–585
2.Christie BD, Munk ME (1991) J Am Chem Soc 113: 3750–
3757
3.Munk ME, Madison MS, Schulz K-P, Korytko A (1998) 13th
Workshop, Nov 13th–15th 1998, Bad Dürkheim http:/www2.
ccc.uni-erlangen.de/cic/workshop98/paper8.html
4.Peng C, Yuan S, Zheng C, Hui Y, Wu H, Ma K (1994) J Chem
Inf Comput Sci 34: 814–819
5.Steinbeck C (1996) Angew Chem Int Ed Engl 35:1984–1986
6.Nuzillard J-M, Massiot G (1991) Tetrahedron 47: 3655–3664
7.Funatsu K, Sasaki S (1996) J Chem Inf Comput Sci 36: 190–
204
8.Lindel T, Junker J, Koeck M (1999) Eur J Org Chem 3: 573–
577
9. Jaspars M (1999) Nat Prod Rep 16: 241–248
10.Elyashberg ME, Blinov KA, Martirosian ER (1999) Lab Au-
tomat Inf Management 34: 15–30
11.1D and 2D 15N NMR spectral data of strychnine were kindly
provided to us by Gary Martin and Chad Hadden, Pharmacia,
Kalamazoo, MI
12.Elyashberg ME, Karasev YuZ, Martirosian ER (1999) Anal
Chim Acta 388: 353–363
13.Koehn FE, Gunasekera SP, Niel DN, Cross SS (1991) Tetrahe-
dron Letters 32:169–172
14. Jimeno ML, Rumbero A, Vazquez P (1995) Magn Res Chem
33: 408–411
Note added in proof Following sending the article to the journal,
we managed to improve the method of automated search and elim-
ination of contradictions in 2D NMR data. Furthermore, using
structure determination of cryptospirolepine (C35H25N3O1) as an
example, a methodology of application of user database in combi-
nation with 2D connectivities has been developed. To check the
developed approaches, experimental data (usually COSY and
HMBC) from articles published in the Journal of Natural Prod-
ucts, were used. In the process of the system testing we addition-
ally determined structures of 45 molecules of natural products with
a number of skeletal atoms n equal to 20–45 and M = 250–650. It
turned out that in 60% of tasks time for generation and verification
of structures did not exceed 1 min, and in 10% the process took
from one to 3 h. New methods and obtained results will be re-
ported in our future publications.
the game, automatically assign hybridization of carbon
atoms and possibility of their neighborhood with het-
eroatoms, form GOODLIST and BADLIST, as well as
perform spectral and structural filtering of the generated
structures. The important distinguished feature of the pro-
gram is the availability of the module, which in many
cases is able to detect and eliminate contradictions be-
tween connectivities. The strategy of molecular structure
determination in the interactive mode has been developed,
and examples demonstrating the efficiency of the system
as a tool for structure determination of natural products
are given.
To develop the system further, it is necessary to im-
prove the method of primary processing of spectral data in
online mode, enhance the efficiency of algorithm for
search and elimination of contradictions in 2D NMR data,
develop the strategy of creation and application of user
database, aimed at structure determination of compounds
of certain classes.
Acknowledgements We sincerely offer our gratitude to Gary
Martin and Chad Hadden of Pharmacia, Kalamazoo, MI for en-
couraging us to include H1-N15 correlations into this approach as
well as pointing us to useful literature.
References
1.Martin GE, Hadden CE (1999) J Nat Prod (2000) 63: 543–585
2.Christie BD, Munk ME (1991) J Am Chem Soc 113: 3750–
3757
3.Munk ME, Madison MS, Schulz K-P, Korytko A (1998) 13th
Workshop, Nov 13th–15th 1998, Bad Dürkheim http:/www2.
ccc.uni-erlangen.de/cic/workshop98/paper8.html
4.Peng C, Yuan S, Zheng C, Hui Y, Wu H, Ma K (1994) J Chem
Inf Comput Sci 34: 814–819
5.Steinbeck C (1996) Angew Chem Int Ed Engl 35:1984–1986
6.Nuzillard J-M, Massiot G (1991) Tetrahedron 47: 3655–3664
7.Funatsu K, Sasaki S (1996) J Chem Inf Comput Sci 36: 190–
204
8.Lindel T, Junker J, Koeck M (1999) Eur J Org Chem 3: 573–
577
9. Jaspars M (1999) Nat Prod Rep 16: 241–248
10.Elyashberg ME, Blinov KA, Martirosian ER (1999) Lab Au-
tomat Inf Management 34: 15–30
11.1D and 2D 15N NMR spectral data of strychnine were kindly
provided to us by Gary Martin and Chad Hadden, Pharmacia,
Kalamazoo, MI
12.Elyashberg ME, Karasev YuZ, Martirosian ER (1999) Anal
Chim Acta 388: 353–363
13.Koehn FE, Gunasekera SP, Niel DN, Cross SS (1991) Tetrahe-
dron Letters 32:169–172
14. Jimeno ML, Rumbero A, Vazquez P (1995) Magn Res Chem
33: 408–411
Note added in proof Following sending the article to the journal,
we managed to improve the method of automated search and elim-
ination of contradictions in 2D NMR data. Furthermore, using
structure determination of cryptospirolepine (C35H25N3O1) as an
example, a methodology of application of user database in combi-
nation with 2D connectivities has been developed. To check the
developed approaches, experimental data (usually COSY and
HMBC) from articles published in the Journal of Natural Prod-
ucts, were used. In the process of the system testing we addition-
ally determined structures of 45 molecules of natural products with
a number of skeletal atoms n equal to 20–45 and M = 250–650. It
turned out that in 60% of tasks time for generation and verification
of structures did not exceed 1 min, and in 10% the process took
from one to 3 h. New methods and obtained results will be re-
ported in our future publications.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
5 Readers on Mendeley
by Discipline
60% Chemistry
by Academic Status
40% Other Professional
20% Ph.D. Student
20% Researcher (at a non-Academic Institution)
by Country
20% United Kingdom
20% Netherlands
20% Belgium


