Sign up & Download
Sign in

Automated structure verification based on 1H NMR prediction.

by Sergey S Golotvin, Eugene Vodopianov, Brent A Lefebvre, Antony J Williams, Timothy D Spitzer
Magnetic resonance in chemistry MRC (2006)

Abstract

A unique opportunity exists when an experimental NMR spectrum is obtained for which a specific chemical structure is anticipated. A process of Verification-the confirmation of a postulated structure-is now possible, as opposed to Elucidation-the de novo determination of a structure. A method for automated structure verification is suggested, which compares the chemical shifts, intensities and multiplicities of signals in an experimental 1H NMR spectrum with those from a predicted spectrum for the proposed structure. A match factor (MF) is produced and used to classify the spectrum-structure match into one of three categories, correct, ambiguous, or incorrect. The verification result is also augmented by the spectrum assignment obtained as part of the verification process. This method was tested on a set of synthetic spectra and several sets of experimental spectra, all of which were automatically prepared from raw data. Taking into account even the most problematic structures, with many labile protons present and poor prediction accuracy, 50% of all spectra can still be automatically verified without any false positives or negatives. In a blind test on a typical set of data, it is shown that fewer than 31% of the structures would need manual evaluation. This means that a system is possible whereby 69% of the spectra are prepared and evaluated automatically, and never need to be seen or evaluated by a human.

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Automated structure verification based on 1H NMR prediction.

MAGNETIC RESONANCE IN CHEMISTRY
Magn. Reson. Chem. 2006; 44: 524–538
Published online 17 February 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/mrc.1781
Automated structure verification based on 1H NMR
prediction
Sergey S. Golotvin,1 Eugene Vodopianov,1 Brent A. Lefebvre,2∗ Antony J. Williams2 and
Timothy D. Spitzer3
1 Advanced Chemistry Development Inc., Moscow Department, 6 Akademik Bakulev Street, Moscow 117513, Russian Federation, Russia
2 Advanced Chemistry Development Inc., 110 Yonge Street, 14th floor, Toronto, Ontario M5C 1T4, Canada
3 GlaxoSmithKline Inc., 5 Moore Drive, Research Triangle Park, NC, 27 709, USA
Received 1 September 2005; Revised 10 December 2005; Accepted 10 December 2005
A unique opportunity exists when an experimental NMR spectrum is obtained for which a specific
chemical structure is anticipated. A process of Verification – the confirmation of a postulated structure – is
nowpossible, as opposed to Elucidation – the de novodetermination of a structure. Amethod for automated
structure verification is suggested, which compares the chemical shifts, intensities and multiplicities of
signals in an experimental 1H NMR spectrum with those from a predicted spectrum for the proposed
structure. A match factor (MF) is produced and used to classify the spectrum-structure match into one of
three categories, correct, ambiguous, or incorrect. The verification result is also augmented by the spectrum
assignment obtained as part of the verification process. This methodwas tested on a set of synthetic spectra
and several sets of experimental spectra, all of which were automatically prepared from raw data. Taking
into account even the most problematic structures, with many labile protons present and poor prediction
accuracy, 50% of all spectra can still be automatically verified without any false positives or negatives. In
a blind test on a typical set of data, it is shown that fewer than 31% of the structures would need manual
evaluation. This means that a system is possible whereby 69% of the spectra are prepared and evaluated
automatically, and never need to be seen or evaluated by a human. Copyright  2006 John Wiley & Sons,
Ltd.
KEYWORDS: NMR; 1H NMR; prediction; structure verification; automated analysis; high-throughput; computer assisted
INTRODUCTION
Recent years have witnessed dramatic improvements in
high-throughput NMR spectroscopy. New flow-NMR tech-
nologies such as VAST1 and BEST2 have made it possible to
acquire spectra for hundreds of samples a day. This still far
surpasses the high-throughput capabilities of most speed-
optimized 2D NMR experiments, despite the increased
information contained within these spectra.3 Because of the
limited throughput of 2D NMR relative to 1D 1H NMR;
the latter, in conjunction with other analytical techniques
such as MS, is the primary tool for validating combinatorial
libraries.4
The main bottleneck in the application of proton NMR
spectroscopy to combinatorial chemistry and parallel syn-
thesis is the analysis of the huge amounts of data generated.
In a previous demonstration by Hamper et al., a qualitative
manual inspection of a set of NMR spectra was performed
using stacked plots for each plate row (A–H) in a 96-well
plate,5 paying attention to the presence of peaks expected in
the desired product(s). Although the results were shown to
ŁCorrespondence to: Brent A. Lefebvre, Advanced Chemistry
Development Inc. (ACD/Labs), 110 Yonge Street, 14th floor,
Toronto, ON M5C 1T4, Canada. E-mail: brent@acdlabs.com
be very consistent with that of HPLC conversion data, the
amount of labor involved significantly hindered the analy-
sis of such a large amount of NMR data. Another similar
approach to aid the interpretation of the NMR spectra from
96-well plates involves a pseudo-2D map, with spectra glued
by row or columns.1 Such a graphical presentation of the
data is capable of highlighting violations in the expected
systematic patterns of NMR signals, but still requires a lot
of attention from a spectroscopist and the accuracy of the
approach is both hard to quantify and automate.
Several methods for structure validation, which rely on
the spectrum pattern of R-group recognition, have been
suggested. In one instance, an unsupervised neural network
was used to cluster together NMR spectra containing a
common pattern of R-groups that had been introduced
during a reaction and from identified outliers within such
clusters.6 Here, the well-plate data is converted to a matrix
of 96 rows by 820 columns. Since this approach uses relative
peak heights within a spectrum, it is essential to make sure
that the signals from solvent peaks or impurities are not
defined as the most intense peak(s). This approach has been
validated for selecting NMR spectra that do not fit the pattern
common to a given substituent. However, the structure is
not necessarily incorrect, and it remains a challenge for the
user to identify why these spectra are not consistent with
Copyright  2006 John Wiley & Sons, Ltd.
Page 2
hidden
Automated structure verification 525
the expected pattern. Also, the technique does not appear
to be reliable when significant contributions to the spectral
signals, derived from impurities having similar R-patterns
(starting materials or by-products), are present.
In the Autodrop approach7 (another R-group recognition
procedure), whole structures are considered as a combination
of different substructures. Correspondingly, 2D HSQC NMR
spectra are regarded as a sum of the spectral patterns
from the individual substructures. The proposed structure
is confirmed if the spectral patterns of all substructures are
present. While this method may offer a good visual aid to the
interpretation of results, it is restricted to 2D NMR spectral
data, which, as discussed earlier, has a lower throughput
than 1D 1H NMR spectra. Another possible source of error is
the assumption that the spectral patterns are stable. This can
sometimes become misleading because magnetically active
nuclei in the vicinity of the reaction site may, of course, cause
changes in the spectral patterns.
Recent progress has been reported in the extraction
of coupling information during the analysis of 1D 1H
NMR spectra.8 – 10 This work shows that it is possible
to accurately identify individual multiplets and extract
coupling information, where possible, during an automated
analysis. This capability offers a new venue for structure
verification since it allows the comparison of the chemical
shifts and coupling constants derived from an experimental
1D 1H NMR spectrum with those same parameters predicted
for the proposed structure.
A common approach to 1H NMR prediction is based on
additivity rules.11,12 Using this approach for a given molecule,
a number of substructures with applicable additivity rules
are automatically identified. The rest of the molecule
is treated as substituents associated with each of the
substructures.13 Estimates have been given11 that the NMR
shift locations can be predicted to within 0.3 ppm accuracy.
However, for structures where no or few additivity rules
are available, the technique suffers.14 The commercial
implementation of this approach is available in packages
such as CambridgeSoft’s ChemDraw Ultra,15 which was
previously available in Chemical Concepts’ SpecTool,16 and
is presently available online.17,18
A more popular approach to the prediction of NMR
chemical shifts utilizes the availability of large NMR spec-
tral databases containing chemical structures with assigned
chemical shifts. In such databases, the structures are
described by a ‘spherical’ code, the HOSE19 code (Hierar-
chical Organization of Spherical Environments) and spectral
signals are assigned to the corresponding atoms. During
the prediction, the algorithm searches for matches between
the HOSE codes for each nuclear environment in the pre-
diction molecule and the HOSE code environments inside
the database to model the predicted shifts. If the database
of structures is of sufficient size and structural diversity,
the results can be very accurate. It should also be noted
that the HOSE code approach could be applied to the cal-
culation of coupling constants. To support the increasing
structural diversity of today’s chemistry, it is also possi-
ble to populate user databases with additional chemical
structures and nuclear assignments; thereby expanding the
structural diversity supported by the prediction algorithms.
This HOSE code method is available in a number of commer-
cially available software packages for NMR shift prediction,
including Sadtler’s Know-It-All package,20 Chemical Con-
cepts’ SpecInfo,21 and Advanced Chemistry Developments’
(ACD/Labs) ACD/NMR Predictors for 1H, 13C, 15N, 19F and
31P nuclei.22
Griffiths and Bright published their method for structure
validation23 utilizing the chemical shift prediction software
ACD/HNMR Predictor (Version 4.0). Their method created
a matrix of predicted and experimental shifts, and built the
largest possible diagonal using this matrix. Additional filters,
such as the number of coupling constants and total protons
in the molecule, were applied. The resultant mismatch
value should not exceed a defined threshold to confirm
the structure. With labile protons excluded, the prediction-
based approach delivered very good results in a test where
all structures within a small subset were proposed for each
spectrum. In their work, all possible combination pairs for
21 structures and spectra were formed and validated. With
a chosen threshold value of 0.24 ppm, 20 structures were
confirmed and only three false positives were obtained. One
of these false positives was nearly impossible to resolve by
1D 1H NMR alone, even by a human expert.
This approach employs a window for mismatch calcula-
tion with a typical value of 2 ppm. As a result, it does not
provide information regarding which predicted resonance
is paired with which experimental signal. This limitation
denies the user the great value that is contained even in a
tentative assignment produced during structure confirma-
tion. An assignment would also allow the user to directly
compare the properties of predicted and experimental sig-
nals and increase the degree of discrimination during the
process of structure verification. Another significant draw-
back of this method is that the cutoff value for structure
validation depends solely on prediction accuracy. A correct
structure that is not well predicted will be rejected.
In our opinion, the most rigorous approach to structure
validation should ask the following questions:
(1) How well does the proposed structure correspond to the
spectrum?
(2) What is the probability of another structure being
coincident with this spectrum at the same level or with
an even better match?
(3) How reliable is this evaluation?
These questions are commonly difficult to answer. Firstly,
there is no way to directly compare the consistency between
a spectrum and structure; only the experimental spectrum
and a spectrum predicted for the proposed structure can
be compared. The second question is a challenge because
of the sheer number of structures that could be consistent.
The validation problem itself may be approached from a
practical point of view with the following questions that are
somewhat easier to answer:
(1) Does the experimental spectrum contain the spectrum
predicted for the proposed structure?
(2) Does the experimental spectrum contain only the spec-
trum predicted for the proposed structure?
Copyright  2006 John Wiley & Sons, Ltd. Magn. Reson. Chem. 2006; 44: 524–538

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
38% Ph.D. Student
 
25% Other Professional
 
13% Post Doc
by Country
 
63% United States
 
13% Israel
 
13% Russia