QSAR model predictions are most reliable if they come from the model's applicability domain. The Setubal Workshop report provides a conceptual guidance for defining a (Q)SAR applicability domain. However, an operational definition is necessary for applying this guidance in practice. It should also permit the design of an automatic (computerised) procedure for determining a model's applicability domain. This paper attempts to address this need for models that use a large number of descriptors (for example, group contribution-based models). The high dimensionality of these models imposes specific computational restrictions on estimating the interpolation region. The Syracuse Research Corporation KOWWIN model for prediction of the n-octanol/water partition coefficient is analysed as a case study. This is a linear regression model that uses 508 fragment counts and correction factors as descriptors, and is based on the group contribution approach. We conclude that the applicability domain estimation by descriptor ranges, combined with Principal Component rotation as a data pre-processing step, is an acceptable compromise between estimation accuracy and the amount of data in the training set.
CITATION STYLE
Nikolova-Jeliazkova, N., & Jaworska, J. (2005). An approach to determining applicability domains for QSAR group contribution models: An analysis of SRC KOWWIN. Alternatives to Laboratory Animals, 33(5), 461–470. https://doi.org/10.1177/026119290503300510
Mendeley helps you to discover research relevant for your work.