Optimization of minimum set of protein-DNA interactions: A quasi exact solution with minimum over-fitting

N. A. Temiz; A. Trapp; O. A. Prokopyev; C. J. Camacho

Journal ArticleOPEN ACCESS

Optimization of minimum set of protein-DNA interactions: A quasi exact solution with minimum over-fitting

Bioinformatics (2009) 26(3) 319-325

DOI: 10.1093/bioinformatics/btp664

12Citations

20Readers

Abstract

Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger-DNA complexes with binding data from 11 mutants, 7 fromEGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code. Results: Based on the structural models of feasible interaction networks for 35 mutants of EGR-DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein-DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes. Contact: ccamacho@pitt.edu;droleg@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. © The Author(s) 2009. Published by Oxford University Press.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Temiz, N. A., Trapp, A., Prokopyev, O. A., & Camacho, C. J. (2009). Optimization of minimum set of protein-DNA interactions: A quasi exact solution with minimum over-fitting. Bioinformatics, 26(3), 319–325. https://doi.org/10.1093/bioinformatics/btp664

Readers' Seniority

PhD / Post grad / Masters / Doc 8

44%

Professor / Associate Prof. 5

28%

Researcher 5

28%

Readers' Discipline

Agricultural and Biological Sciences 9

56%

Biochemistry, Genetics and Molecular Bi... 4

25%

Computer Science 2

13%

Materials Science 1

Optimization of minimum set of protein-DNA interactions: A quasi exact solution with minimum over-fitting

Abstract

References Powered by Scopus

A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules

Integer and combinatorial optimization

Zinc finger-DNA recognition: Crystal structure of a Zif268-DNA complex at 2.1 Å

Cited by Powered by Scopus

Challenges, applications, and recent advances of protein-ligand docking in structure-based drug design

Simultaneous Scheduling of Multiple Frequency Services in Stochastic Unit Commitment

Generation Expansion Planning with Large Amounts of Wind Power via Decision-Dependent Stochastic Programming

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline