Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites

  • Gershenzon N
  • Stormo G
  • Ioshikhes I
  • 64

    Readers

    Mendeley users who have this article in their library.
  • 59

    Citations

    Citations of this article.

Abstract

Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text

Authors

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free