The huge influx of protein sequence and structure information is becoming more a quagmire of data rather than the font of knowledge that was anticipated. The latest tally of sequences in GenBank stands at >100 gigabases, and in Protein Data Bank (PDB) there are 34,917 structures (as of January 31, 2006). Of those, ~40% and 1%, respectively, are characterized as "unknown function." The comparatively low fraction of unknowns in PDB reflects the large effort spent solving each protein structure, part of which is directed to functional characterization. Nevertheless, the unknown function segment is rapidly growing in PDB and in Structural Genomics centers (Chandonia and Brenner 2006). With the advent of cheaper and faster techniques, both for sequencing and for solving protein structures, we can only expect this trend to accelerate. The best example of this trend is the recent flood of environmental genomic data (metagenomics) that is already dwarfing the output from all previous genome sequencing efforts and consists almost solely of predicted proteins with unknown functions (Tringe and Rubin 2005).
CITATION STYLE
Friedberg, I., Jambon, M., & Godzik, A. (2006). New avenues in protein function prediction. Protein Science, 15(6), 1527–1529. https://doi.org/10.1110/ps.062158406
Mendeley helps you to discover research relevant for your work.