Unimod: Protein modifications for mass spectrometry.
Proteomics (2004)
- PubMed: 15174123
Available from www.ncbi.nlm.nih.gov
or
Abstract
Unimod is a database of protein modifications for use in mass spectrometry applications, especially protein identification and de novo sequencing. It contains accurate and verifiable values, derived from elemental compositions, for the mass differences introduced by both natural and artificial modifications.
Available from www.ncbi.nlm.nih.gov
Page 1
Unimod: Protein modifications for...
Unimod: Protein modifications for mass spectrometry David M. Creasy and John S. Cottrell Matrix Science Ltd., London, UK Unimod is a database of protein modifications for use in mass spectrometry applications, especially protein identification and de novo sequencing. It contains accurate and verifiable values, derived from elemental compositions, for the mass differences introduced by both natural and artificial modifications. Keywords: Database / Mass spectrometry / Protein modifications Received 30/9/03 Revised 29/12/03 Accepted 10/1/04 1534 Proteomics 2004, 4, 1534���1536 1 Introduction Unimod (http://www.unimod.org/) is a database of protein modifications for use in mass spectrometry applications, especially protein identification and de novo sequencing. It contains accurate and verifiable values, derived from elemental compositions, for the mass differences intro- duced by both natural and artificial modifications. There is no attempt to provide a biological context for natural modifications or experimental protocols for artificial mod- ifications. Such information is readily available from other sources. In particular, the RESID database [1] contains extensive information on pre-, co- and post-translational modifications. 1.1 Scope To be generally useful for mass spectrometry-based pro- tein identification and characterization, a database of modifications needs to meet the following requirements: (i) It should be comprehensive, covering pre-, co- and post-translational modifications, deliberately introduced chemical and isotopic modifications, and artefacts of sample handling. (ii) Both monoisotopic and average mass values should be provided. (iii) The mass values should be automatically computed from verifiable ele- mental compositions using accurate atomic weights [2] and atomic masses of selected nuclides [3]. (iv) For each modification, site specificity information should be pro- vided. That is, which amino acid residues or termini are susceptible to modification and any constraints on the position of the modification within the protein or peptide. (v) Related data of importance to mass spectrometry applications should be included. For example, details of any neutral loss (side chain cleavage) that may occur during analysis. (vi) Records should be cross-referenced to other databases and the primary literature. (vii) There should be a user interface for sorting, filtering and search- ing the records. (viii) It should be possible to export the data in formats that can be used by search engines and other software applications. To the best of our knowledge, no existing resource fulfils all these requirements. The most comprehensive compi- lation is probably Delta Mass [4]. Unfortunately, it is lim- ited to integer values for average masses and most records do not include either specificity information or an elemental composition. FindMod [5], a web-based search tool from ExPASy (http://www.expasy.org/tools/findmod/ findmod_masses.html), tabulates more complete infor- mation for a limited number of modifications, mostly post-translational. Unimod is work in progress, and cur- rently falls well short of the first requirement, to be com- prehensive. In the belief that many hands make light work, Unimod has been implemented as an open access, Web- based resource, to which anyone can add a new record. Hopefully, if adding a new record to Unimod is no more work than hand calculating the mass values, coverage will grow steadily. The disadvantage of an open-access approach is that quality control is difficult. Many records lack even a single literature reference, and the overall result will not be to the taste of a perfectionist. We can only respond that, in this instance, an imperfect some- thing is better than nothing. Average mass values are calculated using the atomic weight of each constituent element, which is the weighted average of all its natural isotopes. Monoisotopic mass Correspondence: Dr. John S. Cottrell, Matrix Science Ltd., 8 Wyndham Place, London W1H 1PP, UK E-mail: jcottrell@matrixscience.com Fax: 144-20-7725-9360 ��� 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de DOI 10.1002/pmic.200300744
Page 2
Proteomics 2004, 4, 1534���1536 Unimod 1535 values are calculated using the mass of the most abun- dant natural isotope of each constituent element. This definition of monoisotopic mass conforms to that given in ���Standard Definitions of Terms Relating to Mass Spec- trometry��� [6]. Occasionally, literature sources mistakenly refer to monoisotopic mass as being calculated from the lightest natural isotope of each element. While this definition leads to the same result for elemental compo- sitions containing just CHNOS, it produces incorrect values if the composition includes elements such as iron or selenium. Site specificity is defined by choosing from two lists of controlled terms. The first list, Site, contains entries for the twenty standard amino acid residues plus ���C-term��� and ���N-term���. The second list, Position, contains 5 entries: ���Anywhere���, ���Any N-term���, ���Any C-term���, ���Protein N-term���, ���Protein C-term���. A combination of entries from these two lists covers the majority of cases: (i) Modifications that affect a residue or group of residues, independent of posi- tion, e.g., oxidation of methionine. (ii) Modifications that affect a terminus, independent of the terminal residue, e.g., methyl esterification of a carboxy terminus. (iii) Mod- ifications that affect a residue, only when it is at a peptide terminus, e.g., transformation of methionine to homoser- ine following cleavage by cyanogen bromide. (iv) Modifi- cations that affect the original terminus of the intact pro- tein, but not new peptide termini created by digestion, e.g., post-translational acetylation of the protein amino terminus. Each specificity definition also includes an optional, free text comment field. This can be used for information relating to a particular specificity (e.g., ���rare��� or ���only at low pH���), augmenting any notes that apply to the modification record as a whole. 1.2 Nomenclature There is a pressing need for a standardized nomenclature for modifications. Many software applications present lists of modifications to the user. At present, some ap- plications describe modifications by the reagent (e.g., iodoacetamide), others describe the resulting entity (e.g., carbamidomethyl (C) or carboxyamidomethyl-cys), while others use ad hoc abbreviations, (e.g., Cys_CAM). Not only is there potential for confusion, but this also in- creases the difficulty of creating standardized input and output formats for data interchange. From a practical per- spective, standardized names need to be semidescriptive or, at least, recognizable, without becoming excessively long. Possibly, a limit on length of 16 characters, chosen from ASCII letters and digits, plus the punctuation char- acters plus, minus, underscore, and curly braces would provide sufficient flexibility. Names should not be case- sensitive, allowing case to be manipulated to increase readability. Finally, most modification names should not incorporate the specificity, because a single modification may have multiple specificities. For example, better to have a single modification named Me-Ester, which can be combined with specificity descriptors as required, than Me-Ester-D, Me-Ester-E, Me-Ester-C-term, etc. Naturally, there will be exceptions to this, such as Pyro-glu. The authors hope that an appropriate body will address this issue in the near future. Until then, Unimod only enforces a condition that the full and short names for each mod- ification are unique within the database. 2 Practical information In September 2003, Unimod contained 146 modification records, encompassing a total of 226 site specificities. An example of the detailed entry display is shown in Fig. 1. There are some 23 N-linked glycan records, correspond- ing to experimentally observed glycan compositions of less than 1 kDa. Entries have not been added for remain- ing known N-linked and O-linked glycan compositions up to 2 kDa, because the number of records would increase by more than 400, making it difficult to browse non-glycan entries. This will be addressed in the near future by allow- ing entries to be filtered by classification. The database is freely accessible on the Internet at http:// www.unimod.org/. Anyone can register to add a new record on-line and, in doing so, becomes the curator of that record. Anyone wishing to correct or amend an existing record must send a change request to the re- cord���s curator by e-mail. E-mail addresses are not openly displayed, because this would expose the cura- tor to spam. Instead, the Unimod interface includes an integrated form for sending an initial e-mail to the record curator. As of September 2003, 57 users had registered to curate records or to communicate with record cura- tors by e-mail. The entire contents of Unimod are available for download in two formats, with files updated weekly: (i) Complete database dump in XML format, suitable for machine translation into other formats. (ii) Mascot mod_file format [7]. Each new or modified record is checked for apparent validity before the downloadable files are rebuilt. The most common problem is duplication, where someone has created a new record for a modification that was already present in the database. To date, there has only been a single case of someone adding a record that was clearly intended to be meaningless. Unimod has been placed in the public domain by means of a copyleft licence [8]. ��� 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.de
Readership Statistics
31 Readers on Mendeley
by Discipline
13% Chemistry
by Academic Status
32% Ph.D. Student
26% Researcher (at an Academic Institution)
16% Post Doc
by Country
19% United States
13% United Kingdom
10% Denmark
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


