Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed μProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification.
CITATION STYLE
De Souza, E. V., Dalberto, P. F., Machado, V. P., Canedo, A., Saghatelian, A., Machado, P., … Bizarro, C. V. (2022). μProteInS- A proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs. Bioinformatics, 38(9), 2612–2614. https://doi.org/10.1093/bioinformatics/btac115
Mendeley helps you to discover research relevant for your work.