Repetitive extragenic palindromic...
Research in Microbiology 156 (2005) 424���433 www.elsevier.com/locate/resmic Repetitive extragenic palindromic sequences in the Pseudomonas syringae pv. tomato DC3000 genome: extragenic signals for genome reannotation Raquel Tobes ���, Eduardo Pareja Bioinformatics Unit, Era7 Information Technologies, C/R��o Tajo 49, Las Gabias, Granada 18110, Spain Received 27 August 2004 accepted 13 October 2004 Available online 24 December 2004 Abstract Repetitive extragenic palindromic (REPs) sequences were first described in enterobacteriacea and later in Pseudomonas putida. We have detected a new variant (51 base pairs) of REP sequences that appears to be disseminated in more than 300 copies in the Pseudomonas syringae DC3000 genome. The finding of REP sequences in P. syringae confirms the broad presence of this type of repetitive sequence in bacteria. We analyzed the distribution of REP sequences and the structure of the clusters, and we show that palindromy is conserved. REP sequences appear to be allocated to the extragenic space, with a special preference for the intergenic spaces limited by convergent genes, while their presence is scarce between divergent genes. Using REP sequences as markers of extragenicity we re-annotated a set of genes of the P. syringae DC3000 genome demonstrating that REP sequences can be used for refinement of annotation of a genome. The similarity detected between virulence genes from evolutionarily distant pathogenic bacteria suggests the acquisition of clusters of virulence genes by horizontal gene transfer. We did not detect the presence of P. syringae REP elements in the principal pathogenicity gene clusters. This absence suggests that genome fragments lacking REP sequences could point to regions recently acquired from other organisms, and REP sequences might be new tracers for gaining insight into key aspects of bacterial genome evolution, especially when studying pathogenicity acquisition. In addition, as the P. syringae REP sequence is species-specific with respect to the sequenced genomes, it is an exceptional candidate for use as a fingerprint in precise genotyping and epidemiological studies. ��� 2004 Elsevier SAS. All rights reserved. Keywords: Pseudomonas syringae Repetitive extragenic palindromic sequences Genome annotation Pathogenicity gene clusters Genotyping 1. Introduction Higgins et al. [12] and Stern et al. [19] first detected repetitive extragenic palindromic (REP) sequences as a ma- jor component of the bacterial genome that occupied up to 1% of the genomes of Escherichia coli and Salmonella ty- phimurium. They defined the repetitive unit formed by 35 base pairs. In 1991 Gilson et al. [11] detected REP elements forming part of a more complex structure termed BIME (bacterial interspersed mosaic element) composed of clus- * Corresponding author. E-mail address: rtobes@era7.com (R. Tobes). ters of REP sequences with alternating orientation separated by other sequence motifs. REP sequences are difficult to detect for three different reasons: first, they change from species to species (species- specific) second, the sequence changes slightly from copy to copy in a species (imperfect repeats) and third, they are only partially palindromic. These idiosyncratic REP se- quence features explain why it is necessary to use special search tools to find them. Stern et al. [19,20] first related REP sequences to mRNA degradation, chromosome structure and recombination. Yang and Ames [24] identified DNA gyrase, responsible for the maintenance of DNA supercoiling in bacteria, as a REP binding protein in 1988. They underlined the fact that the 0923-2508/$ ��� see front matter ��� 2004 Elsevier SAS. All rights reserved. doi:10.1016/j.resmic.2004.10.014
R. Tobes, E. Pareja / Research in Microbiology 156 (2005) 424���433 425 palindromic nature and the conservation of REP sequences were features especially appropriate for protein recognition. This finding was further confirmed by Espeli et al. [9] who proved that gyrase interacted in vivo with BIME-2 elements and that a pair of diverging REP sequences constituted the target of DNA gyrase. The specific binding of DNA poly- merase I to REP sequences was also demonstrated in vitro, indicating that REP sequences could be specific anchor- age sites for supercoiled nucleoid domains in prokaryotes, similar to the eukaryotic scaffold-associated regions (SARs) involved in control of chromatin organization and gene ex- pression [10]. In addition, an integration host factor (IHF) recognition sequence was detected in 28 positions of the E. coli genome between two inverted REP sequences. These complex clusters were termed RIP (repetitive IHF binding palindromic) and a specific union of IHF to these RIP clus- ters was experimentally proven [3,18]. In 1988, Higgins [13] proposed that differential mRNA stability mediated by the REP sequences could be responsible for differential gene ex- pression within polycistronic operons. This hypothesis has been recently confirmed in E. coli with the demonstration of REP elements as mRNA stabilizers that protect 5 -proximal cistrons from 3 ��� 5 exonucleolytic degradation, determin- ing different levels of gene expression in polycistronic mes- sages [14]. REP sequences have also been proposed as a prokaryotic equivalent of ���selfish DNA��� considering that gene conversion may play a role in their evolution and maintenance [13]. There is some evidence associating REP elements with genomic plasticity. Thus, REP sequences have been found at the recombination junctions of lambda biotransducing phages [16] and it appears that amplification of plasmid F_128 is initiated by REP���REP recombination [15]. In addi- tion, it was reported that IS1397 and IS621 insert specifically within REP sequences of E. coli and that ISKpn1 inserts into REP sequences of Klebsiella pneumoniae [5,6,22,23]. Thus, the characterization of some REP elements as hot spots for recombination and transposition suggests that REP elements are key elements in adaptive bacterial evolution. Pseudomonas syringae is an agriculturally important plant pathogen with at least 50 pathovars based on host specificity. P. syringae enters plant leaves through stomata and produces necrotic lesions that are often surrounded by chlorotic halos. The genome of P. syringae pathovar tomato DC3000 has been recently described [4] and is considered to be a model for most animal and plant pathogens in the gamma Proteobacteria. In this group, pathogenicity seems to rely on type III secretion systems (TTSS) that inject viru- lence effector proteins into host cells [4]. We detected a species-specific repetitive sequence scat- tered throughout the chromosome of P. syringae DC3000 with features of the REP sequences. We characterized these sequences and, based on REP sequence features, we refined the annotation of the P. syringae DC3000 genome. 2. Materials and methods This study was developed using the sequences and the an- notations of the P. syringae DC3000 genome available at the NCBI (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/ Complete.html). Sequence version: NC_004578.1 GI: 28867243. To detect REP sequences we used a BLAST-based strat- egy specially designed to detect repetitive sequences in the extragenic space of genomes (R. Tobes and E. Pareja, man- uscript in preparation). To obtain the Logo representing the REP sequences present in P. syringae pv. tomato DC3000 we used the sequ- ence logo generator Weblogo [8] available at http://weblogo. berkeley.edu/. To predict RNA secondary structure we used the server ���Vienna RNA Secondary Structure Prediction��� server (http:// rna.tbi.univie.ac.at/cgi-bin/alifold.cgi). For multiple align- ment of the sequences we used the program Multalign [7] available at http://prodes.toulouse.inra.fr/multalin/multalin. html and CLUSTAL W [21] available at http://www.ebi.ac. uk/clustalw/#. We developed C++ programs to assist in the definition and analysis of the distribution and clusterization of REP sequences. Detected palindromy for each position of the first puta- tive hairpin was calculated counting the number of cases in which the symmetrical position contains the corresponding complementary base. Detected palindromy is expressed in percentage. Expected palindromy for each position was the result of the addition of the expected palindromy calculated independently for each base (A, C, G and T). The expected palindromy for each base depends on the frequency of this base in this position and on the frequency of its complemen- tary base in the symmetric position. Thus, for instance the expected palindromy for A at position ���2 was calculated by multiplying the percentage of A at position ���2 by the per- centage of T at position +2 and dividing by 100. 3. Results and discussion We detected a repetitive palindromic sequence in the genome of P. syringae DC3000 that appears in 365 copies in the chromosome (Table 1, supplementary material) and in one copy in plasmid pDC3000B, representing about 2% of the extragenic space of the genome. The sequence of this REP element (Fig. 1) is species-specific and does not dis- play significant similarity with REP sequences detected in Pseudomonas putida and enterobacteriacea (Fig. 2). The analysis of REP sequence positioning along the P. sy- ringae DC3000 genome (Fig. 3) showed REP sequences as non-uniformly distributed elements. Some REP sequences appear to group in relatively dense clusters while the rest appear sparsely distributed in the chromosome (Table 1, sup- plementary material).