Resistance genes plus adhesin in the chromosome?

Two new assemblies of the German E. coli outbreak strains were released today, one from BGI (452 scaffolds/contigs; Illumina Hiseq paired end, 500bp inserts) and one from HPA (13 scaffolds, 454 mate pair). In the HPA assembly, the resistance genes for streptomycin, trimethoprim, sufamethoxazole, streptomycin and mercury (some of which are carried by a Tn21 transposon and IntI1 integron) are present in the same scaffold as the Ec55989 chromosome (scaffold 2). The picture below shows the mapping of this scaffold 2 to the chromosome in blue, the IncI plasmid  in green (which carries the blaTEM and blaCTX-M genes) and  the resistance genes of pAKU_1 in red (plasmid from Paratyphi A, using this as a reference because this is a plasmid I’ve worked with previously so am familiar with interpreting).

What you can see is that most of the scaffold maps to the chromosome, but the the red resistance genes are also present (near 100 kbp marker). Upstream of the resistance genes, in this scaffold at least, there are some additional regions of low similarity with the Ec55989 chromosome, consistent with a whole stretch of DNA being inserted into the chromosome. Very little of this has any homology to the IncI plasmid which we know is present in the strain, consistent with the idea that these resistance genes are not present on the IncI plasmid. This whole region is conserved in the BGI genome, shown in purple.

This could all be a scaffolding and assembly error, but looking at the mate pair reads should confirm or deny this.

 

Update: A closer look at the region suggests the scaffold may be correct. Below is a mapping of the new scaffold (second line) against Ec55989 (top line) and E. coli S88 (bottom line), showing the site of the possible insertion. The novel sequence is inserted into a tRNA sequence, which is typical of many integrase-mediated insertions. On the right of the insertion is an integrase gene with 100% identity to integrase sai in the Shigella flexneri 2002017. Part of the tRNA is duplicated on the left hand side of the insertion, again typical of a real integrase-mediated insertion.

Possible insertion of resistance genes and adhesin in chromosome

Acquired adhesin

So what exactly is in the insertion? To the left is a stretch of sequence with homology to pathogenicity island genes from several E. coli and Shigella genomes, including E. coli S88, E. coli 042, E. coli SE15, Shigella flexneri 2a SRL pathogenicity island. The homology to S88 is is shown in the figure. This region contains a protein with an autotransporter domain, annotated in some genomes as flu, Ag43, others as aidA-like adhesin, etc. So it is associated with adhesion.

Here is a phylogenetic tree showing the closest matching proteins (by NCBI blastp):

Adhesin inserted in O104:H4 outbreak strain (tree of similar proteins)

And the closest matching DNA sequences (by NCBI blastn):

Adhesin inserted in O104:H4 outbreak strain (tree of DNA sequences)

Acquired multidrug resistance

The rest of the insertion contains small hypothetical genes of unknown function, plus several common mobile elements associated with drug resistance.

Immediately adjacent to the “pathogenicity island” sequences described above (and present in the same contig) are part of tniAdelta (part of Tn21), a pecM-like permease and two tetracycline resistance genes (tetA, tetR):

The next contig contains a mercury resistance operon usually found in Tn21:

And the next contig a sequence containing strA, strB (resistance to streptomycin) and sul2 (resistance to sulfonamides… and then a new contig containing part of a Tn21-like transposon including tnpA, tnpM, tnpR and an IntI1-like integron which appears to have two resistance genes in the cassette (sul1 and a dihydrofolate reductase, I think it is A7). Next door in the same contig is an IS1 transposase and then the sai integrase, and then we are back to the tRNA-Sec sequence and chromosomal genes:

So given the sequence context, it is likely that the scaffold is correct in grouping these contigs together in this order, as it looks like a common and plausible gene order, with a possible mechanism for mobilisation. I’m used to seeing these resistance genes in plasmids, so to convince myself they can also be integrated into the chromosome I had a quick look for similar integrations in other E. coli. Here is one with a very similar set of resistance elements, even in the same order, inserted into the chromosome of EAEC E. coli 042 genome (although in this case it is not associated with the adhesion element mentioned above).

A similar syntenic set of resistance genes are present in E. coli 042 (EAEC) chromosome (top) and the German outbreak strain's chromosome (bottom)

Data: My manual annotation of this region in the HPA assembly is available here. It can be loaded into Artemis as an entry on top of the HPA scaffold. ACT comparisons on request but you can easily make your own using WebACT.

Advertisements

MLST of IncI blaCTX-M plasmid in German outbreak strain

Overnight I received an email from Scott Weissman at the Seattle Children’s Hospital. He has done some analysis of the IncI, blaCTX-M bearing plasmid from the outbreak strain using the plasmid MLST database. Here’s what he did:

To facilitate comparisons to other plasmids, I analyzed the LB226692 contigs in order to identify a plasmid Sequence Type (pST) for this outbreak strain’s IncI1 plasmid carrying CTX-M-15 and TEM-1.  I extracted fragments for the 5 MLST loci (as described at http://pubmlst.org/plasmid/primers/incI1.shtml) from the GenBank contigs, and obtained allele assignments as follows: repI 3 | ardA 4 | trbA 6| sogS 3 | pilL 3, which corresponds to pST31.  (I should note that the extracted sequence for trbA contained a 1-nt “insertion” relative to reference allele 6, which I assume to be sequencing artifact, although a novel allele cannot be excluded – given the indel occurrence within a poly-T tract of 4 T’s).

The database contains 15 plasmid entries for IncI1 pST31 (see below), including pEC_Bactec (described by Smet et al, PLoS One, 2011;5:e11202).  All of these entries carry the CTX-M-15 and TEM-1 enzymes, so there are no headlines here.

I would note, however, that this CTX-M-15 plasmid is distinct from the IncF-family plasmids that have been globally distributed by E. coli ST131 (eg, pC15a-1a, as described by Boyd et al, AAC 2004;48:3758-64) and detected subsequently in multiple Klebsiella pneumoniae clones (see Oteo et al, JAC, 2009;64:524-528).

IncI pST31 entries in the IncI pMLST database

To supplement this I had a quick look at the latest BGI assembly of TY2482, the other outbreak strain that has been sequenced. I found the same results, but this time with a precise trbA allele 6 (i.e. Scott was right in guessing this is an error in the Ion Torrent data at a homopolymeric tract).

Re the table above, the paper describing the 2004 Shigella sonnei plasmid is here, I’m not sure if the others are published.

This is what the eBURST diagram looks like for the data in the IncI plasmid MLST database…the German outbreak sequence type, pST31, is pointed out with a red arrow.

eBurst diagram for IncI plasmid MLST

The pMLST sequences from TY2482 (BGI assembly 2, June 6) are:

>repI1-3
 GAGAGATGGCATGTACGGGCAGTAAGTCAGAAGACTGAAGATGCTCCGGAAGCCATAAAA
 GGAAAACCCCCACTATCTTTCTTACGAACTTGGCGGAACGACGAA
 >ardA-4
 AATACAACTGTGGAAGCATCGCCGGACGCTGGTTTGACCTGACCACGTTTGATGATGAGC
 GCGACTTTTTCGCCGCCTGCCGTGCTCTTCACCAGGATGAAGCCGATCCTGAACTGATGT
 TTCAGGATTATGAGGGATTCCCGGGGAATATGGCCTCTGAATGCCATATCAACTGGGCCT
 GGGTTGAAGGCTTCCGCCTGGCACGGGATGAAGGCTGCGAAGAGGCTTATCGTCTCTGGG
 TGGAGGATACCGGTGAGACGGATTTTGACACCTTCCGCGATGCCTGGTGGGGCGAGGCTG
 ACAGTGAGGAGGCTTTTGCGGTTGAGTTCGCCAGTGATACCGG
 >trbA-6
 GCAACCCGCCGCTCAGGCCGTTTGCCACCATGAAAGAGTTTTTCCGGATCACCATCTGCC
 AGTACTGGGGCGATAGCAGGGGACAACGAGGCAAAGATGTGTGGCAGTCGGGTAATATCT
 ACAGGTCTGCGGGTGAAACGGCTTTGTCCCGGGTGTTGATACCATTCCCATAAACACCAG
 AGTGTCACAGGTAAAAGATACATCCACAGAATACCTATGGTCTGCTCCATGACGTTAATC
 CACTGGCTATAGCTTATGTTGGCGGCGTTGTTACCGGTCATGGCAAGCAGGTTGTATCGT
 GGTGCTGCATAGTTATGAAATGGTCCCCAGTCGACCAGTCCCCATAAGGTATGAAGAATC
 AGGCAGCTGGCGTAAACCACTTCCGGCAAGAATAACCATATGACGAATAGCAGCAGGATA
 AGAAGGACACCAACAGCGCCCCATATCTGCATAGGATCTTCTGCGACAGGCTGTCGATTG
 TAAGA
 >sogS-3
 GTCGTCGTGGTTTCCGCTGAGGGCGTGGGATCACTGTTCTCATGCGCCTGTGAATCCGTT
 TTTTTACGCGTAAAAAGGCCACGCGCTTTGTCGAGAAACGATGAAGTATTATCAGAAGGT
 GATGTGCTCTGAACAGGTTGCTGCGGAGTGGGTTCATCCCGGACAGCCGGTTCTATAGTG
 GCTGTTGTGGCCTGAAGTTCTGACTCATCTGCCTGAACGTGGCCTGTGTCCGGTTGCGAC
 GGCATATCACTGTT
 >pilL-3
 TTGATGCCATGCTTTCGCATTTTGTTTCTTCTGCCCACTTAATAATGTTTTCCCTTAATG
 TAGTGCCTGCCGGCGCACGCCACTCTTTACCCTGAGATACCGGTTTGACAGGTGTCCCGG
 TCATGAGTGGGATAGACTTGACTGTAGAGCCGGTCGGAGTCGGGATTGCTGCGGGCGTAG
 ACGGAGATACGCTGTTTCCCCTGAATGGGTTTCGTGGTTTGTTCTGGCTATTTGCTGTCG
 TTGAAGACTCCGGGGAAGTGGATGTGGTTACCATGG

STEC/EHEC outbreak – horizontally transferred genes

In the German outbreak bacteria, as in most E. coli, plenty of horizontal transfer has gone on to create the genome we are now looking at.

I’ve done about all I’m going to on this analysis, at least until some more complete data is released… but I did generate a summary plot and have a quick look at the origins of the stx, ter and other acquired genes.

This is a quick look at what the outbreak strain’s genome looks like:

Previously known sequences that are present in the outbreak genome

What is this showing us? Firstly, as established by other’s work mapping reads and contigs to the available E. coli reference genome sequences, the chromosome of the outbreak strain is most similar to strain Ec55989, an enteroaggregative E. coli (EAEC) isolated in Africa over a decade ago [central circle in figure]. It shares with this strain part of the EAEC plasmid [55989p, top right] carrying aggregative adhesion operons aat, the regulator aggR and some other bits, but it has a different aggregative adhesion fimbrial complement (AAF/I) from Ec55989. It has also acquired the stx2 phage carrying shiga-toxin 2 genes stx2A, stx2B [top left]; a plasmid sharing high similarity with the IncI plasmid pEC_Bactec, including blaCTX-M and blaTEM-1 beta-lactamase (antibiotic resistance) genes [bottom left] and a lot of sequence similar to plasmid pCVM29188_101 from Salmonella entericaKentucky [bottom left]. The circles represent the sequence of the plasmids and phage (previously sequenced and deposited in GenBank) that are most similar to sequences in the novel strain. The green rings indicate which parts of these references sequences are also present in the novel German strain (via BLAST comparison with TY2482/MIRA contigs)….so nearly all of the Ec55989 chromosome and pEC_Bactec plasmid, and not quite all of the other phage & plasmid sequences.

There is a further 300-500 kbp of sequence that doesn’t match any of these 5 reference sequences, but we can get a feel for these by searching deeper in the GenBank database via BLAST, and using the wonderful annotation provided by ERA7. [Annotation for just these contigs here.] I haven’t had a chance to look through these properly yet, but of course there is the tellurium resistance operon ter, which we expect because phenotypically the strain was noted as tellurium resistant some time ago.

The origin of the Shiga toxin phage is interesting. The toxin genes themselves (subunits A & B) are 100% identical at the nucleotide level to other stx2 toxins in NCBI, see alignment here showing precisely identical reference sequences. I mapped contigs (TY2482, MIRA assembly) to the VT2 phage to identify those that are likely to be part of the acquired phage. Using these sequences to search NCBI (nr, blastn), the closest match was to Stx2 phage I (accession AP004402, 100% identity across 81%)…but obviously the phage acquired by the German strain is a bit different because the whole of Stx2 phage I is not present (approx 20% missing, top left in figure above).

The tellerium resistance genes are also quite similar to those seen before in a variety of E. coli. I used the ERA7 annotation to identify contigs carrying the ter operon, and did a BLASTN search in NCBI for matches to these contigs. I aligned them properly with Muscle, made a bio-NJ tree and used the ‘Consensus’ function in Dendroscope (LSA tree) to combine the trees into a consensus tree. The result shows the ter operon is very similar to that found in other EHEC, especially O157:H7:

Consensus tree for ter operon (German outbreak strains highlighted)

Finally, I had a look at one contig that I noticed wasn’t present in Ec55989 but had homology to the E. coli O157:H7 Sakai chromosome… it is contig husec41_c1441, containing a probably transporter protein and two other genes of unknown function. Interestingly, a BLAST search of NCBI showed this sequence is usually chromosomally encoded, and was most similar to genes in Shigella flexneri and Shigella boydii, which cause bacterial dysentery [alignment of BLAST hits; tree drawn with FigTree this time]. So this is just a hint that there are still plenty of novel and potentially important genes to be discovered in this genome!

TY2482/MIRA assembled, contig 1441

EHEC genomes – plasmid

A brief look at the genes present in the outbreak strains and not in the reference strain Ec55989 reveals a large number of contigs with similarity to the plasmid pEC_Bactec (by NCBI blast)… a quick MUMmer alignment and ACT visualisation confirms that an IncI plasmid similar to pEC_bactec (GU371927.1, ref here) is present, including the beta-lactamase gene CTX-M-15 adjacent to ISEcp1 (MIRA assembly of TY2482, contig husec41_c30). In this image, the TW2482 assembly is on top, pEC_Bactec is on bottom.

MIRA assembly of TW2482 vs pEC_Bactec

Update: Note this is also similar to pEK204, which was isolated from UPEC and we have recently found very similar plasmids (also carrying blaCTX-M-15) circulating in Shigella in Vietnam and Klebsiella pneumoniae in Australia. So this would seem to be spreading around the Enterobacteriaceae.