In the German outbreak bacteria, as in most E. coli, plenty of horizontal transfer has gone on to create the genome we are now looking at.
I’ve done about all I’m going to on this analysis, at least until some more complete data is released… but I did generate a summary plot and have a quick look at the origins of the stx, ter and other acquired genes.
This is a quick look at what the outbreak strain’s genome looks like:
What is this showing us? Firstly, as established by other’s work mapping reads and contigs to the available E. coli reference genome sequences, the chromosome of the outbreak strain is most similar to strain Ec55989, an enteroaggregative E. coli (EAEC) isolated in Africa over a decade ago [central circle in figure]. It shares with this strain part of the EAEC plasmid [55989p, top right] carrying aggregative adhesion operons aat, the regulator aggR and some other bits, but it has a different aggregative adhesion fimbrial complement (AAF/I) from Ec55989. It has also acquired the stx2 phage carrying shiga-toxin 2 genes stx2A, stx2B [top left]; a plasmid sharing high similarity with the IncI plasmid pEC_Bactec, including blaCTX-M and blaTEM-1 beta-lactamase (antibiotic resistance) genes [bottom left] and a lot of sequence similar to plasmid pCVM29188_101 from Salmonella entericaKentucky [bottom left]. The circles represent the sequence of the plasmids and phage (previously sequenced and deposited in GenBank) that are most similar to sequences in the novel strain. The green rings indicate which parts of these references sequences are also present in the novel German strain (via BLAST comparison with TY2482/MIRA contigs)….so nearly all of the Ec55989 chromosome and pEC_Bactec plasmid, and not quite all of the other phage & plasmid sequences.
There is a further 300-500 kbp of sequence that doesn’t match any of these 5 reference sequences, but we can get a feel for these by searching deeper in the GenBank database via BLAST, and using the wonderful annotation provided by ERA7. [Annotation for just these contigs here.] I haven’t had a chance to look through these properly yet, but of course there is the tellurium resistance operon ter, which we expect because phenotypically the strain was noted as tellurium resistant some time ago.
The origin of the Shiga toxin phage is interesting. The toxin genes themselves (subunits A & B) are 100% identical at the nucleotide level to other stx2 toxins in NCBI, see alignment here showing precisely identical reference sequences. I mapped contigs (TY2482, MIRA assembly) to the VT2 phage to identify those that are likely to be part of the acquired phage. Using these sequences to search NCBI (nr, blastn), the closest match was to Stx2 phage I (accession AP004402, 100% identity across 81%)…but obviously the phage acquired by the German strain is a bit different because the whole of Stx2 phage I is not present (approx 20% missing, top left in figure above).
The tellerium resistance genes are also quite similar to those seen before in a variety of E. coli. I used the ERA7 annotation to identify contigs carrying the ter operon, and did a BLASTN search in NCBI for matches to these contigs. I aligned them properly with Muscle, made a bio-NJ tree and used the ‘Consensus’ function in Dendroscope (LSA tree) to combine the trees into a consensus tree. The result shows the ter operon is very similar to that found in other EHEC, especially O157:H7:
Finally, I had a look at one contig that I noticed wasn’t present in Ec55989 but had homology to the E. coli O157:H7 Sakai chromosome… it is contig husec41_c1441, containing a probably transporter protein and two other genes of unknown function. Interestingly, a BLAST search of NCBI showed this sequence is usually chromosomally encoded, and was most similar to genes in Shigella flexneri and Shigella boydii, which cause bacterial dysentery [alignment of BLAST hits; tree drawn with FigTree this time]. So this is just a hint that there are still plenty of novel and potentially important genes to be discovered in this genome!