I’ve given up trying to call this EHEC, EAEC, STEC or HUSEC because to me it seems more confusing than hepful! But we are talking about the E. coli responsible for an outbreak of HUS in Germany.
Now that BGI has released a more complete assembly (using HiSeq data, now in 513 contigs), it is worthwhile attempting some analysis of phage sequences in the genome.
First I ordered the BGI contigs contigs against a composite reference sequence including the Ec55989 chromosome and plasmid, the other two plasmids I discussed here and the VT2 (shiga toxin) prophage as discussed here and here. I did this using Mauve, the alignment & ordered contigs are available here and this is how it looks in Mauve (compound reference on top, Ty2482 contigs from BGI on bottom):
I then submitted the contigs to Prophage Finder, which uses BLASTX to compare contigs to predicted prophage sequences. The raw output is here. I then used a combination of manual processing in text editors and Excel, and a table-to-embl format conversion script I wrote some time ago, to generate an annotation of prophage sequences in the BGI contigs. You can see it as an Excel table, or open this EMBL file in Artemis (it contains the ordered contig sequences plus the phage annotations).
Basically, it shows that the outbreak strain contains several prophage, most of which it shares with Ec55989. However it has acquired two additional prophages…
One of course is the Shiga-toxin producing phage, in which the Prophage Finder annotation found 64 genes in 46 kbp worth of contigs. This is shorter than most published Stx phage, and is consistent with my earlier findings (circular map; ACT diagram) that only part of the 90 kbp VT2-Sakai Shiga toxin phage was present in the German genome. Perhaps someone with expertise in Stx phage will know why this is.
The phylogenetic tree of NCBI blastn hits to these contigs (using ‘Distance tree of results’ option, Fast Minimum Evolution and other default settings) looks like this. No obvious source revealed here.
A second phage?
In addition to this there is a set of 20 phage genes in a single contig of ~14.1 kbp (BGI ), which is not present Ec55989. The phage is most similar to one present in the E. coli UT189 genome (accession CP000243), with 100% identity across ~80% of the sequence. I don’t know if this is a complete phage or if it could be part of the Stx2 phage, perhaps someone who knows how to ‘read’ phage genomes can check if it has all the necessary bits to be its own phage! Download the sequence and annotation here.
Phylogenetic tree for this phage, based on blast hits in NCBI:
Keep up to date on crowd-sourced analysis at the github site here.