Nico Petty from University of Queensland has done some additional analysis of the prophage which she’s asked me to post here. Thanks Nico!
The stx phage and others in the O104 outbreak strain
Following on from our finding earlier in the week, that the O104 outbreak strain has acquired a phage that carries the stx2A and stx2B Shiga toxin genes, and Kat’s finding that only part of the stx phage in Sakai was present in the German outbreak strain, I’ve done a bit more analysis of the phage.
Using Kat’s ordering of the contigs of the latest BGI assembly (6/6 Illumina + Ion Torrent), a blast (see below) against the stx phages in two EHEC O157:H7 strains – Sp5 in the Sakai genome and 933W in the EDL933 genome shows that, even though the region is in lots of small contigs in the O104 genome (middle, alternating orange and brown), there is some similarity from the start to the end of the stx phages. However, there is clearly much less similarity in the larger contigs in the region to the left of the stx genes (highlighted in red).
This could be a simple case of misassembly as this is just an early draft genome and was ordered against the closely related EAEC 55989 genome, which doesn’t have this phage. I had a look through the rest of the O104 genome and found that there are contigs elsewhere in the genome assembly which have similarity to the to the left hand side of the O157 stx phages. I reordered the contigs to replace these contigs at the left hand side of the stx phage in O104 (see below) and a blast against Sp5 and 933W as before showed a little more similarity, particularly with phage 933W (bottom). These contigs (13 and 492) also result in a phage of similar size to Sp5 and EDL933, which makes it a more likely fit. However, although contig 492 does encode phage-related genes, they are still quite different from those in the syntenic region of the related EHEC phages. The reason for this region of difference in the stx phage (other than missassemly) could be that phage genomes are chimeras, they consist of different genetic modules, acquired from different ancestors.
The genome of E. coli is highly repetitive and full of repeat sequences which contribute to gene flux – both through the acquisition of new genetic material, e.g. phages and antibiotic resistance genes as we have seen in this outbreak strain, and also through recombination. This is particularly the case for the lambdoid prophages of E. coli, which are highly related to each other and often the source of recombination – swapping phage modules amongst themselves and also contributing to rearrangements in the bacterial chromosome through recombination between highly repetitive sequences.
As in other E. coli strains, there are several lambdoid prophages in the genome of this outbreak strain (the Stx phage is one of these). Due to their repetitive nature, it can be difficult to distinguish which prophage regions belong to which prophage, making assembly of these regions in bacterial genomes very difficult. This is certainly the case in the German outbreak strain as most of the prophage regions are in several contigs. However, given the relative novelty of a Shiga toxin producing EAEC, I suspect that this stx phage was only a recent acquisition (comparison with genome sequence of the 2001 German O104 strain when it is available will give us more clues about the evolution of this outbreak strain) and is likely to be intact.
The other phage region not in EAEC 55989
The other phage that Kat mentioned (a set of 20 phage genes on a single contig not found in 55989) is less than half the size required to be an intact prophage by itself, and is either a prophage remnant or could be part of another phage in the O104 genome. A blast shows this prophage region is similar to an intact prophage (highlighted in green in the picture) in the genome of UPEC UTI89 (see below), and also ExPEC S88. It is possible that a similar, intact phage might be present in the O104 genome as lots of the small contigs at the end of the ordered contigs match the rest of the UTI889 prophage, but it is impossible to tell with the current assembly.
More sequence data, particularly paired-end sequencing and longer reads than we have in the current assembly are really needed to resolve these prophage regions and work out how many phages there are in the genome and which bit of genome belongs to which phage. Once the phages are assembled properly, we will be able to determine if the phages are intact and encode all the genes necessary to produce functional virions, we will also be able to determine if they carry any other virulence genes in addition to the stx2A and stx2B genes.