The Health Protection Agency in the UK has released a third E. coli O104:H4 genome from the German outbreak, strain H112180280. 454 reads and scaffold are available here: http://www.hpa-bioinformatics.org.uk/lgp/genomes (BGI has also released an updated assembly but not sure yet how it was done.)
It contains a 73 kbp scaffold with matches to the other available EAEC aggregative adhesion plasmids (scaffold 8 in the HPA assembly). Here is a plot of the newly assembled scaffold and its homology to the other plasmids:
This was made using BRIG, freely available here from the Beatson lab at UQ, in about 5 minutes. Great program.
The inner ring is the novel plasmid scaffold, and its GC contig is shown in the black squiggly line. Because this is a scaffold, it contains some bits of unknown sequence between contigs (where it is known from mate pair reads that the contigs are ordered in this orientation, but there are small gaps between the contigs where we don’t know what the sequence is). These gaps are indicated with a series of N (as opposed to A, C, G or T) of the estimated length of the gap…so in the black wiggly line, these gaps show up as solid blocks of black, e.g. the one around the 70kbp mark at the top.
The other three rings (blue, red, green) indicate where there are similar sequences in the other three plasmids as shown in the central legend. You can see that pO86A1 is most similar to the novel plasmid, as it has similar sequence in parts of the novel plasmid where the other two plasmids don’t have matches (eg. between the 50-50kbp mark). There are also some parts where none of the three reference plasmids have good matches.
On the left in orange is the location of the agg operon encoding aggregative adhesion fimbriae (AAF/I). This has only very weak similarity to the other plasmids (the rings are very pale here, indicating low % identity), because the others encode type II and type III aggregative adhesion fimbriae instead (AAF/II, AAF/III). The bit upstream (10-15 kbp mark) that also has low homology contains insertion sequence (IS) elements and downstream is a resolvase, so it is possible the operon is mobile…although proper annotation is needed to sort this out.
As with Ion Torrent, the 454 is prone to errors in homopolymeric tracts (ie runs of the same base, difficult to tell between AAAAA and AAAAAA), which introduces frameshifts into the assembly. So it would be great to have an ERA7 error-tolerant annotation to sort this out.
Update: Most of the sequence that is shared with pO86A1 but not the others are transposases, with the exception of a serine protease autotransporter from a family of mucinases including pic, ipd, sepA:
The gene is mostly similar to sepA genes found in the virulence plasmids of Shigella: