I’ve begun to look at the EAEC plasmids in the german outbreak sequence data, guided by the review of E. coli plasmids in Johnson & Nolan, 2009. I mapped the novel sequence data to the three available plasmids: 55989p (from the Ec55989 isolate, whose chromosome is most similar to the outbreak strains), p042 (from the 042 isolate), and pO86A1. None of them are present in their entirety, only 40-60% of plasmid genes were covered. The closest was 55989p (59% covered by the LB2226692 assembly, 56% covered by the MIRA assembly of Ty2428 and 54% covered by the BGI assembly of TY2428)…perhaps unsurprising since the outbreak strain was most similar to its Ec55989 host chromosome also. The same was evident from mapping BGI reads to the reference plasmids:
The available plasmids have AAF/II and AAF/III (AAF = aggregative adhesion fimbriae) clusters. As I reported earlier, the novel strains appear to contain a different AAF locus, AAF/I (aggABCD), which is quite distinct. One contig from the LB2226692 genome (gi_334716751_gb_AFOB01000328.1_ Escherichia coli LB226692 Contig26) contains N-terminal of aggC and whole of aggD (from AAF/I), adjacent to an IS element (closest to IS1294, IS91 family) and >8 kbp of additional sequence.
That additional sequence in LB2226992 Contig 26 maps to Salmonella enterica subsp. enterica serovar Kentucky str. CVM29188 pCVM29188_101 CP001121.1. Notably, the rest of this plasmid is also present in both outbreak strains, so it is possible that the agg operon was inserted into a pCVM29188_101-like plasmid and has somehow replaced the AAF genes in this strain. However, it is likely that the appearance of the aggCD and pCVM plasmid genes in the same contig could be down to mis-assembly – more data will be needed to sort this out. Using BWA to map TY2482 reads to the LB2226992 contig 26, I couldn’t find any reads that cover the boundary between the agg and plasmid sequences, but I would need the LB2226992 reads to check the evidence for this in the LB2226992 assembly, which are not yet public. The genes in this shared sequence (SeKA_C0033-SeKA_C0041) include some with basic plasmid functions which, if the assembly was correct, would be pretty good evidence of integration of aggABCD directly into the plasmid backbone.
Apparently EAEC usually have only one AAF cluster, and the outbreak strain is no exception – it does not carry the agg3 or aaf AAF operons of 55989p, p042 or pO86A1. I don’t know why this happens… The outbreak strain also has the ipd gene from pO86A1 (encoding an extracellular serineprotease) but not pet from 042. The aatPABCD operon and aggR genes, plasmid-borne genes characteristic of EAEC, are present and appear to be intact.
A target for PCR typing?
I’m not familiar with EAEC typing, but the little I’ve read in the last few days suggests that (a) AAF operons are a marker for EAEC, and (b) AAF/I is relatively rare among EAEC. In which case PCR testing for a sequence within this AAF/I (aggABCD) cluster could be a good way to differentiate between the outbreak strain and other STEC (i.e. aggA+, stx2+, eae-).
Apparently the current strains are alredy being tested by PCR for aatA and coming up positive, consistent with the fact that read mapping and assembly identify the complete aatPABCD operon (antiaggregation protein transporter system) in the outbreak strains…. but typing for aggA could eliminate other EAEC.
Update: The Koch Institute is already doing exactly this, they have characterised the outbreak strain as aatA+, aggR+, aap+, aggA+ and aggC+ by PCR (ST678). Must have been missing from the earlier info I saw. Great to see that genomics can provide the same conclusion as the experts, even for an E. coli newbie.
Update 2: Still, the genomic analysis shows/confirms that aatPABCD, aggR, aap are conserved markers of EAEC, whereas the aggA (AAF/I) is a bit more specific to this outbreak. I would argue that the combination of HUS symptoms, stx2+, eae-, aggA+ would be a pretty convincing first pass at typing, especially if you are limited (as most are) in the number of PCRs you can use to screen each isolate.
Clearly some data on frequency of this combination of markers from public health labs would be needed to confirm this has value…but given the initial ‘surprise’ at HUS caused by stx+, LEE-, EAEC (and rarity of reports in the literature) I would hazard a guess that this would be pretty discriminatory in Europe at the moment.
Update 3: BGI has release a new assembly incorporating 200x data from Illumina Hiseq, see post. But the agg cluster forms its own contig within this assembly (contig 43), without even the adjacent IS elements, so no further clues to its genomic context. Maybe if the reads were released this could be look at?