Dealing with circular genomes

Bacterial chromosomes and plasmids are circular (i.e. have no beginning & no end), with a few exceptions. This can pose problems with read alignment (since reference genome sequences are always cut down into a linear form with a beginning & end, although some reads will bridge this artificial gap and may be thrown away since they partially match the start & stop equally well) and also de novo assembly (since assembler are generally unaware that sequences can be circular).

Sometimes this doesn’t matter, although in my experience with read mapping, it causes an artificial drop in the depth of reads mapping to the artificial “start” and “end” of the sequence which might affect some analyses.

A few suggestions for dealing with circularity in de novo assemblies were recently discussed on BioStar:

The take home message is that you need to be aware of this issue, because the software packages aren’t! If it matters for your analysis then the solutions are quite simple, but you have to look for this yourself as it will not be reported by mapping or assembly software.