Phage annotation with PHAST

Just a quick post to say how much I love PHAST, the PHAge Search Tool.

It looks for possible prophages in your bacterial genomes, and makes such beautiful pictures of the results, like this summary of the five phage it found in a new Salmonella genome:

It also draws nice circular diagrams to show you where the phage are located, like this:

And it will even show you a nicely annotated figure of indidual phage it found, using an interactive Flash viewer:

My only gripe is that unlike some of the more visualization-challenged phage finders, PHAST doesn’t output actual annotation files, like GenBank or GFF or even a  simple text table that would be straightforward to convert into GenBank… the format in which it prints out the actual information on where each phage is located in your sequence seems to be a home-grown text format that is not easy to parse with existing tools.

Oh well, I suppose I will have to write a little script to turn PHAST’s phage hunt results into a proper annotation… unless someone else has already done this?

Dealing with circular genomes

Bacterial chromosomes and plasmids are circular (i.e. have no beginning & no end), with a few exceptions. This can pose problems with read alignment (since reference genome sequences are always cut down into a linear form with a beginning & end, although some reads will bridge this artificial gap and may be thrown away since they partially match the start & stop equally well) and also de novo assembly (since assembler are generally unaware that sequences can be circular).

Sometimes this doesn’t matter, although in my experience with read mapping, it causes an artificial drop in the depth of reads mapping to the artificial “start” and “end” of the sequence which might affect some analyses.

A few suggestions for dealing with circularity in de novo assemblies were recently discussed on BioStar:

The take home message is that you need to be aware of this issue, because the software packages aren’t! If it matters for your analysis then the solutions are quite simple, but you have to look for this yourself as it will not be reported by mapping or assembly software.