NCBI’s new Microbial genome BLAST

A word of warning for those tempted to use NCBI’s new ‘Microbial genomes’ BLAST page, http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=megaBlast&SHOW_DEFAULTS=on&BLAST_SPEC=MicrobialGenomes, currently advertised on the front page of NCBI.

This appears to include chromosomal sequences only. There is an option to limit the search to ‘complete genomes’ (currently 2,102) or include ‘draft genomes’ (currently 4,294) but this does not include plasmid or phage sequences. Of course prophage sequenced integrated into chromosomal sequences will be in there, but this doesn’t reflect the known diversity of phage sequences. Plasmid sequences present within whole-genome shotgun data will be present in the draft genomes, but not annotated as such, and the reference plasmid sequences will not be found.

So, if you get a new genome sequenced, find something new in there compared to your favourite sequence, and want to know what it is and how novel it  might be… DO NOT use the microbial genomes BLAST! If what you’re looking at is phage- or plasmid-derived (as will be most novel DNA), you might be misled into thinking you have found something new or different.

While it is great to have the ability to limit searches to microbial DNA (which speeds things up hugely and is a big help, thankyou NCBI!) this should really have the option to include plasmid and phage databases in the search, otherwise I suspect we will see a lot of “novel” sequences being reported… I hope they do keep the option of a chromosome-only search (although this only works for finished genomes, not draft), as this might be handy in some instances.

So unless you are specifically wanting to limit your search to chromosomal sequences, for now it is best to stick to the regular BLAST page and select the ‘nr’ nonredundant sequence database for your bacterial searches.