A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.