Accurate and Efficient Gene Function Prediction using a Multi-Bacterial Network


TR Number



Journal Title

Journal ISSN

Volume Title



The rapid rise in newly sequenced genomes requires the development of computational methods to supplement experimental functional annotations. The challenge that arises is to develop methods for gene function prediction that integrate information for multiple species while also operating on a genomewide scale. We develop a label propagation algorithm called FastSinkSource and apply it to a sequence similarity network integrated with species-specific heterogeneous data for 19 pathogenic bacterial species. By using mathematically-provable bounds on the rate of progress of FastSinkSource during power iteration, we decrease the running time by a factor of 100 or more without sacrificing prediction accuracy. To demonstrate scalability, we expand to a 73-million edge network across 200 bacterial species while maintaining accuracy and efficiency improvements. Our results point to the feasibility and promise of multi-species, genomewide gene function prediction, especially as more experimental data and annotations become available for a diverse variety of organisms.