DeePSP-GIN: identification and classification of phage structural proteins using predicted protein structure, pretrained protein language model, and graph isomorphism network
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Phages are vital components of the microbial ecosystem, and their functions and roles are largely determined by their structural proteins. Accurately annotating phage structural proteins (PSPs) is essential for understanding phage biology and their interactions with bacterial hosts, which can pave the way for innovative strategies to combat bacterial infections and develop phage-based therapies. However, the sequence diversity of PSPs makes their identification and annotation challenging. While various computational methods are available for predicting PSPs, they currently lack the integration of protein structural information, an important aspect for understanding protein function. With the advent of deep learning models, protein structures can be predicted accurately and quickly from protein sequences, creating new opportunities for PSP prediction and analysis. We developed DeePSP-GIN, a graph isomorphism network (GIN) - based deep learning model leveraging predicted protein structures and protein language model for PSP identification and classification. To the best of our knowledge, DeePSP-GIN is the first method utilizing predicted protein structural information for PSP prediction tasks. It offers dual functionality of identifying PSP and non-PSP sequences and classifying PSPs into seven major classes. DeePSP-GIN converts predicted protein structures from PDB 3D coordinates into graphs and extracts node features from protein language model-generated embeddings. The GIN is then applied to the constructed graphs to learn the discriminating features. The experimental results show that DeePSP-GIN outperforms the state-of-the-art methods in both PSP identification and classification tasks in terms of F1-score. DeePSP-GIN achieves a 1.04% higher F1-score than the nearest competing method in the PSP identification task. Additionally, its overall F1-score in the PSP classification task is approximately 34.38% higher than that of the second-best method. The source code of DeePSP-GIN is available at https://github.com/muhit-emon/DeePSP-GIN under the MIT license.