A Mutated Peptide Database for the Analysis of Proteomic Mass Spectrometry Data
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cancer is often characterized by the accumulation of mutations in a cell population. These mutations often critically disrupt protein function through changes to protein amino acid sequences. Proteomics seeks to characterize and understand proteins through their relative abundance, function, structure, and interactions among others. Proteomic approaches are becoming an increasingly important tool to cancer biologists for profiling cells, tissues, and various healthy or diseased cell states. One of the key technologies that enables protein and peptide amino acid sequences to be determined is mass spectrometry. This is also the premier technology for protein identification. However, it requires the availability of reference protein databases to operate effectively. UniProt's reviewed Swiss-Prot database is one of the best quality protein sequence databases and a valuable tool for protein identifications. This database, however, includes only protein canonical sequences, these being sequences that are widely expressed, functional, and highly confirmed. It contains no reference to mutated sequences. On the other hand, the largest cancer-related mutation repository is the COSMIC database that contains 24,599,940 total variants. The database continues to be updated regularly. In this work, we combined data and information from the well-researched canonical sequence database of Swiss-Prot with the comprehensive list of mutations from the COSMIC database to create a new resource that contains mutated peptides of the human proteome. This process resulted in a new database, termed XMAn, comprising 3,793,617 missense and nonsense mutations. The XMAn v3 database was used to identify mutations in the MDA-MB-231 triple negative breast cancer cell line. A total of 540 unique mutations were identified, of which 60 were also part of the COSMIC's Cancer Gene Census database that incorporates genes implicated in cancer development through their function as oncogenes and tumor suppressor genes, among others. Mutations in the oncogene KRAS, DNA Topoisomerase 1 (TOP1), and TGF-beta receptor type 2 (TGFBR2) represent only a small sample of the mutations that were detected. Overall, the XMAn v3 database proved to be very useful in enabling the identification and characterization of missense and nonsense amino acid level mutations, and providing insights into the possible drivers of aberrant proliferation in the MDA-MB-231 cells. This database will represent a valuable resource to researchers characterizing the proteome of cancer cells, and is aimed to be updated regularly, as COSMIC and UniProt release updates to their respective databases.