GenoLIB: a database of biological parts derived from a library of common plasmid features
Synthetic biologists rely on databases of biological parts to design genetic devices and systems. The sequences and descriptions of genetic parts are often derived from features of previously described plasmids using ad hoc, error-prone and time-consuming curation processes because existing databases of plasmids and features are loosely organized. These databases often lack consistency in the way they identify and describe sequences. Furthermore, legacy bioinformatics file formats like GenBank do not provide enough information about the purpose of features. We have analyzed the annotations of a library of similar to 2000 widely used plasmids to build a non-redundant database of plasmid features. We looked at the variability of plasmid features, their usage statistics and their distributions by feature type. We segmented the plasmid features by expression hosts. We derived a library of biological parts from the database of plasmid features. The library was formatted using the Synthetic Biology Open Language, an emerging standard developed to better organize libraries of genetic parts to facilitate synthetic biology workflows. As proof, the library was converted into GenoCAD grammar files to allow users to import and customize the library based on the needs of their research projects.