An Empirical Study of API Breaking Changes in Bioconductor
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Bioconductor is the second largest R software package repository that is primarily used for the analysis of genomic and biological data. With downloads exceeding millions in recent years, the widespread growth of the repository's adoption can be attributed to it's diverse selection of community-created packages, written in the programming language R, that allow statistical methodologies for analysis and modelling of data. However, as these packages evolve, their APIs go through changes that can break existing user code. Fixing these API breaking changes whenever a package is updated can be frustrating and time-consuming, especially since a large fraction of the user community are researchers who do not necessarily have software engineering background. In that context, we first present a tool that can detect syntactic API breaking changes between two released versions of a library written in R through static analysis of the package source code. This tool can be of utility to R package developers, so that they can more comprehensively report or handle the breaking changes in their releases, and to R package users, who want to be aware of the API differences that may exist between two releases before upgrading the libraries in their code. Through the use of this tool and manual inspection, we also conducted an empirical study of the breaking changes and backward incompatibility in Bioconductor packages. We studied the 100 most downloaded packages in the repository and found that 28% of all packages releases are backward incompatible. We also found that 55% of these breaking changes go undocumented and developers don't maintain semantic versioning for 22% of the releases. Finally, we manually inspected 10 library releases that consisted of breaking changes and found 2% of the API-s to affect 31 client projects.