GRAPLEr: A Distributed Collaborative Environment for Lake Ecosystem Modeling that Integrates Overlay Networks, High-throughput Computing, and Web Services
The GLEON Research And PRAGMA Lake Expedition -- GRAPLE -- is a collaborative effort between computer science and lake ecology researchers. It aims to improve our understanding and predictive capacity of the threats to the water quality of our freshwater resources, including climate change. This paper presents GRAPLEr, a distributed computing system used to address the modeling needs of GRAPLE researchers. GRAPLEr integrates and applies overlay virtual network, high-throughput computing, and Web service technologies in a novel way. First, its user-level IP-over-P2P (IPOP) overlay network allows compute and storage resources distributed across independently-administered institutions (including private and public clouds) to be aggregated into a common virtual network, despite the presence of firewalls and network address translators. Second, resources aggregated by the IPOP virtual network run unmodified high-throughput computing middleware (HTCondor) to enable large numbers of model simulations to be executed concurrently across the distributed computing resources. Third, a Web service interface allows end users to submit job requests to the system using client libraries that integrate with the R statistical computing environment. The paper presents the GRAPLEr architecture, describes its implementation and reports on its performance for batches of General Lake Model (GLM) simulations across three cloud infrastructures (University of Florida, CloudLab, and Microsoft Azure).