Integrated approaches for monitoring sharks: Leveraging machine learning, big data, and molecular biology

Loading...
Thumbnail Image

TR Number

Date

2025-10-24

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Sharks are ecologically important predators facing severe global declines, yet conservation and management are hindered by data deficiencies in taxonomy, distribution, and abundance. In this dissertation, I develop and integrate complementary technological approaches: machine learning, big data workflows, and molecular techniques—to expand scalable, non-invasive monitoring of sharks with programmatic and practical field methodologies. First, I constructed the largest global shark image dataset to date and developed the Shark Detector, a pipeline combining object detection and hierarchical classification. This system automatically locates, identifies, and classifies sharks in heterogeneous media, achieving >90% recall for detection and up to 92% species-level classification accuracy across 80 species, outperforming existing biodiversity classifiers. Second, we refined these methods for ecological survey applications by packaging the models into sharkDetectoR (R package) and SharkByte (desktop application), enabling accessible, semi-automatic processing of baited remote underwater videos (BRUVs). These tools reduced annotation effort by up to 95% while preserving high taxonomic resolution, and demonstrated iterative improvement through survey-specific data boosting. Third, I designed scalable pipelines to mine and filter >5 million social network (Instagram, Flickr) and open source (iNaturalist and Global Biodiversity Information Facility) posts and >600k opportunistic shark observations. By pairing automated classification with effort-standardized statistical models, we derived species-specific abundance indices that revealed regionally consistent population trends: increasing trajectories for coastal taxa in the Bahamas, and recent declines of reef-associated sharks in the Hawaiian Islands. Finally, I piloted molecular monitoring of critically endangered white sharks (Carcharodon carcharias) in the Mediterranean Sea using complimentary Environmental DNA detection and validation workflows. I collected 204 samples across the Sicilian Channel, Adriatic and Ligurian Seas, and detected white sharks at four stations. Detections were confirmed in the lab. Particle simulations identified the detected individuals as nearby for the purpose of tracking them in the field. A preliminary multi-species assay detected 12 elasmobranch species. These workflows provided novel spatiotemporal insights into white shark (and other elasmobranch) occurrence in hypothesized hotspots. Together, these chapters demonstrate how integrated computational and molecular approaches can overcome data limitations, provide reproducible ecological indices, and inform conservation of threatened shark populations in data-poor regions.

Description

Keywords

Big data, Environmental DNA, Machine Learning, Sharks

Citation