Using Historical Atlas Data to Develop High-Resolution Distribution Models of Freshwater Fishes

TR Number



Journal Title

Journal ISSN

Volume Title


Public Library of Science


Understanding the spatial pattern of species distributions is fundamental in biogeography, and conservation and resource management applications. Most species distribution models (SDMs) require or prefer species presence and absence data for adequate estimation of model parameters. However, observations with unreliable or unreported species absences dominate and limit the implementation of SDMs. Presence-only models generally yield less accurate predictions of species distribution, and make it difficult to incorporate spatial autocorrelation. The availability of large amounts of historical presence records for freshwater fishes of the United States provides an opportunity for deriving reliable absences from data reported as presence-only, when sampling was predominantly community-based. In this study, we used boosted regression trees (BRT), logistic regression, and MaxEnt models to assess the performance of a historical metacommunity database with inferred absences, for modeling fish distributions, investigating the effect of model choice and data properties thereby. With models of the distribution of 76 native, non-game fish species of varied traits and rarity attributes in four river basins across the United States, we show that model accuracy depends on data quality (e.g., sample size, location precision), species' rarity, statistical modeling technique, and consideration of spatial autocorrelation. The cross-validation area under the receiver-operating-characteristic curve (AUC) tended to be high in the spatial presence-absence models at the highest level of resolution for species with large geographic ranges and small local populations. Prevalence affected training but not validation AUC. The key habitat predictors identified and the fish-habitat relationships evaluated through partial dependence plots corroborated most previous studies. The community-based SDM framework broadens our capability to model species distributions by innovatively removing the constraint of lack of species absence data, thus providing a robust prediction of distribution for stream fishes in other regions where historical data exist, and for other taxa (e.g., benthic macroinvertebrates, birds) usually observed by community-based sampling designs.



Freshwater fish, Habitats, Spatial autocorrelation, Decision trees, Land use, Invasive species, Rivers, Snakes


Huang J, Frimpong EA (2015) Using Historical Atlas Data to Develop High-Resolution Distribution Models of Freshwater Fishes. PLoS ONE 10(6): e0129995. doi:10.1371/journal.pone.0129995