Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids

Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). Th...

Full description

Saved in:
Bibliographic Details
Main Authors: Chiaverini, Luca, Macdonald, David W., Hearn, Andrew J., Kaszta, Zaneta, Ash, Eric, Bothwell, Helen M., Can, Ozgun Emre, Channa, Phan, Clements, Gopalasamy Reuben *, Haidir, Iding Achmad, Kyaw, Pyae Phyoe, Moore, Jonathan H., Rasphone, Akchousanh, Tan, Cedric Kai Wei, Cushman, Samuel A.
Format: Article
Published: Elsevier 2023
Subjects:
Online Access:http://eprints.sunway.edu.my/2717/
https://doi.org/10.1016/j.ecoinf.2023.102026
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.sunway.eprints.2717
record_format eprints
spelling my.sunway.eprints.27172024-07-02T02:06:53Z http://eprints.sunway.edu.my/2717/ Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids Chiaverini, Luca Macdonald, David W. Hearn, Andrew J. Kaszta, Zaneta Ash, Eric Bothwell, Helen M. Can, Ozgun Emre Channa, Phan Clements, Gopalasamy Reuben * Haidir, Iding Achmad Kyaw, Pyae Phyoe Moore, Jonathan H. Rasphone, Akchousanh Tan, Cedric Kai Wei Cushman, Samuel A. Q Science (General) QH Natural history SD Forestry Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF. Elsevier 2023 Article PeerReviewed Chiaverini, Luca and Macdonald, David W. and Hearn, Andrew J. and Kaszta, Zaneta and Ash, Eric and Bothwell, Helen M. and Can, Ozgun Emre and Channa, Phan and Clements, Gopalasamy Reuben * and Haidir, Iding Achmad and Kyaw, Pyae Phyoe and Moore, Jonathan H. and Rasphone, Akchousanh and Tan, Cedric Kai Wei and Cushman, Samuel A. (2023) Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids. Ecological Informatics, 75. ISSN 1878-0512 https://doi.org/10.1016/j.ecoinf.2023.102026 10.1016/j.ecoinf.2023.102026
institution Sunway University
building Sunway Campus Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Sunway University
content_source Sunway Institutional Repository
url_provider http://eprints.sunway.edu.my/
topic Q Science (General)
QH Natural history
SD Forestry
spellingShingle Q Science (General)
QH Natural history
SD Forestry
Chiaverini, Luca
Macdonald, David W.
Hearn, Andrew J.
Kaszta, Zaneta
Ash, Eric
Bothwell, Helen M.
Can, Ozgun Emre
Channa, Phan
Clements, Gopalasamy Reuben *
Haidir, Iding Achmad
Kyaw, Pyae Phyoe
Moore, Jonathan H.
Rasphone, Akchousanh
Tan, Cedric Kai Wei
Cushman, Samuel A.
Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
description Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF.
format Article
author Chiaverini, Luca
Macdonald, David W.
Hearn, Andrew J.
Kaszta, Zaneta
Ash, Eric
Bothwell, Helen M.
Can, Ozgun Emre
Channa, Phan
Clements, Gopalasamy Reuben *
Haidir, Iding Achmad
Kyaw, Pyae Phyoe
Moore, Jonathan H.
Rasphone, Akchousanh
Tan, Cedric Kai Wei
Cushman, Samuel A.
author_facet Chiaverini, Luca
Macdonald, David W.
Hearn, Andrew J.
Kaszta, Zaneta
Ash, Eric
Bothwell, Helen M.
Can, Ozgun Emre
Channa, Phan
Clements, Gopalasamy Reuben *
Haidir, Iding Achmad
Kyaw, Pyae Phyoe
Moore, Jonathan H.
Rasphone, Akchousanh
Tan, Cedric Kai Wei
Cushman, Samuel A.
author_sort Chiaverini, Luca
title Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_short Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_full Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_fullStr Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_full_unstemmed Not seeing the forest for the trees: Generalised linear model out-performs random forest in species distribution modelling for Southeast Asian felids
title_sort not seeing the forest for the trees: generalised linear model out-performs random forest in species distribution modelling for southeast asian felids
publisher Elsevier
publishDate 2023
url http://eprints.sunway.edu.my/2717/
https://doi.org/10.1016/j.ecoinf.2023.102026
_version_ 1804069662129913856
score 13.211869