CIFOR–ICRAF publishes over 750 publications every year on agroforestry, forests and climate change, landscape restoration, rights, forest policy and much more – in multiple languages.

CIFOR–ICRAF addresses local challenges and opportunities while providing solutions to global problems for forests, landscapes, people and the planet.

We deliver actionable evidence and solutions to transform how land is used and how food is produced: conserving and restoring ecosystems, responding to the global climate, malnutrition, biodiversity and desertification crises. In short, improving people’s lives.

Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors

Export citation

Geostatistics and machine learning have been extensively applied for modelling and predicting the spatial distribution of continuous soil variables. In addition to providing predictions, both techniques quantify the uncertainty associated with the predictions, although geostatistics is more developed in this respect. Despite the increased use of these techniques, most algorithms ignore that the soil measurements are not error-free. Recently, concern has also arisen about the extrapolation risk of these techniques, be it in geographic space, feature space, or both. In this paper, regression kriging (RK) and random forest (RF) were compared with respect to their ability to deliver accurate predictions and quantify prediction uncertainties, while accounting for measurement errors in the soil data. The sensitivity of results of both models to soil measurement errors was also evaluated, as well as their spatial extrapolation potential. This was done for a case study in Cameroon where soil pH, clay and organic carbon were mapped from measurements obtained using both conventional and proximal soil sensing methods. The results showed that both models produced comparable ranges and maps of predicted values for the soil properties of interest. Compared to RF, RK outperformed RF by presenting generally a higher Model Efficiency Coefficient (MEC), lower Root Mean Squared Error (RMSE) values and better extrapolation performance. The improvement in RMSE was about 10, 12 and 2 % while the improvement in MEC was on average 5, 22 and 1 % for pH, clay and SOC, respectively Overestimation of the local uncertainty observed for RK was larger than that of RF as shown by accuracy plots, indicating that prediction uncertainties were better quantified by the RF model. Better extrapolation performance was obtained with RK that derived better predictions than RF at unsampled locations as shown by cross-validation metrics and scatter plots, particularly when RK and RF were used for spatial extrapolation. The effects of incorporating measurement errors were not significant both for the predictions and for the prediction uncertainties due to the fact that most calibration data had the same measurement error variance. Model comparison should go beyond common validation metrics that only evaluate prediction accuracy but must also account for the ability to quantify prediction uncertainty at unsampled locations.

DOI:
https://doi.org/10.1016/j.geoderma.2022.116192
Altmetric score:
Dimensions Citation Count:

    Publication year

    2021

    Authors

    Takoutsing, B.; Heuvelink, G.B.

    Language

    English

    Keywords

    soil, mapping, measurement, statistical models, soil properties

    Geographic

    Cameroon

Related publications