Spectroscopic measurements of soil samples are reliable because they are highly repeatable and reproducible. They characterise the samples' mineral–organic composition. Estimates of concentrations of soil constituents are inevitably less precise than estimates obtained conventionally by chemical analysis. But the cost of each spectroscopic estimate is at most one-tenth of the cost of a chemical determination. Spectroscopy is cost-effective when we need many data, despite the costs and errors of calibration. Soil spectroscopists understand the risks of over-fitting models to highly dimensional multivariate spectra and have command of the mathematical and statistical methods to avoid them. Machine learning has fast become an algorithmic alternative to statistical analysis for estimating concentrations of soil constituents from reflectance spectra. As with any modelling, we need judicious implementation of machine learning as it also carries the risk of over-fitting predictions to irrelevant elements of the spectra. To use the methods confidently, we need to validate the outcomes with appropriately sampled, independent data sets. Not all machine learning should be considered ‘black boxes’. Their interpretability depends on the algorithm, and some are highly interpretable and explainable. Some are difficult to interpret because of complex transformations or their huge and complicated network of parameters. But there is rapidly advancing research on explainable machine learning, and these methods are finding applications in soil science and spectroscopy. In many parts of the world, soil and environmental scientists recognise the merits of soil spectroscopy. They are building spectral libraries on which they can draw to localise the modelling and derive soil information for new projects within their domains. We hope our article gives readers a more balanced and optimistic perspective of soil spectroscopy and its future.
DOI:
https://doi.org/10.1111/ejss.13271
Altmetric score:
Dimensions Citation Count: