AlradSpectra: a Quantification Tool for Soil Properties Using Spectroscopic Data in R

Dotto, André Carnieletto; Dalmolin, Ricardo Simão Diniz; Caten, Alexandre ten; Gris, Diego José; Ruiz, Luis Fernando Chimelo

doi:10.1590/18069657rbcs20180263

Acessibilidade / Reportar erro

Brasil

Revista Brasileira de Ciência do Solo

Español English

Brasil

Español English

sumário « anterior atual seguinte »

Sumário

Division - Soil in Space and Time, Commission – Pedometric • Rev. Bras. Ciênc. Solo 43 • 2019 • https://doi.org/10.1590/18069657rbcs20180263 copy

AlradSpectra: a Quantification Tool for Soil Properties Using Spectroscopic Data in R

Authorship SCIMAGO INSTITUTIONS RANKINGS

ABSTRACT

Soil reflectance spectroscopy has become an innovative method for soil property quantification supplying data for studies in soil fertility, soil classification, digital soil mapping, while reducing laboratory time and applying a clean technology. This paper describes the implementation of a Graphical User Interface (GUI) using R named AlradSpectra. It contains several tools to process spectroscopic data and generate models to predict soil properties. The GUI was developed to accomplish tasks such as perform a large range of spectral preprocessing techniques, implement several multivariate calibration methods, generate statistics assessment and graphical output, validate the models using independent dataset, and predict unknown variables using soil spectral data. AlradSpectra has four main modules: Import Data, Spectral Preprocessing, Modeling, and Prediction. The implementation of AlradSpectra is demonstrated by applying visible near-infrared reflectance spectroscopy for soil organic carbon (SOC) prediction. The data contains the value of SOC and Vis-NIR reflectance for 595 soil samples. The prediction statistic assessment of SOC was performed applying all spectral preprocessing and methods. The R ² considering all models ranged from 0.54 to 0.80. In the partial least squares regression (PLSR) models, the performances were similar to multiple linear regression (MLR) and support vector machines (SVM). The lowest error in the SOC prediction was achieved by PLSR method with standard normal variate (SNV) preprocessing reaching an R ² of 0.80, the smallest root mean square error (RMSE) of 0.47 %, and ratio of performance to inter-quartile distance (RPIQ) of 3.12. The capacity of performing multiple tasks, being free and open-source, easy to operate, and requiring no initial knowledge of R programming language are features that make AlradSpectra a useful tool to perform different modeling approaches and predict the desired soil variable.

GUI; R environment; multivariate calibration; spectral preprocessing; Pedometrics

Component	R Package ⁽¹⁾	Reference
Graphical Integration	devtools	Wickham et al. (2016)Wickham H , Hester J , Chang W . devtools: Tools to Make Developing R Packages Easier . R package version 1.12.0 ; 2016 . Available from: https://rdrr.io/cran/devtools/ https://rdrr.io/cran/devtools/...
Graphical Integration	gWidgetsRGtk2	Lawrence and Verzani (2014)Lawrence M , Verzani J . gWidgetsRGtk2: Toolkit implementation of gWidgets for RGtk2 . R package version 0.0-83 ; 2014 . Available from: https://rdrr.io/cran/gWidgetsRGtk2/ https://rdrr.io/cran/gWidgetsRGtk2/...
Descriptive statistics	fitdistrplus	Delignette-Muller and Dutang (2015)Delignette-Muller ML , Dutang C . fitdistrplus: an R package for fitting distributions . J Stat Softw . 2015 ; 64 : 1 - 34 . https://doi.org/10.18637/jss.v064.i04 https://doi.org/10.18637/jss.v064.i04...
Levene’s Test	car	Fox and Weisberg (2011)Fox J , Weisberg S . An R companion to applied regression . 2nd ed. Thousand Oaks : Sage Publications ; 2011 .
Plots	ggplot2	Wickham (2009)Wickham H . ggplot2: elegant graphics for data analysis . London : Springer ; 2009 .
	graphics	R Development Core Team (2018)R Development Core Team . R: A language and environment for statistical computing . R Foundation for Statistical Computing . Vienna, Austria ; 2018 . Available from: http://www.R-project.org/ . http://www.R-project.org/...
	gridExtra	Auguie (2016)Auguie B . gridExtra: miscellaneous functions for “Grid” graphics . R package version 2.2.1 ; 2016 . Available from: https://mran.microsoft.com/snapshot/2016-11-19/web/packages/gridExtra/index.html . https://mran.microsoft.com/snapshot/2016...
Spectral Preprocessing	clusterSim	Walesiak and Dudek (2016)Walesiak M , Dudek A . clusterSim: searching for optimal clustering procedure for a data set . R package version 0.45-1 ; 2016 . Available from: https://rdrr.io/cran/clusterSim/ https://rdrr.io/cran/clusterSim/...
	pls	Mevik et al. (2016)Mevik B-H , Wehrens R , Liland KH . pls: partial least squares and principal component regression . R Packag version 2.6-0 ; 2016 . Available from: http://mevik.net/work/software/pls.html http://mevik.net/work/software/pls.html...
	prospectr	Stevens and Ramirez-Lopez (2013)Stevens A , Ramirez-Lopez L . An introduction to the prospectr package . R package Vignette ; 2013 . Available from: ftp://200.236.31.2/CRAN/web/packages/prospectr/vignettes/prospectr-intro.pdf ftp://200.236.31.2/CRAN/web/packages/pro...
Modeling and Prediction	caret	Kuhn (2017)Kuhn M . caret: classification and regression training . R package version 6.0-73 ; 2017 . Available from: http://adsabs.harvard.edu/abs/2015ascl.soft05003K http://adsabs.harvard.edu/abs/2015ascl.s...
	e1071
	e1071	Dimitriadou et al. (2017)Dimitriadou E , Hornik K , Leisch F , Meyer D , Weingessel A , Meyer D , Weingessel A . Misc functions of the department of statistics (e1071), TU Wien . R Packag version 1.6-8 . 2017 . Available from: https://www.researchgate.net/profile/Friedrich_Leisch/publication/221678005_E1071_Misc_Functions_of_the_Department_of_Statistics_E1071_TU_Wien/links/547305880cf24bc8ea19ad1d/E1071-Misc-Functions-of-the-Department-of-Statistics-E1071-TU-Wien.pdf https://www.researchgate.net/profile/Fri...
	kernlab	Karatzoglou et al. (2004)Karatzoglou A , Smola A , Hornik K , Zeileis A . kernlab - an S4 package for kernel methods in R . J Stat Softw . 2004 ; 11 : 1 - 20 . https://doi.org/10.1016/j.csda.2009.09.023 https://doi.org/10.1016/j.csda.2009.09.0...
	pls	Mevik et al. (2016)Mevik B-H , Wehrens R , Liland KH . pls: partial least squares and principal component regression . R Packag version 2.6-0 ; 2016 . Available from: http://mevik.net/work/software/pls.html http://mevik.net/work/software/pls.html...
	randomForest
		Liaw and Wiener (2002)Liaw A , Wiener M . Classification and regression by randomForest . R News . 2002 ; 2 : 18 - 22 .

Spectral Preprocessing (SP)	Summary
Method (M)
Function (F)
Package (P)
SP: Smoothing F: movav P: prospect		It is a simple moving average of a spectral data using a convolution function.

SP: Binning F: binning P: prospectr	Binning is used to reducing the effects of minor observation errors by computing average values of spectral data. To perform spectral binning, the bin size has to be specified (bin size).
SP:Absorbance F: A = log10 1/R	Absorbance is based on measuring the amount of light absorbed by a sample at a given wavelength.
SP: Detrend F: detrend P: prospectr	Detrend normalizes the spectral data by applying a standard normal variate transformation followed by fitting a second-degree polynomial regression model and returning the fitted residuals.
SP: Continuum Removal (CR) F:continuumRemoval P: prospectr	Continuum Removal remove the continuous features of the spectra and is often used to isolate specific absorption features present in the spectrum to minimize the noise. The continuum is represented by a mathematical function used to separate and highlight specific absorption bands of the reflectance spectrum.
SP: Savitzky–Golay Derivative F: savitzkyGolay P: prospect	Derivatives are performed to remove unimportant baseline signal from samples by taking the derivative of the measured responses with respect to the variable number (wavelength). The Savitzky-Golay derivatization algorithm requires selection of smoothing points (filter width), the orders of polynomial and derivative.
SP: Standard Normal Variate (SNV) F:standardNormalVariate P: prospectr	Standard Normal Variate is performed in spectral data to remove scatter. It is applied to every spectrum individually. Standard Normal Variate is designed to operate based on centering the underlying linear slope of each individual sample spectrum.
SP: Multiplicative Scatter Correction (MSC) F: msc P: pls	Multiplicative Scatter Correction is achieved by regressing a measured spectrum against a reference spectrum. The MSC is effective in minimizing baseline offsets and multiplicative effect. The outcome of MSC, in many cases, is very similar to SNV, except SNV corrects each spectrum individually and does not need the entire data set.
SP: Normalization F: data.Normalization P: clusterSim	Normalization means adjusting values measured on different scales to a common scale, where these normalized values eliminate scattering effects. Five types of normalization were included in AlradSpectra: standardization, normalization in range, quotient transformation, normalization, and normalization with zero being the central point
SP: Multiple Linear Regression (MLR) F: glmStepAIC P: caret	Multiple Linear Regression is a statistical method that uses several explanatory variables to predict the outcome of a response variable in a simple linear model (Galton, 1886). The MLR assumes the relationships between independent variables and the dependent variable are linear.
M: Partial Least Squares Regression (PLSR) F: plsr P: pls	Partial Least Squares Regression can handle complicated relationships between predictors and responses and can deal with complex modeling problems. Additionally, PLSR is a method for constructing predictive models when the factors are many and highly collinear ( Wold et al., 1984Wold S , Ruhe A , Wold H , Dunn III WJ . The collinearity problem in linear regression. the Partial Least Squares (PLS) approach to generalized inverses . SIAM J Sci Stat Comp . 1984 ; 5 : 735 - 43 . https://doi.org/10.1137/0905052 https://doi.org/10.1137/0905052... ), which is the case of hyperspectral data.
M: Support Vector Machines (SVM) F: svm P: e1071	Support Vector Machines are a group of supervised learning methods, which represent an extension to nonlinear models of generalized algorithm with the capability of training nonlinear classifiers ( Ivanciuc, 2007Ivanciuc O . Applications of support vector machines in chemistry . In: Lipkowitz KB , Cundari TR , editors . Reviews in computational chemistry . New York : John Wiley & Sons, Inc .; 2007 . vol. 23 . p. 291 - 400 . ). Associated with SVM algorithm is the criteria of smaller number of support vectors yield a better model performance ( Loosli et al., 2007Loosli G , Bottou L , Canu S . Training invariant SVMs using selective sampling . In: Bottou L , Chapelle O , DeCoste D , Weston J , editors . Large-scale kernel mach . London : The MIT Press ; 2007 . p. 301 - 20 . ).
M: Random Forest (RF) F: randomForest P: randomFores	Random Forest is a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest ( Breiman, 2001Breiman L . Random forests . Mach Learn . 2001 ; 45 : 5 - 32 . https://doi.org/10.1023/A:1010933404324 https://doi.org/10.1023/A:1010933404324... ). The RF is versatile and flexible with a small or large data set. Model interpretability is an issue when compared to linear models.
M: Gaussian Process Regression (GPR) F: gausspr P: kernla	Gaussian Process Regression is a nonparametric regression using Gaussian processes, which applies a kernel function for training and predicting. In machine learning, kernel methods are a class of algorithms for pattern analysis. This approach replaces the features (predictors) by a kernel function. Several classes of kernels can be used for machine learning, and the selection of kernel is critical to the success of these algorithms ( Karatzoglou et al., 2004Karatzoglou A , Smola A , Hornik K , Zeileis A . kernlab - an S4 package for kernel methods in R . J Stat Softw . 2004 ; 11 : 1 - 20 . https://doi.org/10.1016/j.csda.2009.09.023 https://doi.org/10.1016/j.csda.2009.09.0... ).

	Set	Observation	Minimum	Maximum	Mean	Median	Standard deviation	Skewness	Kurtosis
Soil organic carbon (%)	Whole	500	0.02	6.87	1.95	1.86	1.08	0.79	4.06
	Training	350	0.02	6.87	1.98	1.87	1.11	0.88	4.38
	Validation	150	0.21	4.69	1.86	1.84	1.00	0.46	2.62
	Prediction	95	0.34	4.83	2.18	2.19	1.07	0.24	-0.59

Method	Preprocessing	Training set			Validation set
		R²	RMSE	RPIQ	R²	RMSE ⁽¹⁾	RPIQ
				---------------------- % ---------------------			---------------------- % ---------------------
		MLR	SNV	0.84	0.43	3.24	0.80	0.51	3.20
Smoothing	0.80		0.48	3.07	0.77	0.52	2.77
Detrend	0.84		0.44	3.37	0.76	0.52	2.76
CR	0.86		0.41	3.82	0.76	0.53	2.39
Absorbance	0.84		0.43	3.58	0.76	0.53	2.59
Normalization	0.84		0.41	3.49	0.78	0.55	2.90
Original	0.80		0.48	3.00	0.72	0.59	2.68
MSC	0.85		0.40	3.50	0.75	0.61	2.70
Binning	0.63		0.65	2.18	0.57	0.71	2.19
SGD	0.74		0.56	2.75	0.54	0.72	1.73
PLSR	SNV	0.84	0.44	3.34	0.80	0.47	3.12
	Detrend	0.83	0.46	3.24	0.75	0.51	2.83
	CR	0.86	0.40	3.91	0.78	0.53	3.08
	Absorbance Normalization	0.84	0.43	3.53	0.76	0.53	2.61
	Absorbance Normalization	0.82	0.44	3.31	0.79	0.54	2.94
	Original	0.76	0.51	2.77	0.75	0.56	2.83
	MSC	0.85	0.40	3.72	0.76	0.57	2.55
	Binning	0.78	0.50	2.84	0.71	0.59	2.64
	Smoothing	0.79	0.50	2.96	0.70	0.60	2.41
	SGD	0.75	0.54	2.85	0.56	0.71	1.77
	Absorbance	0.86	0.41	3.79	0.78	0.51	2.55
SVM	SNV	0.95	0.26	5.70	0.74	0.52	2.75
	Normalization	0.94	0.26	5.81	0.75	0.53	2.48
	Original	0.80	0.48	2.98	0.74	0.56	2.81
	MSC	0.95	0.24	6.18	0.73	0.61	2.38
	Smoothing	0.80	0.51	3.04	0.68	0.62	2.03
	Binning	0.79	0.49	2.83	0.69	0.63	2.60
	Detrend	0.98	0.15	9.31	0.66	0.72	2.09
	SGD	0.99	0.10	14.16	0.53	0.77	1.93
	CR	0.99	0.10	14.27	0.61	0.85	1.61
RF	Detrend	0.67	0.66	2.26	0.67	0.57	2.50
	CR	0.73	0.58	2.70	0.69	0.60	2.13
	SGD	0.68	0.66	2.32	0.58	0.71	1.76
	Smoothing	0.38	0.89	1.73	0.44	0.71	1.79
	SNV	0.60	0.70	2.15	0.51	0.72	1.87
	MSC	0.55	0.70	2.01	0.61	0.77	2.13
	Normalization	0.55	0.70	2.05	0.60	0.79	2.01
	Binning	0.39	0.84	1.69	0.40	0.84	1.85
	Absorbance	0.40	0.84	1.83	0.37	0.85	1.60
	Original	0.40	0.82	1.72	0.43	0.86	1.88
GPR	Absorbance	0.85	0.42	3.69	0.77	0.52	2.65
	Normalization	0.93	0.27	5.31	0.76	0.57	2.77
	SNV	0.95	0.26	5.84	0.72	0.58	2.34
	Original	0.81	0.46	3.06	0.73	0.58	2.75
	MSC	0.92	0.29	4.87	0.76	0.59	2.81
	Detrend	0.97	0.21	7.11	0.69	0.60	2.38
	Binning	0.72	0.57	2.48	0.64	0.65	2.38
	Smoothing	0.80	0.50	3.07	0.65	0.65	1.94
	CR	0.99	0.11	14.08	0.61	0.73	1.75
	SGD	0.99	0.00	461.00	0.48	0.83	1.51

Sociedade Brasileira de Ciência do Solo Sociedade Brasileira de Ciência do Solo, Departamento de Solos - Edifício Silvio Brandão, s/n, Caixa Postal 231 - Campus da UFV, CEP 36570-900 - Viçosa-MG, Tel.: (31) 3612-4542 - Viçosa - MG - Brazil
E-mail: sbcs@sbcs.org.br

Acompanhe os números deste periódico no seu leitor de RSS

[1] * Corresponding author: E-mail: dalmolin@ufsm.br