Acessibilidade / Reportar erro

Multiple linear regression and Random Forest model to estimate soil bulk density in mountainous regions

Abstract

The objective of this work was the development of models with different sets of data for estimating soil bulk density in tropical mountainous regions, from soil attributes commonly found in the analyses of soil profiles described in regional surveys. The complete dataset is composed of 163 samples and it was divided into six groups, of which three groups have 73 samples and the maximum of 32 covariables, and three have 163 samples and the maximum of 18 covariables. The linear regression (RLM) and randomForest (RF) models were tested. The lowest uncertainty between the models was achieved by RLM2, with R2 of 0.56, 13 covariables, and 73 samples. Considering the groups with 163 samples, the best models were the RFs with mean R2 of 0.48. The root mean squared error ranged between 0.09 and 0.14. The most important covariables in the RF model were: organic carbon, hydrogen, fine and coarse sand, base saturation, and cation exchange capacity. By the stepwise backward regression, the main covariables were: silt and clay relation; fine and coarse sand; organic carbon; base saturation; and potassium.

Index terms:
carbon stock; pedotransfer functions; data-driven models; stepwise

Embrapa Secretaria de Pesquisa e Desenvolvimento; Pesquisa Agropecuária Brasileira Caixa Postal 040315, 70770-901 Brasília DF Brazil, Tel. +55 61 3448-1813, Fax +55 61 3340-5483 - Brasília - DF - Brazil
E-mail: pab@embrapa.br