Acessibilidade / Reportar erro

Soil class prediction by data mining in an area of the sedimentary São Francisco basin

Abstract

The objective of this work was to evaluate different strategies for the prediction of soil class distribution on digital soil maps of areas without reference data, in the sedimentary basin of San Francisco, in the north of the state of Minas Gerais, Brazil. The strategies included: taxonomic generalization, training by field observations, training set expansion, and the use of different data mining algorithms. Four matrices were developed, differentiated by the volume of data for machine learning and by soil taxonomic levels to be predicted. The performance of the machine learning algorithms - Random Forest, J48, and MLP -, associated with discretization, class balancing, variable selection, and expansion of the training set was evaluated. Class balancing, variable discretization by equal frequencies, and the Random Forest algorithm showed the best performances. The representativeness extension of field observations, that assumes a larger training area, brought no predictive gain. Soil taxonomic generalization to the suborder level reduces the fragmentation of mapped polygons and improves the accuracy of digital soil maps. When generated by training on in situ soil observations at the mapping area, digital soil maps are as accurate as those trained on preexistent maps.

Index terms:
soil map accuracy; classification algorithms; digital soil map; predictive variables of the terrain

Embrapa Secretaria de Pesquisa e Desenvolvimento; Pesquisa Agropecuária Brasileira Caixa Postal 040315, 70770-901 Brasília DF Brazil, Tel. +55 61 3448-1813, Fax +55 61 3340-5483 - Brasília - DF - Brazil
E-mail: pab@embrapa.br