The objective of this work was to evaluate different strategies for the prediction of soil class distribution on digital soil maps of areas without reference data, in the sedimentary basin of San Francisco, in the north of the state of Minas Gerais, Brazil. The strategies included: Taxonomic generalization, training by field observations, training set expansion, and the use of different data mining algorithms. Four matrices were developed, differentiated by the volume of data for machine learning and by soil taxonomic levels to be predicted. The performance of the machine learning algorithms-Random Forest, J48, and MLP-, associated with discretization, class balancing, variable selection, and expansion of the training set was evaluated. Class balancing, variable discretization by equal frequencies, and the Random Forest algorithm showed the best performances. The representativeness extension of field observations, that assumes a larger training area, brought no predictive gain. Soil taxonomic generalization to the suborder level reduces the fragmentation of mapped polygons and improves the accuracy of digital soil maps. When generated by training on in situ soil observations at the mapping area, digital soil maps are as accurate as those trained on preexistent maps.
CITATION STYLE
Dias, L. M. da S., Coelho, R. M., Valladares, G. S., de Assis, A. C. C., Ferreira, E. P., & da Silva, R. C. (2016). Predição de classes de solo por mineração de dados em área da bacia sedimentar do São Francisco. Pesquisa Agropecuaria Brasileira, 51(9), 1396–1404. https://doi.org/10.1590/S0100-204X2016000900038
Mendeley helps you to discover research relevant for your work.