Acessibilidade / Reportar erro

Performances of several machine learning algorithms and of logistic regression to predict Fasciola hepática in cattle

Desempenho de vários algoritmos de aprendizado de máquina e regressão logística para prever Fasciola hepatica em bovinos

Abstract

The objective of this work was to compare the performances of logistic regression and machine learning algorithms to predict infection caused by Fasciola hepatica in cattle. A dataset on 30,151 bovines from Uruguay was used. Logistic regression (LR) and the algorithms k-nearest neighbor (KNN), classification and regression trees (CART), and random forest (RF) were compared. The interquartile range (IQR) and z-score were used to improve the classification and compared to each another. Sex, age, carcass conformation score, fat score, productive purpose, and carcass weight were used as independent variables for all algorithms. Infection by F. hepática was used as a binary dependent variable. The accuracies of LR, KNN, CART, and RF were 0.61, 0.57, 0.57, and 0.58, respectively. The variable importance of LR showed that adult cattle tended to be infected by F. hepatica. All models showed low accuracy, but LR successfully distinguished variables related to F. hepatica. Both the IQR and z-score show similar results in improving the classification metrics for the used dataset. In the dataset, data related to climate or factors such as body weight can improve the reliability of the model in future studies.

Index terms:
Fasciola hepatica ; classification; data mining; fluke; machine learning

Embrapa Secretaria de Pesquisa e Desenvolvimento; Pesquisa Agropecuária Brasileira Caixa Postal 040315, 70770-901 Brasília DF Brazil, Tel. +55 61 3448-1813, Fax +55 61 3340-5483 - Brasília - DF - Brazil
E-mail: pab@embrapa.br