Acessibilidade / Reportar erro

Data processing to remove outliers and inliers: A systematic literature study1 1 Research developed at Universidade Estadual do Oeste do Paraná, Cascavel, PR, Brazil

Processamento de dados para remoção de pontos outliers e inliers: Estudo sistemático da literatura

ABSTRACT

Outliers and inliers often arise during sample data acquisition. While outliers represent anomalous observations, inliers are erroneous data points within the main body of the dataset. It was aimed to conduct a systematic literature study (SLS) to survey methods and software employed for outlier and inlier removal, particularly within exploratory data analysis. The study was conducted in three phases: (i) systematic literature mapping (SLM), (ii) snowballing (SB), and (iii) SLR. Initially, 772 scientific studies were identified, subsequently narrowed down to 86 after applying selection criteria. Backward (BSB) and forward (FSB) snowballing further yielded 16 studies, resulting in a final pool of 102 studies for analysis. It was identified three outlier removal techniques (Chebyshev’s inequality, boxplot, and principal component analysis), one inlier removal technique (local Moran’s index), and thirteen commonly used software.

Key words:
exploratory analysis; precision agriculture; local Moran’s index; data cleaning

HIGHLIGHTS:

Sample data acquisition often exhibits outliers and inliers.

Outlier and inlier techniques include Chebyshev’s inequality, boxplot, principal component anal-ysis, and local Moran’s index.

This systematic literature search effectively identified and filtered the research papers.

Unidade Acadêmica de Engenharia Agrícola Unidade Acadêmica de Engenharia Agrícola, UFCG, Av. Aprígio Veloso 882, Bodocongó, Bloco CM, 1º andar, CEP 58429-140, Campina Grande, PB, Brasil, Tel. +55 83 2101 1056 - Campina Grande - PB - Brazil
E-mail: revistagriambi@gmail.com