Acessibilidade / Reportar erro

Advancing Gene Expression Data Analysis: an Innovative Multi-objective Optimization Algorithm for Simultaneous Feature Selection and Clustering

Abstract

Clustering algorithms play a crucial role in identifying co-expressed genes in microarray data, while feature subset identification is equally important when dealing with large data matrices. In this research paper, we address the problem of simultaneous feature selection and gene expression data clustering within a multi-objective optimization framework. Our approach employs the Archived multi-objective simulated annealing (AMOSA) algorithm to optimize a multi-objective function that incorporates two internal validity indices and a feature weight index. To determine data point membership in different clusters, we utilize a point symmetry-based distance metric. We demonstrate the effectiveness of our proposed approach on three publicly available gene expression datasets using the Silhouette index. Furthermore, we compare the clustering results of our approach, unsupervised feature selection and clustering using Multi-objective optimization framework (UFSC-MOO), to nine other existing techniques, showing its superior performance. Statistical significance is confirmed through Wilcoxon Rank Sum test. Also, biological significance test is employed to show that the obtained clustering solutions are biologically enriched.

Keywords:
Gene expression data Clustering; Feature selection; Point symmetry based distance; AMOSA; Cluster validity index; Feature weight index

HIGHLIGHTS

A novel multi-objective optimization algorithm is proposed for simultaneous feature selection and gene data clustering.

An efficient feature selection approach is utilized for relevant feature subset in gene data clustering.

An approach based on simulated annealing is employed to simultaneously optimize three objective functions.

The obtained Clustering results are proven superior to nine other existing clustering techniques.

Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br