Acessibilidade / Reportar erro

Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Abstract

DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.

Keywords:
Epigenomics; Big data; Workflow scheduling; Heterogeneous computing; Makespan minimization.

HIGHLIGHTS

• Analysis of different scheduling techniques for Epigenomics computation.

• An efficient algorithm that minimize computation time of Epigenomics Application.

• Performance analysis in High Performance Computing (HPC) environment.

• Shows better performance over state-of-art algorithms.

Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br