Stathopoulou KM1*, Georgakopoulos S2, Tasoulis S1 and Plagianakos VP1
1Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
2Department of Mathematics, University of Thessaly, Greece
Advances in computer science in combination with the next-generation sequencing have introduced a new era in biology, enabling advanced state-of-the-art analysis of complex biological data. Bioinformatics is evolving as a union field between Computer Science and biology, enabling the representation, storage, management, analysis and exploration of many types of data with a plethora of machine learning algorithms and computing tools. In this study, we used machine learning algorithms to detect differentially expressed genes between different types of cancer and showing the existence overlap to final results from RNA-sequencing analysis. The datasets were obtained from the National Center for Biotechnology Information (NCBI) resource. Specifically, dataset GSE68086 which corresponds to PMID:200068086. This dataset consists of 171 blood platelet samples collected from patients with six different tumors and healthy individuals. All steps for RNA-sequencing analysis (preprocessing, read alignment, transcriptome reconstruction, expression quantification and differential expression analysis) were followed. Machine Learningbased Random Forest and Gradient Boosting algorithms were applied to predict significant genes. The RStudio statistical tool was used for the analysis.
Differentially expressed genes; Gene expression; Machine learning; Supervised; Unsupervised; RNA-seq analysis; NGS data; Bioconductor
Stathopoulou KM, Georgakopoulos S, Tasoulis S, Plagianakos VP. Investigation of Overlap of Machine Learning Algorithms in the Final Results of RNA-seq Analysis on Gene Expression Estimation. World J Phys Rehabil Med. 2023; 7(1): 1025..