|
In the rapidly evolving field of multi-omics data analysis, understanding the stability of
feature selection is critical for reliable biomarker discovery and clinical applications. This study
investigates the stability of feature-selection methods across various cancer types by utilizing 15
datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature
selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression,
each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-
validation, we measured feature-selection stability and assessed the accuracy of predictions regarding
TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers
demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher
regularization (fewer selected features), while lower regularization generally resulted in decreased
stability across all omics layers. Our findings indicate differences in feature stability across the various
omics layers; mirna consistently exhibited the highest stability across classifiers, while the mutation
and rna layers were generally less stable, particularly with lower regularization. This work highlights
the importance of careful feature selection and validation in high-dimensional datasets to enhance
the robustness and reliability of multi-omics analyses.
|