Comparative evaluation of persistence diagram vectorisation methods in classification tasks
Artykuł w czasopiśmie
MNiSW
100
Lista 2024
| Status: | |
| Autorzy: | Sulowska Dominika |
| Dyscypliny: | |
| Aby zobaczyć szczegóły należy się zalogować. | |
| Rok wydania: | 2026 |
| Wersja dokumentu: | Drukowana | Elektroniczna |
| Język: | angielski |
| Numer czasopisma: | 7 |
| Wolumen/Tom: | 20 |
| Strony: | 54 - 70 |
| Impact Factor: | 1,3 |
| Bazy: | BazTech |
| Efekt badań statutowych | NIE |
| Materiał konferencyjny: | NIE |
| Publikacja OA: | TAK |
| Licencja: | |
| Sposób udostępnienia: | Otwarte czasopismo |
| Wersja tekstu: | Ostateczna wersja opublikowana |
| Czas opublikowania: | W momencie opublikowania |
| Data opublikowania w OA: | 1 czerwca 2026 |
| Abstrakty: | angielski |
| Topological Data Analysis (TDA) enables the analysis of the geometric structure of data using tools from algebraic topology. A central technique in TDA is persistent homology, whose results are represented by persistence diagrams (PDs) describing the lifespan of topological features. Since PDs lack a natural vector-space representation, their direct use in machine learning (ML) classifiers is challenging. Therefore, several vectorisation methods have been proposed, including Persistence Image (PI), Persistence Landscapes (PL), Betti Curves (BC), and Persistence Silhouettes (PS). This study presents a comparative analysis of these vectorisation methods in classification tasks involving both synthetic and real-world datasets, using three classifiers: Logistic Regression (LR), XGBoost (XGB), and Multilayer Perceptron (MLP). Hyperparameter tuning and cross-validation were applied, and model performance was evaluated using accuracy, precision, recall, and F1-score. The results show that PI and PL consistently achieve the highest classification performance across different data types and classifiers. For synthetic datasets, these methods reached scores above 0.98, while for the ECG dataset, they outperformed alternative approaches by up to 30%. In contrast, all methods exhibited limited effectiveness on the MNIST dataset due to high geometric complexity and noise in pixel-based point cloud representations. For the ModelNet10 dataset, PI clearly outperformed other techniques, achieving scores of approximately 0.75. Overall, the results indicate that PI provides a robust and versatile topological representation for classification tasks, while PL stands out for its stability and interpretability in complex data analysis. |
