A convolutional neural network-driven model with adaptive feature fusion for Polish national dance music recognition
Artykuł w czasopiśmie
MNiSW
100
Lista 2024
| Status: | |
| Autorzy: | Chwaleba Kinga, Wach Weronika |
| Dyscypliny: | |
| Aby zobaczyć szczegóły należy się zalogować. | |
| Rok wydania: | 2026 |
| Wersja dokumentu: | Drukowana | Elektroniczna |
| Język: | angielski |
| Numer czasopisma: | 1 |
| Wolumen/Tom: | 20 |
| Strony: | 354 - 372 |
| Scopus® Cytowania: | 1 |
| Bazy: | Scopus | BazTech |
| Efekt badań statutowych | NIE |
| Materiał konferencyjny: | NIE |
| Publikacja OA: | TAK |
| Licencja: | |
| Sposób udostępnienia: | Otwarte czasopismo |
| Wersja tekstu: | Ostateczna wersja opublikowana |
| Czas opublikowania: | W momencie opublikowania |
| Data opublikowania w OA: | 21 listopada 2025 |
| Abstrakty: | angielski |
| Mel spectrograms have been widely applied in music identification, often yielding successful results when com- bined with well-known pre-trained classification methods such as VGG16, DenseNet121, or ResNet50. However, the acquired performance may still be improved by employing fusion techniques and proposing a dataset consist- ing of more samples, which generally demonstrate superior results. Thus, a novel approach employing these meth- ods with the formerly pre-trained classifiers has been introduced. The core innovation of our study is feature fusion utilizing Mel spectrograms, spectrograms, scalograms, and Mel-Frequency Cepstral Coefficients plots, created based on audio recordings from the created dataset encompassing Polish national dance music. The adaptive model is suggested as a mechanism adjusting the highly relevant features for Polish national dance music identification. Furthermore, the use of SHapley Additive exPlanations makes it possible to visualize which parts of the input fea- ture maps are crucial to the model fusion decisions. Subsequently, the most prevalent classification metrics were employed including accuracy, precision, recall, and F1-score to compare the obtained results with state-of-the-art. Hence, the present method yields highly satisfactory results, exceeding 94% accuracy. Consequently, this study not only sets a new benchmark for Polish national dance recognition but also underscores the broader potential of multi-representation fusion as a general blueprint for next-generation audio classification systems. |
