Baza Publikacji Pracowników Politechniki Lubelskiej

Status:
Autorzy:	Powroźnik Paweł, Skublewska-Paszkowska Maria
Dyscypliny:
	Aby zobaczyć szczegóły należy się zalogować.
Rok wydania:	2025
Wersja dokumentu:	Drukowana \| Elektroniczna
Język:	angielski
Numer czasopisma:	4
Wolumen/Tom:	21
Strony:	110 - 126
Scopus® Cytowania:	0
Bazy:	Scopus \| BazTech \| Central & Eastern European Academic Source (CEEAS) \| CNKI Scholar (China National Knowledge Infrastucture) \| DOAJ (Directory of Open Access Journals) \| EBSCO \| ERIH PLUS \| Index Copernicus \| J-Gate
Efekt badań statutowych	NIE
Materiał konferencyjny:	NIE
Publikacja OA:	TAK
Licencja:
Sposób udostępnienia:	Otwarte czasopismo
Wersja tekstu:	Ostateczna wersja opublikowana
Czas opublikowania:	W momencie opublikowania
Data opublikowania w OA:	31 grudnia 2025
Abstrakty:	angielski
	Speech emotion recognition has been gaining importance for years, but most of the existing models are based on a single signal representation or conventional convolutional layers with a large number of parameters. In this study, we propose a compact multi-representation architecture that combines four images of the speech signal: spectrogram, MFCC features, wavelet scalogram, and fuzzy transform maps. Furthermore, the application of Kronecker convolution for efficient feature extraction with an extended receptive field is shown. Another novelty is cross-fusion, a mechanism that models interactions between branches without significantly increasing complexity. The core of the network is complemented by a transformer-based block and language-independent adversarial learning. The model is evaluated in a scenario of quadruple cross-lingual tests covering four data corpora for four languages: English, German, Polish and Danish. It is trained on three languages and tested on the fourth, achieving a weighted accuracy of 96.3%. In addition, the influence of selected activation functions on the classification quality is investigated. Ablation analysis shows that removing the Kronecker convolution reduces the efficiency by 5.6%, and removing the fuzzy transform representation by 4.7%. The obtained results indicate that the combination of Kronecker convolution, multi-channel fusion, and adversarial learning is a promising direction for building universal, language-independent emotion recognition systems.

Informacja o cookies

K4F-Net: Lightweight multi-view speech emotion recognition with Kronecker convolution and cross-language robustness

Artykuł w czasopiśmie