Baza Publikacji Pracowników Politechniki Lubelskiej

Status:
Autorzy:	Oralbekova Dina, Mamyrbayev Orken, Azarova Larysa E., Kurmetkan Turdybek, Gordiichuk Halyna, Zhumazhan Nurdaulet, Sawicki Daniel
Dyscypliny:
	Aby zobaczyć szczegóły należy się zalogować.
Wersja dokumentu:	Drukowana \| Elektroniczna
Język:	angielski
Strony:	1 - 8
Scopus® Cytowania:	0
Bazy:	Scopus
Efekt badań statutowych	NIE
Materiał konferencyjny:	TAK
Nazwa konferencji:	Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2025
Skrócona nazwa konferencji:	SPIE-IEEE-PSP 2025
URL serii konferencji:	LINK
Termin konferencji:	3 lipca 2025 do 4 lipca 2025
Miasto konferencji:	Lublin
Państwo konferencji:	POLSKA
Publikacja OA:	NIE
Abstrakty:	angielski
	This paper explores the impact of various synthetic data generation methods on the performance of speech separation and diarization models. Three approaches are considered: simple audio track overlay, synthetic dialogue generation, and acoustic condition modeling. To evaluate their effectiveness, we used Conv-TasNet for speech separation and EEND-Conformer for diarization, both trained on a 400-hour Kazakh speech corpus. Experiments demonstrated that synthetic data can significantly enhance model performance when adapting to low-resource languages. The most effective method was synthetic dialogue generation, yielding results close to those obtained with real data for both speech separation and diarization. In contrast, acoustic condition modeling showed the highest deviations, indicating the need for further refinement. The findings confirm the potential of synthetic data for speech processing tasks. The proposed methods can improve the performance of automatic speech recognition models in scenarios with limited labeled data and challenging acoustic environments.

Informacja o cookies

Synthetic data generation for Kazakh speech separation and diarization based on the use of neural networks

Fragment książki (Rozdział monografii pokonferencyjnej)