Synthetic data generation for Kazakh speech separation and diarization based on the use of neural networks
Fragment książki (Rozdział monografii pokonferencyjnej)
MNiSW
20
Poziom I
| Status: | |
| Autorzy: | Oralbekova Dina, Mamyrbayev Orken, Azarova Larysa E., Kurmetkan Turdybek, Gordiichuk Halyna, Zhumazhan Nurdaulet, Sawicki Daniel |
| Dyscypliny: | |
| Aby zobaczyć szczegóły należy się zalogować. | |
| Wersja dokumentu: | Drukowana | Elektroniczna |
| Język: | angielski |
| Strony: | 1 - 8 |
| Scopus® Cytowania: | 0 |
| Bazy: | Scopus |
| Efekt badań statutowych | NIE |
| Materiał konferencyjny: | TAK |
| Nazwa konferencji: | Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2025 |
| Skrócona nazwa konferencji: | SPIE-IEEE-PSP 2025 |
| URL serii konferencji: | LINK |
| Termin konferencji: | 3 lipca 2025 do 4 lipca 2025 |
| Miasto konferencji: | Lublin |
| Państwo konferencji: | POLSKA |
| Publikacja OA: | NIE |
| Abstrakty: | angielski |
| This paper explores the impact of various synthetic data generation methods on the performance of speech separation and diarization models. Three approaches are considered: simple audio track overlay, synthetic dialogue generation, and acoustic condition modeling. To evaluate their effectiveness, we used Conv-TasNet for speech separation and EEND-Conformer for diarization, both trained on a 400-hour Kazakh speech corpus. Experiments demonstrated that synthetic data can significantly enhance model performance when adapting to low-resource languages. The most effective method was synthetic dialogue generation, yielding results close to those obtained with real data for both speech separation and diarization. In contrast, acoustic condition modeling showed the highest deviations, indicating the need for further refinement. The findings confirm the potential of synthetic data for speech processing tasks. The proposed methods can improve the performance of automatic speech recognition models in scenarios with limited labeled data and challenging acoustic environments. |