Articles | Open Access |

IMPROVING SPEECH NATURALNESS IN UZBEK TEXT-TO-SPEECH USING DEEP LEARNING-BASED PROSODY MODELING

Yuldasheva Umida Husniddin qizi , Samarkand Branch of Tashkent University of Information Technologies

Abstract

Speech naturalness is one of the most critical challenges in text-to-speech (TTS) systems, especially for low-resource languages such as Uzbek. While recent advances in deep learning have significantly improved the intelligibility of synthesized speech, achieving natural prosody—including appropriate intonation, rhythm, stress, and timing—remains a complex problem. This study focuses on improving speech naturalness in Uzbek TTS systems through deep learning-based prosody modeling. The paper analyzes existing approaches to prosody modeling, discusses the linguistic characteristics of the Uzbek language that affect prosodic patterns, and proposes the integration of neural network-based methods to capture expressive and natural speech features. The findings highlight the potential of deep learning architectures to enhance the quality and naturalness of Uzbek speech synthesis and contribute to the development of more human-like TTS systems.

Keywords

Text-to-speech, Uzbek language, speech naturalness, prosody modeling, deep learning, neural networks, speech synthesis.

References

Taylor, P. (2009). Text-to-Speech Synthesis. Cambridge University Press.

Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.

Wang, Y., et al. (2017). Tacotron: Towards end-to-end speech synthesis. Proceedings of Interspeech.

Jumanazar o'g'li, B. J. SOCIO-PSYCHOLOGICAL CHARACTERISTICS OF THE FORMATION OF SOCIAL INSTITUTIONS IN STUDENTS.

Oord, A. V. D., et al. (2016). WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

Skerry-Ryan, R., et al. (2018). Towards end-to-end prosody transfer for expressive speech synthesis. Proceedings of ICML.

Shen, J., et al. (2018). Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. ICASSP.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

IMPROVING SPEECH NATURALNESS IN UZBEK TEXT-TO-SPEECH USING DEEP LEARNING-BASED PROSODY MODELING. (2026). International Journal of Artificial Intelligence, 6(02), 465-468. https://www.academicpublishers.org/journals/index.php/ijai/article/view/10758