Speech synthesis HLT

noun

a broader field of artificially generating spoken human language. It refers to the computational task of producing speech from non-auditory input, such as text, lip movements, or symbolic linguistic data. This technology enables computer simulation of the human voice and is used to translate information into audible speech for applications including accessibility (e.g., screen readers for the visually impaired), mobile communication systems, and various human-computer interfaces. The field encompasses several technical approaches, such as text-to-speech (TTS), concatenative synthesis, and formant synthesis, and is conceptually complementary to speech recognition.

[Глоссариум по искусственному интеллекту, 2024]

syn.: speech generation

Examples of speech synthesis in a sentence:

We used speech synthesis technology to synthesize the sentence “Have you eaten yet” in Mandarin and Cantonese, respectively, then we manually annotated the sound boundaries and Pinyin on a character level.

[Li et al., Artificial Intelligence Review, 2024]

The rule-based approach, reusing many components from other parts of the GiellaLT infrastructure, also means that high quality speech synthesis is within reach for most language communities.

[Wiechetek et al, 2022]