‘Code-switching in Indic Speech synthesisers’
Date20th Mar 2020
Time08:30 PM
Venue A M Turing Hall (BSB 361)
PAST EVENT
Details
Abstract:
Human beings are no longer confined to a state, a country or a continent. People with different linguistic backgrounds interact with each other and have thus become multilingual. It is common to find people code-switch between languages of their proficiency.
In a country like India, which has a diverse linguistic culture, code-switching is very common. Most urban Indians are simultaneously exposed to at least two languages (English and mother tongue). Migration from the native region leads to exposure to three languages, the language of the region, mother tongue and English. This results in borrowing of words from other languages
into one’s vocabulary leading to subconscious code mixing and switching. Many of the daily-used words in native languages have been replaced by their English counterparts (such as fan, switch, light, train, etc.). Conversational speech may include sentences, each of which is a mixture of words from different languages, or a mixture of sentences from different languages. Such a scenario calls for the inclusion of code-switching ability in current text-to-speech (TTS) systems to produce natural sounding speech. These systems should be capable of synthesising code-switched sentences that preserve the
original pronunciation of the words while maintaining the accent of the speaker.
In a multi-lingual text, the main challenge lies in accurately capturing the phonotactical variations at the code-switching points. Indian languages, in general, are digitally low resource. Further, the available data is mostly monolingual and code-switched data is very limited. This research aims at building code-switchable TTS systems, that can mimic the innate human ability of transitioning from language to language in an utterance, using monolingual data.
In the first part of the work, bilingual HTS-STRAIGHT (HTS: HMM-based speech synthesis systems, STRAIGHT: Speech Transformation and Representation using Adaptive Interpolation weiGHTed spectrum) systems are trained after segmentation of the data using signal processing cues in an HMM-DNN framework. Experiments are done on 3 language pairs (Hindi+English, Tamil+English and Hindi+Tamil). Degradation mean opinion scores (DMOS) for monolingual sentences show marginal degradation over those of an equivalent monolingual TTS system, while the DMOS for bilingual sentences is significantly better than that of the corresponding monolingual TTS systems.
The second part of the work focuses on building code-switchable systems in the end-to-end framework. A phone-based approach that leverages the similarities among the pooled languages is proposed. Lack of data makes system building cumbersome in the end-to-end framework even for monolingual systems. Two phone-based approaches are explored in the context of building monolingual TTSes. The better among these is used to build a bilingual system for Hindi+English. DMOS scores show that the aforementioned system synthesizes bilingual text reasonably well.
Thanks and Regards,
Anju
Speakers
ANJU LEELA THOMAS, CS17S032
Computer Science and Engineering