From SMC Wiki

User:Ar rahul/GSoC2013/

2 bytes added, 05:33, 26 January 2017
Reverted edits by Sperminator (talk) to last revision by Ar rahul
:*ഫ ( ph’a ) pronounced differently in ഫലം and ഫാന്‍ . ന (na) (Nasal dental and Nasal alveolar) is pronounced differently even though the grapheme notation is same (eg. നനക്കുക (nan’naykkuka). phonological rules have been applied manually and edited the dictionary
:*In continuous speech, word boundaries are also challenging. For instance, the word "thalasthanam” (തലസ്ഥാനം ) can be misconstrued as "thala sthanam” (തല സ്ഥാനം ).
:*Articulation of certain phonemes are context dependent .For eg: the words ബലം and ജലം are pronounced as ബെലം and ജെലം respectively .
:*The prosody of spoken Malayalam makes it difficult to correctly identify the sound units.(Phonemes).
In order to address these language specific issues of Malayalam speech recognition we need to have a working acoustic model and language model, which is unfortunately not available or in naive state for Malayalam language. Our aim is to develop a working acoustic and language model and thereafter address language specific issues one by one as possible in the limited time constraint.  
# Language data is the key ingredient in terms of research and development in the area of language technology. The data ( speech corpora and text corpora ) collected for this project will be made publicly available for public for future works.
# High quality acoustic model and language model for Malayalam with low WER(word error rate) will be developed which can be used for research and development purposes in Malayalam Speech Recognition and Processing area .
# Acoustic and Language model developed can be used by programmers/developers directly to create solutions to many existing problems that need speech recognition in local language.