Language model and Acoustic model for Malayalam

As for Malayalam ASR not much work is reported in literature. Resources available are very less. We have to do the following


 * Create a sufficient speech corpus : First of all we have to select a suitable text corpus, which is very representative. Movie conversation is not suitable. the speech sounds vary widely as the speaking style changes. It is very very difficult to make an ASR that will work with all speaking styles, since acoustic variability is so complex to model. Conversation especially varies from person to person. It is advisable to start with reading style.So there are two options - either take data from news recordings or make a recording by reading a selected text corpus. Second option is better since we can compile a very representative data base, in consultation with linguists.


 * Develop acoustic model : The phones in the database has to be mapped to phonemes and we to train the acoustic model to automatically do this mapping during recognition process. So far as my knowledge goes, we don't have an extensive phonetic dictionary readily available for Malayalam. In continuous speech phonemes change its acoustic properties depending on nearby phonemes, context and position of occurrence etc. Taking all these in to consideration, in consultation with linguists and referring to already available linguistic works an acoustic model can be developed. We shall try with adaptive acoustic modeling.


 * Develop language modeling : The probabilistic modeling of the occurrence of sequence of words can be done using the same text corpus.

Malayalam Audio Books

 * Vaayichalum Vaayichalum theeratha pusthakam -part1
 * Vaayichalum Vaayichalum theeratha pusthakam - part2
 * Blessy
 * Panchathanthram
 * Barcode
 * Higuita
 * Aanandamargam
 * kidappara samaram

Potential Mentors

 * Dr. Deepa P.Gopinath Lecturer in Electronics and Communication Department of  Electronics Engg. College of Engineering Thiruvananthapuram   Kerala, India  Mobile- +919446583466