Language model and Acoustic model for Malayalam: Difference between revisions
From SMC Wiki
Line 11: | Line 11: | ||
==Potential Resources== | ==Potential Resources== | ||
====Malayalam Audio Books==== | ====Malayalam Audio Books==== | ||
* | * [https://soundcloud.com/kaaarvarnam/vaayichalum-vaayichalum-1 Vaayichalum Vaayichalum theeratha pusthakam -part1] | ||
* [https://soundcloud.com/kaaarvarnam/vaayichalum-vaayichalum-2 Vaayichalum Vaayichalum theeratha pusthakam - part2] | |||
* [https://soundcloud.com/kaaarvarnam/blessy Blessy] | |||
* [https://soundcloud.com/kaaarvarnam/panchatantra1 Panchathanthram] | |||
* [https://soundcloud.com/kaaarvarnam/barcode-susmeshchandroth Barcode] | |||
* [https://soundcloud.com/kaaarvarnam/higuita Higuita] | |||
* [https://soundcloud.com/kaaarvarnam/aanandamargam Aanandamargam] | |||
* [https://soundcloud.com/kaaarvarnam/kidapparasamaram-p-v-shaji kidappara samaram] | |||
===Reading Materials=== | ===Reading Materials=== |
Latest revision as of 11:19, 20 March 2014
As for Malayalam ASR not much work is reported in literature. Resources available are very less. We have to do the following
- Create a sufficient speech corpus : First of all we have to select a suitable text corpus, which is very representative. Movie conversation is not suitable. the speech sounds vary widely as the speaking style changes. It is very very difficult to make an ASR that will work with all speaking styles, since acoustic variability is so complex to model. Conversation especially varies from person to person. It is advisable to start with reading style.So there are two options - either take data from news recordings or make a recording by reading a selected text corpus. Second option is better since we can compile a very representative data base, in consultation with linguists.
- Develop acoustic model : The phones in the database has to be mapped to phonemes and we to train the acoustic model to automatically do this mapping during recognition process. So far as my knowledge goes, we don't have an extensive phonetic dictionary readily available for Malayalam. In continuous speech phonemes change its acoustic properties depending on nearby phonemes, context and position of occurrence etc. Taking all these in to consideration, in consultation with linguists and referring to already available linguistic works an acoustic model can be developed. We shall try with adaptive acoustic modeling.
- Develop language modeling : The probabilistic modeling of the occurrence of sequence of words can be done using the same text corpus.
Potential Resources
Malayalam Audio Books
- Vaayichalum Vaayichalum theeratha pusthakam -part1
- Vaayichalum Vaayichalum theeratha pusthakam - part2
- Blessy
- Panchathanthram
- Barcode
- Higuita
- Aanandamargam
- kidappara samaram
Reading Materials
Potential Mentors
- Dr. Deepa P.Gopinath
Lecturer in Electronics and Communication
Department of Electronics Engg.
College of Engineering Thiruvananthapuram
Kerala, India
Mobile- +919446583466