User:Ar rahul/GSoC2013/

From SMC Wiki
Revision as of 22:19, 29 April 2013 by Ar rahul (talk | contribs)

Developing Acoustic and Language Model for Malayalam Recognition

Personal Information

  1. Email Address :
  2. Telephone : +919446048820
  3. University and Education : BTech in Computer Science , College of Engineering Trivandrum ( University of Kerala )

Why do you want to work with the Swathanthra Malayalam Computing ?

I think most of the technological advancements in the field of computer science is inaccessible to the majority of general public due to lack of local language support .SMC, with its slogan "എന്റെ കമ്പ്യൂട്ടറിനു് എന്റെ ഭാഷ " (my language for my computer), has always been in the forefront for the same.Malyalam being my mother tongue i believe i can contribute to the SMC community.

Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?

I have participated in a localisation camp organised by SMC . Other than that i have also actively participated in developing a GNU/Linux distribution , based on debian , aimed at students of technical courses ( .

Did you participate with the past GSoC programs, if so which years, which organizations?

No . I’am applying GSOC for the first time .

Do you have other obligations between May and August ?

No . I am confident that i can finish this project in time . I can devote 40hrs a week for this project.

Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2013 program, if yes, which area(s), you are interested in?

Yes . Speech Recognition and Artificial Intelligence is my area of interest . The scope of this project is much beyond a single SoC . My dream is to improve speech recognition engine currently available for malayalam to a level better or at least on par with English language.

Why should we choose you over other applicants?

For the past four months i have been working on a project that involved modeling a closed vocabulary acoustic model in malayalam. I have good experience working with the sphinx engine which is the speech recognition system that i am going to use in creating acoustic and language model. I have experience using sphinxtrain and cmuclmtk which are used to train acoustic model and language model respectively.I also have experience writing python scripts for automating creation of database description files such as dictionary, transcription etc. With the experience that I have I am confident of creating an acoustic model and language model for malayalam language with acceptable WER(word error rates) in time .

Proposal Description


The project aims at building an Acoustic model and Language Model for Malayalam language , which will be very useful for research and development purposes in Malayalam Speech Recognition and Processing area .

project proposal

CMU Sphinx is an open source toolkit for speech recognition developed by carnegie mellon university.It contains series of speech recognizers of which latest is sphinx4 , acoustic model trainer (sphinx train) and a statsitical language model builder (cmuclmtk). For developing a continous speech recognition system we need well trained acoustic model and language model.An acousitc model process audio recordings with their transcriptions and form statstical representations of word. A language model describes the likelihood, probability, or penalty taken when a sequence or collection of words is seen.

CMUSphinx project comes with several high-quality acoustic models and language model for language like english, french, spanish etc.

The aim of this project as a whole is to develop a high-quality acoustic model and language model for malayalam.

The initial goal of the project is creating the database required which involves :

  • Collecting voice data and making transcription for acoustic model
  • Collecting text corpora for language model

Once the database is formed we can start training the acoustic model using sphinxtrain and build language model using cmuclmtk . Although we have not applied any optimisation at this stage of the project we will have successfully created a working acoustic and language model.

Optimisations , careful selection of voice and text data that can better represent the language , can be performed at this stage/phase so that quality of the acoustic model and language model created can be improved.

  • Grapheme to phoneme converters and optimal text selection algorithm can be used to select a set phonetically rich sentences from a huge text corpus.


Technical Skills

  1. Languages : C,C++,Python,Java,Bash
  2. Software Packages : GDB , Emacs , Eclipse
  3. Embedded Platforms : Arduino , Atmel AVR , SiliconLabs CIP-

Free Software

  • Appropriate speaker selection and using data statistics can greatly improve the quality of collected acoustic data.