From SMC Wiki

GSoC/2013/Project ideas

1,054 bytes added, 06:40, 31 March 2013
== A spell checker for Indic language that understands inflections ==
(Santhosh will explain this)
'''Expertise required''':
SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination happening in Indian languages especially south indian languages. The dictionary we have for Malayalam spellchecker is having 150000 words. Of course we can expand the dictionary, but that has no much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this inside hunspell. If that is not feasible(hunspell upstream is not active), develop and algorithm and implement it. '''Expertise required''': Basic understanding of grammar system of atleast one Indian language '''Mentor''' : Santhosh Thottingal 
== Improving the webfonts module in Silpa using jquery.webfonts and proving more Indic and complex fonts as part of it. ==