Anonymous

Changes

From SMC Wiki

GSoC/2013/Project ideas

77 bytes added, 21:08, 2 April 2013
SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination occurring in Indian languages especially south Indian languages. The dictionary we have for Malayalam spellchecker have about 150000 words. Of course we can expand the dictionary, but that doesn't have much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a Malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this with hunspell. If that is not feasible(hunspell upstream is not active), develop an algorithm and implement it.
* '''[https://savannah.nongnu.org/task/index.php?12558 Savannah Task]'''* '''Expertise required''': Basic understanding of grammar system of at least one Indian language
* '''Mentor''' : Santhosh Thottingal
==Indic rendering support in ConTeXt==