SILPA project has a spellchecker written using python with a not so simple algorithm. But still it is not capable of handling inflection and agglutination
happening in Indian languages especially south indian languages. The dictionary we have for Malayalam spellchecker is having 150000 words. Of course we can expand the dictionary, but that has no much value since words can be formed in Malayalam or Tamil etc by joining multiple words. In addition to that, words get inflected based on grammar forms(sandhi), plural, gender etc. Hunspell has a system to handle this, but so far nobody succeeded in getting it working for multi level suffix stripping as required for Malayalam. Some times a malayalam word can be formed by more than 5 words joining together. We will need a word splitting logic or a table taking care of all patterns. The project is to attempt solving this inside hunspell. If that is not feasible(hunspell upstream is not active), develop and algorithm and implement it.
'''Expertise required''': Basic understanding of grammar system of
atleast one Indian language
'''Mentor''' : Santhosh Thottingal