User:Jaseem/spellcheck

= Malayalam Spell-checker =

Problem
English dictionaries "rely on complete lists of full word forms, a requirement that cannot be met for morphologically complex languages" like Malayalam. Theoretically, In Malayalam agglutination of unlimited words can happen. Generally less than 10. Handling agglutinations and inflections in a spell-checker can be challenging.

Refer http://thottingal.in/documents/MalayalamComputingChallenges.pdf

Other Challenges

 * Homophonic root words can have difference inflections
 * മറക്കുക & മറയുക; പറയുക & പറക്കുക
 * Same word can inflect differently in same context (not common)
 * പോവുക, പോകുക
 * Sandhi rules are complex.

Hunspell
Hunspell has an algorithm for figuring out agglutination. Need to figure out how to use it.

Implementation in other languages
Spell Checking an Agglutinative Language: Quechua http://www.zora.uzh.ch/52921/1/ltc-106-rios.pdf Quechua, doesn't seem to have the complexity that malayalam sandhi's have. The automaton presented in the paper doesn't seem to work on malayalam.
 * kachichasqa= kachi + cha +sqa

http://www.cmpe.boun.edu.tr/~akin/papers/spelling_checking_in_Turkish.pdf

http://arxiv.org/pdf/cmp-lg/9410004.pdf

http://www.ldcil.org/up/conferences/morph/presentations/Vijay%20[Compatibility%20Mode].pdf http://www.cse.iitb.ac.in/~pb/papers/cicling12-stemming.pdf
 * Stemmer: For finding root words