User:Koulanurag

From SMC Wiki

Project Title :Improve the learning system of Varnam

Personal Information:

Email : koulanurag@gmail.com Freenode IRC Nick : koulanurag Eductaion: BE(Computer Engineering)-University of Mumbai-Expected Graduation Year: Jul 2014 Languages: Hindi, English, Kashmiri Bio: I am a self-learner, technologically curious and a dedicated student highly interested in research and development work.My interest lies in machine Learning, NLP, pattern recognition and analyzing data. My primary sources of leaning about Machine learning and Data Science are courses from Udacity and Andrew Ng's course on Coursera. I have used python and Octave for machine learning.Also, i have done little work on stock analysis in which i used to scrap stock values from a trading website in order to predict their future values.I am a active participant of various programming competitions being conducted around.I wish to do research work in the field of machine learning/Data Science in order to learn more and get insight view of the various possibilities and applications of this beautiful field.Also, currently i am writing a research paper on Natural Language Database Interfaces.Also, I recently particpated in "TagMe-Online Machine Learning Competition" held by IIsc,Bangalore in which i held Rank 70(out of 606 teams) during the validation phase. I would love to contribute to SMC even after GSoC 2014 especially with improving the learning mechanism. Time duration: I can work full time after the first week of June. Since, i have my university exams from May 16 to Jun 3.

Proposal Description

Overview: The idea is to improve the learning ability of the varnam translator.The major constraint is that there could be many variations of spelling in English for a hindhi word and a root word of hindi can have many prefixes. In current learning approach; varnam takes all the possible prefixes into account and learn all of them to improve future suggestions.Although i am not sure But in this way, it is loosing the relation between the root word and "root word + prefix" . Hence, it has to learn "root +prefix " as a new word. In the new approach, instead of learning "root+prefix" , may be we can extract the root and prefix separately from it and then do their translation separately to get the desired word in hindi. Implementation: although i don't have any exact implementaion mapping in mind.But, i believe it will involve writing grammer rules in order to detect root words.Also, We may even apply reinforce learning for higher accuarcy rates. Timeline: (i don't have much idea)- may be 2 and 1/2weeks for understanding the exsiting system and "Porter Stemming Algorithm". After that, 3-4 weeks for implementation of Porter Stemming into Varnam. Coding skills: python(3),C++(2),Octave(2),Java(3),C(1),R(1), Ruby(1.5)---this is on a scale of 0-5; 0:novice and 5:expert I haven't yet communicated with the any mentor. SMC Wiki Link: