User:Pskirann/gsoc-2014-proposal
Personal Information
Name : Kiran PS
Email : pskirann@gmail.com
Phone : 9495588058
Blog :recursivelooper.blogspot.com
Freenode IRC Nick: kiranps
University : Calicut University
Current Education : 3rd year Computer Science and Engineering , Government Engineering College, Thrissur
Why Swathanthra Malayalam Computing?
To make a good contribution to open source, support of a local community is need.The community provides a platform to work with experienced people. We already have entered into a Digital world, digitalization in this field is necessary for the survival of language.So i think smc is best option to work with.
Past involvement with the Swathanthra Malayalam Computing
I was a volunteer when SMC celebrated 'vyazhavattom' at Thrissur kerala sahithya academy
Did you participate with the past GSoC programs ?
No
Do you have other obligations between May and August ?
University exam during may(5 days),I can easily catch up with it in further weeks
Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?
Varnam project and building word corpus
Why should we choose you over other applicants?
I have good skills in C, C++, python, java, golang, Qt.I have researched about developing linguistic corpa, which helped me to realise the need of buiding a text corpus.I'm already familair with code of varnam-bus and libvarnam.I fixed some bugs in libvarnam and varnam-ibus.Added Datuk Corpus and silpa spell checker dictionary to varnam.
Proposal Description
Overview
Varnam is a predictive transliterator for indian languages. Varnam has a self learning program called ilibvarnam, it can learn words as you type using an input tool(varnam-ibus). It stores the words it has learned, and makes use of this knowledge to provide suggestions while transliterating. Another way by which varnam learns new words is by feeding the word corpus to learning system, more data that you feed, the better it will be. The aim of the project is to
1) Upload the personal learnings from the input tool to the online word corpus
2) Download the new releases of word corpus and feed it to the learning system
The need i believe it fulfills
Unlike other transliterators varnam has got a prediction and learninig system. In my view varnam is not only learning word but also the user, which helps to increase the typing speed of the user.
Word corpus syncing feaure will be helful in many ways
1) New learnings from the input tool will be helful to build online corpus
2) Currently when a new word corpus is released, user have to manually feed the traininig data.Sync feature can overcome this problem
3) Users will able to use their the personal learnings from multiple devices and platforms
4) Users with similar learnings can be categorised together
5) Sync works a backup of their personal learning
6) Online corpus helps to track how new words and senses are emerging, as well as spotting other trends in usage, spelling.
7) Privacy of users will be considered, Before uploading user will given an option to remove unwanted words from local corpus
Implementation
Varnam uses sqllite for save the learning from command-line and varnam-ibus. Each record in sqlite will have a timestamps,mentioning when i it created. Newly added record to learning file will be selected on the basis of the timestamps and date when last sync was performed,using sql query.It will be uploaded the online corpus. User will be give an interface to remove unwanted word and an option to manually train their favorite words. Sync will be performed using curl or rsync.User will be notified when new word corpus is released.
Rough Timeline
May 19 - May 31 : Analyze the the current learning database
Jun 1 - Jun 7 : Redesigning the learning database to make it for uploading
Jun 7 - Jun 15 : Designing the online database of corpus
Jun 15 - Jun 25 : Create program for uploading
Jul 25 - Jul 5 : Create program for downloading
Jul 5 - Jul 15 : Create Gui using QT
Jul 15 - Jul 25 : Testing
Jul 25 - Aug 5 : Fix all existing bugs.
Aug 5 - Aug 18 : Review code and database. also fix bugs and get ready for final evaluation
Something I have created
I have created an application scroll the page using face.
Communication with Mentor
I have communicated with Navaneeth K N , about the project and he was very helpful.He answered almost every question i asked.
Reference
facescroll : [1]
Varnamproject - Bugs: bug #40510 : [2]