User:Pskirann/gsoc-2014-proposal

From SMC Wiki
Revision as of 13:04, 19 March 2014 by Tachyons (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Personal Information

Name : Kiran PS

Email : pskirann@gmail.com

Phone : 9495588058

Blog :recursivelooper.blogspot.com

Freenode IRC Nick: kiranps

University : Calicut University

Current Education : 3rd year Computer Science and Engineering , Government Engineering College, Thrissur

Why Swathanthra Malayalam Computing?

To make a good contribution to open source, support of a local community is need.The community provides a platform to work with experienced people. We already have entered into a Digital world, digitalization in this field is necessary for the survival of language.So i think smc is best option to work with.

Past involvement with the Swathanthra Malayalam Computing

I was a volunteer when SMC celebrated 'vyazhavattom' at Thrissur kerala sahithya academy

Did you participate with the past GSoC programs ?

No

Do you have other obligations between May and August ?

University exam during may(5 days),I can easily catch up with it in further weeks

Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?

Varnam project and building word corpus

Why should we choose you over other applicants?

I have good skills in C, C++, python, java, golang, Qt.I have researched about developing linguistic corpa, which helped me to realise the need of buiding a text corpus.I'm already familair with code of varnam-bus and libvarnam.I fixed some bugs in libvarnam and varnam-ibus.Added Datuk Corpus and silpa spell checker dictionary to varnam.


Proposal Description

Overview

Varnam is a predictive transliterator for indian languages. Varnam has a self learning program called ilibvarnam, it can learn words as you type using an input tool(varnam-ibus). It stores the words it has learned, and makes use of this knowledge to provide suggestions while transliterating. Another way by which varnam learns new words is by feeding the word corpus to learning system, more data that you feed, the better it will be. The aim of the project is to

1) Upload the personal learnings from the input tool to the online word corpus

2) Download the new releases of word corpus and feed it to the learning system


The need i believe it fulfills

Unlike other transliterators varnam has got a prediction and learninig system. In my view varnam is not only learning word but also the user, which helps to increase the typing speed of the user.

Word corpus syncing feaure will be helful in many ways

1) New learnings from the input tool will be helful to build online corpus

2) Currently when a new word corpus is released, user have to manually feed the traininig data.Sync feature can overcome this problem

3) Users will able to use their the personal learnings from multiple devices and platforms

4) Users with similar learnings can be categorised together

5) Sync works a backup of their personal learning

6) Online corpus helps to track how new words and senses are emerging, as well as spotting other trends in usage, spelling.

7) Privacy of users will be considered, Before uploading user will given an option to remove unwanted words from local corpus


Implementation

Varnam uses sqllite for save the learning from command-line and varnam-ibus. Each record in sqlite will have a timestamps,mentioning when i it created. Newly added record to learning file will be selected on the basis of the timestamps and date when last sync was performed,using sql query.It will be uploaded the online corpus. User will be give an interface to remove unwanted word and an option to manually train their favorite words. Sync will be performed using curl or rsync.User will be notified when new word corpus is released.


Rough Timeline

May 19 - May 31 : Analyze the the current learning database

Jun 1 - Jun 7 : Redesigning the learning database to make it for uploading

Jun 7 - Jun 15 : Designing the online database of corpus

Jun 15 - Jun 25 : Create program for uploading

Jul 25 - Jul 5  : Create program for downloading

Jul 5 - Jul 15 : Create Gui using QT

Jul 15 - Jul 25 : Testing

Jul 25 - Aug 5  : Fix all existing bugs.

Aug 5 - Aug 18 : Review code and database. also fix bugs and get ready for final evaluation


Something I have created

I have created an application scroll the page using face.


Communication with Mentor

I have communicated with Navaneeth K N , about the project and he was very helpful.He answered almost every question i asked.


Reference

facescroll : [1]

Varnamproject - Bugs: bug #40510 : [2]