User:Joemathai
Google Summer of Code 2014 Proposal for Swathanthra Malalayalam Computing
Personal details
Name : Joe Mathai IRC Nicks : joemathai on freenode Email : joemathai16@gmail.com Education/College : B.Tech Computer Science and Engineering , Govt. Model Engineering College. Blog : http://joemathai.github.io Github Username : joemathai
Why do you want to work with the Swathanthra Malayalam Computing?
Swathanthra Malayalam Computing is contributing much to Indic languages and language computing, apart from that it will be a great learning opportunity to work in an active opensource community like SMC.
Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?
Yes,I have contributed to opensource projects like ThinkUp , Mozilla and few other projects in github.
Did you participate with the past GSoC programs, if so which years, which organizations?
No, I've not participated in GSoC programs before.
Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment
It will be possible for me to work 40 hours / Week.
Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?
I will definitely continue doing it along with maintaining the Javascript port of SILPA which i am proposing to implement as a part of GSoC 2014.My interest mainly lies in language processing algorithms and their optimization.
Why should we choose you over other applicants?
I am an active a FOSS advocate and have previous experience in working with opensource projects,which puts me at a good position to write good quality code with necessary testing.Apart from this i have experience in working on Node.js and python modules which is needed for successful completion of my project.
Proposal Description
Title
Converting indic processing modules currently in SILPA into javascript modules library
Abstract
The SILPA project has a lot of indic language processing modules which are written in python,a lot of which can be used to a greater extend and benefit if they are available in JavaScript. JavaScript has turned into the lingua franca of the web and it is extremely efficient when it come to creating web applications.I would like to propose a project where individual modules in SILPA code base is ported to javscript modules which can be used on the client-side and server side,along with its dependencies and implement a RESTful API for SILPA with Node.js.
Implementation Details
The implementation of this proposal
1.Identification of dependencies : The first task it to identify all the dependencies of various indic language processing modules within SILPA and they chart out the order of porting the modules.
2.Implementing Modules : The next task is to successfully mimic the processing done by the corresponding python module.Then implement universal modules using Universal Module Definition(https://github.com/umdjs/umd) so that the module can be used successfully both on a client-side and also on the server side.The UMD pattern that suits the project best is the one with support for node,AMD and browser globals(https://github.com/umdjs/umd/blob/master/returnExports.js).
3.Unit testing : Create unit tests for individual modules using Jquery qunit for client-side and mocha/grunt for the server side code.
4.Documentation
5.Publishing : The modules can be published easily with npm and bower.
After the modules are ported
Implementation of a RESTful API with Node.js using express.js framework using the ported modules.
GSoC 2014 Timeline
This timeline has been created after going through the various modules, their dependencies and implementation difficulty.Also porting of the experimental modules within SILPA is planned towards the end.
21st April - 18th May :
1. Setup a clean Linux-development environment for SILPA port project(probably use Docker/Vagrant ).
2. Discuss the priority and the order of the modules to be ported with the SMC-Silpa community and also contact the authors of the various modules and discuss the challenges and further optimizations that could be added.
3. Start researching on ways to improve the implementations by optimizing the code with better algorithms,data-structures etc.Also Check the feasibility of implementing the Script rendering on the client side.
4. Create a prototype of SILPA-common in UMD to better understand UMD.
19th-26th May :
1. Implementation of SILPA-common in JavaScript , language detection module and char-map.
2. Implementation of CMU_Dict pronouncing module for indic languages.
3. Implementation of Transliteration.
27th May - 4th June :
1. Implement Hyphenation module(reuse or implement a Unicode char based language detection).
2. Implement Soundx module in JavaScript.
3. Implement inexact search module in JavaScript.
5th-20th June :
1. Implement syllabalizer.
2. Implement indicngram.
3. Implement Shingling.
4. Implement text-similarity.
21st - 23rd June :
Check for issues/bugs reported in the code so far by the community and if there was any recent changes in the code ported to javascript so far implement them too.
Prepare the necessary documents and code for the mid term evaluation.
23rd June - 10th July :
1. Implement Payyans module in JavaScript.
2. Discuss with the contributors of spellchecker which is listed as an experimental module and make a port of it to JavaScript.
3. Implement Unicode Collation Algorithm sort of malayalam words.
4. Implement katapayadi module.
11th - 31st July :
- Try to implement a SILPA scriptrender as a client side module.
- Implement normalizer module,after proper discussion with the authors.
- Create unit tests for every single module ported along with the documentation.The unit test will contain both client and server side testing.
August 1 - 10
- Implement a RESTful API using Node.js with the modules ported or implement a FLASK RESTful API using the existing python modules.
August 10 - 16
- Check for recent commits to the modules in python and if any implement them in the corresponding JavaScript modules.
- Code review : After each module is done publish it after a review process by the members of the community.
Augustus 17 - 18
- Prepare for the pencil down phase.
About Me
I am Joe Mathai ,a computer science engineering student who is passionate about programming , FOSS and above all a free and open web.I am currently doing my 3rd year B.Tech in Govt. Model Engineering College,Kochi.Apart from participating in competitive coding events and hackathons,i spend most of my free time either working with my college FOSS team or cooking up some cool script.Happy to mail my resume on request :).