User:Joemathai

From SMC Wiki

Google Summer of Code 2014 Proposal for Swathanthra Malalayalam Computing



Personal details

Name               : Joe Mathai
IRC Nicks          : joemathai on freenode
Email              : joemathai16@gmail.com
Education/College  : B.Tech Computer Science and Engineering , Govt. Model Engineering College.
Blog               : http://joemathai.github.io
Github Username    : joemathai


Why do you want to work with the Swathanthra Malayalam Computing?

Swathanthra Malayalam Computing is contributing much to Indic languages and language computing, apart from that it will be a great learning opportunity to work in an active opensource community like SMC.

Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?

Yes,I have contributed to opensource projects like ThinkUp , Mozilla and few other projects in github.

Did you participate with the past GSoC programs, if so which years, which organizations?

No, I've not participated in GSoC programs before.

Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment

It will be possible for me to work 40 hours / Week.

Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?

I will definitely continue doing it along with maintaining the Javascript port of SILPA which i am proposing to implement as a part of GSoC 2014.My interest mainly lies in language processing algorithms and their optimization.

Why should we choose you over other applicants?

  • I am an active a FOSS advocate and have previous experience in working with opensource projects,which puts me at a good position as i can collaborate and write production quality code.
  • Apart from this i am proficient in JavaScript,python and version controlling tool git which is needed for successful completion of my project.Also i am fluent in Hindi and Malayalam both of which are indic languages which makes it easier to understand design of the modules.
  • The best reason for choosing me would be for the project that i want to implement which will increase the use of already existing SILPA modules and spawn whole new class of applications which can uses it thus extending the reach of indic language computing.

Proposal Description

Title

 Converting indic processing modules currently in SILPA into javascript modules library


Abstract

The SILPA project has indic language processing modules which are written in python,a lot of which can be used to a greater extend and to benefit if they are available in JavaScript. JavaScript has turned into the lingua franca of the web and it is extremely efficient when it come to creating web applications not mention it speed of execution with modern browser engines.I would like to propose a project where individual modules in SILPA code base is ported to JavaScript which can be used on both the client-side and server side(Universal Modules) and furthermore implement a RESTful API for SILPA.

Implementation Details

The implementation of this proposal


1. Identification of dependencies : The first task it to identify all the dependencies of various indic language processing modules within SILPA and they chart out the order of porting the modules.

2. Implementing Modules  : The next task is to successfully mimic the processing done by the corresponding python module.Then implement universal modules using Universal Module Definition(https://github.com/umdjs/umd) so that the module can be used successfully both on a client-side and also on the server side.The UMD pattern that suits the project best is the one with support for node,AMD and browser globals(https://github.com/umdjs/umd/blob/master/returnExports.js).

4. Also re-write Script-render in SILPA using Mozilla/pdf.js (https://github.com/mozilla/pdf.js) so that any wiki link can be rendered and viewed from the client-side as a pdf.Also use SVG.js for rendering the text in the required font (UTF-8) in client side as *.svg file.

5. Unit testing  : Create unit tests for individual modules using Jquery qunit for client-side and mocha/grunt for the server side code.

6. Documentation

7. Publishing  : The modules can be published easily with npm and bower.

8.Implementation of a RESTful API using SILPA modules.


(The timeline has been created after going through individual modules and their dependencies and few implementation details are mentioned too ..)

GSoC 2014 Timeline

This timeline has been created after going through the various modules, their dependencies and implementation difficulty.Also porting of the experimental modules within SILPA is planned towards the end.


21st April - 18th May :


1. Setup a clean Linux-development environment for SILPA port project(probably use Docker/Vagrant ).

2. Discuss the priority and the order of the modules to be ported with the SMC-Silpa community and also contact the authors of the various modules and discuss the challenges and further optimizations that could be added.

3. Start researching on ways to improve the implementations by optimizing the code with better algorithms,data-structures etc.Also Check the feasibility of implementing the Script rendering on the server side.

4. Create a prototype of SILPA-common in UMD to better understand UMD.


19th-26th May :


1. Implementation of SILPA-common in JavaScript , language detection module and char-map.

2. Implementation of CMU_Dict pronouncing module for indic languages.

3. Implementation of Transliteration.


27th May - 4th June :


1. Implement Hyphenation module(reuse or implement a Unicode char based language detection).

2. Implement Soundx module in JavaScript.

3. Implement inexact search module in JavaScript.


5th-20th June :


1. Implement syllabalizer.

2. Implement indicngram.

3. Implement Shingling.

4. Implement text-similarity.


21st - 23rd June :


Check for issues/bugs reported in the code so far by the community and if there was any recent changes in the code ported to javascript so far implement them too.

Prepare the necessary documents and code for the mid term evaluation.


23rd June - 10th July :


1. Implement Payyans module in JavaScript.

2. Discuss with the contributors of spellchecker which is listed as an experimental module and make a port of it to JavaScript.

3. Implement Unicode Collation Algorithm sort of malayalam words.

4. Implement katapayadi module.


11th - 31st July :


1. Implement a SILPA scriptrender as a client side module along with a web application to demonstrate it.

2. Implement normalizer module,after proper discussion with the authors.

3. Create unit tests for every single module ported along with the necessary documentations.The unit test will contain both client and server side tests.


August 1 - 10


  • Implement a RESTful API using Node.js with the modules ported or implement a FLASK RESTful API using the existing python modules.


August 10 - 16


  • Check for recent commits to the modules in python and if any implement them in the corresponding JavaScript modules.
  • Code review : After each module is done publish it after a review process by the members of the community.


August 17 - 18

  • Prepare for the pencil down phase.

About Me

I am Joe Mathai ,a computer science engineering student who is passionate about programming , FOSS and above all a free and open web.I am currently doing my 3rd year B.Tech in Govt. Model Engineering College,Kochi.Apart from participating in competitive coding events and hackathons,i spend most of my free time either working with my college FOSS team or cooking up some cool script.My areas of interest is in natural language processing,distributed computing and operating systems.I am currently developing a distributed computing framework in JavaScript( http://github.com/joemathai/disco.js ) which uses internet to distribute and process data.Apart from this i was the part of the team that created a subtitles sync application which used Nokia's LightSpeak technology and our project was judged the best hack in MIT Labs/Nokia hackathon.I am also the current Google Student Ambassador for my university and have been involved in popularizing the use of AngularJS and AppEngine.

Happy to mail my resume on request :).