User:Sujithvm

From SMC Wiki
Revision as of 03:25, 22 March 2014 by Sujithvm (talk | contribs)

Personal Information

  • Name : Sujith V
  • Telephone : +91 7259281007
  • Freenode IRC Nick : sujithvm
  • University and Current Education : Pursuing 2nd year Computer Science Engineering at PES Institute of Technology, Bangalore South Campus.
  • Hometown : Palakkad, Kerala
  • Github : https://github.com/SujithVadakkepat
  • Why do you want to work with the Swathanthra Malayalam Computing? :

Swathanthra Malayalam Computing has been technologically serving a large part of Indian community by developing state of the art modules in Indic Language processing. It would be a great privilege to collaborate and work with SMC and simultaneously have an enriching experience of contributing to Open Source Community.

  • Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor? :

No.

  • Did you participate with the past GSoC programs, if so which years, which organizations? :

No, this will be my first time participating in GSoC.

  • Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment :

No.

  • Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in? :

Yes, I would love to contribute to SMC even after GSoC 2014. The areas which I am interested in contributing to are Natural Language Processing Algorithms, Java modules and Bug Fixing.

  • Why should we choose you over other applicants? :

Good programming experience in Java, Android, Python, C#, C, C++ And Javascript. I have developed quite a number of Android applications and have also worked on projects with Natural Language Processing in Python using NLTK Library. Also I am a regular participant in online coding competitions and have established a good position in CodeChef, TopCoder and Hackerrank.

Proposal Title : Android SDK for SILPA


1 Summary
SILPA (Swathanthra Indian Language Processing Applications) comprises of a set of applications for processing Indian Languages. The project aims at the development of Android SDK for Silpa modules which can aid developers in developing their Indic applications.

2 Project Description
Currently all the SILPA modules and applications for processing Indian languages are programmed in Python and Android developers face the limitation of benefiting from these efficient language processing modules by using it as a web service. Also Android is devoid of proper support for Indian Languages. The project focuses on developing an Android SDK for SILPA applications by porting all modules to Android Library and packing fonts to achieve consistency in viewing texts of different languages on all platforms and hence bridging the gap between SILPA modules and Android developers. This SDK will enable developers to enjoy freedom from Android fragmentation and support their Indic applications in the dominating Android application market.

3 Need it fulfills
i) This library will help developers to support their Indic Android applications.
ii) Can possibly help millions of Indian mobile phone users who can communicate only in their native languages.
iii) Popularising Silpa in the Android developer community.

4 Relevant experience
i) I have developed several numbers of Android apps whose details have been mentioned below.
ii) I have Worked in Python in creating NLP modules such as efficient sentiment analyzer, Readability index for text etc.
iii) Worked with www.sanfoundry.com in contributing Java programs on Data Structures and Algorithms as a part of Open Learning Project.

5 Project Objectives and Goals
The main goals for the project are summarized as follows:
i) Identification of modules that can be ported to Java and List of dependencies for each module.
ii) Preparing asset data and resources for all the modules (such as fonts, dictionaries, language specific maps and text references etc).
iii) Porting all selected modules to Java and adding API calls to libraries and Unit testing of all modules using JUnit.
iv) Building Android library for Android SDK.
v) Stress Testing and Documentation of all modules.
vi) Developing code snippets and sample Android application portraying usage of all modules.
vii) Releasing Android SDK under a license.

6 Implementation Details
i) In depth scrutinization of all modules and preparing a list of dependencies and other supporting Java libraries which are required for functionality of the modules.
Dependencies and Java alternatives:
i) Java bindings for Cairo, ImageJ – graphics library for rendering text.
ii) JSoup for Java, TagSoup for Java – parsing XML/HTML for script render module.
v) iText – required for PDF creation for script render module.
vi) JOrtho – required for SpellCheck module.
vii) Natural language parser for CMU dictionary – required for transliterator module.

ii) Preparing asset data and resources for all the modules :
Gathering asset data and resources for each module and creating Singleton classes and storing maps in SQLite databases which can be accessed by means of cursor loader to enable accessibility across all activities in Android. i) Payyans – required font maps to be added into assets in Android.
ii) Transliterator – CMU dictionary and adding natural language parser for CMU from UC Berkeley.
iii) SpellCheck – Integrating JOrtho (Java Spell Checking dictionary).
iv) Render – Integrating iText library for PDF creation.

List of resource data to be stored in SQLite database and accessing the same using Hash Map in Java:

v) Guess language – trigrams and language map.
vi) Indic Soundex – phonetic codes for languages.
vii) Katapayadi - language map.
viii) Hyphenator – hyphenator rules.
ix) UCA sort – UCA sort keys.
x) Stemmer – stemmer rules.
xi) Syllabalizer – syllable list and language map.
xii) Character Details – Unicode character database.
xiii) Fortune – Proverbs list.


iii) Porting all selected modules to Java and adding API calls to libraries and Unit Testing of all modules :
This task involves writing core language processing functions of modules efficiently in Java after which API calls to all library functions is written. All modules to be ported with maximum language optimization to ensure speedy processing of data by modules by defining and utilising time and memory optimized data structure classes and algorithms. Rigorous unit testing of all library functions using JUnit and tweaking code to rectify all issues.

Implementation strategy for all modules:
All functions and data members of modules can be accessed via object of the respective module class. For example, Soundex module functions can be accessed by :
Soundex obj = new Soundex (); int cmp = obj.compare(string1, string2); String soundexCode = obj.soundex (string1);

Applying Silpa modules to Android views:

i) Indic Soundex: XML layout/script can be applied to two EditText/TextViews such that as user enters text into the EditText boxes, comparison of strings is done in background and result can be stored or applied elsewhere.

ii) Payyans: XML layout/script that can be applied to EditText/TextViews such that conversion between ASCII and Unicode can be carried out in background and result can be stored or applied elsewhere.

iii) Transliterator: XML layout/script that can be applied to EditText/TextViews and transliteration process is carried out and output to ImageView or TextView using ImageJ library.

iv) SpellCheck: XML layout/script that can be applied to EditText/TextViews such that incorrectly spelt words are highlighted using JOrtho library.

v) Fortune: XML layout/script that can be applied to TextView to display a particular proverb or to ListView to display all proverbs. Also scheduling a daily notification service such that a proverb is popped up at the start of the day or at a specified time.

vi) Hyphenator: XML layout/script that can be applied to all text views such that text overflow due to insufficient space is accommodated in the next line with the usage of hyphen.

vii) UCA sort: Custom Adapter for ListView such that text of selected component can be sorted and displayed.

viii) Guess Language: XML layout/script that can be applied to EditText/TextView where field text is obtained by means of context and language is predicted by accessing GuessLanguage methods.

ix) Stemmer: XML layout/script that can be applied to text views from which text can be obtained by means of context and each word can be reduced to stem word and stored.

x) Syllabalizer: XML layout/script that can be applied to text views from which text can be obtained by means of context and each word can be reduced to list of syllables and stored.

xi) Katapayadi: XML layout/script that can be applied to text views from which text can be obtained by means of context and Katapayadi number for each word is calculated.

xii) Character Details: XML layout/script that can be applied to text views from which text can be obtained by means of context and character details can be obtained by passing text to character details function of its respective class via object of the respective class.

iv) Building Android SDK :
Task of building Android SDK can be sub divided into :
a) Creating base model classes for all modules.
b) Generating all constructors, getter and setter functions which are required to initialize each module.
c) Creating instances of the each module class and creating custom xml layouts for Android UI elements and views such as TextView, EditText etc which instances of class linked to them. For eg. TextView of Transliteration modules is designed to convert entered text into any specific language when text is applied to it.
d) The input for Indic languages in Android Application in provided by integrating Indic Keyboard developed by SMC itself.
e) Define dependencies for the project such as Java bindings for Cairo, JSoup, ImageJ, TagSoup which are required for modules such as Script Renderer.


v) Stress Testing and Documentation of all modules :
Extensive well written documentation of all classes and methods using Javadoc tool.


vi) Developing code snippets and sample Android application portraying usage of all modules:
Writing code snippets and other sample Java and Android codes so as help developers in benefitting from the Android SDK.


vii) Releasing Android SDK under license:
Silpa is licensed under GNU Affero General Public License. However license for Android SDK is to be mentioned by mentor. Developing plug-ins for Eclipse and Android tools to enable developers to download and utilise the SDK.


  • Communication with Mentor

Contacted Jishnu Mohan and Hrishikesh K.B

Tentative Timeline

Self experimentation and Research Period

April 1 – April 8: Familiarizing myself with libraries such as ImageJ, JSoup, Cairo, TagSoup, iText, JOrtho and using CMU parser.
April 8 – April 14: Familiarizing myself with creation of Android libraries and creating custom views for each module.
April 14 – April 20: Familiarizing myself with creation of SDK tools and plugins for Eclipse/Android tools to enable developers to download code.

During Community Bonding Period

April 21 – April 30: Careful scrutinization of existing modules to achieve in depth understanding of the functional features of the module.
May 1 – May 10: Communication with mentor regarding the implementation strategy and considering the changes as suggested by the mentor.
May 11 – May 18: Finalizing a definite objective plan for the project and setting up online git repositories.

Coding Period

May 19 - May 26: Preparing asset data and resources for all modules and importing them into the project.
May 26 - June 6: Complete porting of half of modules and discussion with mentor regarding the same.
June 7 - June 8: Tweaking codes and implementing suggested changes as per remarks given by mentor.
June 8 - June 24: Complete Porting of all modules with testing.

Mid-Term Evaluation

June 24 – June 30: Integrating all modules into Android library with creation of base model classes, constructors, getters and setters and other essential requirements.
July 1 – July 10: Creating custom xml and views which developers can integrate into their code.
July 11 – July 20: Bug fixing and Stress Testing of android components and discussion with mentor regarding the same.
July 20 – July 30: Optimizing, Documentation, Fine tuning, Code clean up.
August 1 - August 10: Writing sample codes in java and building sample Android fragments and activity demonstrating features of SDK.
August 11 – August 17 (Pencils down): Discussion with mentor, writing tests, improving documentation and adding project license.
End-Term Evaluation

Post GSoC
Actively contributing to SMC by updating modules for improved efficiency, bug fixing of Android SDK and development of SDK for Windows platform to support usage of SILPA modules on Windows desktop and phone platforms.

About Me

I am Sujith V, BE Computer Science Student from PES Institute of Technology, Bangalore. My programming interests lies mainly in algorithms, data structures and application development domains. I possess a fair amount of knowledge and experience in Java, Python and Android Application Development. I am also a regular participant in online coding platforms such as Codechef, Topcoder , CodeForces and Hackerrank. Few projects developed by me as follows:
i) Worked with www.sanfoundry.com in contributing Java programs on Data Structures and Algorithms as a part of Open Learning Project.
ii) GuideDog - An Android application to enable visually challenged people to use features of a smart touch screen phone through user friendly gestures and voice.
iii) Mechanics - An Android application for engineering students to resolve vectors graphically, access engineering calculator, enhanced notes, visual animations.
iv) Sudoku - Android application re invent popular game Sudoku.
v) Worked in Python in creating NLP modules such as efficient sentiment analyzer, Readability index for text etc.
vi) Worked in Unity 3D to create animation in response to user moves.