User:Sujithvm

From SMC Wiki
Revision as of 13:38, 6 April 2014 by Sujithvm (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Personal Information

  • Name : Sujith V
  • Telephone : +91 7259281007
  • Freenode IRC Nick : sujithvm
  • University and Current Education : Pursuing 2nd year Computer Science Engineering at PES Institute of Technology, Bangalore South Campus.
  • Hometown : Palakkad, Kerala
  • Github : https://github.com/SujithVadakkepat
  • Why do you want to work with the Swathanthra Malayalam Computing? :

Swathanthra Malayalam Computing has been technologically serving a large part of Indian community by developing state of the art modules in Indic Language processing. It would be a great privilege to collaborate and work with SMC and simultaneously have an enriching experience of contributing to Open Source Community.

  • Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor? :

No.

  • Did you participate with the past GSoC programs, if so which years, which organizations? :

No, this will be my first time participating in GSoC.

  • Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment :

No.

  • Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in? :

Yes, I would love to contribute to SMC even after GSoC 2014. The areas which I am interested in contributing to are Natural Language Processing Algorithms, Java modules and Bug Fixing.

  • Why should we choose you over other applicants? :

Good programming experience in Java, Android, Python, C#, C, C++ And Javascript. I have developed quite a number of Android applications and have also worked on projects with Natural Language Processing in Python using NLTK Library. Also I am a regular participant in online coding competitions and have established a good position in CodeChef, TopCoder and Hackerrank.

Proposal Title : Android SDK for SILPA


1 Summary
SILPA (Swathanthra Indian Language Processing Applications) comprises of a set of applications for processing Indian Languages. The project aims at the development of Android SDK for Silpa modules which can aid developers in developing their Indic applications.

2 Project Description
Currently all the SILPA modules and applications for processing Indian languages are programmed in Python and Android developers face the limitation of benefiting from these efficient language processing modules by using it as a web service. Also Android is devoid of proper support for Indian Languages. The project focuses on developing an Android SDK for SILPA applications by porting all modules to Android Library and packing fonts to achieve consistency in viewing texts of different languages on all platforms and hence bridging the gap between SILPA modules and Android developers. This SDK will enable developers to enjoy freedom from Android fragmentation and support their Indic applications in the dominating Android application market.

3 Need it fulfills
i) This library will help developers to support their Indic Android applications.
ii) Can possibly help millions of Indian mobile phone users who can communicate only in their native languages.
iii) Popularising Silpa in the Android developer community.

4 Relevant experience
i) I have developed several numbers of Android apps whose details have been mentioned below.
ii) I have Worked in Python in creating NLP modules such as efficient sentiment analyzer, Readability index for text etc.
iii) Worked with www.sanfoundry.com in contributing Java programs on Data Structures and Algorithms as a part of Open Learning Project.

5 Project Objectives and Goals
The main goals for the project are summarized as follows:
i) Identification of modules that can be ported to Java and List of dependencies for each module.
ii) Preparing asset data and resources for all the modules (such as fonts, dictionaries, language specific maps and text references etc).
iii) Porting all selected modules to Java and adding API calls to libraries and Unit testing of all modules using JUnit.
iv) Building Android library for Android SDK.
v) Stress Testing and Documentation of all modules.
vi) Developing code snippets and sample Android application portraying usage of all modules.
vii) Releasing Android SDK under a license.

6 Implementation Details
i) In depth scrutinization of all modules and preparing a list of dependencies and other supporting Java libraries which are required for functionality of the modules.
Dependencies and Java alternatives:
i) Java bindings for Cairo, ImageJ – graphics library for rendering text.
ii) JSoup for Java, TagSoup for Java – parsing XML/HTML for script render module.
v) iText – required for PDF creation for script render module.
vi) JOrtho – required for SpellCheck module.
vii) Natural language parser for CMU dictionary – required for transliterator module.

ii) Preparing asset data and resources for all the modules :
Gathering asset data and resources for each module and creating Singleton classes and storing maps in SQLite databases which can be accessed by means of cursor loader to enable accessibility across all activities in Android. i) Payyans – required font maps to be added into assets in Android.
ii) Transliterator – CMU dictionary and adding natural language parser for CMU from UC Berkeley.
iii) SpellCheck – Integrating JOrtho (Java Spell Checking dictionary).
iv) Render – Integrating iText library for PDF creation.

List of resource data to be stored in SQLite database and accessing the same using Hash Map in Java:

v) Guess language – trigrams and language map.
vi) Indic Soundex – phonetic codes for languages.
vii) Katapayadi - language map.
viii) Hyphenator – hyphenator rules.
ix) UCA sort – UCA sort keys.
x) Stemmer – stemmer rules.
xi) Syllabalizer – syllable list and language map.
xii) Character Details – Unicode character database.
xiii) Fortune – Proverbs list.


iii) Porting all selected modules to Java and adding API calls to libraries and Unit Testing of all modules :
This task involves writing core language processing functions of modules efficiently in Java after which API calls to all library functions is written. All modules to be ported with maximum language optimization to ensure speedy processing of data by modules by defining and utilising time and memory optimized data structure classes and algorithms. Rigorous unit testing of all library functions using JUnit and tweaking code to rectify all issues.

Implementation strategy for all modules:
All functions and data members of modules can be accessed via object of the respective module class. For example, Soundex module functions can be accessed by :
Soundex obj = new Soundex (); int cmp = obj.compare(string1, string2); String soundexCode = obj.soundex (string1);

Applying Silpa modules to Android views:

i) Indic Soundex: XML layout/script can be applied to two EditText/TextViews such that as user enters text into the EditText boxes, comparison of strings is done in background and result can be stored or applied elsewhere.

ii) Payyans: XML layout/script that can be applied to EditText/TextViews such that conversion between ASCII and Unicode can be carried out in background and result can be stored or applied elsewhere.

iii) Transliterator: XML layout/script that can be applied to EditText/TextViews and transliteration process is carried out and output to ImageView or TextView using ImageJ library.

iv) SpellCheck: XML layout/script that can be applied to EditText/TextViews such that incorrectly spelt words are highlighted using JOrtho library.

v) Fortune: XML layout/script that can be applied to TextView to display a particular proverb or to ListView to display all proverbs. Also scheduling a daily notification service such that a proverb is popped up at the start of the day or at a specified time.

vi) Hyphenator: XML layout/script that can be applied to all text views such that text overflow due to insufficient space is accommodated in the next line with the usage of hyphen.

vii) UCA sort: Custom Adapter for ListView such that text of selected component can be sorted and displayed.

viii) Guess Language: XML layout/script that can be applied to EditText/TextView where field text is obtained by means of context and language is predicted by accessing GuessLanguage methods.

ix) Stemmer: XML layout/script that can be applied to text views from which text can be obtained by means of context and each word can be reduced to stem word and stored.

x) Syllabalizer: XML layout/script that can be applied to text views from which text can be obtained by means of context and each word can be reduced to list of syllables and stored.

xi) Katapayadi: XML layout/script that can be applied to text views from which text can be obtained by means of context and Katapayadi number for each word is calculated.

xii) Character Details: XML layout/script that can be applied to text views from which text can be obtained by means of context and character details can be obtained by passing text to character details function of its respective class via object of the respective class.

iv) Building Android SDK :
Task of building Android SDK can be sub divided into :
a) Creating base model classes for all modules.
b) Generating all constructors, getter and setter functions which are required to initialize each module.
c) Creating instances of the each module class and creating custom xml layouts for Android UI elements and views such as TextView, EditText etc which instances of class linked to them. For eg. TextView of Transliteration modules is designed to convert entered text into any specific language when text is applied to it.
d) The input for Indic languages in Android Application in provided by integrating Indic Keyboard developed by SMC itself.
e) Define dependencies for the project such as Java bindings for Cairo, JSoup, ImageJ, TagSoup which are required for modules such as Script Renderer.


v) Implementation of Image rendering module into SDK :
i) Need for Image rendering module:

a) Support for Indic languages on Android phones. For example – Native Gujarathi, Punjabi font support is missing as of Android 4.4 (Kitkat).
b) Improper rendering if device does not support complex scripts.
c) Achieve consistency across all devices by packaging popular fonts and providing developers the flexibility to force rendering using packaged fonts.

ii) Implementation procedures:

a) Compiling Pango - Cairo with Android NDK (Native Development Kit) Pango is a library for layering out and rendering of text with emphasis on internalization and core Pango layout can be used with different fonts backend. The integration of Pango with Cairo which is a vector graphics library with powerful rendering model and compiling them with NDK would provide a complete solution to high quality graphics rendering in an Android device. However the chief difficulty would lie in building the libraries with native toolchain and building JNI(Java Native Interface) libraries for the same and using JCairo (Java Binding) to render script on the device.

b) Compling Harfbuzz-ng from Android Open Source Project using Android NDK Integrating Harfbuzz (OpenType text shaping library) and FreeType (font rasterization engine) would prove to be an elegant solution to rendering complex script on Android devices. The complex script layout engine (Harfbuzz) can map Unicode code points to glyph ids and font rasterization engine (FreeType) can do glyph to image rendering. The advantage of using this implementation over using Pango-Cairo is that all Android devices are FreeType library except a very few. Hence by compiling Harfbuzz, Indic text can be efficiently rendered on Android phones.

c) Using SILPA web service to perform rendering of Indic text on Android Device By supplying parameters such as text to be rendered, filetype, filename, path, dimensions, color, font and font size, the path to the generated rendering can be obtained which can be used to render text on devices. However the major disadvantage of this implementation is the internet dependency for querying to the webs service.

However, the final implementation of complex script rendering by aforementioned procedures would be selected on close investigation on the feasibility of the each implementation.


vi) Stress Testing and Documentation of all modules :
Extensive well written documentation of all classes and methods using Javadoc tool.


vii) Developing code snippets and sample Android application portraying usage of all modules:
Writing code snippets and other sample Java and Android codes so as help developers in benefitting from the Android SDK.


viii) Releasing Android SDK under license:
Silpa is licensed under GNU Affero General Public License. However license for Android SDK is to be mentioned by mentor. Developing plug-ins for Eclipse and Android tools to enable developers to download and utilise the SDK.


  • Communication with Mentor

Contacted Jishnu Mohan and Hrishikesh K.B

Tentative Timeline

Self experimentation and Research Period

April 1 – April 8: Familiarizing myself with libraries such as ImageJ, JSoup, Cairo, TagSoup, iText, JOrtho and using CMU parser.
April 8 – April 14: Familiarizing myself with creation of Android libraries and creating custom views for each module and creation of SDK tools and plugins for Eclipse/Android tools to enable developers to download code.
April 14 – April 20: Familiarizing myself with compiling Harfbuzz and Pango-Cairo with Android SDK to serve fundamental issue of complex script rendering on Android devices. During Community Bonding Period.

During Community Bonding Period

April 21 – April 30: Careful scrutinization of existing modules to achieve in depth understanding of the functional features of the module.
May 1 – May 10: Communication with mentor regarding the implementation strategy and considering the changes as suggested by the mentor.
May 11 – May 18: Finalizing a definite objective plan for the project and setting up online git repositories.

Coding Period

Targeting to complete implementation of independent modules (no dependencies) namely Payyans, Soundex, Stemmer, Fortune, Syllabalizer, Katapayadi, Character Details and Guess Language.

May 19 - May 26: Preparing asset data and resources for all modules and importing them into the project.
May 27 - June 6: Complete porting of aforementioned modules and discussion with mentor regarding the same.
June 7 - June 8: Tweaking codes and implementing suggested changes as per remarks given by mentor.

(Integration into SDK):
June 8 - June 12: Integrating all modules into Android library with creation of base model classes, constructors, getters and setters and other essential requirements.
June 13 - June 19:Creating custom xml scripts and views which developers can integrate into their code
June 20 - June 24: Bug fixing and Stress Testing of android components and discussion with mentor regarding the same.

Mid-Term Evaluation
Targeting to complete implementation of remaining modules namely Transliterator, UCA Sort, Spell Check, Hyphenator, and Script Renderer.

June 25 – July 6:Porting and integrating the remaining modules into the SDK.
July 7 – July 11:Compiling Harfbuzz with Android NDK and integrating with FreeType for script rendering.
July 12 – July 17:Compiling Pango-Cairo with Android NDK and building JNI libraries which is to be used by JCairo for script rendering.
July 18 – July 23:Testing of Script rendering modules on Android emulators and devices and Finalizing script rendering implementation based on above two schemes and implementing script rendering using Silpa web service on failure of the above two schemes.
July 24 – July 31:Optimizing, Documentation, Fine tuning, Code clean up.
August 1 – August 10:Writing sample codes in java and building sample Android fragments and activities demonstrating the features of SDK.
August 11 – August 17:Discussion with mentor, writing tests, improving documentation and adding project license.
End-Term Evaluation

Post GSoC
Actively contributing to SMC by updating modules for improved efficiency, bug fixing of Android SDK and development of SDK for Windows platform to support usage of SILPA modules on Windows desktop and phone platforms.

About Me

I am Sujith V, BE Computer Science Student from PES Institute of Technology, Bangalore. My programming interests lies mainly in algorithms, data structures and application development domains. I possess a fair amount of knowledge and experience in Java, Python and Android Application Development. I am also a regular participant in online coding platforms such as Codechef, Topcoder , CodeForces and Hackerrank. Few projects developed by me as follows:
i) Worked with www.sanfoundry.com in contributing Java programs on Data Structures and Algorithms as a part of Open Learning Project.
ii) GuideDog - An Android application to enable visually challenged people to use features of a smart touch screen phone through user friendly gestures and voice.
iii) Mechanics - An Android application for engineering students to resolve vectors graphically, access engineering calculator, enhanced notes, visual animations.
iv) Sudoku - Android application re invent popular game Sudoku.
v) Worked in Python in creating NLP modules such as efficient sentiment analyzer, Readability index for text etc.
vi) Worked in Unity 3D to create animation in response to user moves.