User:Haseeb: Difference between revisions

From SMC Wiki
No edit summary
(Blanked the page)
Line 1: Line 1:
==GSoC 2013: Urdu Support to Silpa==


===Overview===
Silpa, Swathanthra Indian Language Processing Applications is a web framework and a set of applications for processing Indian Languages. Silpa supports many Indic languages, the project aims at extending Silpa's functionality by adding "Urdu" support to related and existing modules during this summer.
===Project Details:===
'''Development & The Way Forward'''
--------
''' Urdu Script:'''
Urdu is an Indo-Aryan language. The script it uses is derived from Arabic and Persian, but to suit the particular requirements of Indo-Aryan phonology, particularly aspiration, retroflexion and nasalization, it has been suitably modified. It is cursive in nature.  The letters are of two types, connectors and non-connectors. The connectors combine with the following letters in the word or the syllable, while the non-connectors cannot combine with the following letters. However, all letters combine with the preceding connector ones.  Most of the letters have three shapes, initial when they occur in the beginning, medial when they occur in the middle and finally joined when they occur at the end of a word. The final unjoined shape is the same as the basic letter.
'''Writing System:'''
The script is written from right to left.
'''Sequence of Urdu Letters:'''
[[File:Example.png]]
Ref: http://en.wikipedia.org/wiki/Urdu_alphabet
'''Vowels:'''
The long vowels in Urdu are indicated by alif ( ا ), alif-mad ( آ ), vāo ( و ), choṭī yē ( ې ) and baṛī yē ( ے ). The superscript mad ( ٓ  ) written over alif, e.g., آ , denotes long /ā/ at the beginning of a word. However, in medial and final position alif ( ا ) by itself stands for a long /ā/. Yē ( ې ) and vāo ( و ) when occurring initially, stand for semi-vowel /y/ and /v/ respectively, such as, /yahā̃/ ( یہاں ), /vahā̃/ ( وہاں ). Vāo ( و ), choṭī yē ( ې ), baṛī yē ( ے ) in other environments denote long vowels.
'''Short Vowels:'''
The short vowels in Urdu are indicated by superscript or subscript as indicated below:
Above a consonant is called 'zabar'. It denotes a following /a/:  ...ٓ...
Below a consonant is called 'zēr'. It denotes a following /i/:    ...̗...
Above a consonant is called 'pēsh'. It denotes a following /u/:    ...ٓ...
'''Modules that i will be working to add Urdu support: '''
* Transliteration - i have already started working on this [https://github.com/haseebgit/Transliteration Urdu Transliteration]
* Guess Language
* Dictionary - i will be using wikitionary [http://dumps.wikimedia.org/urwiktionary/latest/ Dump]
* Spell Checker
* Syllabification
* Soundex
* Approximate search
* Shingling Library
* Fortune Cookies - I will be adding Urdu [http://en.wikipedia.org/wiki/Shayari Shayari] of famous urdu poets.
* Hyphentator
'''Benefits:'''
* It will spread the wings of Silpa project
* It will be more popular among Urdu users, since there aren't much Urdu softwares available
* It tend to motivate organizations to use Silpa as it will forward their aim of communication and collaboration with other Indic languages. Hence adding another flag to Silpa's powerful support.
* It will enable Silpa to extend its collaborative nature to the next level. Specially in School's , College's and Universities.
'''Roadmap:'''
* Learning : Already started and on Going.
* Programming: 8 weeks
* Final review and adding time : 1 week
* Bug fixing time :1 week
* Documentation : 1 week
'''Methodology'''
----------
My development process will follow the standard Silpa development
process, under the guidance of my mentor.Each module will be developed in a branch.
When the code and matching unit tests are finished, they will go through
code review by mentor to ensure it follows the coding
standard, is well designed, is sufficiently tested and documented, etc.
Once the issues from the code review have been addressed the branch will
be merged into trunk.
Small,frequent feedback via code review, and the
requirement for doing testing and documentation will ensure I am learning
and improving throughout the summer, and will enhance my ability to get
code merged into Silpa from the very beginning.
'''How i came up with this? '''
----------
During discussion with Vasudev kamath(coyninja_), He suggested to propose this.
Originally i proposed the idea "word thesaurus" for Indic languages, due to unavailability of data i had to drop the idea.
'''Motivation'''
---------
I am FOSS enthusiast, in my free time i do translations to couple of open source projects.
SMC is great initiative for Indic languages and i like the community.The project will open new doors for Urdu users.
'''Level of Difficulty'''
--------
Medium
'''Potential Mentor'''
--------
Vasudev Kamath ‏(copyninja_)
'''Why me?'''
------
I like coding in Python which is language Silpa is developed, i have some scripts , which mostly uses Flask [https://bitbucket.org/haseebbit Bitbucket].I am native speaker of Urdu, and can read, write and speak Urdu well enough. I do know other languages Hindi, Kannada and some little Telgu, i also know Arabic(i can read but can't speak :) ). I am also familiar with Silpa source code and i already started working on [https://github.com/haseebgit/Transliteration Transliteration] module. Also in my free time i do translation to couple of open source projects.I am fairly good with Git and Mercurial .
* [http://mozilla.locamotion.org/accounts/haseeb/ Mozilla]
* [https://translations.launchpad.net/~haseeb Launchpad]
===Timeline===
-------
It's difficult to plan how the work will be done.
In the last week of June and second week of July I have my semester exams for my bachelor degree. Therefore I would like to start working during Community Bonding period.
# May 27      - June 17      = Community Bonding + Transliteration + Guess Language
# June 18      - June 28      = Hyphentator
# June 29      - July 7      = Fortune cookies
# July 9      - July 28      = Spell checker and Dictionary
# July 29      - August 2    = Midterm time. Submit the code. I'll keep this time free in case I've fallen behind schedule.
# August 3    - August 23    = Soundex and Approx Search
# August 24    - September 4  = Shingling Library
# September 5  - September 12 = Syllabification
# September 13 - September 23 = Documentation, final review and extensive testing to prevent as much bugs as possible
===Personal information===
-------------
'''Email Address:''' abdulraufhaseeb@gmail.com
'''Blog URL:''' http://terrificking.blogspot.in/
'''Freenode IRC Nick:''' haseeb
'''University and current education:''' My university is Visveswaraya Technological University and Currently pursuing Bachelor of engineering in Electronics and Communication from K.B.N college of engineering Gulbarga.
'''Past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor:'''I didn't contributed to SMC formally, however did some minor contributions created requirements.txt and modules.txt in Silpa source code structure for easy Installation  of Silpa and one code clean up.
I am  active member of opensource community in my College and City. I have been taking part in many opensource meetups and events(mostly in Bangalore). I have also volunteered with many events like Hasgeek's events,PyCon India etc.This time i handled PyCon India's Registration both Online as well as Onsite. I am also the founder of my Local Linux User group in my town (ilug-gulbarga).
'''Did you participate with the past GSoC programs, if so which years, which organizations?'''
No. I didn't participated
'''OS and Editor:'''I use Arch Linux on top of Gnome as my operating System, i like Sublime text 2 editor, there are some features i really like "Code completion" , "color themes" etc
'''Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2013 program, if yes, which area(s), you are interested in?'''
Yes i will continue to work SMC and i will maintain Urdu related tasks. I have also a future Goal "Word Thesaurus" for Indic languages :).

Revision as of 06:59, 26 April 2013