|
|
(12 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| Google Summer of Code 2013 Proposal for Swathanthra Malalayalam Computing
| | GSoC Proposal for Flask based Silpa |
|
| |
|
| | | Separate templates from SILPA and have it inside modules packaged for pypi |
| == Personal Information ==
| | Project: The templates used as User Interface is part of SILPA. This should be separated and should come as a part of individual modules. |
| | | More [http://wiki.smc.org.in/User:Janwin/Flask_based_Silpa-template_packaging] |
| Email Address ː jisacs1492@gmail.com
| |
| Blog URL ː http://techytrends.wordpress.com/
| |
| Freenode IRC Nick ː Janvin
| |
| University and current education ː BTech Computer Science, Calicut University
| |
| | |
| | |
| === Why do you want to work with the Swathanthra Malayalam Computing? ===
| |
| | |
| I find Swathanthra Malayalam Computing as the best way to contribute to both the open source software and my mother tongue. SMC has got a wide reach and possibilities in language processing and liberalization of information across regional boundaries. I want to enjoy the spirit of free and open source software development in a global level.
| |
| | |
| === Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor? ===
| |
| No.I was not familiar with SMC. I haven't explored any chance to contribute to open source projects before even though I used to employ free open source software, reprogram it and learn as a hobby.
| |
| | |
| === Did you participate with the past GSoC programs, if so which years, which organizations? ===
| |
| No. I have not participated in GSoC programs before.
| |
| | |
| === Do you have other obligations between May and August ? Please note that we expect the Summer of Code to be a full time, 40 hour a week commitment ===
| |
| | |
| I can find time for GSoC without a fail from the end of may till July since we have our summer vacation meanwhile. In august I can find 3 hours each weekday and about 12 hours in the weekends which helps me to dedicate around 40 hours a week. Since GSoC is very important to me I can assure you my time.
| |
| | |
| === Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2013 program, if yes, which area(s), you are interested in? ===
| |
| | |
| I want to contribute to SMC even if you could not get me on board through GSoC. I would be glad to be a part of SMC in developing a better text to speech converter. I want to see SMC's text to speech application being used to translate free online tutorials.
| |
| | |
| === Why should we choose you over other applicants? ===
| |
| | |
| I m a FOSS enthusiast for years and have been a part of FOSS cell in my college for the past 2 and a half years. I have taken sessions on FOSS and bash scripting for juniors.
| |
| Swanthanthra Malayalam Computing will be a great platform for me to contribute more for FOSS.
| |
| I have created an event notification application for our campus and am familiar with web frameworks like Django. I have used HTML and CSS templates for the Django framework. Since I am comfortable with python scripting it will help me in this project.
| |
| | |
| == Proposal Description ==
| |
| | |
| === An overview of your proposal ===
| |
| The spell checker module for SILPA should be capable of handling inflections and agglutinations.The current spell checker lacks these features in Malayalam.The proposed project aims at building more advanced spellchecker which works for multi level suffix stripping as required for malayalam.It will also have features to handle inflections and agglutinations in Malayalam.
| |
| The plan is to use hunspell algorithm.For this hunspell files has to be written for malayalam after scripting data needed for files.Hunspell supports two fold suffix stripping by extension.If it does not support upto five level suffix stripping a python based solution has to be found out.
| |
| | |
| === The need you believe it fulfills ===
| |
| | |
| Apart from basic spell checker SILPA provides, hunspell algorithm will help people to use spell checker more efficiently.Implementing hunspell algorithm has following benefits : <br />
| |
| | |
| * Performs spell checking of complex words <br />
| |
| * Spell checks inflecting and agglutinating words <br />
| |
| * Spell checks multi level suffix stripping words <br />
| |
| * Give suggestions for the complex words <br />
| |
| * Handle conditional affixes <br />
| |
| * Support complex compoundings <br />
| |
| | |
| === Any relevant experience you have ===
| |
| I have experience in python in which current spell checker is written and I know algorithm used for the current spell checker.Also I have basic understanding about how hunspell files are written for a language.
| |
| | |
| === How you intend to implement your proposal ===
| |
| | |
| The suffix and prefix patterns for each word has to be generated in Malayalam dictionary.Also inflecting and agglutinating words have to be found.Hunspell is a spell checker and program designed for languages with complex word compounding or character encoding.It uses terminal like interfaces.Hunspell algorithm can be implemented for spellchecker.
| |
| Then hunspell algorithm can be implemented for spellchecker.
| |
| In hunspell algorithm two files are used for spell checking.
| |
| | |
| 1. A dictionary file containing list of words in malayalam.The first line of the file contains word count.Each word can optionally be followed by a '/' and flags which represents affixes.A second word separated by a slash sets the affixation.
| |
| | |
| 2. An affix file which contains optional attributes.Some of these attributes are
| |
| | |
| REP - It sets a replacement table for multiple character corrections for suggestions.It is not applied for the correct word.The first REP is header which gives the number of REPs used followed by REPs from the next line.REP can be used if right word forms differs by 1 or more characters.
| |
| | |
| PFX – It defines prefix
| |
| | |
| SFX – It defines suffix
| |
| | |
| TRY – It sets change characters for suggestions.It is not applied for correct word.It suggests the right word forms
| |
| | |
| There are also options for compounding of words in hunspell.The compound header gives number of compound definitions.The words can be first,middle,last,only middle etc elements in compound words.For this flags are defined in affix file and it is used in dictionary file.
| |
| | |
| Hunspell also supports two fold suffix strippping for agglutinative languages.Single suffix stripping is extended for this purpose.It can also handle many affix classes.
| |
| The Hunspell provides library routines which gives the user word-level linguistic functions : spell checking - spell() and correction -suggest() .It’s constructor needs paths of the affix and dictionary files.
| |
| | |
| We have to make these two files with necessary data for malayalam language spell checker.Suffix stripping can be extended to acheive any multi level suffix stipping.If it doesnt work in hunspell an algorithm to be implemented in python has to be found out.Help can be seeked from language communities if needed for scripting.The dictionary file is already written for silpa.The main task will be creating an affix file in hunspell for malayalam.Since malayalam is an agglutinating language spell checking program can be more complicated.
| |
| | |
| == A rough timeline for your progress with phases ==
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| ! Duration !! Description !! Mile Stone
| |
| |-
| |
| | Before May 27 || Before Announcement of Candidates
| |
| || Familiarize with the version control system,code,documentation of spell checker module of SILPA, hunspell working and requirements of project.Try hunspell in malayalam and other languages.
| |
| | |
| |-
| |
| | May 28 – June 16 || Before Official Coding Period Starts
| |
| || To do self coding with python to further improve my understanding of various concepts involved.Start learning hunspell algorithm.During this period I will remain in constant touch with my mentor to be absolutely clear of my future goals.
| |
| | |
| |-
| |
| | June 17 – July 3 || Official Coding Period Starts
| |
| || Coding,Testing and Debugging of various features in spell checker.
| |
| Starts scripting of various suffix and prefix patterns in malayalam.
| |
| Start writing affix file for hunspell.
| |
| Presentation of components to mentor weekly.
| |
| | |
| |-
| |
| | July 3 - July 31 || Preparing for mid term evaluation
| |
| || Do further scripting for inflecting and agglutinating words.
| |
| Scripting for different compound words.Ask help from language communities for further scripting.Submission of files to mentor for evaluation.
| |
| | |
| |-
| |
| | Aug 1 - Aug 15 || After mid term evaluation
| |
| || Refine the scripting as per mentors suggestions.
| |
| Scripting for multi level suffix stripping.
| |
| Making changes so as to improve functionality.
| |
| | |
| |-
| |
| | August 16 - August 29 || Before Final stage || Implement multi suffix stripping property.
| |
| Completion of affix and dictionary files.
| |
| Most of the time will be used for rigorous testing.
| |
| | |
| |-
| |
| | August 30 - September 10 || Final Stage || Documentation of the project.
| |
| | |
| | |
| |}
| |
| | |
| A buffer of one week has been kept for unpredictable delay.
| |
| | |
| == Tell us something about you have created ==
| |
| 1.Event Notifier for college
| |
| Event Notifier is a web app which sends notifications in college on new events.
| |
| Git hub repository of the project https://github.com/priyapappachan/eventnotifier/tree/my-remote <br />
| |
| Blog post http://priyapappachan.wordpress.com/ <br />
| |
| | |
| == Have you communicated with a potential mentor? If so who? ==
| |
| I couldn’t communicate with the mentor Santhosh Thottingal of my project.I’ve talked with Hrishikesh K B.
| |